19 KiB
GENOME.md — the-testament
Generated: 2026-04-15 Repo: Timmy_Foundation/the-testament Analysis issue: timmy-home #675
Project Overview
The Testament is not a conventional software repo and not just a manuscript dump. It is a hybrid publishing system with four layers:
- narrative source files
- build/packaging pipelines
- presentation surfaces
- verification/quality gates
At the content layer, the repo holds a five-part novel with 18 chapter manuscripts, front/back matter, character sheets, worldbuilding notes, cover copy, soundtrack notes, and other companion artifacts.
At the software layer, it ships a small publishing toolchain that compiles the manuscript into:
- combined markdown
- EPUB
- HTML
- web-reader JSON
- checksum manifest
It also includes:
- a static promotional/reader website (
website/index.html) - an interactive companion experience (
game/the-door.py/game/the-door.html) - audiobook helper scripts (
audiobook/) - validation and smoke-check automation (
scripts/+.gitea/workflows/)
This makes the repo best understood as a sovereign multimedia book production system centered on a novel.
Runtime-confirmed facts from direct verification:
scripts/build-verify.py --jsonpasses and reports 18 chapters- the verifier reports ~18,884 manuscript words in chapters and ~19,227 words in concatenated output
bash scripts/smoke.shpasses and successfully builds markdown/epub/htmlpython3 build/build.py --mdsucceedspython3 compile_all.py --checkcurrently crashes due a qrcode version lookup bug
Quick Facts
Repository composition from direct scan:
- 18 chapter manuscripts in
chapters/ - top-level content/support directories include:
chapters/build/website/audiobook/game/characters/worldbuilding/cover/music/
- primary code entrypoints are Python scripts plus a static HTML site
- no dedicated
tests/directory - validation is script-driven rather than unit-test-driven
Approximate non-output code inventory from pygount scan:
- ~3.6K lines of code-equivalent across Python/HTML/CSS/YAML/Bash/JSON
- code mass is concentrated in:
compile_all.pybuild/build.pycompile.pyscripts/build-verify.pywebsite/index.htmlgame/the-door.py
Architecture
flowchart TD
A[chapters/*.md] --> B[compile_markdown]
C[front-matter.md / build/frontmatter.md] --> B
D[back-matter.md / build/backmatter.md] --> B
E[build/metadata.yaml] --> F[pandoc/reportlab packaging]
G[book-style.css] --> F
H[cover/cover-art.jpg] --> F
B --> I[testament-complete.md]
I --> F
F --> J[testament.epub]
F --> K[testament.html]
F --> L[testament.pdf]
A --> M[compile_chapters_json / website/build-chapters.py]
M --> N[website/chapters.json]
I --> O[generate_manifest]
J --> O
K --> O
L --> O
N --> O
O --> P[build-manifest.json]
A --> Q[scripts/index_generator.py]
R[characters/*.md] --> Q
Q --> S[KNOWLEDGE_GRAPH.md]
A --> T[build/semantic_linker.py]
T --> U[build/cross_refs.json]
A --> V[audiobook/extract_text.py]
V --> W[text excerpts]
W --> X[audiobook/generate_samples.sh]
X --> Y[audiobook sample files]
Y --> Z[audiobook/create_manifest.py]
Z --> AA[audiobook/manifest.md]
AB[scripts/build-verify.py] --> A
AB --> I
AC[scripts/smoke.sh] --> AB
AD[.gitea workflows] --> AC
AE[website/index.html] --> AF[static landing/reading experience]
AG[game/the-door.py / game/the-door.html] --> AH[interactive companion artifact]
Entry Points
Primary build entrypoint
compile_all.py
This is the canonical unified pipeline. It builds:
- combined markdown
- EPUB
- HTML
website/chapters.jsonbuild-manifest.json
It also exposes:
--check--clean- format-specific flags (
--md,--epub,--pdf,--html,--json)
Legacy build entrypoints
build/build.pycompile.py
These overlap with the unified pipeline and still work as alternate build surfaces.
build/build.py is the more structured legacy path.
compile.py is a simpler older compiler that still shells out to scripts/index_generator.py before building.
Verification entrypoints
scripts/build-verify.pyscripts/smoke.sh.gitea/workflows/build.yml.gitea/workflows/smoke.yml.gitea/workflows/validate.yml
These form the repo’s test/CI surface. There are no unit tests; these scripts are the executable contract.
Website/content export entrypoints
website/build-chapters.pywebsite/index.html
build-chapters.py converts chapter markdown into HTML snippets inside website/chapters.json.
website/index.html is a large static HTML/CSS/JS page used as the web-facing presentation layer.
Audiobook entrypoints
audiobook/extract_text.pyaudiobook/create_manifest.pyaudiobook/generate_samples.sh
These scripts support excerpt extraction, sample generation, and audiobook manifest creation.
Companion/interactive entrypoints
game/the-door.pygame/the-door.html
These are sidecar experiences, not part of the core build pipeline, but they are part of the repo architecture.
Knowledge/indexing entrypoints
scripts/index_generator.pybuild/semantic_linker.py
These create graph-like auxiliary artifacts from the manuscript corpus.
Data Flow
Main book build flow
chapter markdown + front matter + back matter
↓
compile_markdown()
↓
combined manuscript: testament-complete.md
↓
format-specific compilers
├─ pandoc -> EPUB
├─ pandoc -> standalone HTML
├─ xelatex / weasyprint / reportlab -> PDF
└─ metadata/css/cover integrated where available
↓
optional output hashing
↓
build-manifest.json
Website/export flow
chapters/*.md
↓
website/build-chapters.py or compile_all.py::compile_chapters_json()
↓
extract heading + convert paragraphs/quotes/headings to HTML fragments
↓
website/chapters.json
Important nuance:
website/chapters.jsonis produced by the toolchain- current
website/index.htmlappears to be a static landing/presentation page - no direct
fetch('chapters.json')usage was found in the current website HTML
So the JSON output is a generated artifact for a web-reader/export path, but not obviously consumed by the checked-in landing page itself.
Verification flow
chapter files + required support files
↓
scripts/build-verify.py
├─ count files
├─ validate heading format
├─ compute word counts
├─ check markdown integrity
├─ concatenate outputs
└─ write build-report.json when asked
Knowledge graph / semantic link flow
characters/*.md + chapters/*.md
↓
scripts/index_generator.py
↓
KNOWLEDGE_GRAPH.md
chapters/*.md
↓
build/semantic_linker.py
↓
build/cross_refs.json
Audiobook flow
chapter markdown
↓
audiobook/extract_text.py
↓
trimmed text excerpt
↓
audiobook/generate_samples.sh
↓
audio sample files
↓
audiobook/create_manifest.py
↓
audiobook/manifest.md
Key Abstractions
1. Chapter corpus
The core domain object of the repo is the ordered chapter set:
chapters/chapter-01.md...chapters/chapter-18.md- exact numbering matters
- heading format matters
- concatenation order matters
Almost every script assumes this ordered corpus is the canonical source of truth.
2. Part boundaries (PARTS)
Both compile.py, build/build.py, and compile_all.py define a PARTS mapping.
This injects higher-level narrative structure into the build output by adding part headers and descriptions at fixed chapter boundaries.
3. Compiled manuscript
testament-complete.md is the normalized intermediate artifact.
It is the manuscript assembly layer from which downstream formats are built.
This is the closest thing the repo has to an internal IR (intermediate representation).
4. Multi-backend packaging
The build system supports multiple packaging backends:
- pandoc for EPUB and HTML
- xelatex for PDF when available
- weasyprint fallback
- reportlab fallback for fully local pure-Python PDF generation
This is a resilience pattern: the repo prefers multiple production paths rather than a single brittle dependency chain.
5. Manifested outputs
build-manifest.json stores output metadata and SHA256 checksums.
That turns built artifacts into auditable objects rather than opaque files.
6. Verification-as-tests
Because there is no tests/ suite, scripts/build-verify.py is effectively the main automated specification for integrity.
It asserts:
- chapter count
- naming/ordering
- heading format
- word-count sanity
- markdown integrity
- concatenation success
- required support files
7. Companion surfaces
The repo has non-manuscript presentation surfaces:
- static website
- interactive game/experience (
The Door) - audiobook assets and scripts
These make the repo a narrative system, not just a book build.
8. Knowledge graph / semantic linking
The repo contains lightweight symbolic tooling:
- regex-based character-to-chapter index generation
- capitalized-phrase cross-reference detection between chapters
This is a GOFAI-like layer over literary content.
API Surface
This repo’s API surface is mostly CLI-based rather than network-based.
Canonical CLI surface
compile_all.py
Commands:
python3 compile_all.pypython3 compile_all.py --mdpython3 compile_all.py --epubpython3 compile_all.py --pdfpython3 compile_all.py --htmlpython3 compile_all.py --jsonpython3 compile_all.py --checkpython3 compile_all.py --clean
Outputs:
testament-complete.mdtestament.epubtestament.htmltestament.pdfwebsite/chapters.jsonbuild-manifest.json
build/build.py
Commands:
python3 build/build.py --mdpython3 build/build.py --epubpython3 build/build.py --pdfpython3 build/build.py --html- default full build behavior
compile.py
Commands documented:
python3 compile.pypython3 compile.py --mdpython3 compile.py --epubpython3 compile.py --htmlpython3 compile.py --check
Observed quirk:
scripts/smoke.shcallspython3 compile.py --validate- no
--validatehandling exists in source - the script still exits 0 because
compile.pyignores unknown args and runs its default build path
That is a real contract quirk/drift worth remembering.
scripts/build-verify.py
Commands:
python3 scripts/build-verify.pypython3 scripts/build-verify.py --cipython3 scripts/build-verify.py --json
Other tooling
python3 website/build-chapters.pypython3 scripts/index_generator.pypython3 build/semantic_linker.pypython3 audiobook/extract_text.py <input.md> <output.txt>python3 audiobook/create_manifest.pybash audiobook/generate_samples.shbash scripts/smoke.shpython3 game/the-door.py
Data contracts
Chapter heading contract
build-verify.py expects each chapter to start with:
# Chapter N — Title
File naming contract
- chapter files must match
chapter-XX.md - exactly 18 chapters are expected by the verifier
Output manifest contract
build-manifest.json includes, per file:
- path
- size_bytes
- sha256
Website chapters JSON contract
Entries include:
numbertitlehtml
Test Coverage Gaps
Current state
There is no unit-test suite and no tests/ directory.
Coverage is currently provided by:
- shell smoke checks
- build verification script
- CI workflow checks
That means the repo has verification, but not isolated regression tests.
What is already covered by script-based checks
- chapter count and naming
- heading format
- minimum word-count sanity
- markdown delimiter/link integrity
- concatenation success
- required-file existence
- basic syntax parsing for Python/YAML/shell/JSON
- secret-pattern grep scanning
Highest-value missing tests
-
compile_all.pydependency-check behavior- there should be a regression test for
--check - current runtime already revealed a concrete failure when
qrcode.__version__is missing
- there should be a regression test for
-
compile_chapters_json()correctness- verify all 18 chapters are emitted
- verify blockquotes/headings/italics render as expected
- verify title extraction stays stable
-
Manifest generation
- verify
build-manifest.jsonincludes every built artifact actually present - verify sha256 and size fields are correct
- verify
-
Build backend selection
- verify fallback order for PDF generation behaves correctly when xelatex/weasyprint/reportlab availability changes
-
scripts/index_generator.py- verify character mention detection and markdown output determinism
-
build/semantic_linker.py- verify the proper-noun extraction and common-word filtering do not produce obviously bad edges
-
Website/output parity
- verify
website/chapters.jsonmatches chapter headings and ordering from source manuscripts
- verify
-
Companion experience smoke tests
game/the-door.pyhas no automated behavior coveragegame/the-door.htmlhas no structural or syntax verification
Recommended first tests
If this repo gets a tests/ directory, start here:
test_compile_all_check_does_not_crashtest_build_chapters_emits_18_ordered_entriestest_manifest_contains_existing_outputstest_build_verify_rejects_missing_chapter
Security Considerations
1. Shelling out to external toolchains
The build system uses subprocess execution for:
- pandoc
- xelatex
- weasyprint-related flows
- helper scripts
This is reasonable for a publishing repo, but it means path handling and shell assumptions matter.
2. Remote font dependency in website HTML
website/index.html imports Google Fonts via CSS @import.
That means the website is not fully sovereign/local-first at render time.
If strict offline/local hosting matters, font bundling would be required.
3. Secret scanning exists, but is grep-based
Both CI and scripts/smoke.sh perform simple pattern scanning.
That is better than nothing, but it is heuristic rather than structured secret detection.
4. Artifact integrity is a strength
build-manifest.json with SHA256 hashes is a strong integrity pattern.
It gives the repo a lightweight provenance layer for distributables.
5. Build check path currently has a reliability bug
Runtime-confirmed:
python3 compile_all.py --checkcrashes with:AttributeError: module 'qrcode' has no attribute '__version__'
This is not a remote exploit issue, but it is an operational integrity issue because the advertised safe preflight check is not robust.
Follow-up issue filed:
- the-testament #51
- Timmy_Foundation/the-testament#51
Drift / Contradictions
1. README vs runtime word count
README says:
- ~70,000 word target
- ~19,000 words drafted
Runtime verification says:
- ~18,884 words in chapter corpus
- ~19,227 words in concatenated output
This is close enough to be directionally aligned, but the verifier is the stronger factual source for current draft size.
2. compile_all.py --check is documented but currently broken
Documented behavior:
- dependency verification
Observed behavior:
- crashes on qrcode version lookup
3. scripts/smoke.sh depends on undocumented compile.py --validate
compile.pydocs do not list--validate- source contains no explicit
--validatepath - smoke still passes because the script ignores unknown flags and performs its default build path
This is a subtle contract mismatch.
4. website/chapters.json generation is present, but current website landing page does not appear to consume it directly
That suggests either:
- a future/planned reader path
- an external consumer
- or leftover infrastructure from an earlier website design
Practical Mental Model
Think of the-testament as three repos living inside one repository:
-
the manuscript repo
- chapters
- front/back matter
- worldbuilding
- character sheets
-
the publishing pipeline repo
- compile scripts
- verification scripts
- CI workflows
- manifest generation
-
the companion media repo
- website
- audiobook helpers
- interactive game experience
- soundtrack/cover assets
The connective tissue is the manuscript corpus. Almost everything else either:
- transforms it
- packages it
- validates it
- or re-presents it in another medium
Source Files of Highest Importance
-
compile_all.py- canonical unified pipeline
- best single source of repo architecture
-
scripts/build-verify.py- real executable quality contract
-
build/build.py- structured legacy builder still in active use
-
compile.py- older build entrypoint still referenced by smoke flow
-
website/index.html- primary web presentation artifact
-
website/build-chapters.py- chapter-to-web JSON transform
-
build/metadata.yaml- publication metadata contract
-
build/semantic_linker.py- symbolic/literary relationship extraction
Recommended Next Refactors
-
Make
compile_all.pythe only documented build entrypoint- de-emphasize or retire duplicated legacy flows once parity is confirmed
-
Add real regression tests around build helpers
- especially
compile_all.py --check - chapter JSON generation
- manifest generation
- especially
-
Clarify the role of
website/chapters.json- either wire it into the site, document its consumer, or remove the dead path
-
Fix the undocumented
compile.py --validatedependency in smoke- either implement the flag or stop invoking it
-
Decide whether the companion game and website should remain in the same repo or be treated as first-class subprojects with their own tests
Bottom Line
the-testament is a sovereign novel-production repo with a manuscript at the center and a light but real software system around it.
Its architecture is not application-server-centric. It is pipeline-centric:
- content in
- validated compilation
- multi-format outputs
- integrity metadata
- companion experiences around the text
The strongest technical asset is the layered publishing pipeline plus manuscript verification. The biggest weakness is the absence of dedicated regression tests around the build system itself.
Source basis for this genome:
- README and manuscript structure docs
- direct source inspection of
compile_all.py,build/build.py,compile.py, website/audiobook/indexing/verification scripts - runtime verification of build and validation commands
- repo scan of content/build/workflow layout