Files
timmy-home/the-testament-GENOME.md
Alexander Whitestone dda1e71029
Some checks failed
Smoke Test / smoke (pull_request) Failing after 16s
wip: add the-testament genome analysis for #675
2026-04-14 23:54:45 -04:00

19 KiB
Raw Blame History

GENOME.md — the-testament

Generated: 2026-04-15 Repo: Timmy_Foundation/the-testament Analysis issue: timmy-home #675


Project Overview

The Testament is not a conventional software repo and not just a manuscript dump. It is a hybrid publishing system with four layers:

  1. narrative source files
  2. build/packaging pipelines
  3. presentation surfaces
  4. verification/quality gates

At the content layer, the repo holds a five-part novel with 18 chapter manuscripts, front/back matter, character sheets, worldbuilding notes, cover copy, soundtrack notes, and other companion artifacts.

At the software layer, it ships a small publishing toolchain that compiles the manuscript into:

  • combined markdown
  • EPUB
  • HTML
  • PDF
  • web-reader JSON
  • checksum manifest

It also includes:

  • a static promotional/reader website (website/index.html)
  • an interactive companion experience (game/the-door.py / game/the-door.html)
  • audiobook helper scripts (audiobook/)
  • validation and smoke-check automation (scripts/ + .gitea/workflows/)

This makes the repo best understood as a sovereign multimedia book production system centered on a novel.

Runtime-confirmed facts from direct verification:

  • scripts/build-verify.py --json passes and reports 18 chapters
  • the verifier reports ~18,884 manuscript words in chapters and ~19,227 words in concatenated output
  • bash scripts/smoke.sh passes and successfully builds markdown/epub/html
  • python3 build/build.py --md succeeds
  • python3 compile_all.py --check currently crashes due a qrcode version lookup bug

Quick Facts

Repository composition from direct scan:

  • 18 chapter manuscripts in chapters/
  • top-level content/support directories include:
    • chapters/
    • build/
    • website/
    • audiobook/
    • game/
    • characters/
    • worldbuilding/
    • cover/
    • music/
  • primary code entrypoints are Python scripts plus a static HTML site
  • no dedicated tests/ directory
  • validation is script-driven rather than unit-test-driven

Approximate non-output code inventory from pygount scan:

  • ~3.6K lines of code-equivalent across Python/HTML/CSS/YAML/Bash/JSON
  • code mass is concentrated in:
    • compile_all.py
    • build/build.py
    • compile.py
    • scripts/build-verify.py
    • website/index.html
    • game/the-door.py

Architecture

flowchart TD
    A[chapters/*.md] --> B[compile_markdown]
    C[front-matter.md / build/frontmatter.md] --> B
    D[back-matter.md / build/backmatter.md] --> B
    E[build/metadata.yaml] --> F[pandoc/reportlab packaging]
    G[book-style.css] --> F
    H[cover/cover-art.jpg] --> F

    B --> I[testament-complete.md]
    I --> F

    F --> J[testament.epub]
    F --> K[testament.html]
    F --> L[testament.pdf]

    A --> M[compile_chapters_json / website/build-chapters.py]
    M --> N[website/chapters.json]

    I --> O[generate_manifest]
    J --> O
    K --> O
    L --> O
    N --> O
    O --> P[build-manifest.json]

    A --> Q[scripts/index_generator.py]
    R[characters/*.md] --> Q
    Q --> S[KNOWLEDGE_GRAPH.md]

    A --> T[build/semantic_linker.py]
    T --> U[build/cross_refs.json]

    A --> V[audiobook/extract_text.py]
    V --> W[text excerpts]
    W --> X[audiobook/generate_samples.sh]
    X --> Y[audiobook sample files]
    Y --> Z[audiobook/create_manifest.py]
    Z --> AA[audiobook/manifest.md]

    AB[scripts/build-verify.py] --> A
    AB --> I
    AC[scripts/smoke.sh] --> AB
    AD[.gitea workflows] --> AC

    AE[website/index.html] --> AF[static landing/reading experience]
    AG[game/the-door.py / game/the-door.html] --> AH[interactive companion artifact]

Entry Points

Primary build entrypoint

  1. compile_all.py

This is the canonical unified pipeline. It builds:

  • combined markdown
  • EPUB
  • PDF
  • HTML
  • website/chapters.json
  • build-manifest.json

It also exposes:

  • --check
  • --clean
  • format-specific flags (--md, --epub, --pdf, --html, --json)

Legacy build entrypoints

  1. build/build.py
  2. compile.py

These overlap with the unified pipeline and still work as alternate build surfaces. build/build.py is the more structured legacy path. compile.py is a simpler older compiler that still shells out to scripts/index_generator.py before building.

Verification entrypoints

  1. scripts/build-verify.py
  2. scripts/smoke.sh
  3. .gitea/workflows/build.yml
  4. .gitea/workflows/smoke.yml
  5. .gitea/workflows/validate.yml

These form the repos test/CI surface. There are no unit tests; these scripts are the executable contract.

Website/content export entrypoints

  1. website/build-chapters.py
  2. website/index.html

build-chapters.py converts chapter markdown into HTML snippets inside website/chapters.json. website/index.html is a large static HTML/CSS/JS page used as the web-facing presentation layer.

Audiobook entrypoints

  1. audiobook/extract_text.py
  2. audiobook/create_manifest.py
  3. audiobook/generate_samples.sh

These scripts support excerpt extraction, sample generation, and audiobook manifest creation.

Companion/interactive entrypoints

  1. game/the-door.py
  2. game/the-door.html

These are sidecar experiences, not part of the core build pipeline, but they are part of the repo architecture.

Knowledge/indexing entrypoints

  1. scripts/index_generator.py
  2. build/semantic_linker.py

These create graph-like auxiliary artifacts from the manuscript corpus.


Data Flow

Main book build flow

chapter markdown + front matter + back matter
    ↓
compile_markdown()
    ↓
combined manuscript: testament-complete.md
    ↓
format-specific compilers
    ├─ pandoc -> EPUB
    ├─ pandoc -> standalone HTML
    ├─ xelatex / weasyprint / reportlab -> PDF
    └─ metadata/css/cover integrated where available
    ↓
optional output hashing
    ↓
build-manifest.json

Website/export flow

chapters/*.md
    ↓
website/build-chapters.py or compile_all.py::compile_chapters_json()
    ↓
extract heading + convert paragraphs/quotes/headings to HTML fragments
    ↓
website/chapters.json

Important nuance:

  • website/chapters.json is produced by the toolchain
  • current website/index.html appears to be a static landing/presentation page
  • no direct fetch('chapters.json') usage was found in the current website HTML

So the JSON output is a generated artifact for a web-reader/export path, but not obviously consumed by the checked-in landing page itself.

Verification flow

chapter files + required support files
    ↓
scripts/build-verify.py
    ├─ count files
    ├─ validate heading format
    ├─ compute word counts
    ├─ check markdown integrity
    ├─ concatenate outputs
    └─ write build-report.json when asked
characters/*.md + chapters/*.md
    ↓
scripts/index_generator.py
    ↓
KNOWLEDGE_GRAPH.md

chapters/*.md
    ↓
build/semantic_linker.py
    ↓
build/cross_refs.json

Audiobook flow

chapter markdown
    ↓
audiobook/extract_text.py
    ↓
trimmed text excerpt
    ↓
audiobook/generate_samples.sh
    ↓
audio sample files
    ↓
audiobook/create_manifest.py
    ↓
audiobook/manifest.md

Key Abstractions

1. Chapter corpus

The core domain object of the repo is the ordered chapter set:

  • chapters/chapter-01.md ... chapters/chapter-18.md
  • exact numbering matters
  • heading format matters
  • concatenation order matters

Almost every script assumes this ordered corpus is the canonical source of truth.

2. Part boundaries (PARTS)

Both compile.py, build/build.py, and compile_all.py define a PARTS mapping. This injects higher-level narrative structure into the build output by adding part headers and descriptions at fixed chapter boundaries.

3. Compiled manuscript

testament-complete.md is the normalized intermediate artifact. It is the manuscript assembly layer from which downstream formats are built.

This is the closest thing the repo has to an internal IR (intermediate representation).

4. Multi-backend packaging

The build system supports multiple packaging backends:

  • pandoc for EPUB and HTML
  • xelatex for PDF when available
  • weasyprint fallback
  • reportlab fallback for fully local pure-Python PDF generation

This is a resilience pattern: the repo prefers multiple production paths rather than a single brittle dependency chain.

5. Manifested outputs

build-manifest.json stores output metadata and SHA256 checksums. That turns built artifacts into auditable objects rather than opaque files.

6. Verification-as-tests

Because there is no tests/ suite, scripts/build-verify.py is effectively the main automated specification for integrity. It asserts:

  • chapter count
  • naming/ordering
  • heading format
  • word-count sanity
  • markdown integrity
  • concatenation success
  • required support files

7. Companion surfaces

The repo has non-manuscript presentation surfaces:

  • static website
  • interactive game/experience (The Door)
  • audiobook assets and scripts

These make the repo a narrative system, not just a book build.

8. Knowledge graph / semantic linking

The repo contains lightweight symbolic tooling:

  • regex-based character-to-chapter index generation
  • capitalized-phrase cross-reference detection between chapters

This is a GOFAI-like layer over literary content.


API Surface

This repos API surface is mostly CLI-based rather than network-based.

Canonical CLI surface

compile_all.py

Commands:

  • python3 compile_all.py
  • python3 compile_all.py --md
  • python3 compile_all.py --epub
  • python3 compile_all.py --pdf
  • python3 compile_all.py --html
  • python3 compile_all.py --json
  • python3 compile_all.py --check
  • python3 compile_all.py --clean

Outputs:

  • testament-complete.md
  • testament.epub
  • testament.html
  • testament.pdf
  • website/chapters.json
  • build-manifest.json

build/build.py

Commands:

  • python3 build/build.py --md
  • python3 build/build.py --epub
  • python3 build/build.py --pdf
  • python3 build/build.py --html
  • default full build behavior

compile.py

Commands documented:

  • python3 compile.py
  • python3 compile.py --md
  • python3 compile.py --epub
  • python3 compile.py --html
  • python3 compile.py --check

Observed quirk:

  • scripts/smoke.sh calls python3 compile.py --validate
  • no --validate handling exists in source
  • the script still exits 0 because compile.py ignores unknown args and runs its default build path

That is a real contract quirk/drift worth remembering.

scripts/build-verify.py

Commands:

  • python3 scripts/build-verify.py
  • python3 scripts/build-verify.py --ci
  • python3 scripts/build-verify.py --json

Other tooling

  • python3 website/build-chapters.py
  • python3 scripts/index_generator.py
  • python3 build/semantic_linker.py
  • python3 audiobook/extract_text.py <input.md> <output.txt>
  • python3 audiobook/create_manifest.py
  • bash audiobook/generate_samples.sh
  • bash scripts/smoke.sh
  • python3 game/the-door.py

Data contracts

Chapter heading contract

build-verify.py expects each chapter to start with:

  • # Chapter N — Title

File naming contract

  • chapter files must match chapter-XX.md
  • exactly 18 chapters are expected by the verifier

Output manifest contract

build-manifest.json includes, per file:

  • path
  • size_bytes
  • sha256

Website chapters JSON contract

Entries include:

  • number
  • title
  • html

Test Coverage Gaps

Current state

There is no unit-test suite and no tests/ directory. Coverage is currently provided by:

  • shell smoke checks
  • build verification script
  • CI workflow checks

That means the repo has verification, but not isolated regression tests.

What is already covered by script-based checks

  • chapter count and naming
  • heading format
  • minimum word-count sanity
  • markdown delimiter/link integrity
  • concatenation success
  • required-file existence
  • basic syntax parsing for Python/YAML/shell/JSON
  • secret-pattern grep scanning

Highest-value missing tests

  1. compile_all.py dependency-check behavior

    • there should be a regression test for --check
    • current runtime already revealed a concrete failure when qrcode.__version__ is missing
  2. compile_chapters_json() correctness

    • verify all 18 chapters are emitted
    • verify blockquotes/headings/italics render as expected
    • verify title extraction stays stable
  3. Manifest generation

    • verify build-manifest.json includes every built artifact actually present
    • verify sha256 and size fields are correct
  4. Build backend selection

    • verify fallback order for PDF generation behaves correctly when xelatex/weasyprint/reportlab availability changes
  5. scripts/index_generator.py

    • verify character mention detection and markdown output determinism
  6. build/semantic_linker.py

    • verify the proper-noun extraction and common-word filtering do not produce obviously bad edges
  7. Website/output parity

    • verify website/chapters.json matches chapter headings and ordering from source manuscripts
  8. Companion experience smoke tests

    • game/the-door.py has no automated behavior coverage
    • game/the-door.html has no structural or syntax verification

If this repo gets a tests/ directory, start here:

  1. test_compile_all_check_does_not_crash
  2. test_build_chapters_emits_18_ordered_entries
  3. test_manifest_contains_existing_outputs
  4. test_build_verify_rejects_missing_chapter

Security Considerations

1. Shelling out to external toolchains

The build system uses subprocess execution for:

  • pandoc
  • xelatex
  • weasyprint-related flows
  • helper scripts

This is reasonable for a publishing repo, but it means path handling and shell assumptions matter.

2. Remote font dependency in website HTML

website/index.html imports Google Fonts via CSS @import. That means the website is not fully sovereign/local-first at render time. If strict offline/local hosting matters, font bundling would be required.

3. Secret scanning exists, but is grep-based

Both CI and scripts/smoke.sh perform simple pattern scanning. That is better than nothing, but it is heuristic rather than structured secret detection.

4. Artifact integrity is a strength

build-manifest.json with SHA256 hashes is a strong integrity pattern. It gives the repo a lightweight provenance layer for distributables.

5. Build check path currently has a reliability bug

Runtime-confirmed:

  • python3 compile_all.py --check crashes with:
    • AttributeError: module 'qrcode' has no attribute '__version__'

This is not a remote exploit issue, but it is an operational integrity issue because the advertised safe preflight check is not robust.

Follow-up issue filed:


Drift / Contradictions

1. README vs runtime word count

README says:

  • ~70,000 word target
  • ~19,000 words drafted

Runtime verification says:

  • ~18,884 words in chapter corpus
  • ~19,227 words in concatenated output

This is close enough to be directionally aligned, but the verifier is the stronger factual source for current draft size.

2. compile_all.py --check is documented but currently broken

Documented behavior:

  • dependency verification

Observed behavior:

  • crashes on qrcode version lookup

3. scripts/smoke.sh depends on undocumented compile.py --validate

  • compile.py docs do not list --validate
  • source contains no explicit --validate path
  • smoke still passes because the script ignores unknown flags and performs its default build path

This is a subtle contract mismatch.

4. website/chapters.json generation is present, but current website landing page does not appear to consume it directly

That suggests either:

  • a future/planned reader path
  • an external consumer
  • or leftover infrastructure from an earlier website design

Practical Mental Model

Think of the-testament as three repos living inside one repository:

  1. the manuscript repo

    • chapters
    • front/back matter
    • worldbuilding
    • character sheets
  2. the publishing pipeline repo

    • compile scripts
    • verification scripts
    • CI workflows
    • manifest generation
  3. the companion media repo

    • website
    • audiobook helpers
    • interactive game experience
    • soundtrack/cover assets

The connective tissue is the manuscript corpus. Almost everything else either:

  • transforms it
  • packages it
  • validates it
  • or re-presents it in another medium

Source Files of Highest Importance

  1. compile_all.py

    • canonical unified pipeline
    • best single source of repo architecture
  2. scripts/build-verify.py

    • real executable quality contract
  3. build/build.py

    • structured legacy builder still in active use
  4. compile.py

    • older build entrypoint still referenced by smoke flow
  5. website/index.html

    • primary web presentation artifact
  6. website/build-chapters.py

    • chapter-to-web JSON transform
  7. build/metadata.yaml

    • publication metadata contract
  8. build/semantic_linker.py

    • symbolic/literary relationship extraction

  1. Make compile_all.py the only documented build entrypoint

    • de-emphasize or retire duplicated legacy flows once parity is confirmed
  2. Add real regression tests around build helpers

    • especially compile_all.py --check
    • chapter JSON generation
    • manifest generation
  3. Clarify the role of website/chapters.json

    • either wire it into the site, document its consumer, or remove the dead path
  4. Fix the undocumented compile.py --validate dependency in smoke

    • either implement the flag or stop invoking it
  5. Decide whether the companion game and website should remain in the same repo or be treated as first-class subprojects with their own tests


Bottom Line

the-testament is a sovereign novel-production repo with a manuscript at the center and a light but real software system around it.

Its architecture is not application-server-centric. It is pipeline-centric:

  • content in
  • validated compilation
  • multi-format outputs
  • integrity metadata
  • companion experiences around the text

The strongest technical asset is the layered publishing pipeline plus manuscript verification. The biggest weakness is the absence of dedicated regression tests around the build system itself.

Source basis for this genome:

  • README and manuscript structure docs
  • direct source inspection of compile_all.py, build/build.py, compile.py, website/audiobook/indexing/verification scripts
  • runtime verification of build and validation commands
  • repo scan of content/build/workflow layout