Timmy_Foundation/timmy-home

Fork 0

Files

Alexander Whitestone dda1e71029

Smoke Test / smoke (pull_request) Failing after 16s

Details

wip: add the-testament genome analysis for #675

2026-04-14 23:54:45 -04:00

19 KiB

Raw Blame History

GENOME.md — the-testament

Generated: 2026-04-15 Repo: Timmy_Foundation/the-testament Analysis issue: timmy-home #675

Project Overview

The Testament is not a conventional software repo and not just a manuscript dump. It is a hybrid publishing system with four layers:

narrative source files
build/packaging pipelines
presentation surfaces
verification/quality gates

At the content layer, the repo holds a five-part novel with 18 chapter manuscripts, front/back matter, character sheets, worldbuilding notes, cover copy, soundtrack notes, and other companion artifacts.

At the software layer, it ships a small publishing toolchain that compiles the manuscript into:

combined markdown
EPUB
HTML
PDF
web-reader JSON
checksum manifest

It also includes:

a static promotional/reader website (website/index.html)
an interactive companion experience (game/the-door.py / game/the-door.html)
audiobook helper scripts (audiobook/)
validation and smoke-check automation (scripts/ + .gitea/workflows/)

This makes the repo best understood as a sovereign multimedia book production system centered on a novel.

Runtime-confirmed facts from direct verification:

scripts/build-verify.py --json passes and reports 18 chapters
the verifier reports ~18,884 manuscript words in chapters and ~19,227 words in concatenated output
bash scripts/smoke.sh passes and successfully builds markdown/epub/html
python3 build/build.py --md succeeds
python3 compile_all.py --check currently crashes due a qrcode version lookup bug

Quick Facts

Repository composition from direct scan:

18 chapter manuscripts in chapters/
top-level content/support directories include:
- chapters/
- build/
- website/
- audiobook/
- game/
- characters/
- worldbuilding/
- cover/
- music/
primary code entrypoints are Python scripts plus a static HTML site
no dedicated tests/ directory
validation is script-driven rather than unit-test-driven

Approximate non-output code inventory from pygount scan:

~3.6K lines of code-equivalent across Python/HTML/CSS/YAML/Bash/JSON
code mass is concentrated in:
- compile_all.py
- build/build.py
- compile.py
- scripts/build-verify.py
- website/index.html
- game/the-door.py

Architecture

flowchart TD
    A[chapters/*.md] --> B[compile_markdown]
    C[front-matter.md / build/frontmatter.md] --> B
    D[back-matter.md / build/backmatter.md] --> B
    E[build/metadata.yaml] --> F[pandoc/reportlab packaging]
    G[book-style.css] --> F
    H[cover/cover-art.jpg] --> F

    B --> I[testament-complete.md]
    I --> F

    F --> J[testament.epub]
    F --> K[testament.html]
    F --> L[testament.pdf]

    A --> M[compile_chapters_json / website/build-chapters.py]
    M --> N[website/chapters.json]

    I --> O[generate_manifest]
    J --> O
    K --> O
    L --> O
    N --> O
    O --> P[build-manifest.json]

    A --> Q[scripts/index_generator.py]
    R[characters/*.md] --> Q
    Q --> S[KNOWLEDGE_GRAPH.md]

    A --> T[build/semantic_linker.py]
    T --> U[build/cross_refs.json]

    A --> V[audiobook/extract_text.py]
    V --> W[text excerpts]
    W --> X[audiobook/generate_samples.sh]
    X --> Y[audiobook sample files]
    Y --> Z[audiobook/create_manifest.py]
    Z --> AA[audiobook/manifest.md]

    AB[scripts/build-verify.py] --> A
    AB --> I
    AC[scripts/smoke.sh] --> AB
    AD[.gitea workflows] --> AC

    AE[website/index.html] --> AF[static landing/reading experience]
    AG[game/the-door.py / game/the-door.html] --> AH[interactive companion artifact]

Entry Points

Primary build entrypoint

compile_all.py

This is the canonical unified pipeline. It builds:

combined markdown
EPUB
PDF
HTML
website/chapters.json
build-manifest.json

It also exposes:

--check
--clean
format-specific flags (--md, --epub, --pdf, --html, --json)

Legacy build entrypoints

build/build.py
compile.py

These overlap with the unified pipeline and still work as alternate build surfaces. build/build.py is the more structured legacy path. compile.py is a simpler older compiler that still shells out to scripts/index_generator.py before building.

Verification entrypoints

scripts/build-verify.py
scripts/smoke.sh
.gitea/workflows/build.yml
.gitea/workflows/smoke.yml
.gitea/workflows/validate.yml

These form the repo’s test/CI surface. There are no unit tests; these scripts are the executable contract.

Website/content export entrypoints

website/build-chapters.py
website/index.html

build-chapters.py converts chapter markdown into HTML snippets inside website/chapters.json. website/index.html is a large static HTML/CSS/JS page used as the web-facing presentation layer.

Audiobook entrypoints

audiobook/extract_text.py
audiobook/create_manifest.py
audiobook/generate_samples.sh

These scripts support excerpt extraction, sample generation, and audiobook manifest creation.

Companion/interactive entrypoints

game/the-door.py
game/the-door.html

These are sidecar experiences, not part of the core build pipeline, but they are part of the repo architecture.

Knowledge/indexing entrypoints

scripts/index_generator.py
build/semantic_linker.py

These create graph-like auxiliary artifacts from the manuscript corpus.

Data Flow

Main book build flow

chapter markdown + front matter + back matter
    ↓
compile_markdown()
    ↓
combined manuscript: testament-complete.md
    ↓
format-specific compilers
    ├─ pandoc -> EPUB
    ├─ pandoc -> standalone HTML
    ├─ xelatex / weasyprint / reportlab -> PDF
    └─ metadata/css/cover integrated where available
    ↓
optional output hashing
    ↓
build-manifest.json

Website/export flow

chapters/*.md
    ↓
website/build-chapters.py or compile_all.py::compile_chapters_json()
    ↓
extract heading + convert paragraphs/quotes/headings to HTML fragments
    ↓
website/chapters.json

Important nuance:

website/chapters.json is produced by the toolchain
current website/index.html appears to be a static landing/presentation page
no direct fetch('chapters.json') usage was found in the current website HTML

So the JSON output is a generated artifact for a web-reader/export path, but not obviously consumed by the checked-in landing page itself.

Verification flow

chapter files + required support files
    ↓
scripts/build-verify.py
    ├─ count files
    ├─ validate heading format
    ├─ compute word counts
    ├─ check markdown integrity
    ├─ concatenate outputs
    └─ write build-report.json when asked

Knowledge graph / semantic link flow

characters/*.md + chapters/*.md
    ↓
scripts/index_generator.py
    ↓
KNOWLEDGE_GRAPH.md

chapters/*.md
    ↓
build/semantic_linker.py
    ↓
build/cross_refs.json

Audiobook flow

chapter markdown
    ↓
audiobook/extract_text.py
    ↓
trimmed text excerpt
    ↓
audiobook/generate_samples.sh
    ↓
audio sample files
    ↓
audiobook/create_manifest.py
    ↓
audiobook/manifest.md

Key Abstractions

1. Chapter corpus

The core domain object of the repo is the ordered chapter set:

chapters/chapter-01.md ... chapters/chapter-18.md
exact numbering matters
heading format matters
concatenation order matters

Almost every script assumes this ordered corpus is the canonical source of truth.

2. Part boundaries (`PARTS`)

Both compile.py, build/build.py, and compile_all.py define a PARTS mapping. This injects higher-level narrative structure into the build output by adding part headers and descriptions at fixed chapter boundaries.

3. Compiled manuscript

testament-complete.md is the normalized intermediate artifact. It is the manuscript assembly layer from which downstream formats are built.

This is the closest thing the repo has to an internal IR (intermediate representation).

4. Multi-backend packaging

The build system supports multiple packaging backends:

pandoc for EPUB and HTML
xelatex for PDF when available
weasyprint fallback
reportlab fallback for fully local pure-Python PDF generation

This is a resilience pattern: the repo prefers multiple production paths rather than a single brittle dependency chain.

5. Manifested outputs

build-manifest.json stores output metadata and SHA256 checksums. That turns built artifacts into auditable objects rather than opaque files.

6. Verification-as-tests

Because there is no tests/ suite, scripts/build-verify.py is effectively the main automated specification for integrity. It asserts:

chapter count
naming/ordering
heading format
word-count sanity
markdown integrity
concatenation success
required support files

7. Companion surfaces

The repo has non-manuscript presentation surfaces:

static website
interactive game/experience (The Door)
audiobook assets and scripts

These make the repo a narrative system, not just a book build.

8. Knowledge graph / semantic linking

The repo contains lightweight symbolic tooling:

regex-based character-to-chapter index generation
capitalized-phrase cross-reference detection between chapters

This is a GOFAI-like layer over literary content.

API Surface

This repo’s API surface is mostly CLI-based rather than network-based.

Canonical CLI surface

`compile_all.py`

Commands:

python3 compile_all.py
python3 compile_all.py --md
python3 compile_all.py --epub
python3 compile_all.py --pdf
python3 compile_all.py --html
python3 compile_all.py --json
python3 compile_all.py --check
python3 compile_all.py --clean

Outputs:

testament-complete.md
testament.epub
testament.html
testament.pdf
website/chapters.json
build-manifest.json

`build/build.py`

Commands:

python3 build/build.py --md
python3 build/build.py --epub
python3 build/build.py --pdf
python3 build/build.py --html
default full build behavior

`compile.py`

Commands documented:

python3 compile.py
python3 compile.py --md
python3 compile.py --epub
python3 compile.py --html
python3 compile.py --check

Observed quirk:

scripts/smoke.sh calls python3 compile.py --validate
no --validate handling exists in source
the script still exits 0 because compile.py ignores unknown args and runs its default build path

That is a real contract quirk/drift worth remembering.

`scripts/build-verify.py`

Commands:

python3 scripts/build-verify.py
python3 scripts/build-verify.py --ci
python3 scripts/build-verify.py --json

Other tooling

python3 website/build-chapters.py
python3 scripts/index_generator.py
python3 build/semantic_linker.py
python3 audiobook/extract_text.py <input.md> <output.txt>
python3 audiobook/create_manifest.py
bash audiobook/generate_samples.sh
bash scripts/smoke.sh
python3 game/the-door.py

Data contracts

Chapter heading contract

build-verify.py expects each chapter to start with:

# Chapter N — Title

File naming contract

chapter files must match chapter-XX.md
exactly 18 chapters are expected by the verifier

Output manifest contract

build-manifest.json includes, per file:

path
size_bytes
sha256

Website chapters JSON contract

Entries include:

number
title
html

Test Coverage Gaps

Current state

There is no unit-test suite and no tests/ directory. Coverage is currently provided by:

shell smoke checks
build verification script
CI workflow checks

That means the repo has verification, but not isolated regression tests.

What is already covered by script-based checks

chapter count and naming
heading format
minimum word-count sanity
markdown delimiter/link integrity
concatenation success
required-file existence
basic syntax parsing for Python/YAML/shell/JSON
secret-pattern grep scanning

Highest-value missing tests

compile_all.py dependency-check behavior
- there should be a regression test for --check
- current runtime already revealed a concrete failure when qrcode.__version__ is missing
compile_chapters_json() correctness
- verify all 18 chapters are emitted
- verify blockquotes/headings/italics render as expected
- verify title extraction stays stable
Manifest generation
- verify build-manifest.json includes every built artifact actually present
- verify sha256 and size fields are correct
Build backend selection
- verify fallback order for PDF generation behaves correctly when xelatex/weasyprint/reportlab availability changes
scripts/index_generator.py
- verify character mention detection and markdown output determinism
build/semantic_linker.py
- verify the proper-noun extraction and common-word filtering do not produce obviously bad edges
Website/output parity
- verify website/chapters.json matches chapter headings and ordering from source manuscripts
Companion experience smoke tests
- game/the-door.py has no automated behavior coverage
- game/the-door.html has no structural or syntax verification

Recommended first tests

If this repo gets a tests/ directory, start here:

test_compile_all_check_does_not_crash
test_build_chapters_emits_18_ordered_entries
test_manifest_contains_existing_outputs
test_build_verify_rejects_missing_chapter

Security Considerations

1. Shelling out to external toolchains

The build system uses subprocess execution for:

pandoc
xelatex
weasyprint-related flows
helper scripts

This is reasonable for a publishing repo, but it means path handling and shell assumptions matter.

2. Remote font dependency in website HTML

website/index.html imports Google Fonts via CSS @import. That means the website is not fully sovereign/local-first at render time. If strict offline/local hosting matters, font bundling would be required.

3. Secret scanning exists, but is grep-based

Both CI and scripts/smoke.sh perform simple pattern scanning. That is better than nothing, but it is heuristic rather than structured secret detection.

4. Artifact integrity is a strength

build-manifest.json with SHA256 hashes is a strong integrity pattern. It gives the repo a lightweight provenance layer for distributables.

5. Build check path currently has a reliability bug

Runtime-confirmed:

python3 compile_all.py --check crashes with:
- AttributeError: module 'qrcode' has no attribute '__version__'

This is not a remote exploit issue, but it is an operational integrity issue because the advertised safe preflight check is not robust.

Follow-up issue filed:

the-testament #51
Timmy_Foundation/the-testament#51

Drift / Contradictions

1. README vs runtime word count

README says:

~70,000 word target
~19,000 words drafted

Runtime verification says:

~18,884 words in chapter corpus
~19,227 words in concatenated output

This is close enough to be directionally aligned, but the verifier is the stronger factual source for current draft size.

2. `compile_all.py --check` is documented but currently broken

Documented behavior:

dependency verification

Observed behavior:

crashes on qrcode version lookup

3. `scripts/smoke.sh` depends on undocumented `compile.py --validate`

compile.py docs do not list --validate
source contains no explicit --validate path
smoke still passes because the script ignores unknown flags and performs its default build path

This is a subtle contract mismatch.

4. `website/chapters.json` generation is present, but current website landing page does not appear to consume it directly

That suggests either:

a future/planned reader path
an external consumer
or leftover infrastructure from an earlier website design

Practical Mental Model

Think of the-testament as three repos living inside one repository:

the manuscript repo
- chapters
- front/back matter
- worldbuilding
- character sheets
the publishing pipeline repo
- compile scripts
- verification scripts
- CI workflows
- manifest generation
the companion media repo
- website
- audiobook helpers
- interactive game experience
- soundtrack/cover assets

The connective tissue is the manuscript corpus. Almost everything else either:

transforms it
packages it
validates it
or re-presents it in another medium

Source Files of Highest Importance

compile_all.py
- canonical unified pipeline
- best single source of repo architecture
scripts/build-verify.py
- real executable quality contract
build/build.py
- structured legacy builder still in active use
compile.py
- older build entrypoint still referenced by smoke flow
website/index.html
- primary web presentation artifact
website/build-chapters.py
- chapter-to-web JSON transform
build/metadata.yaml
- publication metadata contract
build/semantic_linker.py
- symbolic/literary relationship extraction

Recommended Next Refactors

Make compile_all.py the only documented build entrypoint
- de-emphasize or retire duplicated legacy flows once parity is confirmed
Add real regression tests around build helpers
- especially compile_all.py --check
- chapter JSON generation
- manifest generation
Clarify the role of website/chapters.json
- either wire it into the site, document its consumer, or remove the dead path
Fix the undocumented compile.py --validate dependency in smoke
- either implement the flag or stop invoking it
Decide whether the companion game and website should remain in the same repo or be treated as first-class subprojects with their own tests

Bottom Line

the-testament is a sovereign novel-production repo with a manuscript at the center and a light but real software system around it.

Its architecture is not application-server-centric. It is pipeline-centric:

content in
validated compilation
multi-format outputs
integrity metadata
companion experiences around the text

The strongest technical asset is the layered publishing pipeline plus manuscript verification. The biggest weakness is the absence of dedicated regression tests around the build system itself.

Source basis for this genome:

README and manuscript structure docs
direct source inspection of compile_all.py, build/build.py, compile.py, website/audiobook/indexing/verification scripts
runtime verification of build and validation commands
repo scan of content/build/workflow layout

19 KiB Raw Blame History Unescape Escape

GENOME.md — the-testament

Project Overview

Quick Facts

Architecture

Entry Points

Primary build entrypoint

Legacy build entrypoints

Verification entrypoints

Website/content export entrypoints

Audiobook entrypoints

Companion/interactive entrypoints

Knowledge/indexing entrypoints

Data Flow

Main book build flow

Website/export flow

Verification flow

Knowledge graph / semantic link flow

Audiobook flow

Key Abstractions

1. Chapter corpus

2. Part boundaries (PARTS)

3. Compiled manuscript

4. Multi-backend packaging

5. Manifested outputs

6. Verification-as-tests

7. Companion surfaces

8. Knowledge graph / semantic linking

API Surface

Canonical CLI surface

compile_all.py

build/build.py

compile.py

scripts/build-verify.py

Other tooling

Data contracts

Chapter heading contract

File naming contract

Output manifest contract

Website chapters JSON contract

Test Coverage Gaps

Current state

What is already covered by script-based checks

Highest-value missing tests

Recommended first tests

Security Considerations

1. Shelling out to external toolchains

2. Remote font dependency in website HTML

3. Secret scanning exists, but is grep-based

4. Artifact integrity is a strength

5. Build check path currently has a reliability bug

Drift / Contradictions

1. README vs runtime word count

2. compile_all.py --check is documented but currently broken

3. scripts/smoke.sh depends on undocumented compile.py --validate

4. website/chapters.json generation is present, but current website landing page does not appear to consume it directly

Practical Mental Model

Source Files of Highest Importance

Recommended Next Refactors

Bottom Line

19 KiB

Raw Blame History

2. Part boundaries (`PARTS`)

`compile_all.py`

`build/build.py`

`compile.py`

`scripts/build-verify.py`

2. `compile_all.py --check` is documented but currently broken

3. `scripts/smoke.sh` depends on undocumented `compile.py --validate`

4. `website/chapters.json` generation is present, but current website landing page does not appear to consume it directly