Files
timmy-home/docs/KNOW_THY_FATHER_MULTIMODAL_PIPELINE.md
Alexander Whitestone 89dfa1e5de
Some checks failed
Smoke Test / smoke (pull_request) Failing after 23s
feat: add Know Thy Father epic orchestrator (#582)
2026-04-15 01:52:58 -04:00

3.5 KiB

Know Thy Father — Multimodal Media Consumption Pipeline

Refs #582

This document makes the epic operational by naming the current source-of-truth scripts, their handoff artifacts, and the one-command runner that coordinates them.

Why this exists

The epic is already decomposed into four implemented phases, but the implementation truth is split across two script roots:

  • scripts/know_thy_father/ owns Phases 1, 3, and 4
  • scripts/twitter_archive/analyze_media.py owns Phase 2
  • twitter-archive/know-thy-father/tracker.py report owns the operator-facing status rollup

The new runner scripts/know_thy_father/epic_pipeline.py does not replace those scripts. It stitches them together into one explicit, reviewable plan.

Phase map

Phase Script Primary output
1. Media Indexing scripts/know_thy_father/index_media.py twitter-archive/know-thy-father/media_manifest.jsonl
2. Multimodal Analysis scripts/twitter_archive/analyze_media.py --batch 10 twitter-archive/know-thy-father/analysis.jsonl + meaning-kernels.jsonl + pipeline-status.json
3. Holographic Synthesis scripts/know_thy_father/synthesize_kernels.py twitter-archive/knowledge/fathers_ledger.jsonl
4. Cross-Reference Audit scripts/know_thy_father/crossref_audit.py twitter-archive/notes/crossref_report.md
5. Processing Log twitter-archive/know-thy-father/tracker.py report twitter-archive/know-thy-father/REPORT.md

One command per phase

python3 scripts/know_thy_father/index_media.py --tweets twitter-archive/extracted/tweets.jsonl --output twitter-archive/know-thy-father/media_manifest.jsonl
python3 scripts/twitter_archive/analyze_media.py --batch 10
python3 scripts/know_thy_father/synthesize_kernels.py --input twitter-archive/media/manifest.jsonl --output twitter-archive/knowledge/fathers_ledger.jsonl --summary twitter-archive/knowledge/fathers_ledger.summary.json
python3 scripts/know_thy_father/crossref_audit.py --soul SOUL.md --kernels twitter-archive/notes/know_thy_father_crossref.md --output twitter-archive/notes/crossref_report.md
python3 twitter-archive/know-thy-father/tracker.py report

Runner commands

# Print the orchestrated plan
python3 scripts/know_thy_father/epic_pipeline.py

# JSON status snapshot of scripts + known artifact paths
python3 scripts/know_thy_father/epic_pipeline.py --status --json

# Execute one concrete step
python3 scripts/know_thy_father/epic_pipeline.py --run-step phase2_multimodal_analysis --batch-size 10

Source-truth notes

  • Phase 2 already contains its own kernel extraction path (--extract-kernels) and status output. The epic runner does not reimplement that logic.
  • Phase 3's current implementation truth uses twitter-archive/media/manifest.jsonl as its default input. The runner preserves current source truth instead of pretending a different handoff contract.
  • The processing log in twitter-archive/know-thy-father/PROCESSING_LOG.md can drift from current code reality. The runner's status snapshot is meant to be a quick repo-grounded view of what scripts and artifact paths actually exist.

What this PR does not claim

  • It does not claim the local archive has been fully consumed.
  • It does not claim the halted processing log has been resumed.
  • It does not claim fact_store ingestion has been fully wired end-to-end.

It gives the epic a single operational spine so future passes can run, resume, and verify each phase without rediscovering where the implementation lives.