3.5 KiB
3.5 KiB
Know Thy Father — Multimodal Media Consumption Pipeline
Refs #582
This document makes the epic operational by naming the current source-of-truth scripts, their handoff artifacts, and the one-command runner that coordinates them.
Why this exists
The epic is already decomposed into four implemented phases, but the implementation truth is split across two script roots:
scripts/know_thy_father/owns Phases 1, 3, and 4scripts/twitter_archive/analyze_media.pyowns Phase 2twitter-archive/know-thy-father/tracker.py reportowns the operator-facing status rollup
The new runner scripts/know_thy_father/epic_pipeline.py does not replace those scripts. It stitches them together into one explicit, reviewable plan.
Phase map
| Phase | Script | Primary output |
|---|---|---|
| 1. Media Indexing | scripts/know_thy_father/index_media.py |
twitter-archive/know-thy-father/media_manifest.jsonl |
| 2. Multimodal Analysis | scripts/twitter_archive/analyze_media.py --batch 10 |
twitter-archive/know-thy-father/analysis.jsonl + meaning-kernels.jsonl + pipeline-status.json |
| 3. Holographic Synthesis | scripts/know_thy_father/synthesize_kernels.py |
twitter-archive/knowledge/fathers_ledger.jsonl |
| 4. Cross-Reference Audit | scripts/know_thy_father/crossref_audit.py |
twitter-archive/notes/crossref_report.md |
| 5. Processing Log | twitter-archive/know-thy-father/tracker.py report |
twitter-archive/know-thy-father/REPORT.md |
One command per phase
python3 scripts/know_thy_father/index_media.py --tweets twitter-archive/extracted/tweets.jsonl --output twitter-archive/know-thy-father/media_manifest.jsonl
python3 scripts/twitter_archive/analyze_media.py --batch 10
python3 scripts/know_thy_father/synthesize_kernels.py --input twitter-archive/media/manifest.jsonl --output twitter-archive/knowledge/fathers_ledger.jsonl --summary twitter-archive/knowledge/fathers_ledger.summary.json
python3 scripts/know_thy_father/crossref_audit.py --soul SOUL.md --kernels twitter-archive/notes/know_thy_father_crossref.md --output twitter-archive/notes/crossref_report.md
python3 twitter-archive/know-thy-father/tracker.py report
Runner commands
# Print the orchestrated plan
python3 scripts/know_thy_father/epic_pipeline.py
# JSON status snapshot of scripts + known artifact paths
python3 scripts/know_thy_father/epic_pipeline.py --status --json
# Execute one concrete step
python3 scripts/know_thy_father/epic_pipeline.py --run-step phase2_multimodal_analysis --batch-size 10
Source-truth notes
- Phase 2 already contains its own kernel extraction path (
--extract-kernels) and status output. The epic runner does not reimplement that logic. - Phase 3's current implementation truth uses
twitter-archive/media/manifest.jsonlas its default input. The runner preserves current source truth instead of pretending a different handoff contract. - The processing log in
twitter-archive/know-thy-father/PROCESSING_LOG.mdcan drift from current code reality. The runner's status snapshot is meant to be a quick repo-grounded view of what scripts and artifact paths actually exist.
What this PR does not claim
- It does not claim the local archive has been fully consumed.
- It does not claim the halted processing log has been resumed.
- It does not claim fact_store ingestion has been fully wired end-to-end.
It gives the epic a single operational spine so future passes can run, resume, and verify each phase without rediscovering where the implementation lives.