62 lines
3.5 KiB
Markdown
62 lines
3.5 KiB
Markdown
# Know Thy Father — Multimodal Media Consumption Pipeline
|
|
|
|
Refs #582
|
|
|
|
This document makes the epic operational by naming the current source-of-truth scripts, their handoff artifacts, and the one-command runner that coordinates them.
|
|
|
|
## Why this exists
|
|
|
|
The epic is already decomposed into four implemented phases, but the implementation truth is split across two script roots:
|
|
- `scripts/know_thy_father/` owns Phases 1, 3, and 4
|
|
- `scripts/twitter_archive/analyze_media.py` owns Phase 2
|
|
- `twitter-archive/know-thy-father/tracker.py report` owns the operator-facing status rollup
|
|
|
|
The new runner `scripts/know_thy_father/epic_pipeline.py` does not replace those scripts. It stitches them together into one explicit, reviewable plan.
|
|
|
|
## Phase map
|
|
|
|
| Phase | Script | Primary output |
|
|
|-------|--------|----------------|
|
|
| 1. Media Indexing | `scripts/know_thy_father/index_media.py` | `twitter-archive/know-thy-father/media_manifest.jsonl` |
|
|
| 2. Multimodal Analysis | `scripts/twitter_archive/analyze_media.py --batch 10` | `twitter-archive/know-thy-father/analysis.jsonl` + `meaning-kernels.jsonl` + `pipeline-status.json` |
|
|
| 3. Holographic Synthesis | `scripts/know_thy_father/synthesize_kernels.py` | `twitter-archive/knowledge/fathers_ledger.jsonl` |
|
|
| 4. Cross-Reference Audit | `scripts/know_thy_father/crossref_audit.py` | `twitter-archive/notes/crossref_report.md` |
|
|
| 5. Processing Log | `twitter-archive/know-thy-father/tracker.py report` | `twitter-archive/know-thy-father/REPORT.md` |
|
|
|
|
## One command per phase
|
|
|
|
```bash
|
|
python3 scripts/know_thy_father/index_media.py --tweets twitter-archive/extracted/tweets.jsonl --output twitter-archive/know-thy-father/media_manifest.jsonl
|
|
python3 scripts/twitter_archive/analyze_media.py --batch 10
|
|
python3 scripts/know_thy_father/synthesize_kernels.py --input twitter-archive/media/manifest.jsonl --output twitter-archive/knowledge/fathers_ledger.jsonl --summary twitter-archive/knowledge/fathers_ledger.summary.json
|
|
python3 scripts/know_thy_father/crossref_audit.py --soul SOUL.md --kernels twitter-archive/notes/know_thy_father_crossref.md --output twitter-archive/notes/crossref_report.md
|
|
python3 twitter-archive/know-thy-father/tracker.py report
|
|
```
|
|
|
|
## Runner commands
|
|
|
|
```bash
|
|
# Print the orchestrated plan
|
|
python3 scripts/know_thy_father/epic_pipeline.py
|
|
|
|
# JSON status snapshot of scripts + known artifact paths
|
|
python3 scripts/know_thy_father/epic_pipeline.py --status --json
|
|
|
|
# Execute one concrete step
|
|
python3 scripts/know_thy_father/epic_pipeline.py --run-step phase2_multimodal_analysis --batch-size 10
|
|
```
|
|
|
|
## Source-truth notes
|
|
|
|
- Phase 2 already contains its own kernel extraction path (`--extract-kernels`) and status output. The epic runner does not reimplement that logic.
|
|
- Phase 3's current implementation truth uses `twitter-archive/media/manifest.jsonl` as its default input. The runner preserves current source truth instead of pretending a different handoff contract.
|
|
- The processing log in `twitter-archive/know-thy-father/PROCESSING_LOG.md` can drift from current code reality. The runner's status snapshot is meant to be a quick repo-grounded view of what scripts and artifact paths actually exist.
|
|
|
|
## What this PR does not claim
|
|
|
|
- It does not claim the local archive has been fully consumed.
|
|
- It does not claim the halted processing log has been resumed.
|
|
- It does not claim fact_store ingestion has been fully wired end-to-end.
|
|
|
|
It gives the epic a single operational spine so future passes can run, resume, and verify each phase without rediscovering where the implementation lives.
|