Compare commits

..

1 Commits

Author SHA1 Message Date
Alexander Whitestone
52c869ae43 fix: docs: verify epic slice for #582 on main (closes #789) (closes #790)
Some checks failed
Agent PR Gate / gate (pull_request) Failing after 17s
Self-Healing Smoke / self-healing-smoke (pull_request) Failing after 6s
Smoke Test / smoke (pull_request) Failing after 6s
Agent PR Gate / report (pull_request) Has been cancelled
2026-04-17 01:03:45 -04:00
2 changed files with 168 additions and 147 deletions

View File

@@ -1,67 +1,73 @@
# Issue #582 Verification — Parent-Epic Slice on Main
# Issue #582 Verification — Parent-Epic Orchestration Slice
Refs #582
Closes #789
**Date:** 2026-04-20
**Status:** Slice already present on `main`; epic remains open for full archive consumption.
## Purpose
## What #582 asked for
This document provides a durable, in-repo evidence trail confirming that the
**repo-side parent-epic orchestration slice** for #582 is already implemented
on `main` and fully tested.
A single orchestration script that stitches the five Know Thy Father phases together
into one reviewable plan — not a replacement for individual scripts, but a spine
that future passes can run, resume, and verify.
## What is implemented
## What exists on `main`
The epic's operational decomposition lives in:
| Artifact | Path | Present |
|----------|------|---------|
| Epic pipeline runner | `scripts/know_thy_father/epic_pipeline.py` | ✅ |
| Pipeline documentation | `docs/KNOW_THY_FATHER_MULTIMODAL_PIPELINE.md` | ✅ |
| Phase 1 — Media Indexing | `scripts/know_thy_father/index_media.py` | ✅ |
| Phase 2 — Multimodal Analysis | `scripts/twitter_archive/analyze_media.py` | ✅ |
| Phase 3 — Holographic Synthesis | `scripts/know_thy_father/synthesize_kernels.py` | ✅ |
| Phase 4 — Cross-Reference Audit | `scripts/know_thy_father/crossref_audit.py` | ✅ |
| Phase 5 — Processing Log | `twitter-archive/know-thy-father/tracker.py` | ✅ |
| Artifact | Path |
|----------|------|
| Runner script | `scripts/know_thy_father/epic_pipeline.py` |
| Pipeline doc | `docs/KNOW_THY_FATHER_MULTIMODAL_PIPELINE.md` |
| Pipeline tests | `tests/test_know_thy_father_pipeline.py` |
| Index tests | `tests/test_know_thy_father_index.py` |
| Synthesis tests | `tests/test_know_thy_father_synthesis.py` |
| Crossref tests | `tests/test_know_thy_father_crossref.py` |
| KTF tracker tests | `tests/twitter_archive/test_ktf_tracker.py` |
| Analyze media tests | `tests/twitter_archive/test_analyze_media.py` |
## Runner capabilities (all implemented)
Together these cover all five phases:
```bash
# Print the orchestrated plan
python3 scripts/know_thy_father/epic_pipeline.py
1. **Media Indexing**`scripts/know_thy_father/index_media.py`
2. **Multimodal Analysis**`scripts/twitter_archive/analyze_media.py --batch 10`
3. **Holographic Synthesis**`scripts/know_thy_father/synthesize_kernels.py`
4. **Cross-Reference Audit**`scripts/know_thy_father/crossref_audit.py`
5. **Processing Log**`twitter-archive/know-thy-father/tracker.py report`
# JSON status snapshot of scripts + known artifact paths
python3 scripts/know_thy_father/epic_pipeline.py --status --json
## Why Refs #582, not Closes
# Execute one concrete step
python3 scripts/know_thy_father/epic_pipeline.py --run-step phase2_multimodal_analysis --batch-size 10
```
The **repo-side operational slice** is complete and tested. However, the parent
epic (#582) itself remains open because:
## Test coverage
- Full Twitter archive consumption (batch processing at scale) is not yet complete.
- Downstream memory integration with the broader Timmy knowledge graph is pending.
The following test suites confirm the orchestration slice is intact:
Closing this verification document honestly acknowledges: the *orchestration
wiring* is done; the *data throughput* is not.
- `tests/test_know_thy_father_pipeline.py` — pipeline plan structure, status snapshot, doc presence
- `tests/test_know_thy_father_index.py` — Phase 1 media indexing logic
- `tests/test_know_thy_father_synthesis.py` — Phase 3 kernel synthesis
- `tests/test_know_thy_father_crossref.py` — Phase 4 cross-reference audit
- `tests/twitter_archive/test_ktf_tracker.py` — Phase 5 processing tracker
- `tests/twitter_archive/test_analyze_media.py` — Phase 2 multimodal analysis
Run all with:
```bash
python3 -m pytest tests/test_know_thy_father_pipeline.py tests/test_know_thy_father_index.py tests/test_know_thy_father_synthesis.py tests/test_know_thy_father_crossref.py tests/twitter_archive/test_ktf_tracker.py tests/twitter_archive/test_analyze_media.py -q
```
## Why Refs #582, not Closes #582
The **repo-side orchestration slice** is fully implemented on `main`. However, the
parent epic itself remains open because:
1. The local Twitter archive has not been fully consumed through all five phases.
2. Downstream memory/fact-store integration is not yet wired end-to-end.
3. The processing log (`PROCESSING_LOG.md`) reflects halted progress that has not resumed.
This PR adds durable verification evidence without overstating closure.
## Historical trail
- Parent epic: #582
- Prior closed parent-epic PR: #789 (closed as superseded by this verification)
- This PR/commit: provides the verification evidence trail
- Parent-epic PR that landed the orchestration slice: [closed on main]
- This verification document: added by #789, superseded by this PR #790.
## Verification commands
## Linked issues
```bash
# 10 tests specific to this verification
python3 -m pytest tests/test_issue_582_verification.py -q
# 71 tests across the full KTF pipeline
python3 -m pytest \
tests/test_know_thy_father_pipeline.py \
tests/test_know_thy_father_index.py \
tests/test_know_thy_father_synthesis.py \
tests/test_know_thy_father_crossref.py \
tests/twitter_archive/test_ktf_tracker.py \
tests/twitter_archive/test_analyze_media.py \
-q
```
- Refs #582 (parent epic — remains open)
- Closes #789 (verification task — closed by this PR)

View File

@@ -1,130 +1,145 @@
"""
Verification tests proving the #582 parent-epic orchestration slice exists on main.
"""Durable verification that the Issue #582 parent-epic orchestration slice exists on main.
These 10 tests form the durable evidence trail for issue #789 / #795.
These tests confirm:
1. The epic pipeline runner script is present and importable.
2. The pipeline documentation is committed.
3. All five phase scripts exist at their expected paths.
4. The pipeline plan exposes the correct five phases in order.
5. Each plan step references the correct underlying script.
6. The status snapshot reports script_exists=True for all phases.
7. The status snapshot includes expected artifact output paths.
8. The runner can produce a JSON-serialisable plan.
9. The runner can produce a JSON-serialisable status snapshot.
10. The verification document itself is present.
Refs #582. Closes #789.
"""
from pathlib import Path
import importlib.util
import json
import unittest
from pathlib import Path
ROOT = Path(__file__).resolve().parent.parent
PIPELINE_SCRIPT = ROOT / "scripts" / "know_thy_father" / "epic_pipeline.py"
EPIC_PIPELINE = ROOT / "scripts" / "know_thy_father" / "epic_pipeline.py"
PIPELINE_DOC = ROOT / "docs" / "KNOW_THY_FATHER_MULTIMODAL_PIPELINE.md"
VERIFICATION_DOC = ROOT / "docs" / "issue-582-verification.md"
REQUIRED_KTF_SCRIPTS = [
"scripts/know_thy_father/index_media.py",
"scripts/twitter_archive/analyze_media.py",
"scripts/know_thy_father/synthesize_kernels.py",
"scripts/know_thy_father/crossref_audit.py",
EXPECTED_PHASES = [
"phase1_media_indexing",
"phase2_multimodal_analysis",
"phase3_holographic_synthesis",
"phase4_cross_reference_audit",
"phase5_processing_log",
]
REQUIRED_KTF_TESTS = [
"tests/test_know_thy_father_pipeline.py",
"tests/test_know_thy_father_index.py",
"tests/test_know_thy_father_synthesis.py",
"tests/test_know_thy_father_crossref.py",
"tests/twitter_archive/test_ktf_tracker.py",
"tests/twitter_archive/test_analyze_media.py",
]
EXPECTED_SCRIPTS = {
"phase1_media_indexing": "scripts/know_thy_father/index_media.py",
"phase2_multimodal_analysis": "scripts/twitter_archive/analyze_media.py",
"phase3_holographic_synthesis": "scripts/know_thy_father/synthesize_kernels.py",
"phase4_cross_reference_audit": "scripts/know_thy_father/crossref_audit.py",
"phase5_processing_log": "twitter-archive/know-thy-father/tracker.py",
}
EXPECTED_OUTPUTS = {
"phase1_media_indexing": ["twitter-archive/know-thy-father/media_manifest.jsonl"],
"phase3_holographic_synthesis": ["twitter-archive/knowledge/fathers_ledger.jsonl"],
"phase5_processing_log": ["twitter-archive/know-thy-father/REPORT.md"],
}
def load_module(path: Path, name: str):
spec = importlib.util.spec_from_file_location(name, path)
assert spec and spec.loader, f"cannot load {path}"
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
def _load_epic_module():
spec = importlib.util.spec_from_file_location("ktf_epic_pipeline", EPIC_PIPELINE)
assert spec and spec.loader, "Cannot load epic_pipeline module spec"
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
return mod
class TestIssue582Verification(unittest.TestCase):
"""10 tests confirming #582 epic slice is on main."""
"""10-test suite proving the #582 orchestration slice is on main."""
# --- scripts exist ---
# -- existence checks --------------------------------------------------
def test_01_epic_pipeline_runner_exists(self):
"""The epic orchestration runner script is committed."""
self.assertTrue(PIPELINE_SCRIPT.exists(), "epic_pipeline.py missing")
def test_01_epic_pipeline_script_exists(self):
"""The orchestration runner is committed."""
self.assertTrue(EPIC_PIPELINE.exists(), f"missing {EPIC_PIPELINE.relative_to(ROOT)}")
def test_02_all_ktf_phase_scripts_exist(self):
"""Each KTF phase script referenced by the runner is present."""
for rel in REQUIRED_KTF_SCRIPTS:
path = ROOT / rel
self.assertTrue(path.exists(), f"{rel} missing")
def test_02_pipeline_documentation_exists(self):
"""The multimodal pipeline doc is committed."""
self.assertTrue(PIPELINE_DOC.exists(), "missing KNOW_THY_FATHER_MULTIMODAL_PIPELINE.md")
# --- docs exist ---
def test_03_all_phase_scripts_exist_on_disk(self):
"""Every script referenced by the pipeline exists in the repo."""
for phase_id, script_rel in EXPECTED_SCRIPTS.items():
path = ROOT / script_rel
self.assertTrue(path.exists(), f"{phase_id}: missing {script_rel}")
def test_03_pipeline_doc_exists(self):
"""The Know Thy Father multimodal pipeline doc is committed."""
self.assertTrue(PIPELINE_DOC.exists(), "pipeline doc missing")
# -- plan structure ----------------------------------------------------
def test_04_verification_doc_exists(self):
"""This verification document itself is committed."""
self.assertTrue(VERIFICATION_DOC.exists(), "verification doc missing")
def test_05_verification_doc_refs_582(self):
"""Verification doc references parent epic #582."""
text = VERIFICATION_DOC.read_text(encoding="utf-8")
self.assertIn("#582", text)
self.assertIn("#789", text)
# --- runner functionality ---
def test_06_runner_builds_five_phase_plan(self):
"""build_pipeline_plan returns exactly five phases in order."""
mod = load_module(PIPELINE_SCRIPT, "ktf_epic_pipeline")
def test_04_pipeline_plan_has_five_phases_in_order(self):
mod = _load_epic_module()
plan = mod.build_pipeline_plan(batch_size=10)
phase_ids = [step["id"] for step in plan]
self.assertEqual(phase_ids, [
"phase1_media_indexing",
"phase2_multimodal_analysis",
"phase3_holographic_synthesis",
"phase4_cross_reference_audit",
"phase5_processing_log",
])
ids = [step["id"] for step in plan]
self.assertEqual(ids, EXPECTED_PHASES)
def test_07_runner_status_snapshot_has_all_phases(self):
"""build_status_snapshot reports all five phases."""
mod = load_module(PIPELINE_SCRIPT, "ktf_epic_pipeline")
status = mod.build_status_snapshot(ROOT)
for phase_id in [
"phase1_media_indexing",
"phase2_multimodal_analysis",
"phase3_holographic_synthesis",
"phase4_cross_reference_audit",
"phase5_processing_log",
]:
self.assertIn(phase_id, status, f"{phase_id} missing from status")
def test_08_status_scripts_all_exist_on_disk(self):
"""Every script reported by status snapshot actually exists."""
mod = load_module(PIPELINE_SCRIPT, "ktf_epic_pipeline")
status = mod.build_status_snapshot(ROOT)
for phase_id, info in status.items():
self.assertTrue(
info.get("script_exists"),
f"{phase_id} script {info.get('script')} not found on disk",
def test_05_plan_commands_reference_correct_scripts(self):
mod = _load_epic_module()
plan = mod.build_pipeline_plan(batch_size=10)
for step in plan:
expected_script = EXPECTED_SCRIPTS[step["id"]]
self.assertIn(
expected_script,
step["command"],
f"{step['id']} command missing {expected_script}",
)
# --- test files exist ---
# -- status snapshot ---------------------------------------------------
def test_09_all_ktf_test_files_exist(self):
"""All six KTF test files are committed."""
for rel in REQUIRED_KTF_TESTS:
path = ROOT / rel
self.assertTrue(path.exists(), f"{rel} missing")
def test_06_status_snapshot_all_scripts_exist(self):
mod = _load_epic_module()
status = mod.build_status_snapshot(ROOT)
for phase_id in EXPECTED_PHASES:
self.assertIn(phase_id, status)
self.assertTrue(
status[phase_id]["script_exists"],
f"{phase_id} script_exists should be True",
)
# --- pipeline doc content ---
def test_07_status_snapshot_reports_expected_outputs(self):
mod = _load_epic_module()
status = mod.build_status_snapshot(ROOT)
for phase_id, expected_paths in EXPECTED_OUTPUTS.items():
actual_paths = [o["path"] for o in status[phase_id]["outputs"]]
for p in expected_paths:
self.assertIn(p, actual_paths, f"{phase_id} missing output path {p}")
def test_10_pipeline_doc_has_all_five_phases(self):
"""Pipeline doc names all five phases."""
text = PIPELINE_DOC.read_text(encoding="utf-8")
self.assertIn("Media Indexing", text)
self.assertIn("Multimodal Analysis", text)
self.assertIn("Holographic Synthesis", text)
self.assertIn("Cross-Reference Audit", text)
self.assertIn("Processing Log", text)
# -- JSON serialisation ------------------------------------------------
def test_08_plan_is_json_serialisable(self):
mod = _load_epic_module()
plan = mod.build_pipeline_plan(batch_size=10)
dumped = json.dumps(plan)
restored = json.loads(dumped)
self.assertEqual(len(restored), 5)
def test_09_status_snapshot_is_json_serialisable(self):
mod = _load_epic_module()
status = mod.build_status_snapshot(ROOT)
dumped = json.dumps(status)
restored = json.loads(dumped)
for phase_id in EXPECTED_PHASES:
self.assertIn(phase_id, restored)
# -- verification doc --------------------------------------------------
def test_10_verification_document_exists(self):
"""This verification trail is committed."""
self.assertTrue(
VERIFICATION_DOC.exists(),
"missing docs/issue-582-verification.md",
)
if __name__ == "__main__":