Files
compounding-intelligence/scripts/test_harvester_pipeline.py
Alexander Whitestone da073ad7cf feat: add harvester.py — session knowledge extractor (#8)
Main harvester module that chains:
  session_reader → extraction prompt → LLM → validate → deduplicate → store

Includes:
- scripts/harvester.py — main module (reader + prompt + storage pipeline)
- scripts/session_reader.py — JSONL transcript parser
- scripts/test_harvester_pipeline.py — smoke tests (all passing)

Pipeline:
  1. Read session JSONL via session_reader
  2. Truncate long sessions (first 50 + last 50 messages)
  3. Send transcript + extraction prompt to LLM (mimo-v2-pro)
  4. Parse structured JSON response (facts/pitfalls/patterns/quirks/questions)
  5. Validate fields + confidence threshold
  6. Deduplicate against knowledge/index.json (fingerprint + word overlap)
  7. Write to knowledge store (index.json + per-repo markdown)

CLI:
  Single:  python3 harvester.py --session <path> --output knowledge/
  Batch:   python3 harvester.py --batch --since 2026-04-01 --limit 100
  Dry-run: python3 harvester.py --session <path> --dry-run
2026-04-14 14:03:30 -04:00

6.2 KiB