Commit Graph

5 Commits

Author SHA1 Message Date
Alexander Whitestone
da073ad7cf feat: add harvester.py — session knowledge extractor (#8)
Main harvester module that chains:
  session_reader → extraction prompt → LLM → validate → deduplicate → store

Includes:
- scripts/harvester.py — main module (reader + prompt + storage pipeline)
- scripts/session_reader.py — JSONL transcript parser
- scripts/test_harvester_pipeline.py — smoke tests (all passing)

Pipeline:
  1. Read session JSONL via session_reader
  2. Truncate long sessions (first 50 + last 50 messages)
  3. Send transcript + extraction prompt to LLM (mimo-v2-pro)
  4. Parse structured JSON response (facts/pitfalls/patterns/quirks/questions)
  5. Validate fields + confidence threshold
  6. Deduplicate against knowledge/index.json (fingerprint + word overlap)
  7. Write to knowledge store (index.json + per-repo markdown)

CLI:
  Single:  python3 harvester.py --session <path> --output knowledge/
  Batch:   python3 harvester.py --batch --since 2026-04-01 --limit 100
  Dry-run: python3 harvester.py --session <path> --dry-run
2026-04-14 14:03:30 -04:00
102ef67a8e Add test script for knowledge extraction prompt 2026-04-14 17:22:17 +00:00
d9f51b30a9 Add knowledge extraction prompt template for issue #7 2026-04-14 17:21:25 +00:00
Alexander Whitestone
b5873e9e3d Initial structure: knowledge store, scripts, metrics, templates 2026-04-14 11:17:01 -04:00
8252ef5b80 Initial commit 2026-04-14 15:11:53 +00:00