Alexander Whitestone
|
da073ad7cf
|
feat: add harvester.py — session knowledge extractor (#8)
Main harvester module that chains:
session_reader → extraction prompt → LLM → validate → deduplicate → store
Includes:
- scripts/harvester.py — main module (reader + prompt + storage pipeline)
- scripts/session_reader.py — JSONL transcript parser
- scripts/test_harvester_pipeline.py — smoke tests (all passing)
Pipeline:
1. Read session JSONL via session_reader
2. Truncate long sessions (first 50 + last 50 messages)
3. Send transcript + extraction prompt to LLM (mimo-v2-pro)
4. Parse structured JSON response (facts/pitfalls/patterns/quirks/questions)
5. Validate fields + confidence threshold
6. Deduplicate against knowledge/index.json (fingerprint + word overlap)
7. Write to knowledge store (index.json + per-repo markdown)
CLI:
Single: python3 harvester.py --session <path> --output knowledge/
Batch: python3 harvester.py --batch --since 2026-04-01 --limit 100
Dry-run: python3 harvester.py --session <path> --dry-run
|
2026-04-14 14:03:30 -04:00 |
|
|
|
102ef67a8e
|
Add test script for knowledge extraction prompt
|
2026-04-14 17:22:17 +00:00 |
|
|
|
d9f51b30a9
|
Add knowledge extraction prompt template for issue #7
|
2026-04-14 17:21:25 +00:00 |
|
Alexander Whitestone
|
b5873e9e3d
|
Initial structure: knowledge store, scripts, metrics, templates
|
2026-04-14 11:17:01 -04:00 |
|
|
|
8252ef5b80
|
Initial commit
|
2026-04-14 15:11:53 +00:00 |
|