Files
compounding-intelligence/knowledge
Alexander Payne 7bcec41d16
Some checks failed
Test / pytest (pull_request) Failing after 12s
feat: add transcript_harvester — rule-based knowledge extraction from sessions
Implements issue #195 — harvest Q&A pairs, decisions, patterns, preferences,
and error-fix links from Hermes session JSONL transcripts without LLM.

- scripts/transcript_harvester.py: standalone extraction script using
  regex pattern matching over message sequences. Handles 5 categories:
  * qa_pair — user questions ending in ? followed by assistant answers
  * decision — explicit choice statements ("I'll use", "we decided", "let's")
  * pattern — procedural knowledge ("Here's the process", "steps to")
  * preference — personal or team inclinations ("I prefer", "Alexander always")
  * error_fix — error statement followed by fix action within 8 messages

- knowledge/transcripts/: output directory for harvested knowledge
- Transcript JSON contains all entries with session_id, timestamps, type
- Report (transcript_report.md) gives category counts and sample entries

Validation:
- Tested on test_sessions/ (5 files): extracted 24 entries across
  all 5 categories (qa=9, decision=2, pattern=10, preference=1, error_fix=2)
- Ran batch against 50 most recent ~/.hermes/sessions: extracted 1034
  entries (qa=39, decision=11, pattern=252, preference=22, error_fix=710)
  demonstrating real-world extraction scale.

Closes #195
2026-04-26 15:09:45 -04:00
..