Alexander Payne
|
7bcec41d16
|
feat: add transcript_harvester — rule-based knowledge extraction from sessions
Test / pytest (pull_request) Failing after 12s
Implements issue #195 — harvest Q&A pairs, decisions, patterns, preferences,
and error-fix links from Hermes session JSONL transcripts without LLM.
- scripts/transcript_harvester.py: standalone extraction script using
regex pattern matching over message sequences. Handles 5 categories:
* qa_pair — user questions ending in ? followed by assistant answers
* decision — explicit choice statements ("I'll use", "we decided", "let's")
* pattern — procedural knowledge ("Here's the process", "steps to")
* preference — personal or team inclinations ("I prefer", "Alexander always")
* error_fix — error statement followed by fix action within 8 messages
- knowledge/transcripts/: output directory for harvested knowledge
- Transcript JSON contains all entries with session_id, timestamps, type
- Report (transcript_report.md) gives category counts and sample entries
Validation:
- Tested on test_sessions/ (5 files): extracted 24 entries across
all 5 categories (qa=9, decision=2, pattern=10, preference=1, error_fix=2)
- Ran batch against 50 most recent ~/.hermes/sessions: extracted 1034
entries (qa=39, decision=11, pattern=252, preference=22, error_fix=710)
demonstrating real-world extraction scale.
Closes #195
|
2026-04-26 15:09:45 -04:00 |
|