Alexander Payne
|
7bcec41d16
|
feat: add transcript_harvester — rule-based knowledge extraction from sessions
Test / pytest (pull_request) Failing after 12s
Implements issue #195 — harvest Q&A pairs, decisions, patterns, preferences,
and error-fix links from Hermes session JSONL transcripts without LLM.
- scripts/transcript_harvester.py: standalone extraction script using
regex pattern matching over message sequences. Handles 5 categories:
* qa_pair — user questions ending in ? followed by assistant answers
* decision — explicit choice statements ("I'll use", "we decided", "let's")
* pattern — procedural knowledge ("Here's the process", "steps to")
* preference — personal or team inclinations ("I prefer", "Alexander always")
* error_fix — error statement followed by fix action within 8 messages
- knowledge/transcripts/: output directory for harvested knowledge
- Transcript JSON contains all entries with session_id, timestamps, type
- Report (transcript_report.md) gives category counts and sample entries
Validation:
- Tested on test_sessions/ (5 files): extracted 24 entries across
all 5 categories (qa=9, decision=2, pattern=10, preference=1, error_fix=2)
- Ran batch against 50 most recent ~/.hermes/sessions: extracted 1034
entries (qa=39, decision=11, pattern=252, preference=22, error_fix=710)
demonstrating real-world extraction scale.
Closes #195
|
2026-04-26 15:09:45 -04:00 |
|
Timmy
|
b32d316023
|
feat(#10): knowledge file format schema + example knowledge files
- SCHEMA.md: full specification for index.json and YAML knowledge files
- knowledge/global/pitfalls.yaml: 8 cross-repo pitfalls
- knowledge/global/tool-quirks.yaml: 7 environment quirk facts
- knowledge/repos/hermes-agent.yaml: 8 per-repo pitfalls (cron, paths, SSH)
- knowledge/repos/the-nexus.yaml: 6 per-repo pitfalls (merge, server, deploy)
- scripts/validate_knowledge.py: schema validator (29 facts, all passing)
- knowledge/index.json: populated with 29 seed facts from real fleet data
Design decisions:
- YAML for humans, index.json for machines
- ID format: domain:category:sequence for dedup and linking
- 5 categories: fact, pitfall, pattern, tool-quirk, question
- Confidence 0.0-1.0 with defined ranges
- Related facts by ID for graph traversal
- Tags for searchability
- Source count + dates for decay/expiry
Acceptance criteria:
- [x] Directory structure created
- [x] Schema documented (SCHEMA.md)
- [x] index.json with real facts (29 total)
- [x] Example knowledge files for 2 repos (hermes-agent, the-nexus)
- [x] Validation script passes
|
2026-04-14 14:21:21 -04:00 |
|