compounding-intelligence

Timmy_Foundation/compounding-intelligence

Fork 0

Commit Graph

Author	SHA1	Message	Date
Alexander Payne	7bcec41d16	feat: add transcript_harvester — rule-based knowledge extraction from sessions Some checks failed Test / pytest (pull_request) Failing after 12s Details Implements issue #195 — harvest Q&A pairs, decisions, patterns, preferences, and error-fix links from Hermes session JSONL transcripts without LLM. - scripts/transcript_harvester.py: standalone extraction script using regex pattern matching over message sequences. Handles 5 categories: * qa_pair — user questions ending in ? followed by assistant answers * decision — explicit choice statements ("I'll use", "we decided", "let's") * pattern — procedural knowledge ("Here's the process", "steps to") * preference — personal or team inclinations ("I prefer", "Alexander always") * error_fix — error statement followed by fix action within 8 messages - knowledge/transcripts/: output directory for harvested knowledge - Transcript JSON contains all entries with session_id, timestamps, type - Report (transcript_report.md) gives category counts and sample entries Validation: - Tested on test_sessions/ (5 files): extracted 24 entries across all 5 categories (qa=9, decision=2, pattern=10, preference=1, error_fix=2) - Ran batch against 50 most recent ~/.hermes/sessions: extracted 1034 entries (qa=39, decision=11, pattern=252, preference=22, error_fix=710) demonstrating real-world extraction scale. Closes #195	2026-04-26 15:09:45 -04:00
Timmy	b32d316023	feat(#10 ): knowledge file format schema + example knowledge files - SCHEMA.md: full specification for index.json and YAML knowledge files - knowledge/global/pitfalls.yaml: 8 cross-repo pitfalls - knowledge/global/tool-quirks.yaml: 7 environment quirk facts - knowledge/repos/hermes-agent.yaml: 8 per-repo pitfalls (cron, paths, SSH) - knowledge/repos/the-nexus.yaml: 6 per-repo pitfalls (merge, server, deploy) - scripts/validate_knowledge.py: schema validator (29 facts, all passing) - knowledge/index.json: populated with 29 seed facts from real fleet data Design decisions: - YAML for humans, index.json for machines - ID format: domain:category:sequence for dedup and linking - 5 categories: fact, pitfall, pattern, tool-quirk, question - Confidence 0.0-1.0 with defined ranges - Related facts by ID for graph traversal - Tags for searchability - Source count + dates for decay/expiry Acceptance criteria: - [x] Directory structure created - [x] Schema documented (SCHEMA.md) - [x] index.json with real facts (29 total) - [x] Example knowledge files for 2 repos (hermes-agent, the-nexus) - [x] Validation script passes	2026-04-14 14:21:21 -04:00
Alexander Whitestone	b5873e9e3d	Initial structure: knowledge store, scripts, metrics, templates	2026-04-14 11:17:01 -04:00

Author

SHA1

Message

Date

Alexander Payne

7bcec41d16

feat: add transcript_harvester — rule-based knowledge extraction from sessions

Test / pytest (pull_request) Failing after 12s

Details

Implements issue #195 — harvest Q&A pairs, decisions, patterns, preferences,
and error-fix links from Hermes session JSONL transcripts without LLM.

- scripts/transcript_harvester.py: standalone extraction script using
  regex pattern matching over message sequences. Handles 5 categories:
  * qa_pair — user questions ending in ? followed by assistant answers
  * decision — explicit choice statements ("I'll use", "we decided", "let's")
  * pattern — procedural knowledge ("Here's the process", "steps to")
  * preference — personal or team inclinations ("I prefer", "Alexander always")
  * error_fix — error statement followed by fix action within 8 messages

- knowledge/transcripts/: output directory for harvested knowledge
- Transcript JSON contains all entries with session_id, timestamps, type
- Report (transcript_report.md) gives category counts and sample entries

Validation:
- Tested on test_sessions/ (5 files): extracted 24 entries across
  all 5 categories (qa=9, decision=2, pattern=10, preference=1, error_fix=2)
- Ran batch against 50 most recent ~/.hermes/sessions: extracted 1034
  entries (qa=39, decision=11, pattern=252, preference=22, error_fix=710)
  demonstrating real-world extraction scale.

Closes #195

2026-04-26 15:09:45 -04:00

Timmy

b32d316023

feat(#10 ): knowledge file format schema + example knowledge files

- SCHEMA.md: full specification for index.json and YAML knowledge files
- knowledge/global/pitfalls.yaml: 8 cross-repo pitfalls
- knowledge/global/tool-quirks.yaml: 7 environment quirk facts
- knowledge/repos/hermes-agent.yaml: 8 per-repo pitfalls (cron, paths, SSH)
- knowledge/repos/the-nexus.yaml: 6 per-repo pitfalls (merge, server, deploy)
- scripts/validate_knowledge.py: schema validator (29 facts, all passing)
- knowledge/index.json: populated with 29 seed facts from real fleet data

Design decisions:
- YAML for humans, index.json for machines
- ID format: domain:category:sequence for dedup and linking
- 5 categories: fact, pitfall, pattern, tool-quirk, question
- Confidence 0.0-1.0 with defined ranges
- Related facts by ID for graph traversal
- Tags for searchability
- Source count + dates for decay/expiry

Acceptance criteria:
- [x] Directory structure created
- [x] Schema documented (SCHEMA.md)
- [x] index.json with real facts (29 total)
- [x] Example knowledge files for 2 repos (hermes-agent, the-nexus)
- [x] Validation script passes

2026-04-14 14:21:21 -04:00

Alexander Whitestone

b5873e9e3d

Initial structure: knowledge store, scripts, metrics, templates

2026-04-14 11:17:01 -04:00

3 Commits