Timmy_Foundation/compounding-intelligence

Go to file

Alexander Payne 7bcec41d16

Test / pytest (pull_request) Failing after 12s

Details

feat: add transcript_harvester — rule-based knowledge extraction from sessions

Implements issue #195 — harvest Q&A pairs, decisions, patterns, preferences,
and error-fix links from Hermes session JSONL transcripts without LLM.

- scripts/transcript_harvester.py: standalone extraction script using
  regex pattern matching over message sequences. Handles 5 categories:
  * qa_pair — user questions ending in ? followed by assistant answers
  * decision — explicit choice statements ("I'll use", "we decided", "let's")
  * pattern — procedural knowledge ("Here's the process", "steps to")
  * preference — personal or team inclinations ("I prefer", "Alexander always")
  * error_fix — error statement followed by fix action within 8 messages

- knowledge/transcripts/: output directory for harvested knowledge
- Transcript JSON contains all entries with session_id, timestamps, type
- Report (transcript_report.md) gives category counts and sample entries

Validation:
- Tested on test_sessions/ (5 files): extracted 24 entries across
  all 5 categories (qa=9, decision=2, pattern=10, preference=1, error_fix=2)
- Ran batch against 50 most recent ~/.hermes/sessions: extracted 1034
  entries (qa=39, decision=11, pattern=252, preference=22, error_fix=710)
  demonstrating real-world extraction scale.

Closes #195

2026-04-26 15:09:45 -04:00

.gitea/workflows

ci: add pytest workflow for #190

2026-04-15 11:29:23 -04:00

knowledge

feat: add transcript_harvester — rule-based knowledge extraction from sessions

2026-04-26 15:09:45 -04:00

metrics

Initial structure: knowledge store, scripts, metrics, templates

2026-04-14 11:17:01 -04:00

scripts

feat: add transcript_harvester — rule-based knowledge extraction from sessions

2026-04-26 15:09:45 -04:00

templates

feat: build bootstrapper.py - pre-session context assembler

2026-04-14 14:05:30 -04:00

test_sessions

Add test session 5: Session with questions

2026-04-14 19:01:03 +00:00

tests

Merge pull request 'feat: knowledge deduplication — content hash + token similarity (#196 )' (#228 ) from burn/196-1776306000 into main

2026-04-21 15:28:50 +00:00

.gitignore

fix: implement refactoring_opportunity_finder API (#210 )

2026-04-21 07:29:44 -04:00

GENOME.md

fix(#676 ): update GENOME.md for compounding-intelligence

2026-04-21 04:43:54 +00:00

Makefile

ci: add pytest workflow for #190

2026-04-15 11:29:23 -04:00

quality_gate.py

feat: quality gate — score and filter knowledge entries (#198 )

2026-04-20 20:31:04 -04:00

README.md

Initial structure: knowledge store, scripts, metrics, templates

2026-04-14 11:17:01 -04:00

requirements.txt

ci: add pytest workflow for #190

2026-04-15 11:29:23 -04:00

README.md

Compounding Intelligence

Turn 1B+ daily tokens into durable, compounding fleet intelligence.

The Problem

20,991 sessions on disk. Each one starts at zero. Every agent rediscover the same HTTP 405 is a branch protection issue. The intelligence from a million tokens of work evaporates when the session ends.

The Solution

Three pipelines that form a compounding loop:

SESSION ENDS → HARVESTER → KNOWLEDGE STORE → BOOTSTRAPPER → NEW SESSION STARTS SMARTER
                              ↓
                         MEASURER → Prove it's working

Architecture

Pipeline 1: Harvester

Reads finished session transcripts. Extracts durable knowledge: facts, pitfalls, patterns, tool quirks. Stores in knowledge/.

Pipeline 2: Bootstrap

Before a session starts, queries knowledge store for relevant facts. Assembles compact 2k-token context. Injects into session so it starts with full situational awareness.

Pipeline 3: Measure

Tracks whether compounding is happening. Knowledge velocity, error reduction, hit rate, task completion. Daily report proves the loop works.

Directory Structure

├── knowledge/
│   ├── index.json          # Machine-readable fact index
│   ├── global/             # Cross-repo knowledge
│   ├── repos/{repo}.md     # Per-repo knowledge
│   └── agents/{agent}.md   # Agent-type notes
├── scripts/
│   ├── harvester.py        # Post-session knowledge extractor
│   ├── bootstrapper.py     # Pre-session context loader
│   ├── measurer.py         # Compounding metrics
│   └── session_reader.py   # JSONL parser
├── metrics/
│   └── dashboard.md        # Human-readable status
└── templates/
    ├── bootstrap-context.md
    └── harvest-prompt.md

The 100x Path

Month 1:  15,000 facts, sessions 20% faster
Month 2:  45,000 facts, sessions 40% faster, first-try success up 30%
Month 3:  90,000 facts, fleet measurably smarter per token

Each new session is better than the last. The intelligence compounds.

Issues

See all issues for the full roadmap.

Epics:

EPIC 1: Session Harvester (#2)
EPIC 2: Knowledge Store & Bootstrap (#3)
EPIC 3: Compounding Measurement (#4)
EPIC 4: Retroactive Harvest (#5)