# GENOME.md — compounding-intelligence *Auto-generated codebase genome. Addresses timmy-home#676.* --- ## Project Overview **What:** A system that turns 1B+ daily agent tokens into durable, compounding fleet intelligence. **Why:** Every agent session starts at zero. The same mistakes get made repeatedly — the same HTTP 405 is rediscovered as a branch protection issue, the same token path is searched for from scratch. Intelligence evaporates when the session ends. **How:** Three pipelines form a compounding loop: ``` SESSION ENDS → HARVESTER → KNOWLEDGE STORE → BOOTSTRAPPER → NEW SESSION STARTS SMARTER ↓ MEASURER → Prove it's working ``` **Status:** Early stage. Template and test scaffolding exist. Core pipeline scripts (harvester.py, bootstrapper.py, measurer.py, session_reader.py) are planned but not yet implemented. The knowledge extraction prompt is complete and validated. --- ## Architecture ```mermaid graph TD A[Session Transcript
.jsonl] --> B[Harvester] B --> C{Extract Knowledge} C --> D[knowledge/index.json] C --> E[knowledge/global/*.md] C --> F[knowledge/repos/{repo}.md] C --> G[knowledge/agents/{agent}.md] D --> H[Bootstrapper] H --> I[Bootstrap Context
2k token injection] I --> J[New Session
starts smarter] J --> A D --> K[Measurer] K --> L[metrics/dashboard.md] K --> M[Velocity / Hit Rate
Error Reduction] ``` ### Pipeline 1: Harvester **Status:** Prompt designed. Script not implemented. Reads finished session transcripts (JSONL). Uses `templates/harvest-prompt.md` to extract durable knowledge into five categories: | Category | Description | Example | |----------|-------------|---------| | `fact` | Concrete, verifiable information | "Repository X has 5 files" | | `pitfall` | Errors encountered, wrong assumptions | "Token is at ~/.config/gitea/token, not env var" | | `pattern` | Successful action sequences | "Deploy: test → build → push → webhook" | | `tool-quirk` | Environment-specific behaviors | "URL format requires trailing slash" | | `question` | Identified but unanswered | "Need optimal batch size for harvesting" | Output schema per knowledge item: ```json { "fact": "One sentence description", "category": "fact|pitfall|pattern|tool-quirk|question", "repo": "repo-name or 'global'", "confidence": 0.0-1.0 } ``` ### Pipeline 2: Bootstrapper **Status:** Not implemented. Queries knowledge store before session start. Assembles a compact 2k-token context from relevant facts. Injects into session startup so the agent begins with full situational awareness. ### Pipeline 3: Measurer **Status:** Not implemented. Tracks compounding metrics: knowledge velocity (facts/day), error reduction (%), hit rate (knowledge used / knowledge available), task completion improvement. --- ## Directory Structure ``` compounding-intelligence/ ├── README.md # Project overview and architecture ├── GENOME.md # This file (codebase genome) ├── knowledge/ # [PLANNED] Knowledge store │ ├── index.json # Machine-readable fact index │ ├── global/ # Cross-repo knowledge │ ├── repos/{repo}.md # Per-repo knowledge │ └── agents/{agent}.md # Agent-type notes ├── scripts/ │ ├── test_harvest_prompt.py # Basic prompt validation (2.5KB) │ └── test_harvest_prompt_comprehensive.py # Full prompt structure test (6.8KB) ├── templates/ │ └── harvest-prompt.md # Knowledge extraction prompt (3.5KB) ├── test_sessions/ │ ├── session_success.jsonl # Happy path test data │ ├── session_failure.jsonl # Failure path test data │ ├── session_partial.jsonl # Incomplete session test data │ ├── session_patterns.jsonl # Pattern extraction test data │ └── session_questions.jsonl # Question identification test data └── metrics/ # [PLANNED] Compounding metrics └── dashboard.md ``` --- ## Entry Points and Data Flow ### Entry Point 1: Knowledge Extraction (Harvester) ``` Input: Session transcript (JSONL) ↓ templates/harvest-prompt.md (LLM prompt) ↓ Knowledge items (JSON array) ↓ Output: knowledge/index.json + per-repo/per-agent markdown files ``` ### Entry Point 2: Session Bootstrap (Bootstrapper) ``` Input: Session context (repo, agent type, task type) ↓ knowledge/index.json (query relevant facts) ↓ 2k-token bootstrap context ↓ Output: Injected into session startup ``` ### Entry Point 3: Measurement (Measurer) ``` Input: knowledge/index.json + session history ↓ Velocity, hit rate, error reduction calculations ↓ Output: metrics/dashboard.md ``` --- ## Key Abstractions ### Knowledge Item The atomic unit. One sentence, one category, one confidence score. Designed to be small enough that 1000 items fit in a 2k-token bootstrap context. ### Knowledge Store A directory structure that mirrors the fleet's mental model: - `global/` — knowledge that applies everywhere (tool quirks, environment facts) - `repos/` — knowledge specific to each repo - `agents/` — knowledge specific to each agent type ### Confidence Score 0.0–1.0 scale. Defines how certain the harvester is about each extracted fact: - 0.9–1.0: Explicitly stated with verification - 0.7–0.8: Clearly implied by multiple data points - 0.5–0.6: Suggested but not fully verified - 0.3–0.4: Inferred from limited data - 0.1–0.2: Speculative or uncertain ### Bootstrap Context The 2k-token injection that a new session receives. Assembled from the most relevant knowledge items for the current task, filtered by confidence > 0.7, deduplicated, and compressed. --- ## API Surface ### Internal (scripts not yet implemented) | Script | Input | Output | Status | |--------|-------|--------|--------| | `harvester.py` | Session JSONL path | Knowledge items JSON | PLANNED | | `bootstrapper.py` | Repo + agent type | 2k-token context string | PLANNED | | `measurer.py` | Knowledge store path | Metrics JSON | PLANNED | | `session_reader.py` | Session JSONL path | Parsed transcript | PLANNED | ### Prompt (templates/harvest-prompt.md) The extraction prompt is the core "API." It takes a session transcript and returns structured JSON. It defines: - Five extraction categories - Output format (JSON array of knowledge items) - Confidence scoring rubric - Constraints (no hallucination, specificity, relevance, brevity) - Example input/output pair --- ## Test Coverage ### What Exists | File | Tests | Coverage | |------|-------|----------| | `scripts/test_harvest_prompt.py` | 2 tests | Prompt file existence, sample transcript | | `scripts/test_harvest_prompt_comprehensive.py` | 5 tests | Prompt structure, categories, fields, confidence scoring, size limits | | `test_sessions/*.jsonl` | 5 sessions | Success, failure, partial, patterns, questions | ### What's Missing 1. **Harvester integration test** — Does the prompt actually extract correct knowledge from real transcripts? 2. **Bootstrapper test** — Does it assemble relevant context correctly? 3. **Knowledge store test** — Does the index.json maintain consistency? 4. **Confidence calibration test** — Do high-confidence facts actually prove true in later sessions? 5. **Deduplication test** — Are duplicate facts across sessions handled? 6. **Staleness test** — How does the system handle outdated knowledge? --- ## Security Considerations 1. **No secrets in knowledge store** — The harvester must filter out API keys, tokens, and credentials from extracted facts. The prompt constraints mention this but there is no automated guard. 2. **Knowledge poisoning** — A malicious or corrupted session could inject false facts. Confidence scoring partially mitigates this, but there is no verification step. 3. **Access control** — The knowledge store has no access control. Any process that can read the directory can read all facts. In a multi-tenant setup, this is a concern. 4. **Transcript privacy** — Session transcripts may contain user data. The harvester must not extract personally identifiable information into the knowledge store. --- ## The 100x Path (from README) ``` Month 1: 15,000 facts, sessions 20% faster Month 2: 45,000 facts, sessions 40% faster, first-try success up 30% Month 3: 90,000 facts, fleet measurably smarter per token ``` Each new session is better than the last. The intelligence compounds. --- *Generated by codebase-genome pipeline. Ref: timmy-home#676.*