From cdb71adddf1103c9757775054e3b3f38f1871c0f Mon Sep 17 00:00:00 2001 From: Alexander Whitestone Date: Tue, 14 Apr 2026 22:58:55 +0000 Subject: [PATCH] =?UTF-8?q?docs:=20GENOME.md=20=E2=80=94=20full=20codebase?= =?UTF-8?q?=20analysis=20#676?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- GENOME.md | 239 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 239 insertions(+) create mode 100644 GENOME.md diff --git a/GENOME.md b/GENOME.md new file mode 100644 index 0000000..f9b3a7d --- /dev/null +++ b/GENOME.md @@ -0,0 +1,239 @@ +# GENOME.md — compounding-intelligence + +*Auto-generated codebase genome. See timmy-home#676.* + +--- + +## Project Overview + +**What:** A system that turns 1B+ daily agent tokens into durable, compounding fleet intelligence. + +**Why:** Every agent session starts at zero. The same mistakes get made repeatedly — the same HTTP 405 is rediscovered as a branch protection issue, the same token path is searched for from scratch. Intelligence evaporates when the session ends. + +**How:** Three pipelines form a compounding loop: + +``` +SESSION ENDS → HARVESTER → KNOWLEDGE STORE → BOOTSTRAPPER → NEW SESSION STARTS SMARTER + ↓ + MEASURER → Prove it's working +``` + +**Status:** Early stage. Template and test scaffolding exist. Core pipeline scripts (harvester.py, bootstrapper.py, measurer.py, session_reader.py) are planned but not yet implemented. The knowledge extraction prompt is complete and validated. + +--- + +## Architecture + +```mermaid +graph TD + A[Session Transcript
.jsonl] --> B[Harvester] + B --> C{Extract Knowledge} + C --> D[knowledge/index.json] + C --> E[knowledge/global/*.md] + C --> F[knowledge/repos/{repo}.md] + C --> G[knowledge/agents/{agent}.md] + D --> H[Bootstrapper] + H --> I[Bootstrap Context
2k token injection] + I --> J[New Session
starts smarter] + J --> A + D --> K[Measurer] + K --> L[metrics/dashboard.md] + K --> M[Velocity / Hit Rate
Error Reduction] +``` + +### Pipeline 1: Harvester + +**Status:** Prompt designed. Script not implemented. + +Reads finished session transcripts (JSONL). Uses `templates/harvest-prompt.md` to extract durable knowledge into five categories: + +| Category | Description | Example | +|----------|-------------|---------| +| `fact` | Concrete, verifiable information | "Repository X has 5 files" | +| `pitfall` | Errors encountered, wrong assumptions | "Token is at ~/.config/gitea/token, not env var" | +| `pattern` | Successful action sequences | "Deploy: test → build → push → webhook" | +| `tool-quirk` | Environment-specific behaviors | "URL format requires trailing slash" | +| `question` | Identified but unanswered | "Need optimal batch size for harvesting" | + +Output schema per knowledge item: +```json +{ + "fact": "One sentence description", + "category": "fact|pitfall|pattern|tool-quirk|question", + "repo": "repo-name or 'global'", + "confidence": 0.0-1.0 +} +``` + +### Pipeline 2: Bootstrapper + +**Status:** Not implemented. + +Queries knowledge store before session start. Assembles a compact 2k-token context from relevant facts. Injects into session startup so the agent begins with full situational awareness. + +### Pipeline 3: Measurer + +**Status:** Not implemented. + +Tracks compounding metrics: knowledge velocity (facts/day), error reduction (%), hit rate (knowledge used / knowledge available), task completion improvement. + +--- + +## Directory Structure + +``` +compounding-intelligence/ +├── README.md # Project overview and architecture +├── GENOME.md # This file (codebase genome) +├── knowledge/ # [PLANNED] Knowledge store +│ ├── index.json # Machine-readable fact index +│ ├── global/ # Cross-repo knowledge +│ ├── repos/{repo}.md # Per-repo knowledge +│ └── agents/{agent}.md # Agent-type notes +├── scripts/ +│ ├── test_harvest_prompt.py # Basic prompt validation (2.5KB) +│ └── test_harvest_prompt_comprehensive.py # Full prompt structure test (6.8KB) +├── templates/ +│ └── harvest-prompt.md # Knowledge extraction prompt (3.5KB) +├── test_sessions/ +│ ├── session_success.jsonl # Happy path test data +│ ├── session_failure.jsonl # Failure path test data +│ ├── session_partial.jsonl # Incomplete session test data +│ ├── session_patterns.jsonl # Pattern extraction test data +│ └── session_questions.jsonl # Question identification test data +└── metrics/ # [PLANNED] Compounding metrics + └── dashboard.md +``` + +--- + +## Entry Points and Data Flow + +### Entry Point 1: Knowledge Extraction (Harvester) + +``` +Input: Session transcript (JSONL) + ↓ + templates/harvest-prompt.md (LLM prompt) + ↓ + Knowledge items (JSON array) + ↓ +Output: knowledge/index.json + per-repo/per-agent markdown files +``` + +### Entry Point 2: Session Bootstrap (Bootstrapper) + +``` +Input: Session context (repo, agent type, task type) + ↓ + knowledge/index.json (query relevant facts) + ↓ + 2k-token bootstrap context + ↓ +Output: Injected into session startup +``` + +### Entry Point 3: Measurement (Measurer) + +``` +Input: knowledge/index.json + session history + ↓ + Velocity, hit rate, error reduction calculations + ↓ +Output: metrics/dashboard.md +``` + +--- + +## Key Abstractions + +### Knowledge Item +The atomic unit. One sentence, one category, one confidence score. Designed to be small enough that 1000 items fit in a 2k-token bootstrap context. + +### Knowledge Store +A directory structure that mirrors the fleet's mental model: +- `global/` — knowledge that applies everywhere (tool quirks, environment facts) +- `repos/` — knowledge specific to each repo +- `agents/` — knowledge specific to each agent type + +### Confidence Score +0.0–1.0 scale. Defines how certain the harvester is about each extracted fact: +- 0.9–1.0: Explicitly stated with verification +- 0.7–0.8: Clearly implied by multiple data points +- 0.5–0.6: Suggested but not fully verified +- 0.3–0.4: Inferred from limited data +- 0.1–0.2: Speculative or uncertain + +### Bootstrap Context +The 2k-token injection that a new session receives. Assembled from the most relevant knowledge items for the current task, filtered by confidence > 0.7, deduplicated, and compressed. + +--- + +## API Surface + +### Internal (scripts not yet implemented) + +| Script | Input | Output | Status | +|--------|-------|--------|--------| +| `harvester.py` | Session JSONL path | Knowledge items JSON | PLANNED | +| `bootstrapper.py` | Repo + agent type | 2k-token context string | PLANNED | +| `measurer.py` | Knowledge store path | Metrics JSON | PLANNED | +| `session_reader.py` | Session JSONL path | Parsed transcript | PLANNED | + +### Prompt (templates/harvest-prompt.md) + +The extraction prompt is the core "API." It takes a session transcript and returns structured JSON. It defines: +- Five extraction categories +- Output format (JSON array of knowledge items) +- Confidence scoring rubric +- Constraints (no hallucination, specificity, relevance, brevity) +- Example input/output pair + +--- + +## Test Coverage + +### What Exists + +| File | Tests | Coverage | +|------|-------|----------| +| `scripts/test_harvest_prompt.py` | 2 tests | Prompt file existence, sample transcript | +| `scripts/test_harvest_prompt_comprehensive.py` | 5 tests | Prompt structure, categories, fields, confidence scoring, size limits | +| `test_sessions/*.jsonl` | 5 sessions | Success, failure, partial, patterns, questions | + +### What's Missing + +1. **Harvester integration test** — Does the prompt actually extract correct knowledge from real transcripts? +2. **Bootstrapper test** — Does it assemble relevant context correctly? +3. **Knowledge store test** — Does the index.json maintain consistency? +4. **Confidence calibration test** — Do high-confidence facts actually prove true in later sessions? +5. **Deduplication test** — Are duplicate facts across sessions handled? +6. **Staleness test** — How does the system handle outdated knowledge? + +--- + +## Security Considerations + +1. **No secrets in knowledge store** — The harvester must filter out API keys, tokens, and credentials from extracted facts. The prompt constraints mention this but there is no automated guard. + +2. **Knowledge poisoning** — A malicious or corrupted session could inject false facts. Confidence scoring partially mitigates this, but there is no verification step. + +3. **Access control** — The knowledge store has no access control. Any process that can read the directory can read all facts. In a multi-tenant setup, this is a concern. + +4. **Transcript privacy** — Session transcripts may contain user data. The harvester must not extract personally identifiable information into the knowledge store. + +--- + +## The 100x Path (from README) + +``` +Month 1: 15,000 facts, sessions 20% faster +Month 2: 45,000 facts, sessions 40% faster, first-try success up 30% +Month 3: 90,000 facts, fleet measurably smarter per token +``` + +Each new session is better than the last. The intelligence compounds. + +--- + +*Generated by codebase-genome pipeline. Ref: timmy-home#676.*