compounding-intelligence/GENOME.md

# GENOME.md — compounding-intelligence

*Auto-generated codebase genome. Addresses timmy-home#676.*

---

## Project Overview

**What:** A system that turns 1B+ daily agent tokens into durable, compounding fleet intelligence.

**Why:** Every agent session starts at zero. The same mistakes get made repeatedly — the same HTTP 405 is rediscovered as a branch protection issue, the same token path is searched for from scratch. Intelligence evaporates when the session ends.

**How:** Three pipelines form a compounding loop:

```
SESSION ENDS → HARVESTER → KNOWLEDGE STORE → BOOTSTRAPPER → NEW SESSION STARTS SMARTER
                              ↓
                         MEASURER → Prove it's working
```

**Status:** Early stage. Template and test scaffolding exist. Core pipeline scripts (harvester.py, bootstrapper.py, measurer.py, session_reader.py) are planned but not yet implemented. The knowledge extraction prompt is complete and validated.

---

## Architecture

```mermaid
graph TD
    A[Session Transcript<br/>.jsonl] --> B[Harvester]
    B --> C{Extract Knowledge}
    C --> D[knowledge/index.json]
    C --> E[knowledge/global/*.md]
    C --> F[knowledge/repos/{repo}.md]
    C --> G[knowledge/agents/{agent}.md]
    D --> H[Bootstrapper]
    H --> I[Bootstrap Context<br/>2k token injection]
    I --> J[New Session<br/>starts smarter]
    J --> A
    D --> K[Measurer]
    K --> L[metrics/dashboard.md]
    K --> M[Velocity / Hit Rate<br/>Error Reduction]
```

### Pipeline 1: Harvester

**Status:** Prompt designed. Script not implemented.

Reads finished session transcripts (JSONL). Uses `templates/harvest-prompt.md` to extract durable knowledge into five categories:

| Category | Description | Example |
|----------|-------------|---------|
| `fact` | Concrete, verifiable information | "Repository X has 5 files" |
| `pitfall` | Errors encountered, wrong assumptions | "Token is at ~/.config/gitea/token, not env var" |
| `pattern` | Successful action sequences | "Deploy: test → build → push → webhook" |
| `tool-quirk` | Environment-specific behaviors | "URL format requires trailing slash" |
| `question` | Identified but unanswered | "Need optimal batch size for harvesting" |

Output schema per knowledge item:
```json
{
  "fact": "One sentence description",
  "category": "fact|pitfall|pattern|tool-quirk|question",
  "repo": "repo-name or 'global'",
  "confidence": 0.0-1.0
}
```

### Pipeline 2: Bootstrapper

**Status:** Not implemented.

Queries knowledge store before session start. Assembles a compact 2k-token context from relevant facts. Injects into session startup so the agent begins with full situational awareness.

### Pipeline 3: Measurer

**Status:** Not implemented.

Tracks compounding metrics: knowledge velocity (facts/day), error reduction (%), hit rate (knowledge used / knowledge available), task completion improvement.

---

## Directory Structure

```
compounding-intelligence/
├── README.md                           # Project overview and architecture
├── GENOME.md                           # This file (codebase genome)
├── knowledge/                          # [PLANNED] Knowledge store
│   ├── index.json                      # Machine-readable fact index
│   ├── global/                         # Cross-repo knowledge
│   ├── repos/{repo}.md                 # Per-repo knowledge
│   └── agents/{agent}.md               # Agent-type notes
├── scripts/
│   ├── test_harvest_prompt.py          # Basic prompt validation (2.5KB)
│   └── test_harvest_prompt_comprehensive.py  # Full prompt structure test (6.8KB)
├── templates/
│   └── harvest-prompt.md               # Knowledge extraction prompt (3.5KB)
├── test_sessions/
│   ├── session_success.jsonl           # Happy path test data
│   ├── session_failure.jsonl           # Failure path test data
│   ├── session_partial.jsonl           # Incomplete session test data
│   ├── session_patterns.jsonl          # Pattern extraction test data
│   └── session_questions.jsonl         # Question identification test data
└── metrics/                            # [PLANNED] Compounding metrics
    └── dashboard.md
```

---

## Entry Points and Data Flow

### Entry Point 1: Knowledge Extraction (Harvester)

```
Input:  Session transcript (JSONL)
        ↓
        templates/harvest-prompt.md (LLM prompt)
        ↓
        Knowledge items (JSON array)
        ↓
Output: knowledge/index.json + per-repo/per-agent markdown files
```

### Entry Point 2: Session Bootstrap (Bootstrapper)

```
Input:  Session context (repo, agent type, task type)
        ↓
        knowledge/index.json (query relevant facts)
        ↓
        2k-token bootstrap context
        ↓
Output: Injected into session startup
```

### Entry Point 3: Measurement (Measurer)

```
Input:  knowledge/index.json + session history
        ↓
        Velocity, hit rate, error reduction calculations
        ↓
Output: metrics/dashboard.md
```

---

## Key Abstractions

### Knowledge Item
The atomic unit. One sentence, one category, one confidence score. Designed to be small enough that 1000 items fit in a 2k-token bootstrap context.

### Knowledge Store
A directory structure that mirrors the fleet's mental model:
- `global/` — knowledge that applies everywhere (tool quirks, environment facts)
- `repos/` — knowledge specific to each repo
- `agents/` — knowledge specific to each agent type

### Confidence Score
0.0–1.0 scale. Defines how certain the harvester is about each extracted fact:
- 0.9–1.0: Explicitly stated with verification
- 0.7–0.8: Clearly implied by multiple data points
- 0.5–0.6: Suggested but not fully verified
- 0.3–0.4: Inferred from limited data
- 0.1–0.2: Speculative or uncertain

### Bootstrap Context
The 2k-token injection that a new session receives. Assembled from the most relevant knowledge items for the current task, filtered by confidence > 0.7, deduplicated, and compressed.

---

## API Surface

### Internal (scripts not yet implemented)

| Script | Input | Output | Status |
|--------|-------|--------|--------|
| `harvester.py` | Session JSONL path | Knowledge items JSON | PLANNED |
| `bootstrapper.py` | Repo + agent type | 2k-token context string | PLANNED |
| `measurer.py` | Knowledge store path | Metrics JSON | PLANNED |
| `session_reader.py` | Session JSONL path | Parsed transcript | PLANNED |

### Prompt (templates/harvest-prompt.md)

The extraction prompt is the core "API." It takes a session transcript and returns structured JSON. It defines:
- Five extraction categories
- Output format (JSON array of knowledge items)
- Confidence scoring rubric
- Constraints (no hallucination, specificity, relevance, brevity)
- Example input/output pair

---

## Test Coverage

### What Exists

| File | Tests | Coverage |
|------|-------|----------|
| `scripts/test_harvest_prompt.py` | 2 tests | Prompt file existence, sample transcript |
| `scripts/test_harvest_prompt_comprehensive.py` | 5 tests | Prompt structure, categories, fields, confidence scoring, size limits |
| `test_sessions/*.jsonl` | 5 sessions | Success, failure, partial, patterns, questions |

### What's Missing

1. **Harvester integration test** — Does the prompt actually extract correct knowledge from real transcripts?
2. **Bootstrapper test** — Does it assemble relevant context correctly?
3. **Knowledge store test** — Does the index.json maintain consistency?
4. **Confidence calibration test** — Do high-confidence facts actually prove true in later sessions?
5. **Deduplication test** — Are duplicate facts across sessions handled?
6. **Staleness test** — How does the system handle outdated knowledge?

---

## Security Considerations

1. **No secrets in knowledge store** — The harvester must filter out API keys, tokens, and credentials from extracted facts. The prompt constraints mention this but there is no automated guard.

2. **Knowledge poisoning** — A malicious or corrupted session could inject false facts. Confidence scoring partially mitigates this, but there is no verification step.

3. **Access control** — The knowledge store has no access control. Any process that can read the directory can read all facts. In a multi-tenant setup, this is a concern.

4. **Transcript privacy** — Session transcripts may contain user data. The harvester must not extract personally identifiable information into the knowledge store.

---

## The 100x Path (from README)

```
Month 1:  15,000 facts, sessions 20% faster
Month 2:  45,000 facts, sessions 40% faster, first-try success up 30%
Month 3:  90,000 facts, fleet measurably smarter per token
```

Each new session is better than the last. The intelligence compounds.

---

*Generated by codebase-genome pipeline. Ref: timmy-home#676.*