Files
compounding-intelligence/GENOME.md

8.7 KiB
Raw Blame History

GENOME.md — compounding-intelligence

Auto-generated codebase genome. See timmy-home#676.


Project Overview

What: A system that turns 1B+ daily agent tokens into durable, compounding fleet intelligence.

Why: Every agent session starts at zero. The same mistakes get made repeatedly — the same HTTP 405 is rediscovered as a branch protection issue, the same token path is searched for from scratch. Intelligence evaporates when the session ends.

How: Three pipelines form a compounding loop:

SESSION ENDS → HARVESTER → KNOWLEDGE STORE → BOOTSTRAPPER → NEW SESSION STARTS SMARTER
                              ↓
                         MEASURER → Prove it's working

Status: Early stage. Template and test scaffolding exist. Core pipeline scripts (harvester.py, bootstrapper.py, measurer.py, session_reader.py) are planned but not yet implemented. The knowledge extraction prompt is complete and validated.


Architecture

graph TD
    A[Session Transcript<br/>.jsonl] --> B[Harvester]
    B --> C{Extract Knowledge}
    C --> D[knowledge/index.json]
    C --> E[knowledge/global/*.md]
    C --> F[knowledge/repos/{repo}.md]
    C --> G[knowledge/agents/{agent}.md]
    D --> H[Bootstrapper]
    H --> I[Bootstrap Context<br/>2k token injection]
    I --> J[New Session<br/>starts smarter]
    J --> A
    D --> K[Measurer]
    K --> L[metrics/dashboard.md]
    K --> M[Velocity / Hit Rate<br/>Error Reduction]

Pipeline 1: Harvester

Status: Prompt designed. Script not implemented.

Reads finished session transcripts (JSONL). Uses templates/harvest-prompt.md to extract durable knowledge into five categories:

Category Description Example
fact Concrete, verifiable information "Repository X has 5 files"
pitfall Errors encountered, wrong assumptions "Token is at ~/.config/gitea/token, not env var"
pattern Successful action sequences "Deploy: test → build → push → webhook"
tool-quirk Environment-specific behaviors "URL format requires trailing slash"
question Identified but unanswered "Need optimal batch size for harvesting"

Output schema per knowledge item:

{
  "fact": "One sentence description",
  "category": "fact|pitfall|pattern|tool-quirk|question",
  "repo": "repo-name or 'global'",
  "confidence": 0.0-1.0
}

Pipeline 2: Bootstrapper

Status: Not implemented.

Queries knowledge store before session start. Assembles a compact 2k-token context from relevant facts. Injects into session startup so the agent begins with full situational awareness.

Pipeline 3: Measurer

Status: Not implemented.

Tracks compounding metrics: knowledge velocity (facts/day), error reduction (%), hit rate (knowledge used / knowledge available), task completion improvement.


Directory Structure

compounding-intelligence/
├── README.md                           # Project overview and architecture
├── GENOME.md                           # This file (codebase genome)
├── knowledge/                          # [PLANNED] Knowledge store
│   ├── index.json                      # Machine-readable fact index
│   ├── global/                         # Cross-repo knowledge
│   ├── repos/{repo}.md                 # Per-repo knowledge
│   └── agents/{agent}.md               # Agent-type notes
├── scripts/
│   ├── test_harvest_prompt.py          # Basic prompt validation (2.5KB)
│   └── test_harvest_prompt_comprehensive.py  # Full prompt structure test (6.8KB)
├── templates/
│   └── harvest-prompt.md               # Knowledge extraction prompt (3.5KB)
├── test_sessions/
│   ├── session_success.jsonl           # Happy path test data
│   ├── session_failure.jsonl           # Failure path test data
│   ├── session_partial.jsonl           # Incomplete session test data
│   ├── session_patterns.jsonl          # Pattern extraction test data
│   └── session_questions.jsonl         # Question identification test data
└── metrics/                            # [PLANNED] Compounding metrics
    └── dashboard.md

Entry Points and Data Flow

Entry Point 1: Knowledge Extraction (Harvester)

Input:  Session transcript (JSONL)
        ↓
        templates/harvest-prompt.md (LLM prompt)
        ↓
        Knowledge items (JSON array)
        ↓
Output: knowledge/index.json + per-repo/per-agent markdown files

Entry Point 2: Session Bootstrap (Bootstrapper)

Input:  Session context (repo, agent type, task type)
        ↓
        knowledge/index.json (query relevant facts)
        ↓
        2k-token bootstrap context
        ↓
Output: Injected into session startup

Entry Point 3: Measurement (Measurer)

Input:  knowledge/index.json + session history
        ↓
        Velocity, hit rate, error reduction calculations
        ↓
Output: metrics/dashboard.md

Key Abstractions

Knowledge Item

The atomic unit. One sentence, one category, one confidence score. Designed to be small enough that 1000 items fit in a 2k-token bootstrap context.

Knowledge Store

A directory structure that mirrors the fleet's mental model:

  • global/ — knowledge that applies everywhere (tool quirks, environment facts)
  • repos/ — knowledge specific to each repo
  • agents/ — knowledge specific to each agent type

Confidence Score

0.01.0 scale. Defines how certain the harvester is about each extracted fact:

  • 0.91.0: Explicitly stated with verification
  • 0.70.8: Clearly implied by multiple data points
  • 0.50.6: Suggested but not fully verified
  • 0.30.4: Inferred from limited data
  • 0.10.2: Speculative or uncertain

Bootstrap Context

The 2k-token injection that a new session receives. Assembled from the most relevant knowledge items for the current task, filtered by confidence > 0.7, deduplicated, and compressed.


API Surface

Internal (scripts not yet implemented)

Script Input Output Status
harvester.py Session JSONL path Knowledge items JSON PLANNED
bootstrapper.py Repo + agent type 2k-token context string PLANNED
measurer.py Knowledge store path Metrics JSON PLANNED
session_reader.py Session JSONL path Parsed transcript PLANNED

Prompt (templates/harvest-prompt.md)

The extraction prompt is the core "API." It takes a session transcript and returns structured JSON. It defines:

  • Five extraction categories
  • Output format (JSON array of knowledge items)
  • Confidence scoring rubric
  • Constraints (no hallucination, specificity, relevance, brevity)
  • Example input/output pair

Test Coverage

What Exists

File Tests Coverage
scripts/test_harvest_prompt.py 2 tests Prompt file existence, sample transcript
scripts/test_harvest_prompt_comprehensive.py 5 tests Prompt structure, categories, fields, confidence scoring, size limits
test_sessions/*.jsonl 5 sessions Success, failure, partial, patterns, questions

What's Missing

  1. Harvester integration test — Does the prompt actually extract correct knowledge from real transcripts?
  2. Bootstrapper test — Does it assemble relevant context correctly?
  3. Knowledge store test — Does the index.json maintain consistency?
  4. Confidence calibration test — Do high-confidence facts actually prove true in later sessions?
  5. Deduplication test — Are duplicate facts across sessions handled?
  6. Staleness test — How does the system handle outdated knowledge?

Security Considerations

  1. No secrets in knowledge store — The harvester must filter out API keys, tokens, and credentials from extracted facts. The prompt constraints mention this but there is no automated guard.

  2. Knowledge poisoning — A malicious or corrupted session could inject false facts. Confidence scoring partially mitigates this, but there is no verification step.

  3. Access control — The knowledge store has no access control. Any process that can read the directory can read all facts. In a multi-tenant setup, this is a concern.

  4. Transcript privacy — Session transcripts may contain user data. The harvester must not extract personally identifiable information into the knowledge store.


The 100x Path (from README)

Month 1:  15,000 facts, sessions 20% faster
Month 2:  45,000 facts, sessions 40% faster, first-try success up 30%
Month 3:  90,000 facts, fleet measurably smarter per token

Each new session is better than the last. The intelligence compounds.


Generated by codebase-genome pipeline. Ref: timmy-home#676.