Timmy_Foundation/compounding-intelligence

Fork 0

Files

Alexander Whitestone cdb71adddf docs: GENOME.md — full codebase analysis #676

2026-04-14 22:58:55 +00:00

8.7 KiB

Raw Blame History

GENOME.md — compounding-intelligence

Auto-generated codebase genome. See timmy-home#676.

Project Overview

What: A system that turns 1B+ daily agent tokens into durable, compounding fleet intelligence.

Why: Every agent session starts at zero. The same mistakes get made repeatedly — the same HTTP 405 is rediscovered as a branch protection issue, the same token path is searched for from scratch. Intelligence evaporates when the session ends.

How: Three pipelines form a compounding loop:

SESSION ENDS → HARVESTER → KNOWLEDGE STORE → BOOTSTRAPPER → NEW SESSION STARTS SMARTER
                              ↓
                         MEASURER → Prove it's working

Status: Early stage. Template and test scaffolding exist. Core pipeline scripts (harvester.py, bootstrapper.py, measurer.py, session_reader.py) are planned but not yet implemented. The knowledge extraction prompt is complete and validated.

Architecture

graph TD
    A[Session Transcript<br/>.jsonl] --> B[Harvester]
    B --> C{Extract Knowledge}
    C --> D[knowledge/index.json]
    C --> E[knowledge/global/*.md]
    C --> F[knowledge/repos/{repo}.md]
    C --> G[knowledge/agents/{agent}.md]
    D --> H[Bootstrapper]
    H --> I[Bootstrap Context<br/>2k token injection]
    I --> J[New Session<br/>starts smarter]
    J --> A
    D --> K[Measurer]
    K --> L[metrics/dashboard.md]
    K --> M[Velocity / Hit Rate<br/>Error Reduction]

Pipeline 1: Harvester

Status: Prompt designed. Script not implemented.

Reads finished session transcripts (JSONL). Uses templates/harvest-prompt.md to extract durable knowledge into five categories:

Category	Description	Example
`fact`	Concrete, verifiable information	"Repository X has 5 files"
`pitfall`	Errors encountered, wrong assumptions	"Token is at ~/.config/gitea/token, not env var"
`pattern`	Successful action sequences	"Deploy: test → build → push → webhook"
`tool-quirk`	Environment-specific behaviors	"URL format requires trailing slash"
`question`	Identified but unanswered	"Need optimal batch size for harvesting"

Output schema per knowledge item:

{
  "fact": "One sentence description",
  "category": "fact|pitfall|pattern|tool-quirk|question",
  "repo": "repo-name or 'global'",
  "confidence": 0.0-1.0
}

Pipeline 2: Bootstrapper

Status: Not implemented.

Queries knowledge store before session start. Assembles a compact 2k-token context from relevant facts. Injects into session startup so the agent begins with full situational awareness.

Pipeline 3: Measurer

Status: Not implemented.

Tracks compounding metrics: knowledge velocity (facts/day), error reduction (%), hit rate (knowledge used / knowledge available), task completion improvement.

Directory Structure

compounding-intelligence/
├── README.md                           # Project overview and architecture
├── GENOME.md                           # This file (codebase genome)
├── knowledge/                          # [PLANNED] Knowledge store
│   ├── index.json                      # Machine-readable fact index
│   ├── global/                         # Cross-repo knowledge
│   ├── repos/{repo}.md                 # Per-repo knowledge
│   └── agents/{agent}.md               # Agent-type notes
├── scripts/
│   ├── test_harvest_prompt.py          # Basic prompt validation (2.5KB)
│   └── test_harvest_prompt_comprehensive.py  # Full prompt structure test (6.8KB)
├── templates/
│   └── harvest-prompt.md               # Knowledge extraction prompt (3.5KB)
├── test_sessions/
│   ├── session_success.jsonl           # Happy path test data
│   ├── session_failure.jsonl           # Failure path test data
│   ├── session_partial.jsonl           # Incomplete session test data
│   ├── session_patterns.jsonl          # Pattern extraction test data
│   └── session_questions.jsonl         # Question identification test data
└── metrics/                            # [PLANNED] Compounding metrics
    └── dashboard.md

Entry Points and Data Flow

Entry Point 1: Knowledge Extraction (Harvester)

Input:  Session transcript (JSONL)
        ↓
        templates/harvest-prompt.md (LLM prompt)
        ↓
        Knowledge items (JSON array)
        ↓
Output: knowledge/index.json + per-repo/per-agent markdown files

Entry Point 2: Session Bootstrap (Bootstrapper)

Input:  Session context (repo, agent type, task type)
        ↓
        knowledge/index.json (query relevant facts)
        ↓
        2k-token bootstrap context
        ↓
Output: Injected into session startup

Entry Point 3: Measurement (Measurer)

Input:  knowledge/index.json + session history
        ↓
        Velocity, hit rate, error reduction calculations
        ↓
Output: metrics/dashboard.md

Key Abstractions

Knowledge Item

The atomic unit. One sentence, one category, one confidence score. Designed to be small enough that 1000 items fit in a 2k-token bootstrap context.

Knowledge Store

A directory structure that mirrors the fleet's mental model:

global/ — knowledge that applies everywhere (tool quirks, environment facts)
repos/ — knowledge specific to each repo
agents/ — knowledge specific to each agent type

Confidence Score

0.0–1.0 scale. Defines how certain the harvester is about each extracted fact:

0.9–1.0: Explicitly stated with verification
0.7–0.8: Clearly implied by multiple data points
0.5–0.6: Suggested but not fully verified
0.3–0.4: Inferred from limited data
0.1–0.2: Speculative or uncertain

Bootstrap Context

The 2k-token injection that a new session receives. Assembled from the most relevant knowledge items for the current task, filtered by confidence > 0.7, deduplicated, and compressed.

API Surface

Internal (scripts not yet implemented)

Script	Input	Output	Status
`harvester.py`	Session JSONL path	Knowledge items JSON	PLANNED
`bootstrapper.py`	Repo + agent type	2k-token context string	PLANNED
`measurer.py`	Knowledge store path	Metrics JSON	PLANNED
`session_reader.py`	Session JSONL path	Parsed transcript	PLANNED

Prompt (templates/harvest-prompt.md)

The extraction prompt is the core "API." It takes a session transcript and returns structured JSON. It defines:

Five extraction categories
Output format (JSON array of knowledge items)
Confidence scoring rubric
Constraints (no hallucination, specificity, relevance, brevity)
Example input/output pair

Test Coverage

What Exists

File	Tests	Coverage
`scripts/test_harvest_prompt.py`	2 tests	Prompt file existence, sample transcript
`scripts/test_harvest_prompt_comprehensive.py`	5 tests	Prompt structure, categories, fields, confidence scoring, size limits
`test_sessions/*.jsonl`	5 sessions	Success, failure, partial, patterns, questions

What's Missing

Harvester integration test — Does the prompt actually extract correct knowledge from real transcripts?
Bootstrapper test — Does it assemble relevant context correctly?
Knowledge store test — Does the index.json maintain consistency?
Confidence calibration test — Do high-confidence facts actually prove true in later sessions?
Deduplication test — Are duplicate facts across sessions handled?
Staleness test — How does the system handle outdated knowledge?

Security Considerations

No secrets in knowledge store — The harvester must filter out API keys, tokens, and credentials from extracted facts. The prompt constraints mention this but there is no automated guard.
Knowledge poisoning — A malicious or corrupted session could inject false facts. Confidence scoring partially mitigates this, but there is no verification step.
Access control — The knowledge store has no access control. Any process that can read the directory can read all facts. In a multi-tenant setup, this is a concern.
Transcript privacy — Session transcripts may contain user data. The harvester must not extract personally identifiable information into the knowledge store.

The 100x Path (from README)

Month 1:  15,000 facts, sessions 20% faster
Month 2:  45,000 facts, sessions 40% faster, first-try success up 30%
Month 3:  90,000 facts, fleet measurably smarter per token

Each new session is better than the last. The intelligence compounds.

Generated by codebase-genome pipeline. Ref: timmy-home#676.

8.7 KiB Raw Blame History Unescape Escape