docs: GENOME.md — full codebase analysis #676

2026-04-14 22:58:55 +00:00
2 changed files with 239 additions and 282 deletions
--- a/GENOME.md
+++ b/GENOME.md
@@ -0,0 +1,239 @@
 # GENOME.md — compounding-intelligence
 *Auto-generated codebase genome. See timmy-home#676.*
 ---
 ## Project Overview
 **What:** A system that turns 1B+ daily agent tokens into durable, compounding fleet intelligence.
 **Why:** Every agent session starts at zero. The same mistakes get made repeatedly — the same HTTP 405 is rediscovered as a branch protection issue, the same token path is searched for from scratch. Intelligence evaporates when the session ends.
 **How:** Three pipelines form a compounding loop:
 ```
 SESSION ENDS → HARVESTER → KNOWLEDGE STORE → BOOTSTRAPPER → NEW SESSION STARTS SMARTER
                              ↓
                         MEASURER → Prove it's working
 ```
 **Status:** Early stage. Template and test scaffolding exist. Core pipeline scripts (harvester.py, bootstrapper.py, measurer.py, session_reader.py) are planned but not yet implemented. The knowledge extraction prompt is complete and validated.
 ---
 ## Architecture
 ```mermaid
 graph TD
    A[Session Transcript<br/>.jsonl] --> B[Harvester]
    B --> C{Extract Knowledge}
    C --> D[knowledge/index.json]
    C --> E[knowledge/global/*.md]
    C --> F[knowledge/repos/{repo}.md]
    C --> G[knowledge/agents/{agent}.md]
    D --> H[Bootstrapper]
    H --> I[Bootstrap Context<br/>2k token injection]
    I --> J[New Session<br/>starts smarter]
    J --> A
    D --> K[Measurer]
    K --> L[metrics/dashboard.md]
    K --> M[Velocity / Hit Rate<br/>Error Reduction]
 ```
 ### Pipeline 1: Harvester
 **Status:** Prompt designed. Script not implemented.
 Reads finished session transcripts (JSONL). Uses `templates/harvest-prompt.md` to extract durable knowledge into five categories:
 | Category | Description | Example |
 |----------|-------------|---------|
 | `fact` | Concrete, verifiable information | "Repository X has 5 files" |
 | `pitfall` | Errors encountered, wrong assumptions | "Token is at ~/.config/gitea/token, not env var" |
 | `pattern` | Successful action sequences | "Deploy: test → build → push → webhook" |
 | `tool-quirk` | Environment-specific behaviors | "URL format requires trailing slash" |
 | `question` | Identified but unanswered | "Need optimal batch size for harvesting" |
 Output schema per knowledge item:
 ```json
 {
  "fact": "One sentence description",
  "category": "fact|pitfall|pattern|tool-quirk|question",
  "repo": "repo-name or 'global'",
  "confidence": 0.0-1.0
 }
 ```
 ### Pipeline 2: Bootstrapper
 **Status:** Not implemented.
 Queries knowledge store before session start. Assembles a compact 2k-token context from relevant facts. Injects into session startup so the agent begins with full situational awareness.
 ### Pipeline 3: Measurer
 **Status:** Not implemented.
 Tracks compounding metrics: knowledge velocity (facts/day), error reduction (%), hit rate (knowledge used / knowledge available), task completion improvement.
 ---
 ## Directory Structure
 ```
 compounding-intelligence/
 ├── README.md                           # Project overview and architecture
 ├── GENOME.md                           # This file (codebase genome)
 ├── knowledge/                          # [PLANNED] Knowledge store
 │   ├── index.json                      # Machine-readable fact index
 │   ├── global/                         # Cross-repo knowledge
 │   ├── repos/{repo}.md                 # Per-repo knowledge
 │   └── agents/{agent}.md               # Agent-type notes
 ├── scripts/
 │   ├── test_harvest_prompt.py          # Basic prompt validation (2.5KB)
 │   └── test_harvest_prompt_comprehensive.py  # Full prompt structure test (6.8KB)
 ├── templates/
 │   └── harvest-prompt.md               # Knowledge extraction prompt (3.5KB)
 ├── test_sessions/
 │   ├── session_success.jsonl           # Happy path test data
 │   ├── session_failure.jsonl           # Failure path test data
 │   ├── session_partial.jsonl           # Incomplete session test data
 │   ├── session_patterns.jsonl          # Pattern extraction test data
 │   └── session_questions.jsonl         # Question identification test data
 └── metrics/                            # [PLANNED] Compounding metrics
    └── dashboard.md
 ```
 ---
 ## Entry Points and Data Flow
 ### Entry Point 1: Knowledge Extraction (Harvester)
 ```
 Input:  Session transcript (JSONL)
        ↓
        templates/harvest-prompt.md (LLM prompt)
        ↓
        Knowledge items (JSON array)
        ↓
 Output: knowledge/index.json + per-repo/per-agent markdown files
 ```
 ### Entry Point 2: Session Bootstrap (Bootstrapper)
 ```
 Input:  Session context (repo, agent type, task type)
        ↓
        knowledge/index.json (query relevant facts)
        ↓
        2k-token bootstrap context
        ↓
 Output: Injected into session startup
 ```
 ### Entry Point 3: Measurement (Measurer)
 ```
 Input:  knowledge/index.json + session history
        ↓
        Velocity, hit rate, error reduction calculations
        ↓
 Output: metrics/dashboard.md
 ```
 ---
 ## Key Abstractions
 ### Knowledge Item
 The atomic unit. One sentence, one category, one confidence score. Designed to be small enough that 1000 items fit in a 2k-token bootstrap context.
 ### Knowledge Store
 A directory structure that mirrors the fleet's mental model:
 - `global/` — knowledge that applies everywhere (tool quirks, environment facts)
 - `repos/` — knowledge specific to each repo
 - `agents/` — knowledge specific to each agent type
 ### Confidence Score
 0.0–1.0 scale. Defines how certain the harvester is about each extracted fact:
 - 0.9–1.0: Explicitly stated with verification
 - 0.7–0.8: Clearly implied by multiple data points
 - 0.5–0.6: Suggested but not fully verified
 - 0.3–0.4: Inferred from limited data
 - 0.1–0.2: Speculative or uncertain
 ### Bootstrap Context
 The 2k-token injection that a new session receives. Assembled from the most relevant knowledge items for the current task, filtered by confidence > 0.7, deduplicated, and compressed.
 ---
 ## API Surface
 ### Internal (scripts not yet implemented)
 | Script | Input | Output | Status |
 |--------|-------|--------|--------|
 | `harvester.py` | Session JSONL path | Knowledge items JSON | PLANNED |
 | `bootstrapper.py` | Repo + agent type | 2k-token context string | PLANNED |
 | `measurer.py` | Knowledge store path | Metrics JSON | PLANNED |
 | `session_reader.py` | Session JSONL path | Parsed transcript | PLANNED |
 ### Prompt (templates/harvest-prompt.md)
 The extraction prompt is the core "API." It takes a session transcript and returns structured JSON. It defines:
 - Five extraction categories
 - Output format (JSON array of knowledge items)
 - Confidence scoring rubric
 - Constraints (no hallucination, specificity, relevance, brevity)
 - Example input/output pair
 ---
 ## Test Coverage
 ### What Exists
 | File | Tests | Coverage |
 |------|-------|----------|
 | `scripts/test_harvest_prompt.py` | 2 tests | Prompt file existence, sample transcript |
 | `scripts/test_harvest_prompt_comprehensive.py` | 5 tests | Prompt structure, categories, fields, confidence scoring, size limits |
 | `test_sessions/*.jsonl` | 5 sessions | Success, failure, partial, patterns, questions |
 ### What's Missing
 1. **Harvester integration test** — Does the prompt actually extract correct knowledge from real transcripts?
 2. **Bootstrapper test** — Does it assemble relevant context correctly?
 3. **Knowledge store test** — Does the index.json maintain consistency?
 4. **Confidence calibration test** — Do high-confidence facts actually prove true in later sessions?
 5. **Deduplication test** — Are duplicate facts across sessions handled?
 6. **Staleness test** — How does the system handle outdated knowledge?
 ---
 ## Security Considerations
 1. **No secrets in knowledge store** — The harvester must filter out API keys, tokens, and credentials from extracted facts. The prompt constraints mention this but there is no automated guard.
 2. **Knowledge poisoning** — A malicious or corrupted session could inject false facts. Confidence scoring partially mitigates this, but there is no verification step.
 3. **Access control** — The knowledge store has no access control. Any process that can read the directory can read all facts. In a multi-tenant setup, this is a concern.
 4. **Transcript privacy** — Session transcripts may contain user data. The harvester must not extract personally identifiable information into the knowledge store.
 ---
 ## The 100x Path (from README)
 ```
 Month 1:  15,000 facts, sessions 20% faster
 Month 2:  45,000 facts, sessions 40% faster, first-try success up 30%
 Month 3:  90,000 facts, fleet measurably smarter per token
 ```
 Each new session is better than the last. The intelligence compounds.
 ---
 *Generated by codebase-genome pipeline. Ref: timmy-home#676.*
--- a/scripts/dead_code_detector.py
+++ b/scripts/dead_code_detector.py
@@ -1,282 +0,0 @@
 #!/usr/bin/env python3
 """
 Dead Code Detector for Python Codebases
 AST-based analysis to find defined but never-called functions and classes.
 Excludes entry points, plugin hooks, __init__ exports.
 Usage:
  python3 scripts/dead_code_detector.py /path/to/repo/
  python3 scripts/dead_code_detector.py hermes-agent/ --format json
  python3 scripts/dead_code_detector.py . --exclude tests/,venv/
 Output: file:line, function/class name, last git author (if available)
 """
 import argparse
 import ast
 import json
 import os
 import subprocess
 import sys
 from collections import defaultdict
 from pathlib import Path
 from typing import Optional
 # Names that are expected to be unused (entry points, protocol methods, etc.)
 SAFE_UNUSED_PATTERNS = {
    # Python dunders
    "__init__", "__str__", "__repr__", "__eq__", "__hash__", "__len__",
    "__getitem__", "__setitem__", "__contains__", "__iter__", "__next__",
    "__enter__", "__exit__", "__call__", "__bool__", "__del__",
    "__post_init__", "__class_getitem__",
    # Common entry points
    "main", "app", "handler", "setup", "teardown", "fixture",
    # pytest
    "conftest", "test_", "pytest_",  # prefix patterns
    # Protocols / abstract
    "abstractmethod", "abc_",
 }
 def is_safe_unused(name: str, filepath: str) -> bool:
    """Check if an unused name is expected to be unused."""
    # Test files are exempt
    if "test" in filepath.lower():
        return True
    # Known patterns
    for pattern in SAFE_UNUSED_PATTERNS:
        if name.startswith(pattern) or name == pattern:
            return True
    # __init__.py exports are often unused internally
    if filepath.endswith("__init__.py"):
        return True
    return False
 def get_git_blame(filepath: str, lineno: int) -> Optional[str]:
    """Get last author of a line via git blame."""
    try:
        result = subprocess.run(
            ["git", "blame", "-L", f"{lineno},{lineno}", "--porcelain", filepath],
            capture_output=True, text=True, timeout=5
        )
        for line in result.stdout.split("\n"):
            if line.startswith("author "):
                return line[7:]
    except:
        pass
    return None
 class DefinitionCollector(ast.NodeVisitor):
    """Collect all function and class definitions."""
    def __init__(self):
        self.definitions = []  # (name, type, lineno, filepath)
    def visit_FunctionDef(self, node):
        self.definitions.append((node.name, "function", node.lineno))
        self.generic_visit(node)
    def visit_AsyncFunctionDef(self, node):
        self.definitions.append((node.name, "async_function", node.lineno))
        self.generic_visit(node)
    def visit_ClassDef(self, node):
        self.definitions.append((node.name, "class", node.lineno))
        self.generic_visit(node)
 class NameUsageCollector(ast.NodeVisitor):
    """Collect all name references (calls, imports, attribute access)."""
    def __init__(self):
        self.names = set()
        self.calls = set()
        self.imports = set()
    def visit_Name(self, node):
        self.names.add(node.id)
        self.generic_visit(node)
    def visit_Attribute(self, node):
        if isinstance(node.value, ast.Name):
            self.names.add(node.value.id)
        self.generic_visit(node)
    def visit_Call(self, node):
        if isinstance(node.func, ast.Name):
            self.calls.add(node.func.id)
        elif isinstance(node.func, ast.Attribute):
            if isinstance(node.func.value, ast.Name):
                self.names.add(node.func.value.id)
            self.calls.add(node.func.attr)
        self.generic_visit(node)
    def visit_Import(self, node):
        for alias in node.names:
            self.imports.add(alias.asname or alias.name)
        self.generic_visit(node)
    def visit_ImportFrom(self, node):
        for alias in node.names:
            self.imports.add(alias.asname or alias.name)
        self.generic_visit(node)
 def analyze_file(filepath: str) -> dict:
    """Analyze a single Python file for dead code."""
    path = Path(filepath)
    try:
        content = path.read_text()
        tree = ast.parse(content, filename=str(filepath))
    except (SyntaxError, UnicodeDecodeError):
        return {"error": f"Could not parse {filepath}"}
    # Collect definitions
    def_collector = DefinitionCollector()
    def_collector.visit(tree)
    definitions = def_collector.definitions
    # Collect usage
    usage_collector = NameUsageCollector()
    usage_collector.visit(tree)
    used_names = usage_collector.names | usage_collector.calls | usage_collector.imports
    # Also scan the entire repo for references to this file's definitions
    # (this is done at the repo level, not file level)
    dead = []
    for name, def_type, lineno in definitions:
        if name.startswith("_") and not name.startswith("__"):
            # Private functions — might be used externally, less likely dead
            pass
        if name not in used_names:
            if not is_safe_unused(name, filepath):
                dead.append({
                    "name": name,
                    "type": def_type,
                    "file": filepath,
                    "line": lineno,
                })
    return {"definitions": len(definitions), "dead": dead}
 def scan_repo(repo_path: str, exclude_patterns: list = None) -> dict:
    """Scan an entire repo for dead code."""
    path = Path(repo_path)
    exclude = exclude_patterns or ["venv", ".venv", "node_modules", "__pycache__",
                                    ".git", "dist", "build", ".tox", "vendor"]
    all_definitions = {}  # name -> [{file, line, type}]
    all_files = []
    dead_code = []
    # First pass: collect all definitions across repo
    for fpath in path.rglob("*.py"):
        parts = fpath.parts
        if any(ex in parts for ex in exclude):
            continue
        if fpath.name.startswith("."):
            continue
        try:
            content = fpath.read_text(errors="ignore")
            tree = ast.parse(content, filename=str(fpath))
        except:
            continue
        all_files.append(str(fpath))
        collector = DefinitionCollector()
        collector.visit(tree)
        for name, def_type, lineno in collector.definitions:
            rel_path = str(fpath.relative_to(path))
            if name not in all_definitions:
                all_definitions[name] = []
            all_definitions[name].append({
                "file": rel_path,
                "line": lineno,
                "type": def_type,
            })
    # Second pass: check each name for usage across entire repo
    all_used_names = set()
    for fpath_str in all_files:
        try:
            content = Path(fpath_str).read_text(errors="ignore")
            tree = ast.parse(content)
        except:
            continue
        usage = NameUsageCollector()
        usage.visit(tree)
        all_used_names.update(usage.names)
        all_used_names.update(usage.calls)
        all_used_names.update(usage.imports)
    # Find dead code
    for name, locations in all_definitions.items():
        if name not in all_used_names:
            for loc in locations:
                if not is_safe_unused(name, loc["file"]):
                    dead_code.append({
                        "name": name,
                        "type": loc["type"],
                        "file": loc["file"],
                        "line": loc["line"],
                    })
    return {
        "repo": path.name,
        "files_scanned": len(all_files),
        "total_definitions": sum(len(v) for v in all_definitions.values()),
        "dead_code_count": len(dead_code),
        "dead_code": sorted(dead_code, key=lambda x: (x["file"], x["line"])),
    }
 def main():
    parser = argparse.ArgumentParser(description="Find dead code in Python codebases")
    parser.add_argument("repo", help="Repository path to scan")
    parser.add_argument("--format", choices=["text", "json"], default="text")
    parser.add_argument("--exclude", help="Comma-separated patterns to exclude")
    parser.add_argument("--git-blame", action="store_true", help="Include git blame info")
    args = parser.parse_args()
    exclude = args.exclude.split(",") if args.exclude else None
    result = scan_repo(args.repo, exclude)
    if args.format == "json":
        print(json.dumps(result, indent=2))
    else:
        print(f"Dead Code Report: {result['repo']}")
        print(f"Files scanned: {result['files_scanned']}")
        print(f"Total definitions: {result['total_definitions']}")
        print(f"Dead code found: {result['dead_code_count']}")
        print()
        if result["dead_code"]:
            print(f"{'File':<45} {'Line':>4} {'Type':<10} {'Name'}")
            print("-" * 85)
            for item in result["dead_code"]:
                author = ""
                if args.git_blame:
                    author = get_git_blame(
                        os.path.join(args.repo, item["file"]),
                        item["line"]
                    ) or ""
                    author = f" ({author})" if author else ""
                print(f"{item['file']:<45} {item['line']:>4} {item['type']:<10} {item['name']}{author}")
        else:
            print("No dead code detected!")
 if __name__ == "__main__":
    main()