feat: Add parser tests (closes #177 )

feat: Add Gitea issue body parser (closes #177 )
2026-04-15 03:50:04 +00:00 · 2026-04-15 03:49:00 +00:00
3 changed files with 240 additions and 239 deletions
--- a/GENOME.md
+++ b/GENOME.md
@@ -1,239 +0,0 @@
 # GENOME.md — compounding-intelligence
 *Auto-generated codebase genome. See timmy-home#676.*
 ---
 ## Project Overview
 **What:** A system that turns 1B+ daily agent tokens into durable, compounding fleet intelligence.
 **Why:** Every agent session starts at zero. The same mistakes get made repeatedly — the same HTTP 405 is rediscovered as a branch protection issue, the same token path is searched for from scratch. Intelligence evaporates when the session ends.
 **How:** Three pipelines form a compounding loop:
 ```
 SESSION ENDS → HARVESTER → KNOWLEDGE STORE → BOOTSTRAPPER → NEW SESSION STARTS SMARTER
                              ↓
                         MEASURER → Prove it's working
 ```
 **Status:** Early stage. Template and test scaffolding exist. Core pipeline scripts (harvester.py, bootstrapper.py, measurer.py, session_reader.py) are planned but not yet implemented. The knowledge extraction prompt is complete and validated.
 ---
 ## Architecture
 ```mermaid
 graph TD
    A[Session Transcript<br/>.jsonl] --> B[Harvester]
    B --> C{Extract Knowledge}
    C --> D[knowledge/index.json]
    C --> E[knowledge/global/*.md]
    C --> F[knowledge/repos/{repo}.md]
    C --> G[knowledge/agents/{agent}.md]
    D --> H[Bootstrapper]
    H --> I[Bootstrap Context<br/>2k token injection]
    I --> J[New Session<br/>starts smarter]
    J --> A
    D --> K[Measurer]
    K --> L[metrics/dashboard.md]
    K --> M[Velocity / Hit Rate<br/>Error Reduction]
 ```
 ### Pipeline 1: Harvester
 **Status:** Prompt designed. Script not implemented.
 Reads finished session transcripts (JSONL). Uses `templates/harvest-prompt.md` to extract durable knowledge into five categories:
 | Category | Description | Example |
 |----------|-------------|---------|
 | `fact` | Concrete, verifiable information | "Repository X has 5 files" |
 | `pitfall` | Errors encountered, wrong assumptions | "Token is at ~/.config/gitea/token, not env var" |
 | `pattern` | Successful action sequences | "Deploy: test → build → push → webhook" |
 | `tool-quirk` | Environment-specific behaviors | "URL format requires trailing slash" |
 | `question` | Identified but unanswered | "Need optimal batch size for harvesting" |
 Output schema per knowledge item:
 ```json
 {
  "fact": "One sentence description",
  "category": "fact|pitfall|pattern|tool-quirk|question",
  "repo": "repo-name or 'global'",
  "confidence": 0.0-1.0
 }
 ```
 ### Pipeline 2: Bootstrapper
 **Status:** Not implemented.
 Queries knowledge store before session start. Assembles a compact 2k-token context from relevant facts. Injects into session startup so the agent begins with full situational awareness.
 ### Pipeline 3: Measurer
 **Status:** Not implemented.
 Tracks compounding metrics: knowledge velocity (facts/day), error reduction (%), hit rate (knowledge used / knowledge available), task completion improvement.
 ---
 ## Directory Structure
 ```
 compounding-intelligence/
 ├── README.md                           # Project overview and architecture
 ├── GENOME.md                           # This file (codebase genome)
 ├── knowledge/                          # [PLANNED] Knowledge store
 │   ├── index.json                      # Machine-readable fact index
 │   ├── global/                         # Cross-repo knowledge
 │   ├── repos/{repo}.md                 # Per-repo knowledge
 │   └── agents/{agent}.md               # Agent-type notes
 ├── scripts/
 │   ├── test_harvest_prompt.py          # Basic prompt validation (2.5KB)
 │   └── test_harvest_prompt_comprehensive.py  # Full prompt structure test (6.8KB)
 ├── templates/
 │   └── harvest-prompt.md               # Knowledge extraction prompt (3.5KB)
 ├── test_sessions/
 │   ├── session_success.jsonl           # Happy path test data
 │   ├── session_failure.jsonl           # Failure path test data
 │   ├── session_partial.jsonl           # Incomplete session test data
 │   ├── session_patterns.jsonl          # Pattern extraction test data
 │   └── session_questions.jsonl         # Question identification test data
 └── metrics/                            # [PLANNED] Compounding metrics
    └── dashboard.md
 ```
 ---
 ## Entry Points and Data Flow
 ### Entry Point 1: Knowledge Extraction (Harvester)
 ```
 Input:  Session transcript (JSONL)
        ↓
        templates/harvest-prompt.md (LLM prompt)
        ↓
        Knowledge items (JSON array)
        ↓
 Output: knowledge/index.json + per-repo/per-agent markdown files
 ```
 ### Entry Point 2: Session Bootstrap (Bootstrapper)
 ```
 Input:  Session context (repo, agent type, task type)
        ↓
        knowledge/index.json (query relevant facts)
        ↓
        2k-token bootstrap context
        ↓
 Output: Injected into session startup
 ```
 ### Entry Point 3: Measurement (Measurer)
 ```
 Input:  knowledge/index.json + session history
        ↓
        Velocity, hit rate, error reduction calculations
        ↓
 Output: metrics/dashboard.md
 ```
 ---
 ## Key Abstractions
 ### Knowledge Item
 The atomic unit. One sentence, one category, one confidence score. Designed to be small enough that 1000 items fit in a 2k-token bootstrap context.
 ### Knowledge Store
 A directory structure that mirrors the fleet's mental model:
 - `global/` — knowledge that applies everywhere (tool quirks, environment facts)
 - `repos/` — knowledge specific to each repo
 - `agents/` — knowledge specific to each agent type
 ### Confidence Score
 0.0–1.0 scale. Defines how certain the harvester is about each extracted fact:
 - 0.9–1.0: Explicitly stated with verification
 - 0.7–0.8: Clearly implied by multiple data points
 - 0.5–0.6: Suggested but not fully verified
 - 0.3–0.4: Inferred from limited data
 - 0.1–0.2: Speculative or uncertain
 ### Bootstrap Context
 The 2k-token injection that a new session receives. Assembled from the most relevant knowledge items for the current task, filtered by confidence > 0.7, deduplicated, and compressed.
 ---
 ## API Surface
 ### Internal (scripts not yet implemented)
 | Script | Input | Output | Status |
 |--------|-------|--------|--------|
 | `harvester.py` | Session JSONL path | Knowledge items JSON | PLANNED |
 | `bootstrapper.py` | Repo + agent type | 2k-token context string | PLANNED |
 | `measurer.py` | Knowledge store path | Metrics JSON | PLANNED |
 | `session_reader.py` | Session JSONL path | Parsed transcript | PLANNED |
 ### Prompt (templates/harvest-prompt.md)
 The extraction prompt is the core "API." It takes a session transcript and returns structured JSON. It defines:
 - Five extraction categories
 - Output format (JSON array of knowledge items)
 - Confidence scoring rubric
 - Constraints (no hallucination, specificity, relevance, brevity)
 - Example input/output pair
 ---
 ## Test Coverage
 ### What Exists
 | File | Tests | Coverage |
 |------|-------|----------|
 | `scripts/test_harvest_prompt.py` | 2 tests | Prompt file existence, sample transcript |
 | `scripts/test_harvest_prompt_comprehensive.py` | 5 tests | Prompt structure, categories, fields, confidence scoring, size limits |
 | `test_sessions/*.jsonl` | 5 sessions | Success, failure, partial, patterns, questions |
 ### What's Missing
 1. **Harvester integration test** — Does the prompt actually extract correct knowledge from real transcripts?
 2. **Bootstrapper test** — Does it assemble relevant context correctly?
 3. **Knowledge store test** — Does the index.json maintain consistency?
 4. **Confidence calibration test** — Do high-confidence facts actually prove true in later sessions?
 5. **Deduplication test** — Are duplicate facts across sessions handled?
 6. **Staleness test** — How does the system handle outdated knowledge?
 ---
 ## Security Considerations
 1. **No secrets in knowledge store** — The harvester must filter out API keys, tokens, and credentials from extracted facts. The prompt constraints mention this but there is no automated guard.
 2. **Knowledge poisoning** — A malicious or corrupted session could inject false facts. Confidence scoring partially mitigates this, but there is no verification step.
 3. **Access control** — The knowledge store has no access control. Any process that can read the directory can read all facts. In a multi-tenant setup, this is a concern.
 4. **Transcript privacy** — Session transcripts may contain user data. The harvester must not extract personally identifiable information into the knowledge store.
 ---
 ## The 100x Path (from README)
 ```
 Month 1:  15,000 facts, sessions 20% faster
 Month 2:  45,000 facts, sessions 40% faster, first-try success up 30%
 Month 3:  90,000 facts, fleet measurably smarter per token
 ```
 Each new session is better than the last. The intelligence compounds.
 ---
 *Generated by codebase-genome pipeline. Ref: timmy-home#676.*
--- a/scripts/gitea_issue_parser.py
+++ b/scripts/gitea_issue_parser.py
@@ -0,0 +1,131 @@
 #!/usr/bin/env python3
 """
 Gitea Issue Body Parser — Extract structured data from markdown issue bodies.
 Usage:
    cat issue_body.txt | python3 scripts/gitea_issue_parser.py --stdin --pretty
    python3 scripts/gitea_issue_parser.py --url https://forge.../api/v1/repos/.../issues/123 --pretty
    python3 scripts/gitea_issue_parser.py body.txt --title "Fix thing (#42)" --labels pipeline extraction
 """
 import argparse
 import json
 import re
 import sys
 from typing import Dict, List, Any, Optional
 def parse_issue_body(body: str, title: str = "", labels: List[str] = None) -> Dict[str, Any]:
    """Parse a Gitea issue markdown body into structured JSON.
    Extracted fields:
    - title: Issue title
    - context: Background/description section
    - criteria[]: Acceptance criteria (checkboxes or numbered lists)
    - labels[]: Issue labels
    - epic_ref: Parent/epic issue reference (from "Closes #N" or title)
    - sections{}: All ## sections as key-value pairs
    """
    result = {
        "title": title,
        "context": "",
        "criteria": [],
        "labels": labels or [],
        "epic_ref": None,
        "sections": {},
    }
    if not body:
        return result
    # Extract epic reference from title or body
    epic_patterns = [
        r"(?:closes|fixes|addresses|refs?)\s+#(\d+)",
        r"#(\d+)",
    ]
    for pattern in epic_patterns:
        match = re.search(pattern, (title + " " + body).lower())
        if match:
            result["epic_ref"] = int(match.group(1))
            break
    # Parse ## sections
    section_pattern = r"^##\s+(.+?)$\n((?:^(?!##\s).*$\n?)*)"
    for match in re.finditer(section_pattern, body, re.MULTILINE):
        section_name = match.group(1).strip().lower().replace(" ", "_")
        section_content = match.group(2).strip()
        result["sections"][section_name] = section_content
    # Extract acceptance criteria (checkboxes)
    checkbox_pattern = r"^\s*-\s*\[([ xX])\]\s*(.+)$"
    for match in re.finditer(checkbox_pattern, body, re.MULTILINE):
        checked = match.group(1).lower() == "x"
        text = match.group(2).strip()
        result["criteria"].append({"text": text, "checked": checked})
    # If no checkboxes, try numbered lists in "Acceptance Criteria" or "Criteria" section
    if not result["criteria"]:
        for section_name in ["acceptance_criteria", "criteria", "acceptance criteria"]:
            if section_name in result["sections"]:
                numbered = r"^\s*\d+\.\s*(.+)$"
                for match in re.finditer(numbered, result["sections"][section_name], re.MULTILINE):
                    result["criteria"].append({"text": match.group(1).strip(), "checked": False})
                break
    # Extract context (first section or first paragraph before any ## heading)
    first_heading = body.find("## ")
    if first_heading > 0:
        context_text = body[:first_heading].strip()
    else:
        context_text = body.split("\n\n")[0].strip()
    # Clean up: remove "## Context" or "## Problem" header if present
    context_text = re.sub(r"^#+\s*\w+\s*\n?", "", context_text).strip()
    result["context"] = context_text[:500]  # Cap at 500 chars
    return result
 def fetch_issue_from_url(url: str) -> Dict[str, Any]:
    """Fetch an issue from a Gitea API URL and parse it."""
    import urllib.request
    req = urllib.request.Request(url, headers={"Accept": "application/json"})
    with urllib.request.urlopen(req) as resp:
        data = json.loads(resp.read())
    return parse_issue_body(
        body=data.get("body", ""),
        title=data.get("title", ""),
        labels=[l["name"] for l in data.get("labels", [])]
    )
 def main():
    parser = argparse.ArgumentParser(description="Parse Gitea issue markdown into structured JSON")
    parser.add_argument("file", nargs="?", help="Issue body file (or use --stdin)")
    parser.add_argument("--stdin", action="store_true", help="Read from stdin")
    parser.add_argument("--url", help="Gitea API URL to fetch issue from")
    parser.add_argument("--title", default="", help="Issue title")
    parser.add_argument("--labels", nargs="*", default=[], help="Issue labels")
    parser.add_argument("--pretty", action="store_true", help="Pretty-print JSON output")
    args = parser.parse_args()
    if args.url:
        result = fetch_issue_from_url(args.url)
    elif args.stdin:
        body = sys.stdin.read()
        result = parse_issue_body(body, args.title, args.labels)
    elif args.file:
        with open(args.file) as f:
            body = f.read()
        result = parse_issue_body(body, args.title, args.labels)
    else:
        parser.print_help()
        sys.exit(1)
    indent = 2 if args.pretty else None
    print(json.dumps(result, indent=indent))
 if __name__ == "__main__":
    main()
--- a/scripts/test_gitea_issue_parser.py
+++ b/scripts/test_gitea_issue_parser.py
@@ -0,0 +1,109 @@
 #!/usr/bin/env python3
 """Tests for scripts/gitea_issue_parser.py"""
 import sys
 import os
 sys.path.insert(0, os.path.dirname(__file__) or ".")
 # Import from sibling
 import importlib.util
 spec = importlib.util.spec_from_file_location("parser", os.path.join(os.path.dirname(__file__) or ".", "gitea_issue_parser.py"))
 mod = importlib.util.module_from_spec(spec)
 spec.loader.exec_module(mod)
 parse_issue_body = mod.parse_issue_body
 def test_basic_parsing():
    body = """## Context
 This is the background info.
 ## Acceptance Criteria
 - [ ] First criterion
 - [x] Second criterion (done)
 ## What to build
 Some description.
 """
    result = parse_issue_body(body, title="Test (#42)", labels=["bug"])
    assert result["title"] == "Test (#42)"
    assert result["labels"] == ["bug"]
    assert result["epic_ref"] == 42
    assert len(result["criteria"]) == 2
    assert result["criteria"][0]["text"] == "First criterion"
    assert result["criteria"][0]["checked"] == False
    assert result["criteria"][1]["checked"] == True
    assert "context" in result["sections"]
    print("PASS: test_basic_parsing")
 def test_numbered_criteria():
    body = """## Acceptance Criteria
 1. First item
 2. Second item
 3. Third item
 """
    result = parse_issue_body(body)
    assert len(result["criteria"]) == 3
    assert result["criteria"][0]["text"] == "First item"
    print("PASS: test_numbered_criteria")
 def test_epic_ref_from_body():
    body = "Closes #123\n\nSome description."
    result = parse_issue_body(body)
    assert result["epic_ref"] == 123
    print("PASS: test_epic_ref_from_body")
 def test_empty_body():
    result = parse_issue_body("")
    assert result["criteria"] == []
    assert result["context"] == ""
    assert result["sections"] == {}
    print("PASS: test_empty_body")
 def test_no_sections():
    body = "Just a plain issue body with no headings."
    result = parse_issue_body(body)
    assert result["context"] == "Just a plain issue body with no headings."
    print("PASS: test_no_sections")
 def test_multiple_sections():
    body = """## Problem
 Something is broken.
 ## Fix
 Do this instead.
 ## Notes
 Additional info.
 """
    result = parse_issue_body(body)
    assert "problem" in result["sections"]
    assert "fix" in result["sections"]
    assert "notes" in result["sections"]
    assert "Something is broken" in result["sections"]["problem"]
    print("PASS: test_multiple_sections")
 def run_all():
    test_basic_parsing()
    test_numbered_criteria()
    test_epic_ref_from_body()
    test_empty_body()
    test_no_sections()
    test_multiple_sections()
    print("\nAll 6 tests passed!")
 if __name__ == "__main__":
    run_all()
Author	SHA1	Message	Date
Alexander Whitestone	54f3bef7fc	feat: Add parser tests (closes #177 )	2026-04-15 03:50:04 +00:00
Alexander Whitestone	4fcd372de4	feat: Add Gitea issue body parser (closes #177 )	2026-04-15 03:49:00 +00:00