feat: add harvester.py — session knowledge extractor (#8 )

feat: add session_reader.py for JSONL parsing (#8 )
2026-04-14 17:27:20 +00:00 · 2026-04-14 17:26:21 +00:00
55 changed files with 134 additions and 10866 deletions
--- a/.gitea/workflows/test.yml
+++ b/.gitea/workflows/test.yml
@@ -1,22 +0,0 @@
-name: Test
-
-on:
-  pull_request:
-  push:
-    branches: [main]
-
-jobs:
-  pytest:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v4
-      - uses: actions/setup-python@v5
-        with:
-          python-version: '3.11'
-      - name: Install test dependencies
-        run: |
-          python -m pip install --upgrade pip
-          pip install -r requirements.txt
-      - name: Run test suite
-        run: |
-          make test
--- a/.gitignore
+++ b/.gitignore
@@ -1,2 +0,0 @@
-__pycache__/
-*.pyc
--- a/GENOME.md
+++ b/GENOME.md
@@ -1,251 +0,0 @@
-# GENOME.md — compounding-intelligence
-
-**Generated:** 2026-04-17
-**Repo:** Timmy_Foundation/compounding-intelligence
-**Description:** Turn 1B+ daily agent tokens into durable, compounding fleet intelligence.
-
---
-
-## Project Overview
-
-Every agent session starts at zero. The same HTTP 405 gets rediscovered as a branch protection issue. The same token path gets searched from scratch. Intelligence evaporates when the session ends.
-
-Compounding-intelligence solves this with three pipelines forming a loop:
-
-```
-SESSION ENDS → HARVESTER → KNOWLEDGE STORE → BOOTSTRAPPER → NEW SESSION STARTS SMARTER
-                              ↓
-                         MEASURER → Prove it's working
-```
-
-**Status:** Active development. Core pipelines implemented. 20+ scripts, 14 test files, knowledge store populated with real data.
-
-## Architecture
-
-```mermaid
-graph TD
-    TRANS[Session Transcripts<br/>~/.hermes/sessions/*.jsonl] --> READER[session_reader.py]
-    READER --> HARVESTER[harvester.py]
-    HARVESTER -->|LLM extraction| PROMPT[harvest-prompt.md]
-    HARVESTER --> DEDUP[deduplicate()]
-    DEDUP --> INDEX[knowledge/index.json]
-    DEDUP --> GLOBAL[knowledge/global/*.yaml]
-    DEDUP --> REPO[knowledge/repos/*.yaml]
-
-    INDEX --> BOOTSTRAPPER[bootstrapper.py]
-    BOOTSTRAPPER -->|filter + rank + truncate| CONTEXT[Bootstrap Context<br/>2k token injection]
-    CONTEXT --> SESSION[New Session starts smarter]
-
-    INDEX --> VALIDATOR[validate_knowledge.py]
-    INDEX --> STALENESS[knowledge_staleness_check.py]
-    INDEX --> GAPS[knowledge_gap_identifier.py]
-
-    TRANS --> SAMPLER[sampler.py]
-    SAMPLER -->|score + rank| BEST[High-value sessions]
-    BEST --> HARVESTER
-
-    TRANS --> METADATA[session_metadata.py]
-    METADATA --> SUMMARY[SessionSummary objects]
-
-    KNOWLEDGE --> DIFF[diff_analyzer.py]
-    DIFF --> PROPOSALS[improvement_proposals.py]
-    PROPOSALS --> PRIORITIES[priority_rebalancer.py]
-```
-
-## Entry Points
-
-### Core Pipelines
-
-| Script | Purpose | Key Functions |
-|--------|---------|---------------|
-| `harvester.py` | Extract knowledge from session transcripts | `harvest_session()`, `call_llm()`, `deduplicate()`, `validate_fact()` |
-| `bootstrapper.py` | Build pre-session context from knowledge store | `build_bootstrap_context()`, `filter_facts()`, `sort_facts()`, `truncate_to_tokens()` |
-| `session_reader.py` | Parse JSONL session transcripts | `read_session()`, `extract_conversation()`, `messages_to_text()` |
-| `sampler.py` | Score and rank sessions for harvesting value | `scan_session_fast()`, `score_session()` |
-| `session_metadata.py` | Extract structured metadata from sessions | `extract_session_metadata()`, `SessionSummary` |
-
-### Analysis & Quality
-
-| Script | Purpose |
-|--------|---------|
-| `validate_knowledge.py` | Validate knowledge index schema compliance |
-| `knowledge_staleness_check.py` | Detect stale knowledge (source changed since extraction) |
-| `knowledge_gap_identifier.py` | Find untested functions, undocumented APIs, missing tests |
-| `diff_analyzer.py` | Analyze code diffs for improvement signals |
-| `improvement_proposals.py` | Generate ranked improvement proposals |
-| `priority_rebalancer.py` | Rebalance priorities across proposals |
-| `automation_opportunity_finder.py` | Find manual steps that can be automated |
-| `dead_code_detector.py` | Detect unused code |
-| `dependency_graph.py` | Map dependency relationships |
-| `perf_bottleneck_finder.py` | Find performance bottlenecks |
-| `refactoring_opportunity_finder.py` | Identify refactoring targets |
-| `gitea_issue_parser.py` | Parse Gitea issues for knowledge extraction |
-
-### Automation
-
-| Script | Purpose |
-|--------|---------|
-| `session_pair_harvester.py` | Extract training pairs from sessions |
-
-## Data Flow
-
-```
-1. Session ends → .jsonl written to ~/.hermes/sessions/
-2. sampler.py scores sessions by age, recency, repo coverage
-3. harvester.py reads top sessions, calls LLM with harvest-prompt.md
-4. LLM extracts facts/pitfalls/patterns/quirks/questions
-5. deduplicate() checks against existing index via fact_fingerprint()
-6. validate_fact() checks schema compliance
-7. write_knowledge() appends to knowledge/index.json + per-repo YAML
-8. On next session start, bootstrapper.py:
-   a. Loads knowledge/index.json
-   b. Filters by session's repo and agent type
-   c. Sorts by confidence (high first), then recency
-   d. Truncates to 2k token budget
-   e. Injects as pre-context
-9. Agent starts with full situational awareness instead of zero
-```
-
-## Key Abstractions
-
-### Knowledge Item (fact/pitfall/pattern/quirk/question)
-```json
-{
-  "fact": "Gitea token is at ~/.config/gitea/token",
-  "category": "tool-quirk",
-  "repo": "global",
-  "confidence": 0.9,
-  "evidence": "Found during clone attempt",
-  "source_session": "2026-04-13_abc123",
-  "extracted_at": "2026-04-13T20:00:00Z"
-}
-```
-
-### SessionSummary (session_metadata.py)
-Extracted metadata per session: duration, token count, tools used, repos touched, error count, outcome.
-
-### Gap / GapReport (knowledge_gap_identifier.py)
-Structured gap analysis: untested functions, undocumented APIs, missing tests. Severity: critical/high/medium/low.
-
-### Knowledge Index (knowledge/index.json)
-Machine-readable fact store. 12KB, populated with real data. Categories: fact, pitfall, pattern, tool-quirk, question.
-
-## Knowledge Store
-
-```
-knowledge/
-├── index.json              # Master fact store (12KB, populated)
-├── SCHEMA.md               # Schema documentation
-├── global/
-│   ├── pitfalls.yaml       # Cross-repo pitfalls (2KB)
-│   └── tool-quirks.yaml    # Tool-specific quirks (2KB)
-├── repos/
-│   ├── hermes-agent.yaml   # hermes-agent knowledge (2KB)
-│   └── the-nexus.yaml      # the-nexus knowledge (2KB)
-└── agents/                 # Per-agent knowledge (empty)
-```
-
-## API Surface
-
-### LLM API (consumed)
-| Provider | Endpoint | Usage |
-|----------|----------|-------|
-| Nous Research | `https://inference-api.nousresearch.com/v1` | Knowledge extraction |
-| Ollama | `http://localhost:11434/v1` | Local fallback |
-
-### File API (consumed/produced)
-| Path | Format | Direction |
-|------|--------|-----------|
-| `~/.hermes/sessions/*.jsonl` | JSONL | Input (session transcripts) |
-| `knowledge/index.json` | JSON | Output (master fact store) |
-| `knowledge/global/*.yaml` | YAML | Output (cross-repo knowledge) |
-| `knowledge/repos/*.yaml` | YAML | Output (per-repo knowledge) |
-| `templates/harvest-prompt.md` | Markdown | Config (extraction prompt) |
-
-## Test Coverage
-
-**14 test files** covering core pipelines:
-
-| Test File | Covers |
-|-----------|--------|
-| `test_harvest_prompt.py` | Prompt validation, hallucination detection |
-| `test_harvest_prompt_comprehensive.py` | Extended prompt testing |
-| `test_harvester_pipeline.py` | Harvester extraction + dedup |
-| `test_bootstrapper.py` | Context building, filtering, truncation |
-| `test_session_pair_harvester.py` | Training pair extraction |
-| `test_improvement_proposals.py` | Proposal generation |
-| `test_priority_rebalancer.py` | Priority scoring |
-| `test_knowledge_staleness.py` | Staleness detection |
-| `test_automation_opportunity_finder.py` | Automation detection |
-| `test_diff_analyzer.py` | Diff analysis |
-| `test_gitea_issue_parser.py` | Issue parsing |
-| `test_refactoring_opportunity_finder.py` | Refactoring signals |
-| `test_knowledge_gap_identifier.py` | Gap analysis |
-| `test_perf_bottleneck_finder.py` | Perf bottleneck detection |
-
-### Coverage Gaps
-
-1. **session_reader.py** — No dedicated test file (tested indirectly)
-2. **sampler.py** — No test file (scoring logic untested)
-3. **session_metadata.py** — No test file
-4. **validate_knowledge.py** — No test file
-5. **knowledge_staleness_check.py** — Tested but limited
-
-## Security Considerations
-
-### API Key Handling
- `harvester.py` reads API key from `~/.hermes/auth.json` or env vars
- Key passed to LLM API in request headers only
- No key logging
-
-### Knowledge Integrity
- `validate_fact()` checks schema before writing
- `deduplicate()` prevents duplicate entries via fingerprint
- `knowledge_staleness_check.py` detects when source code changed but knowledge didn't
- Confidence scores prevent low-quality knowledge from polluting the store
-
-### File Safety
- Knowledge writes are append-only (never deletes)
- Bootstrap context is truncated to budget (no prompt injection via knowledge)
- Session reader handles malformed JSONL gracefully
-
-## File Index
-
-```
-scripts/
-  harvester.py                          (473 lines) — Core knowledge extraction
-  bootstrapper.py                       (302 lines) — Pre-session context builder
-  session_reader.py                     (137 lines) — JSONL session parser
-  sampler.py                            (363 lines) — Session scoring + ranking
-  session_metadata.py                   (271 lines) — Session metadata extraction
-  validate_knowledge.py                  (44 lines) — Index validation
-  knowledge_staleness_check.py          (125 lines) — Staleness detection
-  knowledge_gap_identifier.py           (291 lines) — Gap analysis engine
-  diff_analyzer.py                      (203 lines) — Diff analysis
-  improvement_proposals.py              (518 lines) — Proposal generation
-  priority_rebalancer.py                (745 lines) — Priority scoring
-  automation_opportunity_finder.py      (600 lines) — Automation detection
-  dead_code_detector.py                 (270 lines) — Dead code detection
-  dependency_graph.py                   (220 lines) — Dependency mapping
-  perf_bottleneck_finder.py             (635 lines) — Perf analysis
-  refactoring_opportunity_finder.py      (46 lines) — Refactoring signals
-  gitea_issue_parser.py                 (140 lines) — Gitea issue parsing
-  session_pair_harvester.py             (224 lines) — Training pair extraction
-knowledge/
-  index.json                             (12KB)    — Master fact store
-  SCHEMA.md                              (3KB)     — Schema docs
-  global/pitfalls.yaml                   (2KB)     — Cross-repo pitfalls
-  global/tool-quirks.yaml                (2KB)     — Tool quirks
-  repos/hermes-agent.yaml                (2KB)     — Repo-specific knowledge
-  repos/the-nexus.yaml                   (2KB)     — Repo-specific knowledge
-templates/
-  harvest-prompt.md                      (4KB)     — Extraction prompt
-test_sessions/                            (5 files) — Sample transcripts
-tests/ + scripts/test_*                   (14 files)— Test suite
-```
-
-**Total:** ~6,500 lines of code across 18 scripts + 14 test files.
-
---
-
-*Generated by Codebase Genome pipeline — Issue #676*
--- a/4
+++ b/4
@@ -1,4 +0,0 @@
-.PHONY: test
-
-test:
-	python3 -m pytest tests/test_ci_config.py scripts/test_*.py -v
--- a/knowledge/SCHEMA.md
+++ b/knowledge/SCHEMA.md
@@ -1,114 +0,0 @@
-# Knowledge File Format Specification
-
-**Version:** 1
-**Issue:** #10
-**Status:** Draft
-
---
-
-## Overview
-
-The knowledge system has two layers:
-
-1. **index.json** — Machine-readable fact index. Fast lookups by ID, category, repo, tags.
-2. **Knowledge files** (YAML) — Human-readable, editable facts organized by domain.
-
-The harvester writes to both. The bootstrapper reads from index.json. Humans edit the YAML files directly.
-
---
-
-## index.json Schema
-
-```json
-{
-  "version": 1,
-  "last_updated": "ISO-8601 timestamp",
-  "total_facts": 0,
-  "facts": []
-}
-```
-
-### Fact Object
-
-| Field | Type | Required | Description |
-|-------|------|----------|-------------|
-| `id` | string | yes | Unique identifier: `{domain}:{category}:{sequence}` |
-| `fact` | string | yes | One-sentence description of the knowledge |
-| `category` | enum | yes | One of: `fact`, `pitfall`, `pattern`, `tool-quirk`, `question` |
-| `domain` | string | yes | Where this applies: repo name, `global`, or agent name |
-| `confidence` | float | yes | 0.0–1.0. How certain is this knowledge? |
-| `tags` | string[] | no | Searchable labels |
-| `source_count` | int | no | How many sessions confirmed this fact |
-| `first_seen` | date | no | ISO-8601 date first extracted |
-| `last_confirmed` | date | no | ISO-8601 date last seen in a session |
-| `expires` | date | no | Optional. After this date, fact is stale |
-| `related` | string[] | no | IDs of related facts |
-
-### ID Format: `{domain}:{category}:{sequence}`
-
-### Categories
-
-| Category | Definition |
-|----------|------------|
-| `fact` | Concrete, verifiable information |
-| `pitfall` | Errors, wrong assumptions, time-wasters |
-| `pattern` | Successful sequences of actions |
-| `tool-quirk` | Environment-specific behaviors |
-| `question` | Identified but unanswered |
-
-### Confidence Scoring
-
-| Range | Meaning |
-|-------|---------|
-| 0.9–1.0 | Explicitly stated and verified |
-| 0.7–0.8 | Clearly implied by multiple data points |
-| 0.5–0.6 | Suggested but not fully verified |
-| 0.3–0.4 | Inferred from limited data |
-| 0.1–0.2 | Speculative or uncertain |
-
---
-
-## Directory Structure
-
-```
-knowledge/
-├── index.json                  # Machine-readable fact index
-├── SCHEMA.md                   # This file
-├── global/                     # Cross-repo knowledge
-│   ├── pitfalls.yaml
-│   ├── patterns.yaml
-│   └── tool-quirks.yaml
-├── repos/                      # Per-repo knowledge
-│   ├── {repo-name}.yaml
-│   └── ...
-└── agents/                     # Agent-type knowledge
-    └── {agent-type}.yaml
-```
-
-## YAML File Format
-
-YAML files use frontmatter for metadata, then markdown sections with fact entries:
-
-```yaml
---
-domain: global
-category: tool-quirk
-version: 1
-last_updated: "2026-04-13"
---
-
-# Title
-
-## Section
-
- id: global:tool-quirk:001
-  fact: "Description"
-  confidence: 0.95
-  tags: [tag1, tag2]
-  source_count: 5
-  first_seen: "2026-03-27"
-```
-
-## Validation
-
-Run `python scripts/validate_knowledge.py` to validate index.json.
--- a/knowledge/global/pitfalls.yaml
+++ b/knowledge/global/pitfalls.yaml
@@ -1,80 +0,0 @@
---
-domain: global
-category: pitfall
-version: 1
-last_updated: "2026-04-13"
---
-
-# Pitfalls (Global)
-
-Cross-repo traps that waste time across the fleet.
-
-## Git & Forge
-
- id: global:pitfall:001
-  fact: "Branch protection requires 1 approval on main - API merges fail with 405 without it"
-  confidence: 0.95
-  tags: [git, merge, branch-protection, gitea]
-  source_count: 12
-  first_seen: "2026-04-05"
-  last_confirmed: "2026-04-13"
-  related: [the-nexus:pitfall:001]
-
- id: global:pitfall:002
-  fact: "Never use --no-verify on git commits - it bypasses all hooks including safety checks"
-  confidence: 0.95
-  tags: [git, hooks, safety]
-  source_count: 5
-  first_seen: "2026-03-28"
-  last_confirmed: "2026-04-13"
-
- id: global:pitfall:003
-  fact: "Gitea PR creation workaround needed on the-nexus - direct API call fails, use alternative endpoint"
-  confidence: 0.9
-  tags: [gitea, pr, api, workaround]
-  source_count: 4
-  first_seen: "2026-04-06"
-  last_confirmed: "2026-04-12"
-
-## Agent Operations
-
- id: global:pitfall:004
-  fact: "Anthropic is BANNED from fallback chain - if fallback triggers to Anthropic, something is wrong"
-  confidence: 0.95
-  tags: [provider, anthropic, fallback]
-  source_count: 7
-  first_seen: "2026-03-30"
-  last_confirmed: "2026-04-13"
-
- id: global:pitfall:005
-  fact: "Telegram tokens expired - don't assume Telegram notifications work without checking"
-  confidence: 0.85
-  tags: [telegram, notifications, token]
-  source_count: 3
-  first_seen: "2026-04-02"
-
- id: global:pitfall:006
-  fact: "Multiple gateways = 'cannot schedule futures' error - only one gateway process should run"
-  confidence: 0.9
-  tags: [gateway, cron, process]
-  source_count: 4
-  first_seen: "2026-04-04"
-  last_confirmed: "2026-04-11"
-
-## Testing
-
- id: global:pitfall:007
-  fact: "pytest root collection picks up operational *_test.py scripts - restrict to tests/ directory"
-  confidence: 0.9
-  tags: [pytest, test, collection]
-  source_count: 3
-  first_seen: "2026-04-07"
-  last_confirmed: "2026-04-13"
-
- id: global:pitfall:008
-  fact: "TDD: test 1 before building 55 - verify the cycle works before scaling"
-  confidence: 0.95
-  tags: [tdd, testing, methodology]
-  source_count: 8
-  first_seen: "2026-03-25"
-  last_confirmed: "2026-04-13"
--- a/knowledge/global/tool-quirks.yaml
+++ b/knowledge/global/tool-quirks.yaml
@@ -1,71 +0,0 @@
---
-domain: global
-category: tool-quirk
-version: 1
-last_updated: "2026-04-13"
---
-
-# Tool Quirks (Global)
-
-## Authentication
-
- id: global:tool-quirk:001
-  fact: "Gitea token stored at ~/.config/gitea/token, not env var GITEA_TOKEN"
-  confidence: 0.95
-  tags: [git, auth, gitea, token]
-  source_count: 23
-  first_seen: "2026-03-27"
-  last_confirmed: "2026-04-13"
-  related: [global:pitfall:001]
-
- id: global:tool-quirk:002
-  fact: "Gitea API uses 'Authorization: token TOKEN' header format, not Bearer"
-  confidence: 0.9
-  tags: [git, api, gitea]
-  source_count: 8
-  first_seen: "2026-03-28"
-  last_confirmed: "2026-04-12"
-
- id: global:tool-quirk:003
-  fact: "Gitea Issues API type=issues param does NOT filter PRs - use truthiness check on pull_request field"
-  confidence: 0.95
-  tags: [gitea, api, issues, pr]
-  source_count: 6
-  first_seen: "2026-04-01"
-  last_confirmed: "2026-04-13"
-
-## Paths & Environment
-
- id: global:tool-quirk:004
-  fact: "~/.hermes is the default hermes home - check get_hermes_home() not the path literal"
-  confidence: 0.9
-  tags: [paths, hermes, env]
-  source_count: 10
-  first_seen: "2026-03-30"
-  last_confirmed: "2026-04-13"
-  related: [hermes-agent:pitfall:005]
-
- id: global:tool-quirk:005
-  fact: "Ansible vault-encrypted vars in YAML require vault_inline_vars plugin"
-  confidence: 0.85
-  tags: [ansible, vault, config]
-  source_count: 3
-  first_seen: "2026-04-02"
-
-## Model & Inference
-
- id: global:tool-quirk:006
-  fact: "mimo-v2-pro via Nous Research is the default model - don't assume Anthropic is available"
-  confidence: 0.95
-  tags: [model, provider, nous, default]
-  source_count: 15
-  first_seen: "2026-03-25"
-  last_confirmed: "2026-04-13"
-
- id: global:tool-quirk:007
-  fact: "Kill + restart with 'hermes chat' preserves old model state - NEVER use --resume"
-  confidence: 0.95
-  tags: [hermes, model, restart, session]
-  source_count: 8
-  first_seen: "2026-03-29"
-  last_confirmed: "2026-04-12"
--- a/knowledge/index.json
+++ b/knowledge/index.json
@@ -1,472 +1,6 @@
 {
  "version": 1,
  "last_updated": "2026-04-13T20:00:00Z",
-  "total_facts": 29,
-  "facts": [
-    {
-      "id": "hermes-agent:pitfall:001",
-      "fact": "deploy-crons.py leaves jobs in mixed model format",
-      "category": "pitfall",
-      "domain": "hermes-agent",
-      "confidence": 0.95,
-      "tags": [
-        "cron",
-        "deploy",
-        "model",
-        "config"
-      ],
-      "source_count": 5,
-      "first_seen": "2026-04-08",
-      "last_confirmed": "2026-04-13",
-      "related": [
-        "hermes-agent:pitfall:002",
-        "hermes-agent:pitfall:003"
-      ]
-    },
-    {
-      "id": "hermes-agent:pitfall:002",
-      "fact": "deploy-crons.py --deploy doesn't set legacy skill field from skills list",
-      "category": "pitfall",
-      "domain": "hermes-agent",
-      "confidence": 0.9,
-      "tags": [
-        "cron",
-        "deploy",
-        "skills"
-      ],
-      "source_count": 3,
-      "first_seen": "2026-04-09",
-      "last_confirmed": "2026-04-13",
-      "related": [
-        "hermes-agent:pitfall:001"
-      ]
-    },
-    {
-      "id": "hermes-agent:pitfall:003",
-      "fact": "Cron jobs with blank fallback_model fields trigger spurious gateway warnings",
-      "category": "pitfall",
-      "domain": "hermes-agent",
-      "confidence": 0.9,
-      "tags": [
-        "cron",
-        "model",
-        "fallback"
-      ],
-      "source_count": 4,
-      "first_seen": "2026-04-07",
-      "last_confirmed": "2026-04-12",
-      "related": [
-        "hermes-agent:pitfall:001"
-      ]
-    },
-    {
-      "id": "hermes-agent:pitfall:004",
-      "fact": "model-watchdog.py checks first provider line, not model.provider - causes false drift alarms",
-      "category": "pitfall",
-      "domain": "hermes-agent",
-      "confidence": 0.9,
-      "tags": [
-        "watchdog",
-        "model",
-        "config"
-      ],
-      "source_count": 3,
-      "first_seen": "2026-04-08",
-      "last_confirmed": "2026-04-13"
-    },
-    {
-      "id": "hermes-agent:pitfall:005",
-      "fact": "10+ files read HERMES_HOME directly instead of get_hermes_home()",
-      "category": "pitfall",
-      "domain": "hermes-agent",
-      "confidence": 0.85,
-      "tags": [
-        "paths",
-        "env",
-        "hermes-home"
-      ],
-      "source_count": 6,
-      "first_seen": "2026-04-06",
-      "last_confirmed": "2026-04-12",
-      "related": [
-        "global:pitfall:002"
-      ]
-    },
-    {
-      "id": "hermes-agent:pitfall:006",
-      "fact": "get_hermes_home() doesn't expand tilde when HERMES_HOME=~/... is set",
-      "category": "pitfall",
-      "domain": "hermes-agent",
-      "confidence": 0.8,
-      "tags": [
-        "paths",
-        "env",
-        "bug"
-      ],
-      "source_count": 2,
-      "first_seen": "2026-04-05"
-    },
-    {
-      "id": "hermes-agent:pitfall:007",
-      "fact": "vps-agent-dispatch reports OK while remote hermes binary path is broken",
-      "category": "pitfall",
-      "domain": "hermes-agent",
-      "confidence": 0.9,
-      "tags": [
-        "ssh",
-        "dispatch",
-        "vps"
-      ],
-      "source_count": 4,
-      "first_seen": "2026-04-07",
-      "last_confirmed": "2026-04-11"
-    },
-    {
-      "id": "hermes-agent:pitfall:008",
-      "fact": "nightwatch-health-monitor SSH check fails on cloud-model-only deployments",
-      "category": "pitfall",
-      "domain": "hermes-agent",
-      "confidence": 0.85,
-      "tags": [
-        "ssh",
-        "health",
-        "cloud"
-      ],
-      "source_count": 2,
-      "first_seen": "2026-04-10"
-    },
-    {
-      "id": "the-nexus:pitfall:001",
-      "fact": "Merges fail with HTTP 405 due to branch protection",
-      "category": "pitfall",
-      "domain": "the-nexus",
-      "confidence": 0.95,
-      "tags": [
-        "git",
-        "merge",
-        "branch-protection",
-        "gitea"
-      ],
-      "source_count": 12,
-      "first_seen": "2026-04-05",
-      "last_confirmed": "2026-04-13",
-      "related": [
-        "global:pitfall:001"
-      ]
-    },
-    {
-      "id": "the-nexus:pitfall:002",
-      "fact": "ThreadingHTTPServer required for multi-user bridge - standard HTTPServer blocks on concurrent requests",
-      "category": "pitfall",
-      "domain": "the-nexus",
-      "confidence": 0.95,
-      "tags": [
-        "server",
-        "concurrency",
-        "bridge"
-      ],
-      "source_count": 5,
-      "first_seen": "2026-04-10",
-      "last_confirmed": "2026-04-13"
-    },
-    {
-      "id": "the-nexus:pitfall:003",
-      "fact": "ChatLog.log() crashes on message persistence when index.html has orphaned button tags",
-      "category": "pitfall",
-      "domain": "the-nexus",
-      "confidence": 0.9,
-      "tags": [
-        "html",
-        "crash",
-        "chatlog"
-      ],
-      "source_count": 3,
-      "first_seen": "2026-04-12",
-      "last_confirmed": "2026-04-13"
-    },
-    {
-      "id": "the-nexus:pitfall:004",
-      "fact": "Three.js LOD not implemented - local hardware struggles with full scene",
-      "category": "pitfall",
-      "domain": "the-nexus",
-      "confidence": 0.85,
-      "tags": [
-        "threejs",
-        "performance",
-        "lod"
-      ],
-      "source_count": 4,
-      "first_seen": "2026-04-09",
-      "last_confirmed": "2026-04-13"
-    },
-    {
-      "id": "the-nexus:pitfall:005",
-      "fact": "Duplicate content blocks appear in index.html when PR merges conflict silently",
-      "category": "pitfall",
-      "domain": "the-nexus",
-      "confidence": 0.8,
-      "tags": [
-        "html",
-        "merge-conflict",
-        "duplicate"
-      ],
-      "source_count": 3,
-      "first_seen": "2026-04-11",
-      "last_confirmed": "2026-04-13"
-    },
-    {
-      "id": "the-nexus:pitfall:006",
-      "fact": "Unified HTTP + WebSocket server required for proper URL deployment - separate servers break CORS",
-      "category": "pitfall",
-      "domain": "the-nexus",
-      "confidence": 0.9,
-      "tags": [
-        "deploy",
-        "websocket",
-        "http",
-        "cors"
-      ],
-      "source_count": 4,
-      "first_seen": "2026-04-10",
-      "last_confirmed": "2026-04-13"
-    },
-    {
-      "id": "global:tool-quirk:001",
-      "fact": "Gitea token stored at ~/.config/gitea/token, not env var GITEA_TOKEN",
-      "category": "tool-quirk",
-      "domain": "global",
-      "confidence": 0.95,
-      "tags": [
-        "git",
-        "auth",
-        "gitea",
-        "token"
-      ],
-      "source_count": 23,
-      "first_seen": "2026-03-27",
-      "last_confirmed": "2026-04-13",
-      "related": [
-        "global:pitfall:001"
-      ]
-    },
-    {
-      "id": "global:tool-quirk:002",
-      "fact": "Gitea API uses 'Authorization: token TOKEN' header format, not Bearer",
-      "category": "tool-quirk",
-      "domain": "global",
-      "confidence": 0.9,
-      "tags": [
-        "git",
-        "api",
-        "gitea"
-      ],
-      "source_count": 8,
-      "first_seen": "2026-03-28",
-      "last_confirmed": "2026-04-12"
-    },
-    {
-      "id": "global:tool-quirk:003",
-      "fact": "Gitea Issues API type=issues param does NOT filter PRs",
-      "category": "tool-quirk",
-      "domain": "global",
-      "confidence": 0.95,
-      "tags": [
-        "gitea",
-        "api",
-        "issues",
-        "pr"
-      ],
-      "source_count": 6,
-      "first_seen": "2026-04-01",
-      "last_confirmed": "2026-04-13"
-    },
-    {
-      "id": "global:tool-quirk:004",
-      "fact": "~/.hermes is the default hermes home - check get_hermes_home() not the path literal",
-      "category": "tool-quirk",
-      "domain": "global",
-      "confidence": 0.9,
-      "tags": [
-        "paths",
-        "hermes",
-        "env"
-      ],
-      "source_count": 10,
-      "first_seen": "2026-03-30",
-      "last_confirmed": "2026-04-13",
-      "related": [
-        "hermes-agent:pitfall:005"
-      ]
-    },
-    {
-      "id": "global:tool-quirk:005",
-      "fact": "Ansible vault-encrypted vars in YAML require vault_inline_vars plugin",
-      "category": "tool-quirk",
-      "domain": "global",
-      "confidence": 0.85,
-      "tags": [
-        "ansible",
-        "vault",
-        "config"
-      ],
-      "source_count": 3,
-      "first_seen": "2026-04-02"
-    },
-    {
-      "id": "global:tool-quirk:006",
-      "fact": "mimo-v2-pro via Nous Research is the default model - don't assume Anthropic is available",
-      "category": "tool-quirk",
-      "domain": "global",
-      "confidence": 0.95,
-      "tags": [
-        "model",
-        "provider",
-        "nous",
-        "default"
-      ],
-      "source_count": 15,
-      "first_seen": "2026-03-25",
-      "last_confirmed": "2026-04-13"
-    },
-    {
-      "id": "global:tool-quirk:007",
-      "fact": "Kill + restart with 'hermes chat' preserves old model state - NEVER use --resume",
-      "category": "tool-quirk",
-      "domain": "global",
-      "confidence": 0.95,
-      "tags": [
-        "hermes",
-        "model",
-        "restart",
-        "session"
-      ],
-      "source_count": 8,
-      "first_seen": "2026-03-29",
-      "last_confirmed": "2026-04-12"
-    },
-    {
-      "id": "global:pitfall:001",
-      "fact": "Branch protection requires 1 approval on main - API merges fail with 405 without it",
-      "category": "pitfall",
-      "domain": "global",
-      "confidence": 0.95,
-      "tags": [
-        "git",
-        "merge",
-        "branch-protection",
-        "gitea"
-      ],
-      "source_count": 12,
-      "first_seen": "2026-04-05",
-      "last_confirmed": "2026-04-13",
-      "related": [
-        "the-nexus:pitfall:001"
-      ]
-    },
-    {
-      "id": "global:pitfall:002",
-      "fact": "Never use --no-verify on git commits",
-      "category": "pitfall",
-      "domain": "global",
-      "confidence": 0.95,
-      "tags": [
-        "git",
-        "hooks",
-        "safety"
-      ],
-      "source_count": 5,
-      "first_seen": "2026-03-28",
-      "last_confirmed": "2026-04-13"
-    },
-    {
-      "id": "global:pitfall:003",
-      "fact": "Gitea PR creation workaround needed on the-nexus - direct API call fails",
-      "category": "pitfall",
-      "domain": "global",
-      "confidence": 0.9,
-      "tags": [
-        "gitea",
-        "pr",
-        "api",
-        "workaround"
-      ],
-      "source_count": 4,
-      "first_seen": "2026-04-06",
-      "last_confirmed": "2026-04-12"
-    },
-    {
-      "id": "global:pitfall:004",
-      "fact": "Anthropic is BANNED from fallback chain",
-      "category": "pitfall",
-      "domain": "global",
-      "confidence": 0.95,
-      "tags": [
-        "provider",
-        "anthropic",
-        "fallback"
-      ],
-      "source_count": 7,
-      "first_seen": "2026-03-30",
-      "last_confirmed": "2026-04-13"
-    },
-    {
-      "id": "global:pitfall:005",
-      "fact": "Telegram tokens expired - don't assume Telegram notifications work",
-      "category": "pitfall",
-      "domain": "global",
-      "confidence": 0.85,
-      "tags": [
-        "telegram",
-        "notifications",
-        "token"
-      ],
-      "source_count": 3,
-      "first_seen": "2026-04-02"
-    },
-    {
-      "id": "global:pitfall:006",
-      "fact": "Multiple gateways = 'cannot schedule futures' error - only one gateway process should run",
-      "category": "pitfall",
-      "domain": "global",
-      "confidence": 0.9,
-      "tags": [
-        "gateway",
-        "cron",
-        "process"
-      ],
-      "source_count": 4,
-      "first_seen": "2026-04-04",
-      "last_confirmed": "2026-04-11"
-    },
-    {
-      "id": "global:pitfall:007",
-      "fact": "pytest root collection picks up operational *_test.py scripts - restrict to tests/ directory",
-      "category": "pitfall",
-      "domain": "global",
-      "confidence": 0.9,
-      "tags": [
-        "pytest",
-        "test",
-        "collection"
-      ],
-      "source_count": 3,
-      "first_seen": "2026-04-07",
-      "last_confirmed": "2026-04-13"
-    },
-    {
-      "id": "global:pitfall:008",
-      "fact": "TDD: test 1 before building 55",
-      "category": "pitfall",
-      "domain": "global",
-      "confidence": 0.95,
-      "tags": [
-        "tdd",
-        "testing",
-        "methodology"
-      ],
-      "source_count": 8,
-      "first_seen": "2026-03-25",
-      "last_confirmed": "2026-04-13"
-    }
-  ]
+  "total_facts": 0,
+  "facts": []
 }
--- a/knowledge/repos/hermes-agent.yaml
+++ b/knowledge/repos/hermes-agent.yaml
@@ -1,80 +0,0 @@
---
-domain: hermes-agent
-category: pitfall
-version: 1
-last_updated: "2026-04-13"
---
-
-# Pitfalls (hermes-agent)
-
-## Cron & Deployment
-
- id: hermes-agent:pitfall:001
-  fact: "deploy-crons.py leaves jobs in mixed model format - some have provider/model, some just model"
-  confidence: 0.95
-  tags: [cron, deploy, model, config]
-  source_count: 5
-  first_seen: "2026-04-08"
-  last_confirmed: "2026-04-13"
-  related: [hermes-agent:pitfall:002, hermes-agent:pitfall:003]
-
- id: hermes-agent:pitfall:002
-  fact: "deploy-crons.py --deploy doesn't set legacy skill field from skills list"
-  confidence: 0.9
-  tags: [cron, deploy, skills]
-  source_count: 3
-  first_seen: "2026-04-09"
-  last_confirmed: "2026-04-13"
-  related: [hermes-agent:pitfall:001]
-
- id: hermes-agent:pitfall:003
-  fact: "Cron jobs with blank fallback_model fields trigger spurious gateway warnings"
-  confidence: 0.9
-  tags: [cron, model, fallback]
-  source_count: 4
-  first_seen: "2026-04-07"
-  last_confirmed: "2026-04-12"
-  related: [hermes-agent:pitfall:001]
-
- id: hermes-agent:pitfall:004
-  fact: "model-watchdog.py checks first provider line, not model.provider - causes false drift alarms"
-  confidence: 0.9
-  tags: [watchdog, model, config]
-  source_count: 3
-  first_seen: "2026-04-08"
-  last_confirmed: "2026-04-13"
-
-## Path & Environment
-
- id: hermes-agent:pitfall:005
-  fact: "10+ files read HERMES_HOME directly instead of get_hermes_home() - breaks on custom paths"
-  confidence: 0.85
-  tags: [paths, env, hermes-home]
-  source_count: 6
-  first_seen: "2026-04-06"
-  last_confirmed: "2026-04-12"
-  related: [global:pitfall:002]
-
- id: hermes-agent:pitfall:006
-  fact: "get_hermes_home() doesn't expand tilde when HERMES_HOME=~/... is set"
-  confidence: 0.8
-  tags: [paths, env, bug]
-  source_count: 2
-  first_seen: "2026-04-05"
-
-## SSH & Dispatch
-
- id: hermes-agent:pitfall:007
-  fact: "vps-agent-dispatch reports OK while remote hermes binary path is broken"
-  confidence: 0.9
-  tags: [ssh, dispatch, vps]
-  source_count: 4
-  first_seen: "2026-04-07"
-  last_confirmed: "2026-04-11"
-
- id: hermes-agent:pitfall:008
-  fact: "nightwatch-health-monitor SSH check fails on cloud-model-only deployments"
-  confidence: 0.85
-  tags: [ssh, health, cloud]
-  source_count: 2
-  first_seen: "2026-04-10"
--- a/knowledge/repos/the-nexus.yaml
+++ b/knowledge/repos/the-nexus.yaml
@@ -1,63 +0,0 @@
---
-domain: the-nexus
-category: pitfall
-version: 1
-last_updated: "2026-04-13"
---
-
-# Pitfalls (the-nexus)
-
-## Git & Merging
-
- id: the-nexus:pitfall:001
-  fact: "Merges fail with HTTP 405 due to branch protection - must use merge API with 1 approval"
-  confidence: 0.95
-  tags: [git, merge, branch-protection, gitea]
-  source_count: 12
-  first_seen: "2026-04-05"
-  last_confirmed: "2026-04-13"
-  related: [global:pitfall:001]
-
- id: the-nexus:pitfall:002
-  fact: "ThreadingHTTPServer required for multi-user bridge - standard HTTPServer blocks on concurrent requests"
-  confidence: 0.95
-  tags: [server, concurrency, bridge]
-  source_count: 5
-  first_seen: "2026-04-10"
-  last_confirmed: "2026-04-13"
-
- id: the-nexus:pitfall:003
-  fact: "ChatLog.log() crashes on message persistence when index.html has orphaned button tags"
-  confidence: 0.9
-  tags: [html, crash, chatlog]
-  source_count: 3
-  first_seen: "2026-04-12"
-  last_confirmed: "2026-04-13"
-
-## Three.js & Performance
-
- id: the-nexus:pitfall:004
-  fact: "Three.js LOD not implemented - local hardware struggles with full scene without texture optimization"
-  confidence: 0.85
-  tags: [threejs, performance, lod]
-  source_count: 4
-  first_seen: "2026-04-09"
-  last_confirmed: "2026-04-13"
-
- id: the-nexus:pitfall:005
-  fact: "Duplicate content blocks appear in index.html when PR merges conflict silently"
-  confidence: 0.8
-  tags: [html, merge-conflict, duplicate]
-  source_count: 3
-  first_seen: "2026-04-11"
-  last_confirmed: "2026-04-13"
-
-## Deployment
-
- id: the-nexus:pitfall:006
-  fact: "Unified HTTP + WebSocket server required for proper URL deployment - separate servers break CORS"
-  confidence: 0.9
-  tags: [deploy, websocket, http, cors]
-  source_count: 4
-  first_seen: "2026-04-10"
-  last_confirmed: "2026-04-13"
--- a/quality_gate.py
+++ b/quality_gate.py
@@ -1,297 +0,0 @@
-#!/usr/bin/env python3
-"""
-quality_gate.py — Score and filter knowledge entries.
-
-Scores each entry on 4 dimensions:
- Specificity: concrete examples vs vague generalities
- Actionability: can this be used to do something?
- Freshness: is this still accurate?
- Source quality: was the model/provider reliable?
-
-Usage:
-    from quality_gate import score_entry, filter_entries, quality_report
-
-    score = score_entry(entry)
-    filtered = filter_entries(entries, threshold=0.5)
-    report = quality_report(entries)
-"""
-
-import json
-import math
-import re
-import sys
-from datetime import datetime, timezone
-from pathlib import Path
-from typing import Dict, List, Any, Optional
-
-# Source quality scores (higher = more reliable)
-SOURCE_QUALITY = {
-    "claude-sonnet": 0.9,
-    "claude-opus": 0.95,
-    "gpt-4": 0.85,
-    "gpt-4-turbo": 0.85,
-    "gpt-5": 0.9,
-    "mimo-v2-pro": 0.8,
-    "gemini-pro": 0.8,
-    "llama-3-70b": 0.75,
-    "llama-3-8b": 0.7,
-    "ollama": 0.6,
-    "unknown": 0.5,
-}
-
-DEFAULT_SOURCE_QUALITY = 0.5
-
-# Specificity indicators
-SPECIFIC_INDICATORS = [
-    r"\b\d+\.\d+",  # decimal numbers
-    r"\b\d{4}-\d{2}-\d{2}",  # dates
-    r"\b[A-Z][a-z]+\s[A-Z][a-z]+",  # proper nouns
-    r"`[^`]+`",  # code/commands
-    r"https?://",  # URLs
-    r"\b(example|instance|specifically|concretely)\b",
-    r"\b(step \d|first|second|third)\b",
-    r"\b(exactly|precisely|measured|counted)\b",
-]
-
-# Vagueness indicators (penalty)
-VAGUE_INDICATORS = [
-    r"\b(generally|usually|often|sometimes|might|could|perhaps)\b",
-    r"\b(various|several|many|some|few)\b",
-    r"\b(it depends|varies|differs)\b",
-    r"\b(basically|essentially|fundamentally)\b",
-    r"\b(everyone knows|it's obvious|clearly)\b",
-]
-
-# Actionability indicators
-ACTIONABLE_INDICATORS = [
-    r"\b(run|execute|install|deploy|configure|set up)\b",
-    r"\b(use|apply|implement|create|build)\b",
-    r"\b(check|verify|test|validate|confirm)\b",
-    r"\b(fix|resolve|solve|debug|troubleshoot)\b",
-    r"\b(if .+ then|when .+ do|to .+ use)\b",
-    r"```[a-z]*\n",  # code blocks
-    r"\$\s",  # shell commands
-    r"\b\d+\.\s",  # numbered steps
-]
-
-
-def score_specificity(content: str) -> float:
-    """Score specificity: 0=vague, 1=very specific."""
-    content_lower = content.lower()
-    score = 0.5  # baseline
-
-    # Check for specific indicators
-    specific_count = sum(
-        len(re.findall(p, content, re.IGNORECASE))
-        for p in SPECIFIC_INDICATORS
-    )
-
-    # Check for vague indicators
-    vague_count = sum(
-        len(re.findall(p, content_lower))
-        for p in VAGUE_INDICATORS
-    )
-
-    # Adjust score
-    score += min(specific_count * 0.05, 0.4)
-    score -= min(vague_count * 0.08, 0.3)
-
-    # Length bonus (longer = more detail, up to a point)
-    word_count = len(content.split())
-    if word_count > 50:
-        score += min((word_count - 50) * 0.001, 0.1)
-
-    return max(0.0, min(1.0, score))
-
-
-def score_actionability(content: str) -> float:
-    """Score actionability: 0=abstract, 1=highly actionable."""
-    content_lower = content.lower()
-    score = 0.3  # baseline (most knowledge is informational)
-
-    # Check for actionable indicators
-    actionable_count = sum(
-        len(re.findall(p, content_lower))
-        for p in ACTIONABLE_INDICATORS
-    )
-
-    score += min(actionable_count * 0.1, 0.6)
-
-    # Code blocks are highly actionable
-    if "```" in content:
-        score += 0.2
-
-    # Numbered steps are actionable
-    if re.search(r"\d+\.\s+\w", content):
-        score += 0.1
-
-    return max(0.0, min(1.0, score))
-
-
-def score_freshness(timestamp: Optional[str]) -> float:
-    """Score freshness: 1=new, decays over time."""
-    if not timestamp:
-        return 0.5
-
-    try:
-        if isinstance(timestamp, str):
-            ts = datetime.fromisoformat(timestamp.replace("Z", "+00:00"))
-        else:
-            ts = timestamp
-
-        now = datetime.now(timezone.utc)
-        age_days = (now - ts).days
-
-        # Exponential decay: 1.0 at day 0, 0.5 at ~180 days, 0.1 at ~365 days
-        score = math.exp(-age_days / 180)
-        return max(0.1, min(1.0, score))
-    except (ValueError, TypeError):
-        return 0.5
-
-
-def score_source_quality(model: Optional[str]) -> float:
-    """Score source quality based on model/provider."""
-    if not model:
-        return DEFAULT_SOURCE_QUALITY
-
-    # Normalize model name
-    model_lower = model.lower()
-    for key, score in SOURCE_QUALITY.items():
-        if key in model_lower:
-            return score
-
-    return DEFAULT_SOURCE_QUALITY
-
-
-def score_entry(entry: dict) -> float:
-    """
-    Score a knowledge entry on quality (0.0-1.0).
-
-    Weights:
-    - specificity: 0.3
-    - actionability: 0.3
-    - freshness: 0.2
-    - source_quality: 0.2
-    """
-    content = entry.get("content", entry.get("text", entry.get("response", "")))
-    model = entry.get("model", entry.get("provenance", {}).get("model"))
-    timestamp = entry.get("timestamp", entry.get("provenance", {}).get("timestamp"))
-
-    specificity = score_specificity(content)
-    actionability = score_actionability(content)
-    freshness = score_freshness(timestamp)
-    source = score_source_quality(model)
-
-    return round(
-        0.3 * specificity +
-        0.3 * actionability +
-        0.2 * freshness +
-        0.2 * source,
-        4
-    )
-
-
-def score_entry_detailed(entry: dict) -> dict:
-    """Score with breakdown."""
-    content = entry.get("content", entry.get("text", entry.get("response", "")))
-    model = entry.get("model", entry.get("provenance", {}).get("model"))
-    timestamp = entry.get("timestamp", entry.get("provenance", {}).get("timestamp"))
-
-    specificity = score_specificity(content)
-    actionability = score_actionability(content)
-    freshness = score_freshness(timestamp)
-    source = score_source_quality(model)
-
-    return {
-        "score": round(0.3 * specificity + 0.3 * actionability + 0.2 * freshness + 0.2 * source, 4),
-        "specificity": round(specificity, 4),
-        "actionability": round(actionability, 4),
-        "freshness": round(freshness, 4),
-        "source_quality": round(source, 4),
-    }
-
-
-def filter_entries(entries: List[dict], threshold: float = 0.5) -> List[dict]:
-    """Filter entries below quality threshold."""
-    filtered = []
-    for entry in entries:
-        if score_entry(entry) >= threshold:
-            filtered.append(entry)
-    return filtered
-
-
-def quality_report(entries: List[dict]) -> str:
-    """Generate quality distribution report."""
-    if not entries:
-        return "No entries to analyze."
-
-    scores = [score_entry(e) for e in entries]
-
-    avg = sum(scores) / len(scores)
-    min_score = min(scores)
-    max_score = max(scores)
-
-    # Distribution buckets
-    buckets = {"high": 0, "medium": 0, "low": 0, "rejected": 0}
-    for s in scores:
-        if s >= 0.7:
-            buckets["high"] += 1
-        elif s >= 0.5:
-            buckets["medium"] += 1
-        elif s >= 0.3:
-            buckets["low"] += 1
-        else:
-            buckets["rejected"] += 1
-
-    lines = [
-        "=" * 50,
-        "  QUALITY GATE REPORT",
-        "=" * 50,
-        f"  Total entries:  {len(entries)}",
-        f"  Average score:  {avg:.3f}",
-        f"  Min:            {min_score:.3f}",
-        f"  Max:            {max_score:.3f}",
-        "",
-        "  Distribution:",
-    ]
-
-    for bucket, count in buckets.items():
-        pct = count / len(entries) * 100
-        bar = "█" * int(pct / 5)
-        lines.append(f"    {bucket:<12} {count:>5} ({pct:>5.1f}%) {bar}")
-
-    passed = buckets["high"] + buckets["medium"]
-    lines.append(f"\n  Pass rate (>= 0.5): {passed}/{len(entries)} ({passed/len(entries)*100:.1f}%)")
-    lines.append("=" * 50)
-
-    return "\n".join(lines)
-
-
-def main():
-    import argparse
-    parser = argparse.ArgumentParser(description="Knowledge quality gate")
-    parser.add_argument("files", nargs="+", help="JSONL files to score")
-    parser.add_argument("--threshold", type=float, default=0.5, help="Quality threshold")
-    parser.add_argument("--json", action="store_true", help="JSON output")
-    parser.add_argument("--filter", action="store_true", help="Filter and write back")
-    args = parser.parse_args()
-
-    all_entries = []
-    for filepath in args.files:
-        with open(filepath) as f:
-            for line in f:
-                if line.strip():
-                    all_entries.append(json.loads(line))
-
-    if args.json:
-        results = [{"entry": e, **score_entry_detailed(e)} for e in all_entries]
-        print(json.dumps(results, indent=2))
-    elif args.filter:
-        filtered = filter_entries(all_entries, args.threshold)
-        print(f"Kept {len(filtered)}/{len(all_entries)} entries (threshold: {args.threshold})")
-    else:
-        print(quality_report(all_entries))
-
-
-if __name__ == "__main__":
-    main()
--- a/requirements.txt
+++ b/requirements.txt
@@ -1 +0,0 @@
-pytest>=8,<9
--- a/scripts/automation_opportunity_finder.py
+++ b/scripts/automation_opportunity_finder.py
@@ -1,554 +0,0 @@
-#!/usr/bin/env python3
-"""
-Automation Opportunity Finder — Scan fleet for manual processes that could be automated.
-
-Analyzes:
-1. Cron jobs — finds manual steps between scheduled tasks
-2. Documentation — extracts TODO/FIXME/manual-step patterns
-3. Scripts — detects repeated command sequences
-4. Session transcripts — finds repeated tool-call patterns
-
-Usage:
-    python3 scripts/automation_opportunity_finder.py --hermes-home ~/.hermes
-    python3 scripts/automation_opportunity_finder.py --hermes-home ~/.hermes --json
-    python3 scripts/automation_opportunity_finder.py --hermes-home ~/.hermes --output proposals.json
-"""
-
-import argparse
-import json
-import os
-import re
-import sys
-from collections import Counter, defaultdict
-from datetime import datetime, timezone
-from pathlib import Path
-from typing import Any, Dict, List, Optional
-
-
-# ---------------------------------------------------------------------------
-# Patterns that signal manual work
-# ---------------------------------------------------------------------------
-
-MANUAL_STEP_PATTERNS = [
-    # Explicit manual markers
-    (r"(?i)\bTODO[:\s]", "todo"),
-    (r"(?i)\bFIXME[:\s]", "fixme"),
-    (r"(?i)\bMANUAL[:\s]", "manual_step"),
-    (r"(?i)\bHACK[:\s]", "hack"),
-    (r"(?i)\bWORKAROUND[:\s]", "workaround"),
-    # Step-by-step instructions in docs
-    (r"(?i)^(\d+)[.\)]\s+(run|execute|ssh|scp|curl|cd|make|docker|ansible|git)", "sequential_step"),
-    # Explicitly manual operations
-    (r"(?i)\bmanually\b", "manual_keyword"),
-    (r"(?i)\bby hand\b", "manual_keyword"),
-    (r"(?i)\bdon\'?t forget to\b", "manual_keyword"),
-    (r"(?i)\bremember to\b", "manual_keyword"),
-    (r"(?i)\bmake sure to\b", "manual_keyword"),
-]
-
-# Shell commands that appear frequently in runbooks — signal automatable workflows
-SHELL_COMMAND_PATTERNS = [
-    "ssh ", "scp ", "rsync ", "curl ", "wget ",
-    "docker ", "docker-compose ", "kubectl ",
-    "ansible-playbook ", "terraform ", "systemctl ",
-    "systemctl restart", "systemctl status",
-    "git push", "git pull", "git merge", "git checkout",
-    "pip install", "npm install", "cargo build",
-]
-
-# Directories to skip during scans — large/uninteresting trees
-EXCLUDE_DIRS = frozenset({
-    "node_modules", "venv", ".venv", "__pycache__", ".git",
-    "site-packages", "dist", "build", ".tox", ".mypy_cache",
-    ".pytest_cache", "coverage", ".next", "vendor",
-    "skills",  # hermes skills dir is huge
-    "audio_cache", "skins", "profiles",
-})
-
-# Session tool calls that appear repeatedly — candidates for workflow automation
-TOOL_SEQUENCE_MIN_OCCURRENCES = 3
-
-
-# ---------------------------------------------------------------------------
-# Analyzers
-# ---------------------------------------------------------------------------
-
-def analyze_cron_jobs(hermes_home: str) -> List[Dict[str, Any]]:
-    """Analyze cron job definitions for automation gaps."""
-    proposals = []
-    cron_dir = Path(hermes_home) / "cron"
-    jobs_file = cron_dir / "jobs.json"
-
-    if not jobs_file.exists():
-        # Try YAML format
-        for ext in (".yaml", ".yml"):
-            alt = cron_dir / f"jobs{ext}"
-            if alt.exists():
-                jobs_file = alt
-                break
-
-    if not jobs_file.exists():
-        return proposals
-
-    try:
-        if jobs_file.suffix == ".json":
-            with open(jobs_file) as f:
-                jobs = json.load(f)
-        else:
-            try:
-                import yaml
-                with open(jobs_file) as f:
-                    jobs = yaml.safe_load(f)
-            except ImportError:
-                return proposals
-    except (json.JSONDecodeError, Exception):
-        return proposals
-
-    if not isinstance(jobs, list):
-        return proposals
-
-    # Look for disabled jobs (someone turned them off — might need a different approach)
-    disabled = [j for j in jobs if not j.get("enabled", True)]
-    if disabled:
-        names = [j.get("name", j.get("id", "?")) for j in disabled[:5]]
-        proposals.append({
-            "category": "cron_disabled",
-            "title": f"{len(disabled)} disabled cron job(s) may need automation rework",
-            "description": f"These jobs were disabled: {', '.join(names)}. Investigate why and whether a different automation approach is needed.",
-            "confidence": 0.7,
-            "impact": "medium",
-            "sources": [str(jobs_file)],
-        })
-
-    # Look for jobs with high error counts
-    error_jobs = [j for j in jobs if j.get("last_status") == "error"]
-    if error_jobs:
-        names = [j.get("name", j.get("id", "?")) for j in error_jobs[:5]]
-        proposals.append({
-            "category": "cron_errors",
-            "title": f"{len(error_jobs)} cron job(s) failing — may need automation rework",
-            "description": f"Jobs with errors: {', '.join(names)}. Failure patterns suggest missing dependencies or fragile automation.",
-            "confidence": 0.8,
-            "impact": "high",
-            "sources": [str(jobs_file)],
-        })
-
-    # Look for jobs with delivery errors (platform issues)
-    delivery_errors = [j for j in jobs if j.get("last_delivery_error")]
-    if delivery_errors:
-        proposals.append({
-            "category": "cron_delivery",
-            "title": f"{len(delivery_errors)} cron job(s) have delivery failures",
-            "description": "Delivery failures suggest missing retry logic or platform integration gaps.",
-            "confidence": 0.75,
-            "impact": "medium",
-            "sources": [str(jobs_file)],
-        })
-
-    # Look for jobs on short intervals that could be event-driven
-    for job in jobs:
-        schedule = job.get("schedule", "")
-        # Check for very frequent schedules (every minute, every 5 min)
-        if isinstance(schedule, str) and re.match(r"^\*\/([1-5])\s", schedule):
-            proposals.append({
-                "category": "cron_frequency",
-                "title": f"Job '{job.get('name', job.get('id', '?'))}' runs every {schedule.split()[0]} — consider event-driven",
-                "description": f"High-frequency cron ({schedule}) may be better as event-driven or daemon.",
-                "confidence": 0.6,
-                "impact": "low",
-                "sources": [str(jobs_file)],
-            })
-
-    return proposals
-
-
-def analyze_documents(root_dirs: List[str]) -> List[Dict[str, Any]]:
-    """Scan documentation for manual step patterns."""
-    proposals = []
-    doc_extensions = {".md", ".txt", ".rst", ".adoc"}
-    findings_by_category = defaultdict(list)
-
-    for root_dir in root_dirs:
-        root = Path(root_dir)
-        if not root.exists():
-            continue
-
-        for path in root.rglob("*"):
-            if path.is_dir():
-                continue
-            if path.suffix not in doc_extensions:
-                continue
-            # Skip excluded dirs and hidden dirs
-            parts = path.relative_to(root).parts if root in path.parents or root == path.parent else path.parts
-            if any(p.startswith(".") or p in EXCLUDE_DIRS for p in parts):
-                continue
-            if len(parts) > 8:
-                continue
-
-            try:
-                content = path.read_text(errors="replace")
-            except (PermissionError, OSError):
-                continue
-
-            lines = content.split("\n")
-            for i, line in enumerate(lines):
-                stripped = line.strip()
-                if not stripped:
-                    continue
-
-                for pattern, category in MANUAL_STEP_PATTERNS:
-                    if re.search(pattern, stripped):
-                        findings_by_category[category].append({
-                            "file": str(path),
-                            "line": i + 1,
-                            "text": stripped[:200],
-                        })
-
-    # Generate proposals from findings
-    for category, findings in findings_by_category.items():
-        if len(findings) < 2:
-            continue
-
-        file_count = len(set(f["file"] for f in findings))
-        proposals.append({
-            "category": f"manual_{category}",
-            "title": f"{len(findings)} '{category}' markers across {file_count} doc(s)",
-            "description": f"Found in: {', '.join(set(Path(f['file']).name for f in findings[:5]))}",
-            "confidence": 0.65,
-            "impact": "medium",
-            "sources": list(set(f["file"] for f in findings[:10])),
-            "details": findings[:5],  # sample
-        })
-
-    return proposals
-
-
-def analyze_scripts(root_dirs: List[str]) -> List[Dict[str, Any]]:
-    """Detect repeated command sequences in scripts."""
-    proposals = []
-    script_extensions = {".py", ".sh", ".bash", ".zsh"}
-    command_counter = Counter()
-    command_locations = defaultdict(list)
-
-    for root_dir in root_dirs:
-        root = Path(root_dir)
-        if not root.exists():
-            continue
-
-        for path in root.rglob("*"):
-            if path.is_dir():
-                continue
-            if path.suffix not in script_extensions:
-                continue
-            parts = path.relative_to(root).parts if root in path.parents or root == path.parent else path.parts
-            if any(p.startswith(".") or p in EXCLUDE_DIRS for p in parts):
-                continue
-            if len(parts) > 8:
-                continue
-
-            try:
-                content = path.read_text(errors="replace")
-            except (PermissionError, OSError):
-                continue
-
-            lines = content.split("\n")
-            for i, line in enumerate(lines):
-                stripped = line.strip()
-                if not stripped or stripped.startswith("#"):
-                    continue
-
-                for cmd_prefix in SHELL_COMMAND_PATTERNS:
-                    if cmd_prefix in stripped:
-                        # Normalize the command
-                        normalized = re.sub(r"\s+", " ", stripped)[:120]
-                        command_counter[normalized] += 1
-                        command_locations[normalized].append(f"{path}:{i+1}")
-
-    # Proposals for commands appearing 3+ times
-    for cmd, count in command_counter.most_common(20):
-        if count < 3:
-            break
-        locs = command_locations[cmd]
-        file_count = len(set(loc.split(":")[0] for loc in locs))
-        proposals.append({
-            "category": "repeated_command",
-            "title": f"Command repeated {count}x across {file_count} file(s): {cmd[:80]}",
-            "description": f"Locations: {', '.join(locs[:3])}",
-            "confidence": min(0.5 + (count * 0.1), 0.95),
-            "impact": "medium",
-            "sources": list(set(loc.split(":")[0] for loc in locs)),
-        })
-
-    return proposals
-
-
-def analyze_session_transcripts(session_dirs: List[str]) -> List[Dict[str, Any]]:
-    """Find repeated tool-call patterns in session transcripts."""
-    proposals = []
-    tool_sequence_counter = Counter()
-    tool_sequence_examples = {}
-
-    for session_dir in session_dirs:
-        session_path = Path(session_dir)
-        if not session_path.exists():
-            continue
-
-        for path in session_path.rglob("*.jsonl"):
-            try:
-                content = path.read_text(errors="replace")
-            except (PermissionError, OSError):
-                continue
-
-            # Extract tool calls in sequence
-            tool_sequence = []
-            for line in content.split("\n"):
-                line = line.strip()
-                if not line:
-                    continue
-                try:
-                    msg = json.loads(line)
-                except json.JSONDecodeError:
-                    continue
-
-                # Look for tool calls in assistant messages
-                if msg.get("role") == "assistant" and msg.get("tool_calls"):
-                    for tc in msg["tool_calls"]:
-                        func_name = tc.get("function", {}).get("name", "?")
-                        tool_sequence.append(func_name)
-
-            # Find 2-call sequences
-            for i in range(len(tool_sequence) - 1):
-                seq = (tool_sequence[i], tool_sequence[i + 1])
-                tool_sequence_counter[seq] += 1
-                if seq not in tool_sequence_examples:
-                    tool_sequence_examples[seq] = str(path.name)
-
-            # Find 3-call sequences
-            for i in range(len(tool_sequence) - 2):
-                seq = (tool_sequence[i], tool_sequence[i + 1], tool_sequence[i + 2])
-                tool_sequence_counter[seq] += 1
-                if seq not in tool_sequence_examples:
-                    tool_sequence_examples[seq] = str(path.name)
-
-    # Generate proposals for frequently repeated sequences
-    for seq, count in tool_sequence_counter.most_common(20):
-        if count < TOOL_SEQUENCE_MIN_OCCURRENCES:
-            break
-        seq_str = " -> ".join(seq)
-        proposals.append({
-            "category": "tool_sequence",
-            "title": f"Tool sequence '{seq_str}' repeated {count} times",
-            "description": f"Consider creating a workflow/skill that automates this sequence.",
-            "confidence": min(0.5 + (count * 0.05), 0.9),
-            "impact": "medium",
-            "sources": [tool_sequence_examples.get(seq, "unknown")],
-        })
-
-    return proposals
-
-
-def analyze_shell_history(root_dirs: List[str]) -> List[Dict[str, Any]]:
-    """Find repeated shell commands from history files."""
-    proposals = []
-    command_counter = Counter()
-
-    for root_dir in root_dirs:
-        root = Path(root_dir)
-        history_files = []
-
-        # Look for shell history files
-        for name in (".bash_history", ".zsh_history", ".python_history"):
-            p = root / name
-            if p.exists():
-                history_files.append(p)
-
-        # Also check in hermes home
-        for p in root.glob("**/*history*"):
-            if p.is_file() and p.suffix in ("", ".txt", ".log"):
-                history_files.append(p)
-
-        for hf in history_files:
-            try:
-                content = hf.read_text(errors="replace")
-            except (PermissionError, OSError):
-                continue
-
-            for line in content.split("\n"):
-                stripped = line.strip()
-                # ZSH history format: ": 1234567890:0;command"
-                stripped = re.sub(r"^:\s*\d+:\d+;", "", stripped)
-                if not stripped or len(stripped) < 5:
-                    continue
-                # Skip trivial commands
-                if stripped in ("ls", "cd", "pwd", "clear", "exit"):
-                    continue
-                command_counter[stripped] += 1
-
-    for cmd, count in command_counter.most_common(10):
-        if count < 5:
-            break
-        proposals.append({
-            "category": "shell_repetition",
-            "title": f"Shell command run {count}+ times: {cmd[:80]}",
-            "description": "Frequently repeated shell command — candidate for alias, function, or script.",
-            "confidence": min(0.4 + (count * 0.05), 0.85),
-            "impact": "low",
-            "sources": ["shell_history"],
-        })
-
-    return proposals
-
-
-# ---------------------------------------------------------------------------
-# Proposal output
-# ---------------------------------------------------------------------------
-
-def deduplicate_proposals(proposals: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
-    """Remove duplicate proposals based on title similarity."""
-    seen_titles = set()
-    unique = []
-    for p in proposals:
-        # Normalize title for dedup
-        key = re.sub(r"\d+", "N", p["title"]).lower()
-        if key not in seen_titles:
-            seen_titles.add(key)
-            unique.append(p)
-    return unique
-
-
-def rank_proposals(proposals: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
-    """Sort proposals by impact * confidence (highest first)."""
-    impact_weight = {"critical": 1.0, "high": 0.8, "medium": 0.5, "low": 0.2}
-    return sorted(
-        proposals,
-        key=lambda p: impact_weight.get(p.get("impact", "low"), 0.2) * p.get("confidence", 0.5),
-        reverse=True,
-    )
-
-
-def format_text_report(proposals: List[Dict[str, Any]]) -> str:
-    """Format proposals as human-readable text."""
-    if not proposals:
-        return "No automation opportunities found."
-
-    lines = [
-        "=" * 70,
-        "  AUTOMATION OPPORTUNITY REPORT",
-        f"  Generated: {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M UTC')}",
-        f"  Proposals: {len(proposals)}",
-        "=" * 70,
-        "",
-    ]
-
-    for i, p in enumerate(proposals, 1):
-        score = p.get("confidence", 0.5) * {"critical": 1.0, "high": 0.8, "medium": 0.5, "low": 0.2}.get(p.get("impact", "low"), 0.2)
-        lines.append(f"[{i}] {p['title']}")
-        lines.append(f"    Category: {p['category']}  |  Impact: {p.get('impact','?')}  |  Confidence: {p.get('confidence',0):.0%}  |  Score: {score:.2f}")
-        lines.append(f"    {p['description']}")
-        if p.get("sources"):
-            lines.append(f"    Sources: {', '.join(p['sources'][:3])}")
-        lines.append("")
-
-    # Summary by category
-    cat_counts = Counter(p["category"] for p in proposals)
-    lines.append("-" * 70)
-    lines.append("Summary by category:")
-    for cat, count in cat_counts.most_common():
-        lines.append(f"  {cat}: {count}")
-
-    return "\n".join(lines)
-
-
-# ---------------------------------------------------------------------------
-# Main
-# ---------------------------------------------------------------------------
-
-def main():
-    parser = argparse.ArgumentParser(description="Find automation opportunities across the fleet")
-    parser.add_argument("--hermes-home", default=os.path.expanduser("~/.hermes"),
-                        help="Path to hermes home directory (default: ~/.hermes)")
-    parser.add_argument("--scan-dirs", nargs="*",
-                        help="Additional directories to scan (default: hermes-home + cwd)")
-    parser.add_argument("--session-dirs", nargs="*",
-                        help="Session transcript directories (default: hermes-home/sessions)")
-    parser.add_argument("--json", action="store_true", help="Output as JSON")
-    parser.add_argument("--output", "-o", help="Write proposals to file")
-    parser.add_argument("--min-confidence", type=float, default=0.3,
-                        help="Minimum confidence threshold (default: 0.3)")
-    parser.add_argument("--categories", nargs="*",
-                        help="Only include these categories (cron, docs, scripts, sessions, shell)")
-    args = parser.parse_args()
-
-    hermes_home = os.path.expanduser(args.hermes_home)
-
-    # Default scan directories
-    scan_dirs = [hermes_home, "."]
-    if args.scan_dirs:
-        scan_dirs.extend(args.scan_dirs)
-
-    session_dirs = [os.path.join(hermes_home, "sessions")]
-    if args.session_dirs:
-        session_dirs.extend(args.session_dirs)
-
-    # Also check common session locations
-    for subdir in ("transcripts", "session-db"):
-        p = os.path.join(hermes_home, subdir)
-        if os.path.isdir(p):
-            session_dirs.append(p)
-
-    categories = set(args.categories) if args.categories else {"cron", "docs", "scripts", "sessions", "shell"}
-
-    # Run analyzers
-    all_proposals = []
-
-    if "cron" in categories:
-        all_proposals.extend(analyze_cron_jobs(hermes_home))
-
-    if "docs" in categories:
-        all_proposals.extend(analyze_documents(scan_dirs))
-
-    if "scripts" in categories:
-        all_proposals.extend(analyze_scripts(scan_dirs))
-
-    if "sessions" in categories:
-        all_proposals.extend(analyze_session_transcripts(session_dirs))
-
-    if "shell" in categories:
-        all_proposals.extend(analyze_shell_history(scan_dirs))
-
-    # Deduplicate and rank
-    all_proposals = deduplicate_proposals(all_proposals)
-    all_proposals = rank_proposals(all_proposals)
-
-    # Filter by confidence
-    all_proposals = [p for p in all_proposals if p.get("confidence", 0) >= args.min_confidence]
-
-    # Output
-    output = {
-        "generated_at": datetime.now(timezone.utc).isoformat(),
-        "hermes_home": hermes_home,
-        "scan_dirs": scan_dirs,
-        "total_proposals": len(all_proposals),
-        "proposals": all_proposals,
-    }
-
-    if args.json:
-        result = json.dumps(output, indent=2)
-    else:
-        result = format_text_report(all_proposals)
-
-    if args.output:
-        with open(args.output, "w") as f:
-            if args.json:
-                json.dump(output, f, indent=2)
-            else:
-                f.write(result)
-        print(f"Written to {args.output}", file=sys.stderr)
-    else:
-        print(result)
-
-    return 0 if all_proposals else 1
-
-
-if __name__ == "__main__":
-    sys.exit(main())
--- a/scripts/bootstrapper.py
+++ b/scripts/bootstrapper.py
@@ -1,359 +0,0 @@
-#!/usr/bin/env python3
-"""
-Bootstrapper — assemble pre-session context from knowledge store.
-
-Reads the knowledge store and produces a compact context block (2k tokens max)
-that can be injected into a new session so it starts with situational awareness.
-
-Usage:
-    python3 bootstrapper.py --repo the-nexus --agent mimo-sprint
-    python3 bootstrapper.py --repo timmy-home --global
-    python3 bootstrapper.py --global
-    python3 bootstrapper.py --repo the-nexus --max-tokens 1000
-"""
-
-import argparse
-import json
-import sys
-from pathlib import Path
-from typing import Optional
-
-# Resolve knowledge root relative to this script's parent
-SCRIPT_DIR = Path(__file__).resolve().parent
-REPO_ROOT = SCRIPT_DIR.parent
-KNOWLEDGE_DIR = REPO_ROOT / "knowledge"
-INDEX_PATH = KNOWLEDGE_DIR / "index.json"
-
-# Approximate token count: ~4 chars per token for English text
-CHARS_PER_TOKEN = 4
-
-# Category sort priority (lower = shown first)
-CATEGORY_PRIORITY = {
-    "pitfall": 0,
-    "tool-quirk": 1,
-    "pattern": 2,
-    "fact": 3,
-    "question": 4,
-}
-
-
-def load_index(index_path: Path = INDEX_PATH) -> dict:
-    """Load and validate the knowledge index."""
-    if not index_path.exists():
-        return {"version": 1, "total_facts": 0, "facts": []}
-
-    with open(index_path) as f:
-        data = json.load(f)
-
-    if "facts" not in data:
-        print(f"WARNING: index.json missing 'facts' key", file=sys.stderr)
-        return {"version": 1, "total_facts": 0, "facts": []}
-
-    return data
-
-
-def filter_facts(
-    facts: list[dict],
-    repo: Optional[str] = None,
-    agent: Optional[str] = None,
-    include_global: bool = True,
-) -> list[dict]:
-    """Filter facts by repo, agent, and global scope."""
-    filtered = []
-
-    for fact in facts:
-        fact_repo = fact.get("repo", "global")
-        fact_agent = fact.get("agent", "")
-
-        # Match by repo (regardless of agent)
-        if repo and fact_repo == repo:
-            filtered.append(fact)
-            continue
-
-        # Match by exact agent type
-        if agent and fact_agent == agent:
-            filtered.append(fact)
-            continue
-
-        # Include global facts without agent restriction (universal facts)
-        if include_global and fact_repo == "global" and not fact_agent:
-            filtered.append(fact)
-
-    return filtered
-
-
-def sort_facts(facts: list[dict]) -> list[dict]:
-    """
-    Sort facts by: confidence (desc), then category priority, then fact text.
-    Most reliable and most dangerous facts come first.
-    """
-
-    def sort_key(f):
-        confidence = f.get("confidence", 0.5)
-        category = f.get("category", "fact")
-        cat_priority = CATEGORY_PRIORITY.get(category, 5)
-        return (-confidence, cat_priority, f.get("fact", ""))
-
-    return sorted(facts, key=sort_key)
-
-
-def load_repo_knowledge(repo: str) -> Optional[str]:
-    """Load per-repo knowledge markdown if it exists."""
-    repo_path = KNOWLEDGE_DIR / "repos" / f"{repo}.md"
-    if repo_path.exists():
-        return repo_path.read_text().strip()
-    return None
-
-
-def load_agent_knowledge(agent: str) -> Optional[str]:
-    """Load per-agent knowledge markdown if it exists."""
-    agent_path = KNOWLEDGE_DIR / "agents" / f"{agent}.md"
-    if agent_path.exists():
-        return agent_path.read_text().strip()
-    return None
-
-
-def load_global_knowledge() -> list[str]:
-    """Load all global knowledge markdown files."""
-    global_dir = KNOWLEDGE_DIR / "global"
-    if not global_dir.exists():
-        return []
-
-    chunks = []
-    for md_file in sorted(global_dir.glob("*.md")):
-        content = md_file.read_text().strip()
-        if content:
-            chunks.append(content)
-    return chunks
-
-
-def render_facts_section(facts: list[dict], category: str, label: str) -> str:
-    """Render a section of facts for a single category."""
-    cat_facts = [f for f in facts if f.get("category") == category]
-    if not cat_facts:
-        return ""
-
-    lines = [f"### {label}\n"]
-    for f in cat_facts:
-        conf = f.get("confidence", 0.5)
-        fact_text = f.get("fact", "")
-        repo_tag = f.get("repo", "")
-        if repo_tag and repo_tag != "global":
-            lines.append(f"- [{conf:.0%}] ({repo_tag}) {fact_text}")
-        else:
-            lines.append(f"- [{conf:.0%}] {fact_text}")
-
-    return "\n".join(lines) + "\n"
-
-
-def estimate_tokens(text: str) -> int:
-    """Rough token estimate."""
-    return len(text) // CHARS_PER_TOKEN
-
-
-def truncate_to_tokens(text: str, max_tokens: int) -> str:
-    """Truncate text to approximately max_tokens, cutting at line boundaries."""
-    max_chars = max_tokens * CHARS_PER_TOKEN
-    if len(text) <= max_chars:
-        return text
-
-    # Cut at last newline before the limit
-    truncated = text[:max_chars]
-    last_newline = truncated.rfind("\n")
-    if last_newline > 0:
-        truncated = truncated[:last_newline]
-
-    return truncated + "\n\n[... truncated to fit context window ...]"
-
-
-def build_bootstrap_context(
-    repo: Optional[str] = None,
-    agent: Optional[str] = None,
-    include_global: bool = True,
-    max_tokens: int = 2000,
-    index_path: Path = INDEX_PATH,
-) -> str:
-    """
-    Build the full bootstrap context block.
-
-    Returns a markdown string suitable for injection into a session prompt.
-    """
-    index = load_index(index_path)
-    facts = index.get("facts", [])
-
-    # Filter
-    filtered = filter_facts(facts, repo=repo, agent=agent, include_global=include_global)
-
-    # Sort
-    sorted_facts = sort_facts(filtered)
-
-    # Build sections
-    sections = ["## What You Know (bootstrapped)\n"]
-
-    # Per-repo markdown knowledge
-    if repo:
-        repo_md = load_repo_knowledge(repo)
-        if repo_md:
-            sections.append(f"### Repo Notes: {repo}\n")
-            sections.append(repo_md + "\n")
-
-    # Structured facts by category
-    if sorted_facts:
-        # Group by source
-        repo_facts = [f for f in sorted_facts if f.get("repo") == repo] if repo else []
-        global_facts = [f for f in sorted_facts if f.get("repo") == "global"]
-        agent_facts = [f for f in sorted_facts if f.get("agent") == agent] if agent else []
-
-        if repo_facts:
-            sections.append(f"### Repo: {repo}\n")
-            for cat, label in [
-                ("pitfall", "PITFALLS"),
-                ("tool-quirk", "QUIRKS"),
-                ("pattern", "PATTERNS"),
-                ("fact", "FACTS"),
-                ("question", "OPEN QUESTIONS"),
-            ]:
-                section = render_facts_section(repo_facts, cat, label)
-                if section:
-                    sections.append(section)
-
-        if global_facts:
-            sections.append("### Global\n")
-            for cat, label in [
-                ("pitfall", "PITFALLS"),
-                ("tool-quirk", "QUIRKS"),
-                ("pattern", "PATTERNS"),
-                ("fact", "FACTS"),
-            ]:
-                section = render_facts_section(global_facts, cat, label)
-                if section:
-                    sections.append(section)
-
-        if agent_facts:
-            sections.append(f"### Agent Notes ({agent})\n")
-            for cat, label in [
-                ("pitfall", "PITFALLS"),
-                ("tool-quirk", "QUIRKS"),
-                ("pattern", "PATTERNS"),
-                ("fact", "FACTS"),
-            ]:
-                section = render_facts_section(agent_facts, cat, label)
-                if section:
-                    sections.append(section)
-
-    # Per-agent markdown knowledge
-    if agent:
-        agent_md = load_agent_knowledge(agent)
-        if agent_md:
-            sections.append(f"### Agent Profile: {agent}\n")
-            sections.append(agent_md + "\n")
-
-    # Global markdown knowledge
-    global_chunks = load_global_knowledge()
-    if global_chunks:
-        sections.append("### Global Notes\n")
-        sections.extend(chunk + "\n" for chunk in global_chunks)
-
-    # If nothing was found
-    if len(sections) == 1:
-        sections.append("_No relevant knowledge found. Starting fresh._\n")
-        if not facts:
-            sections.append(
-                "_Knowledge store is empty. Run the harvester to populate it._\n"
-            )
-
-    # Join and truncate
-    context = "\n".join(sections)
-    context = truncate_to_tokens(context, max_tokens)
-
-    return context
-
-
-def main():
-    parser = argparse.ArgumentParser(
-        description="Assemble pre-session context from knowledge store"
-    )
-    parser.add_argument(
-        "--repo",
-        type=str,
-        default=None,
-        help="Repository name to filter facts by",
-    )
-    parser.add_argument(
-        "--agent",
-        type=str,
-        default=None,
-        help="Agent type to filter facts by (e.g., mimo-sprint, groq-fast)",
-    )
-    parser.add_argument(
-        "--global",
-        dest="include_global",
-        action="store_true",
-        default=True,
-        help="Include global facts (default: true)",
-    )
-    parser.add_argument(
-        "--no-global",
-        dest="include_global",
-        action="store_false",
-        help="Exclude global facts",
-    )
-    parser.add_argument(
-        "--max-tokens",
-        type=int,
-        default=2000,
-        help="Maximum token count for output (default: 2000)",
-    )
-    parser.add_argument(
-        "--index",
-        type=str,
-        default=None,
-        help="Path to index.json (default: knowledge/index.json)",
-    )
-    parser.add_argument(
-        "--json",
-        dest="output_json",
-        action="store_true",
-        help="Output raw JSON instead of markdown",
-    )
-
-    args = parser.parse_args()
-
-    index_path = Path(args.index) if args.index else INDEX_PATH
-
-    if args.output_json:
-        # JSON mode: return the filtered, sorted facts
-        index = load_index(index_path)
-        facts = index.get("facts", [])
-        filtered = filter_facts(
-            facts,
-            repo=args.repo,
-            agent=args.agent,
-            include_global=args.include_global,
-        )
-        sorted_facts = sort_facts(filtered)
-        output = {
-            "repo": args.repo,
-            "agent": args.agent,
-            "include_global": args.include_global,
-            "total_indexed": len(facts),
-            "matched": len(sorted_facts),
-            "facts": sorted_facts,
-        }
-        print(json.dumps(output, indent=2))
-    else:
-        # Markdown mode: full bootstrap context
-        context = build_bootstrap_context(
-            repo=args.repo,
-            agent=args.agent,
-            include_global=args.include_global,
-            max_tokens=args.max_tokens,
-            index_path=index_path,
-        )
-        print(context)
-
-    return 0
-
-
-if __name__ == "__main__":
-    sys.exit(main())
--- a/scripts/dead_code_detector.py
+++ b/scripts/dead_code_detector.py
@@ -1,282 +0,0 @@
-#!/usr/bin/env python3
-"""
-Dead Code Detector for Python Codebases
-
-AST-based analysis to find defined but never-called functions and classes.
-Excludes entry points, plugin hooks, __init__ exports.
-
-Usage:
-  python3 scripts/dead_code_detector.py /path/to/repo/
-  python3 scripts/dead_code_detector.py hermes-agent/ --format json
-  python3 scripts/dead_code_detector.py . --exclude tests/,venv/
-
-Output: file:line, function/class name, last git author (if available)
-"""
-
-import argparse
-import ast
-import json
-import os
-import subprocess
-import sys
-from collections import defaultdict
-from pathlib import Path
-from typing import Optional
-
-
-# Names that are expected to be unused (entry points, protocol methods, etc.)
-SAFE_UNUSED_PATTERNS = {
-    # Python dunders
-    "__init__", "__str__", "__repr__", "__eq__", "__hash__", "__len__",
-    "__getitem__", "__setitem__", "__contains__", "__iter__", "__next__",
-    "__enter__", "__exit__", "__call__", "__bool__", "__del__",
-    "__post_init__", "__class_getitem__",
-    # Common entry points
-    "main", "app", "handler", "setup", "teardown", "fixture",
-    # pytest
-    "conftest", "test_", "pytest_",  # prefix patterns
-    # Protocols / abstract
-    "abstractmethod", "abc_",
-}
-
-
-def is_safe_unused(name: str, filepath: str) -> bool:
-    """Check if an unused name is expected to be unused."""
-    # Test files are exempt
-    if "test" in filepath.lower():
-        return True
-
-    # Known patterns
-    for pattern in SAFE_UNUSED_PATTERNS:
-        if name.startswith(pattern) or name == pattern:
-            return True
-
-    # __init__.py exports are often unused internally
-    if filepath.endswith("__init__.py"):
-        return True
-
-    return False
-
-
-def get_git_blame(filepath: str, lineno: int) -> Optional[str]:
-    """Get last author of a line via git blame."""
-    try:
-        result = subprocess.run(
-            ["git", "blame", "-L", f"{lineno},{lineno}", "--porcelain", filepath],
-            capture_output=True, text=True, timeout=5
-        )
-        for line in result.stdout.split("\n"):
-            if line.startswith("author "):
-                return line[7:]
-    except:
-        pass
-    return None
-
-
-class DefinitionCollector(ast.NodeVisitor):
-    """Collect all function and class definitions."""
-
-    def __init__(self):
-        self.definitions = []  # (name, type, lineno, filepath)
-
-    def visit_FunctionDef(self, node):
-        self.definitions.append((node.name, "function", node.lineno))
-        self.generic_visit(node)
-
-    def visit_AsyncFunctionDef(self, node):
-        self.definitions.append((node.name, "async_function", node.lineno))
-        self.generic_visit(node)
-
-    def visit_ClassDef(self, node):
-        self.definitions.append((node.name, "class", node.lineno))
-        self.generic_visit(node)
-
-
-class NameUsageCollector(ast.NodeVisitor):
-    """Collect all name references (calls, imports, attribute access)."""
-
-    def __init__(self):
-        self.names = set()
-        self.calls = set()
-        self.imports = set()
-
-    def visit_Name(self, node):
-        self.names.add(node.id)
-        self.generic_visit(node)
-
-    def visit_Attribute(self, node):
-        if isinstance(node.value, ast.Name):
-            self.names.add(node.value.id)
-        self.generic_visit(node)
-
-    def visit_Call(self, node):
-        if isinstance(node.func, ast.Name):
-            self.calls.add(node.func.id)
-        elif isinstance(node.func, ast.Attribute):
-            if isinstance(node.func.value, ast.Name):
-                self.names.add(node.func.value.id)
-            self.calls.add(node.func.attr)
-        self.generic_visit(node)
-
-    def visit_Import(self, node):
-        for alias in node.names:
-            self.imports.add(alias.asname or alias.name)
-        self.generic_visit(node)
-
-    def visit_ImportFrom(self, node):
-        for alias in node.names:
-            self.imports.add(alias.asname or alias.name)
-        self.generic_visit(node)
-
-
-def analyze_file(filepath: str) -> dict:
-    """Analyze a single Python file for dead code."""
-    path = Path(filepath)
-    try:
-        content = path.read_text()
-        tree = ast.parse(content, filename=str(filepath))
-    except (SyntaxError, UnicodeDecodeError):
-        return {"error": f"Could not parse {filepath}"}
-
-    # Collect definitions
-    def_collector = DefinitionCollector()
-    def_collector.visit(tree)
-    definitions = def_collector.definitions
-
-    # Collect usage
-    usage_collector = NameUsageCollector()
-    usage_collector.visit(tree)
-    used_names = usage_collector.names | usage_collector.calls | usage_collector.imports
-
-    # Also scan the entire repo for references to this file's definitions
-    # (this is done at the repo level, not file level)
-
-    dead = []
-    for name, def_type, lineno in definitions:
-        if name.startswith("_") and not name.startswith("__"):
-            # Private functions — might be used externally, less likely dead
-            pass
-
-        if name not in used_names:
-            if not is_safe_unused(name, filepath):
-                dead.append({
-                    "name": name,
-                    "type": def_type,
-                    "file": filepath,
-                    "line": lineno,
-                })
-
-    return {"definitions": len(definitions), "dead": dead}
-
-
-def scan_repo(repo_path: str, exclude_patterns: list = None) -> dict:
-    """Scan an entire repo for dead code."""
-    path = Path(repo_path)
-    exclude = exclude_patterns or ["venv", ".venv", "node_modules", "__pycache__",
-                                    ".git", "dist", "build", ".tox", "vendor"]
-
-    all_definitions = {}  # name -> [{file, line, type}]
-    all_files = []
-    dead_code = []
-
-    # First pass: collect all definitions across repo
-    for fpath in path.rglob("*.py"):
-        parts = fpath.parts
-        if any(ex in parts for ex in exclude):
-            continue
-        if fpath.name.startswith("."):
-            continue
-
-        try:
-            content = fpath.read_text(errors="ignore")
-            tree = ast.parse(content, filename=str(fpath))
-        except:
-            continue
-
-        all_files.append(str(fpath))
-        collector = DefinitionCollector()
-        collector.visit(tree)
-
-        for name, def_type, lineno in collector.definitions:
-            rel_path = str(fpath.relative_to(path))
-            if name not in all_definitions:
-                all_definitions[name] = []
-            all_definitions[name].append({
-                "file": rel_path,
-                "line": lineno,
-                "type": def_type,
-            })
-
-    # Second pass: check each name for usage across entire repo
-    all_used_names = set()
-    for fpath_str in all_files:
-        try:
-            content = Path(fpath_str).read_text(errors="ignore")
-            tree = ast.parse(content)
-        except:
-            continue
-        usage = NameUsageCollector()
-        usage.visit(tree)
-        all_used_names.update(usage.names)
-        all_used_names.update(usage.calls)
-        all_used_names.update(usage.imports)
-
-    # Find dead code
-    for name, locations in all_definitions.items():
-        if name not in all_used_names:
-            for loc in locations:
-                if not is_safe_unused(name, loc["file"]):
-                    dead_code.append({
-                        "name": name,
-                        "type": loc["type"],
-                        "file": loc["file"],
-                        "line": loc["line"],
-                    })
-
-    return {
-        "repo": path.name,
-        "files_scanned": len(all_files),
-        "total_definitions": sum(len(v) for v in all_definitions.values()),
-        "dead_code_count": len(dead_code),
-        "dead_code": sorted(dead_code, key=lambda x: (x["file"], x["line"])),
-    }
-
-
-def main():
-    parser = argparse.ArgumentParser(description="Find dead code in Python codebases")
-    parser.add_argument("repo", help="Repository path to scan")
-    parser.add_argument("--format", choices=["text", "json"], default="text")
-    parser.add_argument("--exclude", help="Comma-separated patterns to exclude")
-    parser.add_argument("--git-blame", action="store_true", help="Include git blame info")
-    args = parser.parse_args()
-
-    exclude = args.exclude.split(",") if args.exclude else None
-    result = scan_repo(args.repo, exclude)
-
-    if args.format == "json":
-        print(json.dumps(result, indent=2))
-    else:
-        print(f"Dead Code Report: {result['repo']}")
-        print(f"Files scanned: {result['files_scanned']}")
-        print(f"Total definitions: {result['total_definitions']}")
-        print(f"Dead code found: {result['dead_code_count']}")
-        print()
-
-        if result["dead_code"]:
-            print(f"{'File':<45} {'Line':>4} {'Type':<10} {'Name'}")
-            print("-" * 85)
-            for item in result["dead_code"]:
-                author = ""
-                if args.git_blame:
-                    author = get_git_blame(
-                        os.path.join(args.repo, item["file"]),
-                        item["line"]
-                    ) or ""
-                    author = f" ({author})" if author else ""
-                print(f"{item['file']:<45} {item['line']:>4} {item['type']:<10} {item['name']}{author}")
-        else:
-            print("No dead code detected!")
-
-
-if __name__ == "__main__":
-    main()
--- a/scripts/dedup.py
+++ b/scripts/dedup.py
@@ -1,317 +0,0 @@
-#!/usr/bin/env python3
-"""
-dedup.py — Knowledge deduplication: content hash + semantic similarity.
-
-Deduplicates harvested knowledge entries to avoid training on duplicates.
-Uses content hashing for exact matches and token overlap for near-duplicates.
-
-Usage:
-    python3 dedup.py --input knowledge/index.json --output knowledge/index_deduped.json
-    python3 dedup.py --input knowledge/index.json --dry-run
-    python3 dedup.py --test  # Run built-in dedup test
-"""
-
-import argparse
-import hashlib
-import json
-import re
-import sys
-from pathlib import Path
-from typing import List, Dict, Optional, Tuple
-
-
-def normalize_text(text: str) -> str:
-    """Normalize text for hashing: lowercase, collapse whitespace, strip."""
-    text = text.lower().strip()
-    text = re.sub(r'\s+', ' ', text)
-    return text
-
-
-def content_hash(text: str) -> str:
-    """SHA256 hash of normalized text for exact dedup."""
-    normalized = normalize_text(text)
-    return hashlib.sha256(normalized.encode('utf-8')).hexdigest()
-
-
-def tokenize(text: str) -> set:
-    """Simple tokenizer: lowercase words, 3+ chars."""
-    words = re.findall(r'[a-z0-9_]{3,}', text.lower())
-    return set(words)
-
-
-def token_similarity(a: str, b: str) -> float:
-    """Token-based Jaccard similarity (0.0-1.0).
-
-    Fast local alternative to embedding similarity.
-    Good enough for near-duplicate detection.
-    """
-    tokens_a = tokenize(a)
-    tokens_b = tokenize(b)
-    if not tokens_a or not tokens_b:
-        return 0.0
-    intersection = tokens_a & tokens_b
-    union = tokens_a | tokens_b
-    return len(intersection) / len(union)
-
-
-def quality_score(fact: dict) -> float:
-    """Compute quality score for merge ranking.
-
-    Higher is better. Factors:
-    - confidence (0-1)
-    - source_count (more confirmations = better)
-    - has tags (richer metadata)
-    """
-    confidence = fact.get('confidence', 0.5)
-    source_count = fact.get('source_count', 1)
-    has_tags = 1.0 if fact.get('tags') else 0.0
-    has_related = 1.0 if fact.get('related') else 0.0
-
-    # Weighted composite
-    score = (
-        confidence * 0.5 +
-        min(source_count / 10, 1.0) * 0.3 +
-        has_tags * 0.1 +
-        has_related * 0.1
-    )
-    return round(score, 4)
-
-
-def merge_facts(keep: dict, drop: dict) -> dict:
-    """Merge two near-duplicate facts, keeping higher-quality fields.
-
-    The 'keep' fact is enriched with metadata from 'drop'.
-    """
-    # Merge tags (union)
-    keep_tags = set(keep.get('tags', []))
-    drop_tags = set(drop.get('tags', []))
-    keep['tags'] = sorted(keep_tags | drop_tags)
-
-    # Merge related (union)
-    keep_related = set(keep.get('related', []))
-    drop_related = set(drop.get('related', []))
-    keep['related'] = sorted(keep_related | drop_related)
-
-    # Update source_count (sum)
-    keep['source_count'] = keep.get('source_count', 1) + drop.get('source_count', 1)
-
-    # Update confidence (max — we've now seen it from multiple sources)
-    keep['confidence'] = max(keep.get('confidence', 0), drop.get('confidence', 0))
-
-    # Track that we merged
-    if '_merged_from' not in keep:
-        keep['_merged_from'] = []
-    keep['_merged_from'].append(drop.get('id', 'unknown'))
-
-    return keep
-
-
-def dedup_facts(
-    facts: List[dict],
-    exact_threshold: float = 1.0,
-    near_threshold: float = 0.95,
-    dry_run: bool = False,
-) -> Tuple[List[dict], dict]:
-    """Deduplicate a list of knowledge facts.
-
-    Args:
-        facts: List of fact dicts (from index.json)
-        exact_threshold: Hash match = exact duplicate
-        near_threshold: Token similarity above this = near-duplicate
-        dry_run: If True, don't modify, just report
-
-    Returns:
-        (deduped_facts, stats_dict)
-    """
-    if not facts:
-        return [], {"total": 0, "exact_dupes": 0, "near_dupes": 0, "unique": 0}
-
-    # Phase 1: Exact dedup by content hash
-    hash_seen = {}  # hash -> index in deduped list
-    exact_dupes = 0
-    deduped = []
-
-    for fact in facts:
-        text = fact.get('fact', '')
-        h = content_hash(text)
-
-        if h in hash_seen:
-            # Exact duplicate — merge metadata into existing
-            existing_idx = hash_seen[h]
-            if not dry_run:
-                deduped[existing_idx] = merge_facts(deduped[existing_idx], fact)
-            exact_dupes += 1
-        else:
-            hash_seen[h] = len(deduped)
-            deduped.append(fact)
-
-    # Phase 2: Near-dup by token similarity
-    near_dupes = 0
-    i = 0
-    while i < len(deduped):
-        j = i + 1
-        while j < len(deduped):
-            sim = token_similarity(deduped[i].get('fact', ''), deduped[j].get('fact', ''))
-            if sim >= near_threshold:
-                # Near-duplicate — keep higher quality
-                q_i = quality_score(deduped[i])
-                q_j = quality_score(deduped[j])
-                if q_i >= q_j:
-                    if not dry_run:
-                        deduped[i] = merge_facts(deduped[i], deduped[j])
-                    deduped.pop(j)
-                else:
-                    # j is higher quality — merge i into j, then remove i
-                    if not dry_run:
-                        deduped[j] = merge_facts(deduped[j], deduped[i])
-                    deduped.pop(i)
-                    break  # i changed, restart inner loop
-                near_dupes += 1
-            else:
-                j += 1
-        i += 1
-
-    stats = {
-        "total": len(facts),
-        "exact_dupes": exact_dupes,
-        "near_dupes": near_dupes,
-        "unique": len(deduped),
-        "removed": len(facts) - len(deduped),
-    }
-
-    return deduped, stats
-
-
-def dedup_index_file(
-    input_path: str,
-    output_path: Optional[str] = None,
-    near_threshold: float = 0.95,
-    dry_run: bool = False,
-) -> dict:
-    """Deduplicate an index.json file.
-
-    Args:
-        input_path: Path to index.json
-        output_path: Where to write deduped file (default: overwrite input)
-        near_threshold: Token similarity threshold for near-dupes
-        dry_run: Report only, don't write
-
-    Returns stats dict.
-    """
-    path = Path(input_path)
-    if not path.exists():
-        raise FileNotFoundError(f"Index file not found: {input_path}")
-
-    with open(path) as f:
-        data = json.load(f)
-
-    facts = data.get('facts', [])
-    deduped, stats = dedup_facts(facts, near_threshold=near_threshold, dry_run=dry_run)
-
-    if not dry_run:
-        data['facts'] = deduped
-        data['total_facts'] = len(deduped)
-        data['last_dedup'] = __import__('datetime').datetime.now(
-            __import__('datetime').timezone.utc
-        ).isoformat()
-
-        out_path = Path(output_path) if output_path else path
-        with open(out_path, 'w') as f:
-            json.dump(data, f, indent=2, ensure_ascii=False)
-
-    return stats
-
-
-def generate_test_duplicates(n: int = 20) -> List[dict]:
-    """Generate test facts with intentional duplicates for testing.
-
-    Creates n unique facts plus n/4 exact dupes and n/4 near-dupes.
-    """
-    import random
-    random.seed(42)
-
-    unique_facts = []
-    for i in range(n):
-        topic = random.choice(["git", "python", "docker", "rust", "nginx"])
-        tip = random.choice(["use verbose flags", "check logs first", "restart service", "clear cache", "update config"])
-        unique_facts.append({
-            "id": f"test:fact:{i:03d}",
-            "fact": f"When working with {topic}, always {tip} before deploying.",
-            "category": "fact",
-            "domain": "test",
-            "confidence": round(random.uniform(0.5, 1.0), 2),
-            "source_count": random.randint(1, 5),
-            "tags": [topic, "test"],
-        })
-
-    # Add exact duplicates (same text, different IDs)
-    duped = list(unique_facts)
-    for i in range(n // 4):
-        original = unique_facts[i]
-        dupe = dict(original)
-        dupe["id"] = f"test:fact:dup{i:03d}"
-        dupe["confidence"] = round(random.uniform(0.3, 0.8), 2)
-        duped.append(dupe)
-
-    # Add near-duplicates (slightly different phrasing)
-    for i in range(n // 4):
-        original = unique_facts[i]
-        near = dict(original)
-        near["id"] = f"test:fact:near{i:03d}"
-        near["fact"] = original["fact"].replace("always", "should").replace("before deploying", "prior to deployment")
-        near["confidence"] = round(random.uniform(0.4, 0.9), 2)
-        duped.append(near)
-
-    return duped
-
-
-def main():
-    parser = argparse.ArgumentParser(description="Knowledge deduplication")
-    parser.add_argument("--input", help="Path to index.json")
-    parser.add_argument("--output", help="Output path (default: overwrite input)")
-    parser.add_argument("--threshold", type=float, default=0.95,
-                        help="Near-dup similarity threshold (default: 0.95)")
-    parser.add_argument("--dry-run", action="store_true", help="Report only, don't write")
-    parser.add_argument("--test", action="store_true", help="Run built-in dedup test")
-    parser.add_argument("--json", action="store_true", help="JSON output")
-    args = parser.parse_args()
-
-    if args.test:
-        test_facts = generate_test_duplicates(20)
-        print(f"Generated {len(test_facts)} test facts (20 unique + dupes)")
-        deduped, stats = dedup_facts(test_facts, near_threshold=args.threshold)
-        print(f"\nDedup results:")
-        print(f"  Total input:    {stats['total']}")
-        print(f"  Exact dupes:    {stats['exact_dupes']}")
-        print(f"  Near dupes:     {stats['near_dupes']}")
-        print(f"  Unique output:  {stats['unique']}")
-        print(f"  Removed:        {stats['removed']}")
-
-        # Verify: should have ~20 unique (some merged)
-        assert stats['unique'] <= 20, f"Too many unique: {stats['unique']} > 20"
-        assert stats['unique'] >= 15, f"Too few unique: {stats['unique']} < 15"
-        assert stats['removed'] > 0, "No duplicates removed"
-        print("\nOK: Dedup test passed")
-        return
-
-    if not args.input:
-        print("ERROR: Provide --input or --test")
-        sys.exit(1)
-
-    stats = dedup_index_file(args.input, args.output, args.threshold, args.dry_run)
-
-    if args.json:
-        print(json.dumps(stats, indent=2))
-    else:
-        print(f"Dedup results:")
-        print(f"  Total input:    {stats['total']}")
-        print(f"  Exact dupes:    {stats['exact_dupes']}")
-        print(f"  Near dupes:     {stats['near_dupes']}")
-        print(f"  Unique output:  {stats['unique']}")
-        print(f"  Removed:        {stats['removed']}")
-        if args.dry_run:
-            print("  (dry run — no changes written)")
-
-
-if __name__ == "__main__":
-    main()
--- a/scripts/dependency_graph.py
+++ b/scripts/dependency_graph.py
@@ -1,249 +0,0 @@
-#!/usr/bin/env python3
-"""
-Cross-Repo Dependency Graph Builder
-
-Scans repos for import/require/reference patterns and builds a directed
-dependency graph. Detects circular dependencies. Outputs DOT and Mermaid.
-
-Usage:
-  python3 scripts/dependency_graph.py /path/to/repos/
-  python3 scripts/dependency_graph.py --repos repo1,repo2,repo3 --format mermaid
-  python3 scripts/dependency_graph.py --repos-dir /path/to/ --format dot --output deps.dot
-
-Patterns detected:
-  - Python: import X, from X import Y
-  - JavaScript: require("X"), import ... from "X"
-  - Go: import "X"
-  - Ansible: include_role, import_role
-  - Docker/Compose: image: X, depends_on
-  - Config references: repo-name in YAML/TOML/JSON
-"""
-
-import argparse
-import json
-import os
-import re
-import sys
-from collections import defaultdict
-from pathlib import Path
-
-
-# Known repo names for matching
-KNOWN_REPOS = [
-    "hermes-agent", "timmy-config", "timmy-home", "the-nexus", "the-door",
-    "the-beacon", "fleet-ops", "burn-fleet", "timmy-dispatch", "turboquant",
-    "compounding-intelligence", "the-playground", "second-son-of-timmy",
-    "ai-safety-review", "the-echo-pattern", "timmy-academy", "wolf",
-    "the-testament",
-]
-
-
-def normalize_repo_name(name: str) -> str:
-    """Normalize a repo name for comparison."""
-    return name.lower().replace("_", "-").replace(".git", "").strip()
-
-
-def scan_file_for_deps(filepath: str, content: str, own_repo: str) -> set:
-    """Scan a file's content for references to other repos."""
-    deps = set()
-    own_norm = normalize_repo_name(own_repo)
-
-    for repo in KNOWN_REPOS:
-        repo_norm = normalize_repo_name(repo)
-        if repo_norm == own_norm:
-            continue
-
-        # Direct name references
-        patterns = [
-            repo,  # exact name
-            repo.replace("-", "_"),  # underscore variant
-            repo.replace("-", ""),  # no separator
-            f"/{repo}/",  # path reference
-            f'"{repo}"',  # quoted
-            f"'{repo}'",  # single quoted
-            f"Timmy_Foundation/{repo}",  # full Gitea path
-            f"Timmy_Foundation.{repo}",  # Python module path
-        ]
-
-        for pattern in patterns:
-            if pattern in content:
-                deps.add(repo)
-                break
-
-    return deps
-
-
-def scan_repo(repo_path: str, repo_name: str = None) -> dict:
-    """Scan a repo directory for dependencies."""
-    path = Path(repo_path)
-    if not path.is_dir():
-        return {"error": f"Not a directory: {repo_path}"}
-
-    if not repo_name:
-        repo_name = path.name
-
-    deps = set()
-    files_scanned = 0
-    exts = {".py", ".js", ".ts", ".go", ".yaml", ".yml", ".toml", ".json",
-            ".md", ".sh", ".bash", ".Dockerfile", ".tf", ".hcl"}
-
-    for fpath in path.rglob("*"):
-        if not fpath.is_file():
-            continue
-        if fpath.suffix not in exts:
-            continue
-        # Skip common non-source dirs
-        parts = fpath.parts
-        if any(p in (".git", "node_modules", "__pycache__", ".venv", "venv",
-                      "vendor", "dist", "build", ".tox") for p in parts):
-            continue
-
-        try:
-            content = fpath.read_text(errors="ignore")
-        except:
-            continue
-
-        file_deps = scan_file_for_deps(str(fpath), content, repo_name)
-        deps.update(file_deps)
-        files_scanned += 1
-
-    return {
-        "repo": repo_name,
-        "dependencies": sorted(deps),
-        "files_scanned": files_scanned,
-    }
-
-
-def detect_cycles(graph: dict) -> list:
-    """Detect circular dependencies using DFS."""
-    cycles = []
-    visited = set()
-    rec_stack = set()
-
-    def dfs(node, path):
-        visited.add(node)
-        rec_stack.add(node)
-
-        for neighbor in graph.get(node, {}).get("dependencies", []):
-            if neighbor not in visited:
-                result = dfs(neighbor, path + [neighbor])
-                if result:
-                    return result
-            elif neighbor in rec_stack:
-                cycle_start = path.index(neighbor)
-                return path[cycle_start:] + [neighbor]
-
-        rec_stack.remove(node)
-        return None
-
-    for node in graph:
-        if node not in visited:
-            cycle = dfs(node, [node])
-            if cycle:
-                cycles.append(cycle)
-
-    return cycles
-
-
-def to_dot(graph: dict) -> str:
-    """Generate DOT format output."""
-    lines = ["digraph dependencies {"]
-    lines.append("  rankdir=LR;")
-    lines.append('  node [shape=box, style=filled, fillcolor="#1a1a2e", fontcolor="#e6edf3"];')
-    lines.append('  edge [color="#4a4a6a"];')
-    lines.append("")
-
-    for repo, data in sorted(graph.items()):
-        dep_count = len(data.get("dependencies", []))
-        fill = "#2d1b69" if dep_count > 2 else "#16213e"
-        lines.append(f'  "{repo}" [fillcolor="{fill}"];')
-        for dep in data.get("dependencies", []):
-            lines.append(f'  "{repo}" -> "{dep}";')
-
-    lines.append("}")
-    return "\n".join(lines)
-
-
-def to_mermaid(graph: dict) -> str:
-    """Generate Mermaid format output."""
-    lines = ["graph LR"]
-
-    for repo, data in sorted(graph.items()):
-        for dep in data.get("dependencies", []):
-            lines.append(f"    {repo.replace('-','_')} --> {dep.replace('-','_')}")
-
-    # Add node labels
-    lines.append("")
-    for repo in sorted(graph.keys()):
-        lines.append(f"    {repo.replace('-','_')}[{repo}]")
-
-    return "\n".join(lines)
-
-
-def main():
-    parser = argparse.ArgumentParser(description="Build cross-repo dependency graph")
-    parser.add_argument("repos_dir", nargs="?", help="Directory containing repos")
-    parser.add_argument("--repos", help="Comma-separated list of repo paths")
-    parser.add_argument("--format", choices=["dot", "mermaid", "json"], default="json")
-    parser.add_argument("--output", "-o", help="Output file (default: stdout)")
-    parser.add_argument("--cycles-only", action="store_true", help="Only report cycles")
-    args = parser.parse_args()
-
-    results = {}
-    repo_paths = []
-
-    if args.repos:
-        repo_paths = [p.strip() for p in args.repos.split(",")]
-    elif args.repos_dir:
-        base = Path(args.repos_dir)
-        repo_paths = [str(p) for p in base.iterdir() if p.is_dir() and not p.name.startswith(".")]
-    else:
-        parser.print_help()
-        sys.exit(1)
-
-    for rpath in repo_paths:
-        name = Path(rpath).name
-        print(f"Scanning {name}...", file=sys.stderr)
-        result = scan_repo(rpath, name)
-        if "error" not in result:
-            results[name] = result
-
-    # Detect cycles
-    cycles = detect_cycles(results)
-
-    if args.cycles_only:
-        if cycles:
-            print("CIRCULAR DEPENDENCIES DETECTED:")
-            for cycle in cycles:
-                print(f"  {' -> '.join(cycle)}")
-            sys.exit(1)
-        else:
-            print("No circular dependencies found.")
-            sys.exit(0)
-
-    # Output
-    output = {}
-    if args.format == "dot":
-        output = to_dot(results)
-    elif args.format == "mermaid":
-        output = to_mermaid(results)
-    else:
-        output = json.dumps({
-            "repos": results,
-            "cycles": cycles,
-            "summary": {
-                "total_repos": len(results),
-                "total_deps": sum(len(r["dependencies"]) for r in results.values()),
-                "cycles_found": len(cycles),
-            }
-        }, indent=2)
-
-    if args.output:
-        Path(args.output).write_text(output)
-        print(f"Written to {args.output}", file=sys.stderr)
-    else:
-        print(output)
-
-
-if __name__ == "__main__":
-    main()
--- a/scripts/diff_analyzer.py
+++ b/scripts/diff_analyzer.py
@@ -1,216 +0,0 @@
-#!/usr/bin/env python3
-"""
-Diff Analyzer — Parse unified diffs and categorize every change.
-
-Pipeline 6.1 for Compounding Intelligence.
-"""
-
-import re
-from dataclasses import dataclass, field, asdict
-from enum import Enum
-from typing import List, Dict, Any, Optional
-
-
-class ChangeCategory(Enum):
-    ADDED = "added"
-    DELETED = "deleted"
-    MODIFIED = "modified"
-    MOVED = "moved"
-    CONTEXT = "context"
-
-
-@dataclass
-class Hunk:
-    """A single diff hunk with header, line ranges, and category."""
-    header: str
-    old_start: int
-    old_count: int
-    new_start: int
-    new_count: int
-    lines: List[str] = field(default_factory=list)
-    category: ChangeCategory = ChangeCategory.CONTEXT
-
-    def to_dict(self) -> Dict[str, Any]:
-        d = asdict(self)
-        d["category"] = self.category.value
-        return d
-
-
-@dataclass
-class FileChange:
-    """A single file's changes."""
-    path: str
-    old_path: Optional[str] = None  # For renames
-    hunks: List[Hunk] = field(default_factory=list)
-    added_lines: int = 0
-    deleted_lines: int = 0
-    is_new: bool = False
-    is_deleted: bool = False
-    is_renamed: bool = False
-    is_binary: bool = False
-
-    def to_dict(self) -> Dict[str, Any]:
-        return {
-            "path": self.path,
-            "old_path": self.old_path,
-            "hunks": [h.to_dict() for h in self.hunks],
-            "added_lines": self.added_lines,
-            "deleted_lines": self.deleted_lines,
-            "is_new": self.is_new,
-            "is_deleted": self.is_deleted,
-            "is_renamed": self.is_renamed,
-            "is_binary": self.is_binary,
-        }
-
-
-@dataclass
-class ChangeSummary:
-    """Aggregate stats + per-file breakdown."""
-    files: List[FileChange] = field(default_factory=list)
-    total_added: int = 0
-    total_deleted: int = 0
-    total_files_changed: int = 0
-    total_hunks: int = 0
-    new_files: int = 0
-    deleted_files: int = 0
-    renamed_files: int = 0
-    binary_files: int = 0
-
-    def to_dict(self) -> Dict[str, Any]:
-        return {
-            "total_files_changed": self.total_files_changed,
-            "total_added": self.total_added,
-            "total_deleted": self.total_deleted,
-            "total_hunks": self.total_hunks,
-            "new_files": self.new_files,
-            "deleted_files": self.deleted_files,
-            "renamed_files": self.renamed_files,
-            "binary_files": self.binary_files,
-            "files": [f.to_dict() for f in self.files],
-        }
-
-
-class DiffAnalyzer:
-    """Parses unified diff format and produces structured ChangeSummary."""
-
-    HUNK_HEADER_RE = re.compile(r"^@@\s+-(\d+)(?:,(\d+))?\s+\+(\d+)(?:,(\d+))?\s+@@(.*)$")
-    DIFF_FILE_RE = re.compile(r"^diff --git a/(.*) b/(.*)")
-    RENAME_RE = re.compile(r"^rename from (.+)$")
-    RENAME_TO_RE = re.compile(r"^rename to (.+)$")
-    NEW_FILE_RE = re.compile(r"^new file mode")
-    DELETED_FILE_RE = re.compile(r"^deleted file mode")
-    BINARY_RE = re.compile(r"^Binary files .* differ")
-
-    def analyze(self, diff_text: str) -> ChangeSummary:
-        """Parse a unified diff and return a ChangeSummary."""
-        summary = ChangeSummary()
-        if not diff_text or not diff_text.strip():
-            return summary
-
-        # Split diff into per-file sections
-        file_diffs = self._split_files(diff_text)
-
-        for file_diff in file_diffs:
-            fc = self._parse_file_diff(file_diff)
-            summary.files.append(fc)
-            summary.total_added += fc.added_lines
-            summary.total_deleted += fc.deleted_lines
-            summary.total_hunks += len(fc.hunks)
-            if fc.is_new:
-                summary.new_files += 1
-            if fc.is_deleted:
-                summary.deleted_files += 1
-            if fc.is_renamed:
-                summary.renamed_files += 1
-            if fc.is_binary:
-                summary.binary_files += 1
-
-        summary.total_files_changed = len(summary.files)
-        return summary
-
-    def _split_files(self, diff_text: str) -> List[str]:
-        """Split a multi-file diff into individual file diffs."""
-        lines = diff_text.split("\n")
-        chunks = []
-        current = []
-        for line in lines:
-            if line.startswith("diff --git ") and current:
-                chunks.append("\n".join(current))
-                current = [line]
-            else:
-                current.append(line)
-        if current:
-            chunks.append("\n".join(current))
-        return chunks
-
-    def _parse_file_diff(self, diff_text: str) -> FileChange:
-        """Parse a single file's diff section."""
-        lines = diff_text.split("\n")
-        fc = FileChange(path="")
-
-        # Extract file paths
-        for line in lines:
-            m = self.DIFF_FILE_RE.match(line)
-            if m:
-                fc.path = m.group(2)
-                break
-
-        # Check for special states
-        for line in lines:
-            if self.NEW_FILE_RE.match(line):
-                fc.is_new = True
-            elif self.DELETED_FILE_RE.match(line):
-                fc.is_deleted = True
-            elif self.RENAME_RE.match(line):
-                fc.old_path = m.group(1) if (m := self.RENAME_RE.match(line)) else None
-                fc.is_renamed = True
-            elif self.BINARY_RE.match(line):
-                fc.is_binary = True
-                return fc  # No hunks for binary
-
-        # Rename TO
-        for line in lines:
-            m = self.RENAME_TO_RE.match(line)
-            if m and fc.is_renamed:
-                fc.path = m.group(1)
-
-        # Parse hunks
-        current_hunk = None
-        for line in lines:
-            m = self.HUNK_HEADER_RE.match(line)
-            if m:
-                if current_hunk:
-                    self._classify_hunk(current_hunk, fc)
-                    fc.hunks.append(current_hunk)
-                current_hunk = Hunk(
-                    header=m.group(5).strip(),
-                    old_start=int(m.group(1)),
-                    old_count=int(m.group(2) or 1),
-                    new_start=int(m.group(3)),
-                    new_count=int(m.group(4) or 1),
-                )
-            elif current_hunk and (line.startswith("+") or line.startswith("-") or line.startswith(" ")):
-                current_hunk.lines.append(line)
-
-        if current_hunk:
-            self._classify_hunk(current_hunk, fc)
-            fc.hunks.append(current_hunk)
-
-        return fc
-
-    def _classify_hunk(self, hunk: Hunk, fc: FileChange):
-        """Classify a hunk and count lines."""
-        added = sum(1 for l in hunk.lines if l.startswith("+"))
-        deleted = sum(1 for l in hunk.lines if l.startswith("-"))
-
-        fc.added_lines += added
-        fc.deleted_lines += deleted
-
-        if added > 0 and deleted == 0:
-            hunk.category = ChangeCategory.ADDED
-        elif deleted > 0 and added == 0:
-            hunk.category = ChangeCategory.DELETED
-        elif added > 0 and deleted > 0:
-            hunk.category = ChangeCategory.MODIFIED
-        else:
-            hunk.category = ChangeCategory.CONTEXT
--- a/scripts/freshness.py
+++ b/scripts/freshness.py
@@ -1,387 +0,0 @@
-#!/usr/bin/env python3
-"""
-Knowledge Freshness Cron — Detect stale entries from code changes (Issue #200)
-
-Automatically detects when knowledge entries become stale due to code changes.
-
-Detection Method:
-1. Track source file hash alongside knowledge entry
-2. Compare current file hashes vs stored
-3. Mismatch → flag entry as potentially stale
-4. Report stale entries and optionally re-extract
-
-Usage:
-    python3 scripts/freshness.py --knowledge-dir knowledge/
-    python3 scripts/freshness.py --knowledge-dir knowledge/ --json
-    python3 scripts/freshness.py --knowledge-dir knowledge/ --repo /path/to/repo
-    python3 scripts/freshness.py --knowledge-dir knowledge/ --auto-reextract
-"""
-
-import argparse
-import hashlib
-import json
-import os
-import subprocess
-import sys
-import yaml
-from datetime import datetime, timezone
-from pathlib import Path
-from typing import Dict, List, Any, Optional, Tuple
-
-
-def compute_file_hash(filepath: str) -> Optional[str]:
-    """Compute SHA-256 hash of a file. Returns None if file doesn't exist."""
-    try:
-        with open(filepath, "rb") as f:
-            return "sha256:" + hashlib.sha256(f.read()).hexdigest()
-    except (FileNotFoundError, IsADirectoryError, PermissionError):
-        return None
-
-
-def get_git_file_changes(repo_path: str, days: int = 1) -> Dict[str, List[str]]:
-    """
-    Get files changed in git in the last N days.
-    
-    Returns dict with 'modified', 'added', 'deleted' lists of file paths.
-    """
-    changes = {"modified": [], "added": [], "deleted": []}
-    
-    try:
-        # Get commits from last N days
-        cmd = [
-            "git", "-C", repo_path, "log",
-            f"--since={days} days ago",
-            "--name-status",
-            "--pretty=format:",
-            "--diff-filter=MAD"
-        ]
-        result = subprocess.run(cmd, capture_output=True, text=True, timeout=30)
-        
-        if result.returncode != 0:
-            return changes
-        
-        for line in result.stdout.splitlines():
-            line = line.strip()
-            if not line:
-                continue
-            
-            parts = line.split('\t', 1)
-            if len(parts) != 2:
-                continue
-            
-            status, filepath = parts
-            if status == 'M':
-                changes["modified"].append(filepath)
-            elif status == 'A':
-                changes["added"].append(filepath)
-            elif status == 'D':
-                changes["deleted"].append(filepath)
-    
-    except (subprocess.TimeoutExpired, FileNotFoundError):
-        pass
-    
-    # Deduplicate
-    for key in changes:
-        changes[key] = list(set(changes[key]))
-    
-    return changes
-
-
-def load_knowledge_entries(knowledge_dir: str) -> List[Dict[str, Any]]:
-    """
-    Load knowledge entries from YAML files in the knowledge directory.
-    
-    Supports:
-    - knowledge/index.json (legacy format)
-    - knowledge/global/*.yaml
-    - knowledge/repos/*.yaml
-    - knowledge/agents/*.yaml
-    """
-    entries = []
-    
-    # Load from index.json if exists
-    index_path = os.path.join(knowledge_dir, "index.json")
-    if os.path.exists(index_path):
-        try:
-            with open(index_path) as f:
-                data = json.load(f)
-            for fact in data.get("facts", []):
-                entries.append({
-                    "source": "index.json",
-                    "fact": fact.get("fact", ""),
-                    "source_file": fact.get("source_file"),
-                    "source_hash": fact.get("source_hash"),
-                    "category": fact.get("category", "unknown"),
-                    "confidence": fact.get("confidence", 0.5)
-                })
-        except (json.JSONDecodeError, KeyError):
-            pass
-    
-    # Load from YAML files
-    for subdir in ["global", "repos", "agents"]:
-        subdir_path = os.path.join(knowledge_dir, subdir)
-        if not os.path.isdir(subdir_path):
-            continue
-        
-        for filename in os.listdir(subdir_path):
-            if not filename.endswith((".yaml", ".yml")):
-                continue
-            
-            filepath = os.path.join(subdir_path, filename)
-            try:
-                with open(filepath) as f:
-                    data = yaml.safe_load(f)
-                
-                if not data or not isinstance(data, dict):
-                    continue
-                
-                # Extract entries from YAML structure
-                for key, value in data.items():
-                    if isinstance(value, list):
-                        for item in value:
-                            if isinstance(item, dict):
-                                entries.append({
-                                    "source": f"{subdir}/{filename}",
-                                    "fact": item.get("description", item.get("fact", "")),
-                                    "source_file": item.get("source_file"),
-                                    "source_hash": item.get("source_hash"),
-                                    "category": item.get("category", "unknown"),
-                                    "confidence": item.get("confidence", 0.5)
-                                })
-                    elif isinstance(value, dict):
-                        entries.append({
-                            "source": f"{subdir}/{filename}",
-                            "fact": value.get("description", value.get("fact", "")),
-                            "source_file": value.get("source_file"),
-                            "source_hash": value.get("source_hash"),
-                            "category": value.get("category", "unknown"),
-                            "confidence": value.get("confidence", 0.5)
-                        })
-            except (yaml.YAMLError, IOError):
-                pass
-    
-    return entries
-
-
-def check_freshness(knowledge_dir: str, repo_root: str = ".", 
-                   days: int = 1) -> Dict[str, Any]:
-    """
-    Check freshness of knowledge entries against recent code changes.
-    
-    Returns:
-        {
-            "timestamp": ISO timestamp,
-            "total_entries": int,
-            "stale_entries": [...],
-            "fresh_entries": [...],
-            "git_changes": {...},
-            "summary": {...}
-        }
-    """
-    entries = load_knowledge_entries(knowledge_dir)
-    git_changes = get_git_file_changes(repo_root, days)
-    
-    stale_entries = []
-    fresh_entries = []
-    
-    for entry in entries:
-        source_file = entry.get("source_file")
-        if not source_file:
-            # Entry without source file reference
-            fresh_entries.append({**entry, "status": "no_source"})
-            continue
-        
-        # Check if source file was recently modified
-        is_stale = False
-        reason = ""
-        
-        if source_file in git_changes["modified"]:
-            is_stale = True
-            reason = "source_modified"
-        elif source_file in git_changes["deleted"]:
-            is_stale = True
-            reason = "source_deleted"
-        elif source_file in git_changes["added"]:
-            is_stale = True
-            reason = "source_added"
-        
-        # Also check hash if available
-        stored_hash = entry.get("source_hash")
-        if stored_hash:
-            full_path = os.path.join(repo_root, source_file)
-            current_hash = compute_file_hash(full_path)
-            
-            if current_hash is None:
-                is_stale = True
-                reason = "source_missing"
-            elif current_hash != stored_hash:
-                is_stale = True
-                reason = "hash_mismatch"
-        
-        if is_stale:
-            stale_entries.append({
-                **entry,
-                "status": "stale",
-                "reason": reason
-            })
-        else:
-            fresh_entries.append({**entry, "status": "fresh"})
-    
-    # Compute summary
-    total = len(entries)
-    stale_count = len(stale_entries)
-    fresh_count = len(fresh_entries)
-    
-    # Group stale entries by reason
-    stale_by_reason = {}
-    for entry in stale_entries:
-        reason = entry.get("reason", "unknown")
-        if reason not in stale_by_reason:
-            stale_by_reason[reason] = 0
-        stale_by_reason[reason] += 1
-    
-    return {
-        "timestamp": datetime.now(timezone.utc).isoformat(),
-        "total_entries": total,
-        "stale_entries": stale_entries,
-        "fresh_entries": fresh_entries,
-        "git_changes": git_changes,
-        "summary": {
-            "total": total,
-            "stale": stale_count,
-            "fresh": fresh_count,
-            "stale_percentage": round(stale_count / total * 100, 1) if total > 0 else 0,
-            "stale_by_reason": stale_by_reason,
-            "git_changes_summary": {
-                "modified": len(git_changes["modified"]),
-                "added": len(git_changes["added"]),
-                "deleted": len(git_changes["deleted"])
-            }
-        }
-    }
-
-
-def update_stale_hashes(knowledge_dir: str, repo_root: str = ".") -> int:
-    """
-    Update hashes for stale entries. Returns count of updated entries.
-    """
-    entries = load_knowledge_entries(knowledge_dir)
-    updated = 0
-    
-    # This is a simplified version - in practice, you'd need to
-    # write back to the specific YAML files
-    for entry in entries:
-        source_file = entry.get("source_file")
-        if not source_file:
-            continue
-        
-        full_path = os.path.join(repo_root, source_file)
-        current_hash = compute_file_hash(full_path)
-        
-        if current_hash and entry.get("source_hash") != current_hash:
-            # Mark for update (in practice, you'd write back to the file)
-            updated += 1
-    
-    return updated
-
-
-def format_report(result: Dict[str, Any], max_items: int = 20) -> str:
-    """Format freshness check results as a human-readable report."""
-    timestamp = result["timestamp"]
-    summary = result["summary"]
-    stale_entries = result["stale_entries"]
-    git_changes = result["git_changes"]
-    
-    lines = [
-        "Knowledge Freshness Report",
-        "=" * 50,
-        f"Generated: {timestamp}",
-        f"Total entries: {summary['total']}",
-        f"Stale entries: {summary['stale']} ({summary['stale_percentage']}%)",
-        f"Fresh entries: {summary['fresh']}",
-        ""
-    ]
-    
-    # Git changes summary
-    lines.extend([
-        "Git Changes (last 24h):",
-        f"  Modified: {len(git_changes['modified'])} files",
-        f"  Added: {len(git_changes['added'])} files",
-        f"  Deleted: {len(git_changes['deleted'])} files",
-        ""
-    ])
-    
-    # Stale entries by reason
-    if summary.get("stale_by_reason"):
-        lines.extend([
-            "Stale Entries by Reason:",
-            ""
-        ])
-        for reason, count in summary["stale_by_reason"].items():
-            lines.append(f"  {reason}: {count}")
-        lines.append("")
-    
-    # List stale entries
-    if stale_entries:
-        lines.extend([
-            "Stale Entries:",
-            ""
-        ])
-        for i, entry in enumerate(stale_entries[:max_items], 1):
-            source = entry.get("source_file", "?")
-            reason = entry.get("reason", "unknown")
-            fact = entry.get("fact", "")[:60]
-            lines.append(f"{i:2d}. [{reason}] {source}")
-            if fact:
-                lines.append(f"    {fact}")
-        
-        if len(stale_entries) > max_items:
-            lines.append(f"\n... and {len(stale_entries) - max_items} more")
-    else:
-        lines.append("No stale entries found. All knowledge is fresh!")
-    
-    return "\n".join(lines)
-
-
-def main():
-    parser = argparse.ArgumentParser(
-        description="Knowledge Freshness Cron — detect stale entries from code changes")
-    parser.add_argument("--knowledge-dir", required=True,
-                       help="Path to knowledge directory")
-    parser.add_argument("--repo", default=".",
-                       help="Path to repository for git change detection")
-    parser.add_argument("--days", type=int, default=1,
-                       help="Number of days to check for git changes (default: 1)")
-    parser.add_argument("--json", action="store_true",
-                       help="Output as JSON instead of human-readable")
-    parser.add_argument("--max", type=int, default=20,
-                       help="Maximum stale entries to show (default: 20)")
-    parser.add_argument("--auto-reextract", action="store_true",
-                       help="Auto-re-extract knowledge for stale entries")
-    
-    args = parser.parse_args()
-    
-    if not os.path.isdir(args.knowledge_dir):
-        print(f"Error: {args.knowledge_dir} is not a directory", file=sys.stderr)
-        sys.exit(1)
-    
-    if not os.path.isdir(args.repo):
-        print(f"Error: {args.repo} is not a directory", file=sys.stderr)
-        sys.exit(1)
-    
-    result = check_freshness(args.knowledge_dir, args.repo, args.days)
-    
-    if args.json:
-        print(json.dumps(result, indent=2))
-    else:
-        print(format_report(result, args.max))
-    
-    # Auto-re-extract if requested
-    if args.auto_reextract and result["stale_entries"]:
-        print(f"\nAuto-re-extracting {len(result['stale_entries'])} stale entries...")
-        # In a real implementation, this would call the harvester
-        print("(Auto-re-extraction not yet implemented)")
-
-
-if __name__ == "__main__":
-    main()
--- a/scripts/gitea_issue_parser.py
+++ b/scripts/gitea_issue_parser.py
@@ -1,131 +0,0 @@
-#!/usr/bin/env python3
-"""
-Gitea Issue Body Parser — Extract structured data from markdown issue bodies.
-
-Usage:
-    cat issue_body.txt | python3 scripts/gitea_issue_parser.py --stdin --pretty
-    python3 scripts/gitea_issue_parser.py --url https://forge.../api/v1/repos/.../issues/123 --pretty
-    python3 scripts/gitea_issue_parser.py body.txt --title "Fix thing (#42)" --labels pipeline extraction
-"""
-
-import argparse
-import json
-import re
-import sys
-from typing import Dict, List, Any, Optional
-
-
-def parse_issue_body(body: str, title: str = "", labels: List[str] = None) -> Dict[str, Any]:
-    """Parse a Gitea issue markdown body into structured JSON.
-    
-    Extracted fields:
-    - title: Issue title
-    - context: Background/description section
-    - criteria[]: Acceptance criteria (checkboxes or numbered lists)
-    - labels[]: Issue labels
-    - epic_ref: Parent/epic issue reference (from "Closes #N" or title)
-    - sections{}: All ## sections as key-value pairs
-    """
-    result = {
-        "title": title,
-        "context": "",
-        "criteria": [],
-        "labels": labels or [],
-        "epic_ref": None,
-        "sections": {},
-    }
-    
-    if not body:
-        return result
-    
-    # Extract epic reference from title or body
-    epic_patterns = [
-        r"(?:closes|fixes|addresses|refs?)\s+#(\d+)",
-        r"#(\d+)",
-    ]
-    for pattern in epic_patterns:
-        match = re.search(pattern, (title + " " + body).lower())
-        if match:
-            result["epic_ref"] = int(match.group(1))
-            break
-    
-    # Parse ## sections
-    section_pattern = r"^##\s+(.+?)$\n((?:^(?!##\s).*$\n?)*)"
-    for match in re.finditer(section_pattern, body, re.MULTILINE):
-        section_name = match.group(1).strip().lower().replace(" ", "_")
-        section_content = match.group(2).strip()
-        result["sections"][section_name] = section_content
-    
-    # Extract acceptance criteria (checkboxes)
-    checkbox_pattern = r"^\s*-\s*\[([ xX])\]\s*(.+)$"
-    for match in re.finditer(checkbox_pattern, body, re.MULTILINE):
-        checked = match.group(1).lower() == "x"
-        text = match.group(2).strip()
-        result["criteria"].append({"text": text, "checked": checked})
-    
-    # If no checkboxes, try numbered lists in "Acceptance Criteria" or "Criteria" section
-    if not result["criteria"]:
-        for section_name in ["acceptance_criteria", "criteria", "acceptance criteria"]:
-            if section_name in result["sections"]:
-                numbered = r"^\s*\d+\.\s*(.+)$"
-                for match in re.finditer(numbered, result["sections"][section_name], re.MULTILINE):
-                    result["criteria"].append({"text": match.group(1).strip(), "checked": False})
-                break
-    
-    # Extract context (first section or first paragraph before any ## heading)
-    first_heading = body.find("## ")
-    if first_heading > 0:
-        context_text = body[:first_heading].strip()
-    else:
-        context_text = body.split("\n\n")[0].strip()
-    # Clean up: remove "## Context" or "## Problem" header if present
-    context_text = re.sub(r"^#+\s*\w+\s*\n?", "", context_text).strip()
-    result["context"] = context_text[:500]  # Cap at 500 chars
-    
-    return result
-
-
-def fetch_issue_from_url(url: str) -> Dict[str, Any]:
-    """Fetch an issue from a Gitea API URL and parse it."""
-    import urllib.request
-    req = urllib.request.Request(url, headers={"Accept": "application/json"})
-    with urllib.request.urlopen(req) as resp:
-        data = json.loads(resp.read())
-    
-    return parse_issue_body(
-        body=data.get("body", ""),
-        title=data.get("title", ""),
-        labels=[l["name"] for l in data.get("labels", [])]
-    )
-
-
-def main():
-    parser = argparse.ArgumentParser(description="Parse Gitea issue markdown into structured JSON")
-    parser.add_argument("file", nargs="?", help="Issue body file (or use --stdin)")
-    parser.add_argument("--stdin", action="store_true", help="Read from stdin")
-    parser.add_argument("--url", help="Gitea API URL to fetch issue from")
-    parser.add_argument("--title", default="", help="Issue title")
-    parser.add_argument("--labels", nargs="*", default=[], help="Issue labels")
-    parser.add_argument("--pretty", action="store_true", help="Pretty-print JSON output")
-    
-    args = parser.parse_args()
-    
-    if args.url:
-        result = fetch_issue_from_url(args.url)
-    elif args.stdin:
-        body = sys.stdin.read()
-        result = parse_issue_body(body, args.title, args.labels)
-    elif args.file:
-        with open(args.file) as f:
-            body = f.read()
-        result = parse_issue_body(body, args.title, args.labels)
-    else:
-        parser.print_help()
-        sys.exit(1)
-    
-    indent = 2 if args.pretty else None
-    print(json.dumps(result, indent=indent))
-
-
-if __name__ == "__main__":
-    main()
--- a/scripts/improvement_proposals.py
+++ b/scripts/improvement_proposals.py
@@ -1,451 +0,0 @@
-#!/usr/bin/env python3
-"""
-Improvement Proposal Generator for compounding-intelligence.
-
-Analyzes fleet session data to identify waste patterns and generates
-concrete improvement proposals with ROI estimates.
-
-Input: Session analytics JSON (from fleet metrics or session database)
-Output: Markdown proposal document + JSON proposals
-
-Usage:
-    python3 scripts/improvement_proposals.py --input analytics.json
-    python3 scripts/improvement_proposals.py --input analytics.json --format json
-    python3 scripts/improvement_proposals.py --input analytics.json --output proposals.md
-    python3 scripts/improvement_proposals.py --input analytics.json --threshold 5.0
-"""
-
-import argparse
-import json
-import os
-import sys
-from collections import defaultdict
-from dataclasses import dataclass, field, asdict
-from datetime import datetime, timezone
-from typing import Any, Dict, List, Optional
-
-
-# ── Data types ─────────────────────────────────────────────────
-
-@dataclass
-class WastePattern:
-    """A detected waste pattern in fleet sessions."""
-    pattern_type: str       # "repeated_error", "manual_process", "slow_tool", "failed_retry"
-    description: str
-    occurrences: int
-    total_time_hours: float
-    affected_repos: List[str] = field(default_factory=list)
-    example_sessions: List[str] = field(default_factory=list)
-
-
-@dataclass
-class ImprovementProposal:
-    """A concrete improvement proposal with ROI estimate."""
-    title: str
-    problem: str
-    proposed_solution: str
-    estimated_weekly_hours_saved: float
-    estimated_monthly_hours_saved: float
-    implementation_hours: float
-    roi_weeks: float           # weeks to break even
-    priority: str              # "critical", "high", "medium", "low"
-    affected_area: str
-    supporting_evidence: List[str] = field(default_factory=list)
-
-
-# ── Session analysis ──────────────────────────────────────────
-
-def analyze_sessions(sessions: List[dict]) -> List[WastePattern]:
-    """
-    Analyze session data to find waste patterns.
-
-    Looks for:
-    - Repeated errors (same error across sessions)
-    - Manual processes (long sequences of similar actions)
-    - Slow tools (tools with high latency)
-    - Failed retries (multiple attempts at same task)
-    """
-    patterns = []
-
-    # Track error frequency across sessions
-    error_counts: Dict[str, List[dict]] = defaultdict(list)
-    tool_latencies: Dict[str, List[float]] = defaultdict(list)
-    retry_counts: Dict[str, int] = defaultdict(int)
-    manual_sequences: List[dict] = []
-
-    for session in sessions:
-        session_id = session.get("session_id", "unknown")
-        repo = session.get("repo", "global")
-        messages = session.get("messages", [])
-        errors = session.get("errors", [])
-        tool_calls = session.get("tool_calls", [])
-        duration_min = session.get("duration_minutes", 0)
-
-        # 1. Repeated errors
-        for err in errors:
-            err_key = _normalize_error(err.get("message", ""))
-            if err_key:
-                error_counts[err_key].append({
-                    "session_id": session_id,
-                    "repo": repo,
-                })
-
-        # 2. Tool latency tracking
-        for tc in tool_calls:
-            tool_name = tc.get("tool", "unknown")
-            latency = tc.get("latency_ms", 0)
-            if latency > 0:
-                tool_latencies[tool_name].append(latency)
-
-        # 3. Failed retries (same tool called 3+ times in sequence)
-        prev_tool = None
-        streak = 0
-        for tc in tool_calls:
-            tool_name = tc.get("tool", "unknown")
-            if tool_name == prev_tool:
-                streak += 1
-            else:
-                if streak >= 3:
-                    retry_counts[prev_tool] += 1
-                streak = 1
-            prev_tool = tool_name
-        if streak >= 3:
-            retry_counts[prev_tool] += 1
-
-        # 4. Manual processes (10+ sequential tool calls with no automation)
-        if len(tool_calls) > 10:
-            tool_sequence = [tc.get("tool") for tc in tool_calls]
-            unique_tools = set(tool_sequence)
-            if len(unique_tools) <= 3 and len(tool_calls) > 10:
-                manual_sequences.append({
-                    "session_id": session_id,
-                    "repo": repo,
-                    "tool_count": len(tool_calls),
-                    "unique_tools": list(unique_tools),
-                    "duration_min": duration_min,
-                })
-
-    # Generate patterns from collected data
-
-    # Repeated errors (appearing in 3+ sessions)
-    for err_key, occurrences in error_counts.items():
-        if len(occurrences) >= 3:
-            repos = list(set(o["repo"] for o in occurrences))
-            sessions_list = [o["session_id"] for o in occurrences[:5]]
-            # Estimate time wasted: 5 min per error occurrence
-            hours = len(occurrences) * 5 / 60
-            patterns.append(WastePattern(
-                pattern_type="repeated_error",
-                description=f"Error: {err_key[:100]}",
-                occurrences=len(occurrences),
-                total_time_hours=round(hours, 1),
-                affected_repos=repos,
-                example_sessions=sessions_list,
-            ))
-
-    # Slow tools (avg latency > 5000ms across 5+ calls)
-    for tool, latencies in tool_latencies.items():
-        if len(latencies) >= 5:
-            avg_ms = sum(latencies) / len(latencies)
-            if avg_ms > 5000:
-                hours = sum(latencies) / 1000 / 3600
-                patterns.append(WastePattern(
-                    pattern_type="slow_tool",
-                    description=f"Tool '{tool}' averages {avg_ms:.0f}ms latency",
-                    occurrences=len(latencies),
-                    total_time_hours=round(hours, 1),
-                    affected_repos=["global"],
-                    example_sessions=[],
-                ))
-
-    # Failed retries
-    for tool, count in retry_counts.items():
-        if count >= 2:
-            hours = count * 10 / 60  # ~10 min per failed retry sequence
-            patterns.append(WastePattern(
-                pattern_type="failed_retry",
-                description=f"Tool '{tool}' had {count} retry sequences (3+ consecutive calls)",
-                occurrences=count,
-                total_time_hours=round(hours, 1),
-                affected_repos=["global"],
-                example_sessions=[],
-            ))
-
-    # Manual processes
-    for seq in manual_sequences:
-        hours = seq["duration_min"] / 60
-        patterns.append(WastePattern(
-            pattern_type="manual_process",
-            description=f"Session {seq['session_id'][:12]}: {seq['tool_count']} tool calls with only {len(seq['unique_tools'])} unique tools",
-            occurrences=1,
-            total_time_hours=round(hours, 1),
-            affected_repos=[seq["repo"]],
-            example_sessions=[seq["session_id"]],
-        ))
-
-    return sorted(patterns, key=lambda p: p.total_time_hours, reverse=True)
-
-
-def _normalize_error(message: str) -> str:
-    """Normalize error message to a common key."""
-    if not message:
-        return ""
-    msg = message.lower().strip()
-    # Remove variable parts (paths, IDs, timestamps)
-    import re
-    msg = re.sub(r'/\S+', '/PATH', msg)
-    msg = re.sub(r'\b[0-9a-f]{8,}\b', 'HASH', msg)
-    msg = re.sub(r'\d{4}-\d{2}-\d{2}[tT]\d{2}:\d{2}', 'TIME', msg)
-    return msg[:150]
-
-
-# ── Proposal generation ───────────────────────────────────────
-
-def generate_proposals(
-    patterns: List[WastePattern],
-    hourly_rate: float = 50.0,
-    implementation_overhead: float = 1.5,
-) -> List[ImprovementProposal]:
-    """
-    Generate improvement proposals from waste patterns.
-
-    Args:
-        patterns: Detected waste patterns
-        hourly_rate: Developer hourly rate for ROI calculation
-        implementation_overhead: Multiplier for implementation time estimate
-    """
-    proposals = []
-
-    # Group patterns by type
-    by_type: Dict[str, List[WastePattern]] = defaultdict(list)
-    for p in patterns:
-        by_type[p.pattern_type].append(p)
-
-    # 1. Repeated errors → Create fix/skill
-    for p in by_type.get("repeated_error", []):
-        weekly_hours = p.total_time_hours / 4  # monthly → weekly
-        impl_hours = max(1.0, p.occurrences * 0.25) * implementation_overhead
-        roi_weeks = impl_hours / weekly_hours if weekly_hours > 0 else float('inf')
-
-        proposals.append(ImprovementProposal(
-            title=f"Fix repeated error: {p.description[:60]}",
-            problem=f"This error occurred {p.occurrences} times across {len(p.affected_repos)} repos, wasting ~{p.total_time_hours:.1f} hours.",
-            proposed_solution="Root-cause the error and create a permanent fix or mitigation skill.",
-            estimated_weekly_hours_saved=round(weekly_hours, 1),
-            estimated_monthly_hours_saved=round(p.total_time_hours, 1),
-            implementation_hours=round(impl_hours, 1),
-            roi_weeks=round(roi_weeks, 1),
-            priority=_priority_from_roi(roi_weeks),
-            affected_area="reliability",
-            supporting_evidence=[f"{p.occurrences} occurrences in sessions: {', '.join(p.example_sessions[:3])}"],
-        ))
-
-    # 2. Slow tools → Optimize or replace
-    for p in by_type.get("slow_tool", []):
-        weekly_hours = p.total_time_hours / 4
-        impl_hours = 3.0 * implementation_overhead  # optimization task
-        roi_weeks = impl_hours / weekly_hours if weekly_hours > 0 else float('inf')
-
-        proposals.append(ImprovementProposal(
-            title=f"Optimize slow tool: {p.description[:60]}",
-            problem=f"Tool has {p.occurrences} calls with high latency, wasting ~{p.total_time_hours:.1f} hours total.",
-            proposed_solution="Profile the tool, add caching, or replace with a faster alternative.",
-            estimated_weekly_hours_saved=round(weekly_hours, 1),
-            estimated_monthly_hours_saved=round(p.total_time_hours, 1),
-            implementation_hours=round(impl_hours, 1),
-            roi_weeks=round(roi_weeks, 1),
-            priority=_priority_from_roi(roi_weeks),
-            affected_area="performance",
-            supporting_evidence=[f"{p.occurrences} slow calls detected"],
-        ))
-
-    # 3. Failed retries → Add retry logic or validation
-    for p in by_type.get("failed_retry", []):
-        weekly_hours = p.total_time_hours / 4
-        impl_hours = 2.0 * implementation_overhead
-        roi_weeks = impl_hours / weekly_hours if weekly_hours > 0 else float('inf')
-
-        proposals.append(ImprovementProposal(
-            title=f"Reduce retries for tool '{p.description[:50]}'",
-            problem=f"Tool had {p.occurrences} retry sequences, wasting ~{p.total_time_hours:.1f} hours.",
-            proposed_solution="Add input validation, pre-flight checks, or automatic retry with backoff.",
-            estimated_weekly_hours_saved=round(weekly_hours, 1),
-            estimated_monthly_hours_saved=round(p.total_time_hours, 1),
-            implementation_hours=round(impl_hours, 1),
-            roi_weeks=round(roi_weeks, 1),
-            priority=_priority_from_roi(roi_weeks),
-            affected_area="reliability",
-            supporting_evidence=[f"{p.occurrences} retry sequences detected"],
-        ))
-
-    # 4. Manual processes → Automate
-    total_manual_hours = sum(p.total_time_hours for p in by_type.get("manual_process", []))
-    manual_patterns = by_type.get("manual_process", [])
-    if manual_patterns:
-        weekly_hours = total_manual_hours / 4
-        impl_hours = len(manual_patterns) * 2.0 * implementation_overhead
-        roi_weeks = impl_hours / weekly_hours if weekly_hours > 0 else float('inf')
-
-        proposals.append(ImprovementProposal(
-            title=f"Automate {len(manual_patterns)} manual processes",
-            problem=f"{len(manual_patterns)} sessions had long manual tool sequences, wasting ~{total_manual_hours:.1f} hours.",
-            proposed_solution="Create composite skills or scripts that combine the repeated tool sequences into single operations.",
-            estimated_weekly_hours_saved=round(weekly_hours, 1),
-            estimated_monthly_hours_saved=round(total_manual_hours, 1),
-            implementation_hours=round(impl_hours, 1),
-            roi_weeks=round(roi_weeks, 1),
-            priority=_priority_from_roi(roi_weeks),
-            affected_area="automation",
-            supporting_evidence=[f"{len(manual_patterns)} manual sessions detected"],
-        ))
-
-    return sorted(proposals, key=lambda p: p.estimated_monthly_hours_saved, reverse=True)
-
-
-def _priority_from_roi(roi_weeks: float) -> str:
-    """Determine priority from ROI break-even time."""
-    if roi_weeks <= 1:
-        return "critical"
-    elif roi_weeks <= 4:
-        return "high"
-    elif roi_weeks <= 12:
-        return "medium"
-    return "low"
-
-
-# ── Output formatting ─────────────────────────────────────────
-
-def format_proposals_markdown(
-    proposals: List[ImprovementProposal],
-    patterns: List[WastePattern],
-    generated_at: str,
-) -> str:
-    """Format proposals as a markdown document."""
-    lines = [
-        "# Improvement Proposals",
-        "",
-        f"Generated: {generated_at}",
-        f"Based on analysis of {sum(p.occurrences for p in patterns)} waste events across {len(set(r for p in patterns for r in p.affected_repos))} repos.",
-        "",
-        "---",
-        "",
-        "## Summary",
-        "",
-        f"| Metric | Value |",
-        f"|--------|-------|",
-        f"| Total proposals | {len(proposals)} |",
-        f"| Critical priority | {sum(1 for p in proposals if p.priority == 'critical')} |",
-        f"| Total monthly hours wasted | {sum(p.estimated_monthly_hours_saved for p in proposals):.1f}h |",
-        f"| Total weekly hours recoverable | {sum(p.estimated_weekly_hours_saved for p in proposals):.1f}h |",
-        f"| Implementation investment | {sum(p.implementation_hours for p in proposals):.1f}h |",
-        "",
-        "---",
-        "",
-    ]
-
-    for i, prop in enumerate(proposals, 1):
-        lines.extend([
-            f"## {i}. {prop.title}",
-            "",
-            f"**Priority:** {prop.priority.upper()}  ",
-            f"**Area:** {prop.affected_area}  ",
-            f"**ROI break-even:** {prop.roi_weeks:.1f} weeks",
-            "",
-            "### Problem",
-            "",
-            prop.problem,
-            "",
-            "### Proposed Solution",
-            "",
-            prop.proposed_solution,
-            "",
-            "### ROI Estimate",
-            "",
-            f"- Weekly hours saved: **{prop.estimated_weekly_hours_saved}h**",
-            f"- Monthly hours saved: **{prop.estimated_monthly_hours_saved}h**",
-            f"- Implementation time: **{prop.implementation_hours}h**",
-            f"- Break-even: **{prop.roi_weeks:.1f} weeks**",
-            "",
-        ])
-
-        if prop.supporting_evidence:
-            lines.extend([
-                "### Evidence",
-                "",
-            ])
-            for ev in prop.supporting_evidence:
-                lines.append(f"- {ev}")
-            lines.append("")
-
-        lines.extend(["---", ""])
-
-    # Waste pattern appendix
-    lines.extend([
-        "## Appendix: Detected Waste Patterns",
-        "",
-        "| Type | Description | Occurrences | Hours |",
-        "|------|-------------|-------------|-------|",
-    ])
-    for p in patterns[:20]:
-        lines.append(
-            f"| {p.pattern_type} | {p.description[:60]} | {p.occurrences} | {p.total_time_hours}h |"
-        )
-    lines.append("")
-
-    return "\n".join(lines)
-
-
-def format_proposals_json(proposals: List[ImprovementProposal]) -> str:
-    """Format proposals as JSON."""
-    return json.dumps(
-        [asdict(p) for p in proposals],
-        indent=2,
-        default=str,
-    )
-
-
-# ── Main ──────────────────────────────────────────────────────
-
-def main():
-    parser = argparse.ArgumentParser(description="Generate improvement proposals from session analytics")
-    parser.add_argument("--input", required=True, help="Path to session analytics JSON file")
-    parser.add_argument("--output", help="Output file path (default: stdout)")
-    parser.add_argument("--format", choices=["markdown", "json"], default="markdown", help="Output format")
-    parser.add_argument("--hourly-rate", type=float, default=50.0, help="Developer hourly rate for ROI calc")
-    parser.add_argument("--threshold", type=float, default=2.0, help="Min monthly hours to include proposal")
-
-    args = parser.parse_args()
-
-    with open(args.input) as f:
-        data = json.load(f)
-
-    sessions = data if isinstance(data, list) else data.get("sessions", [])
-
-    if not sessions:
-        print("No sessions found in input file.", file=sys.stderr)
-        sys.exit(1)
-
-    # Analyze
-    patterns = analyze_sessions(sessions)
-    proposals = generate_proposals(patterns, hourly_rate=args.hourly_rate)
-
-    # Filter by threshold
-    proposals = [p for p in proposals if p.estimated_monthly_hours_saved >= args.threshold]
-
-    generated_at = datetime.now(timezone.utc).isoformat()
-
-    if args.format == "markdown":
-        output = format_proposals_markdown(proposals, patterns, generated_at)
-    else:
-        output = format_proposals_json(proposals)
-
-    if args.output:
-        with open(args.output, "w") as f:
-            f.write(output)
-        print(f"Wrote {len(proposals)} proposals to {args.output}", file=sys.stderr)
-    else:
-        print(output)
-
-
-if __name__ == "__main__":
-    main()
--- a/scripts/knowledge_gap_identifier.py
+++ b/scripts/knowledge_gap_identifier.py
@@ -1,275 +0,0 @@
-"""
-Knowledge Gap Identifier — Pipeline 10.7
-
-Cross-references code, docs, and tests to find gaps:
- Undocumented functions/classes
- Untested code paths
- Documented but missing implementations
- Test files without corresponding source
-
-Produces a gap report with severity and suggestions.
-"""
-
-from __future__ import annotations
-
-import ast
-import os
-import re
-from dataclasses import dataclass, field
-from enum import Enum
-from pathlib import Path
-from typing import Dict, List, Optional, Set
-
-
-class GapSeverity(Enum):
-    INFO = "info"
-    WARNING = "warning"
-    ERROR = "error"
-
-
-class GapType(Enum):
-    UNDOCUMENTED = "undocumented"
-    UNTESTED = "untested"
-    MISSING_IMPLEMENTATION = "missing_implementation"
-    ORPHAN_TEST = "orphan_test"
-    STALE_DOC = "stale_doc"
-
-
-@dataclass
-class Gap:
-    """A single knowledge gap."""
-    gap_type: GapType
-    severity: GapSeverity
-    file: str
-    line: Optional[int]
-    name: str
-    description: str
-    suggestion: str
-
-
-@dataclass
-class GapReport:
-    """Full gap analysis report."""
-    repo_path: str
-    gaps: List[Gap] = field(default_factory=list)
-    stats: Dict[str, int] = field(default_factory=dict)
-
-    def summary(self) -> str:
-        lines = [f"Gap Report for {self.repo_path}", "=" * 40]
-        by_type = {}
-        for g in self.gaps:
-            by_type.setdefault(g.gap_type.value, []).append(g)
-
-        for gtype, items in sorted(by_type.items()):
-            lines.append(f"\n{gtype.upper()} ({len(items)}):")
-            for g in items:
-                loc = f"{g.file}:{g.line}" if g.line else g.file
-                lines.append(f"  [{g.severity.value}] {g.name} @ {loc}")
-                lines.append(f"    {g.description}")
-
-        lines.append(f"\nTotal gaps: {len(self.gaps)}")
-        self.stats = {k: len(v) for k, v in by_type.items()}
-        return "\n".join(lines)
-
-    def to_dict(self) -> dict:
-        return {
-            "repo_path": self.repo_path,
-            "total_gaps": len(self.gaps),
-            "stats": {k: len(v) for k, v in
-                      {gt: [g for g in self.gaps if g.gap_type == gt]
-                       for gt in GapType}.items() if v},
-            "gaps": [
-                {
-                    "type": g.gap_type.value,
-                    "severity": g.severity.value,
-                    "file": g.file,
-                    "line": g.line,
-                    "name": g.name,
-                    "description": g.description,
-                    "suggestion": g.suggestion,
-                }
-                for g in self.gaps
-            ],
-        }
-
-
-def _collect_python_files(root: Path) -> List[Path]:
-    """Collect .py files, excluding venv/node_modules/.git."""
-    skip = {".git", "venv", "env", ".venv", "node_modules", "__pycache__", ".tox", ".mypy_cache"}
-    files = []
-    for dirpath, dirnames, filenames in os.walk(root):
-        dirnames[:] = [d for d in dirnames if d not in skip]
-        for f in filenames:
-            if f.endswith(".py"):
-                files.append(Path(dirpath) / f)
-    return files
-
-
-def _extract_python_symbols(filepath: Path) -> Set[str]:
-    """Extract top-level function and class names from a Python file."""
-    symbols = set()
-    try:
-        source = filepath.read_text(encoding="utf-8", errors="replace")
-        tree = ast.parse(source, filename=str(filepath))
-    except (SyntaxError, UnicodeDecodeError):
-        return symbols
-
-    for node in ast.iter_child_nodes(tree):
-        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
-            symbols.add(node.name)
-    return symbols
-
-
-def _extract_doc_symbols(filepath: Path) -> Set[str]:
-    """Extract function/class names mentioned in markdown docs."""
-    symbols = set()
-    try:
-        text = filepath.read_text(encoding="utf-8", errors="replace")
-    except (UnicodeDecodeError, OSError):
-        return symbols
-
-    # Match backtick-quoted identifiers: `ClassName`, `func_name`, `func()`
-    for m in re.finditer(r"`([A-Za-z_]\w+)(?:\(\))?`", text):
-        symbols.add(m.group(1))
-    # Match ## ClassName or ### func_name headings
-    for m in re.finditer(r"^#{1,4}\s+(\w+)", text, re.MULTILINE):
-        symbols.add(m.group(1))
-    return symbols
-
-
-def _collect_test_files(root: Path) -> Dict[str, Path]:
-    """Map test module names to their file paths."""
-    test_map = {}
-    for dirpath, dirnames, filenames in os.walk(root):
-        dirnames[:] = [d for d in dirnames if d not in {".git", "venv", "node_modules"}]
-        for f in filenames:
-            if f.startswith("test_") and f.endswith(".py"):
-                # test_foo.py -> foo
-                module_name = f[5:-3]
-                test_map[module_name] = Path(dirpath) / f
-    return test_map
-
-
-class KnowledgeGapIdentifier:
-    """Analyzes a repo for knowledge gaps between code, docs, and tests."""
-
-    def analyze(self, repo_path: str) -> GapReport:
-        root = Path(repo_path).resolve()
-        report = GapReport(repo_path=str(root))
-
-        if not root.is_dir():
-            report.gaps.append(Gap(
-                gap_type=GapType.UNDOCUMENTED,
-                severity=GapSeverity.ERROR,
-                file=str(root),
-                line=None,
-                name="repo",
-                description="Path is not a directory",
-                suggestion="Provide a valid repo directory",
-            ))
-            return report
-
-        # Collect artifacts
-        py_files = _collect_python_files(root)
-        doc_files = list(root.glob("docs/**/*.md")) + list(root.glob("*.md"))
-        test_map = _collect_test_files(root / "tests") if (root / "tests").is_dir() else {}
-
-        # Extract symbols from each source file
-        source_symbols: Dict[str, Set[str]] = {}  # relative_path -> symbols
-        all_source_symbols: Set[str] = set()
-
-        for pf in py_files:
-            rel = str(pf.relative_to(root))
-            # Skip test files and setup/config
-            if "/tests/" in rel or rel.startswith("tests/") or rel.startswith("test_"):
-                continue
-            if pf.name in ("setup.py", "conftest.py", "conf.py"):
-                continue
-
-            syms = _extract_python_symbols(pf)
-            if syms:
-                source_symbols[rel] = syms
-                all_source_symbols.update(syms)
-
-        # Extract documented symbols
-        doc_symbols: Set[str] = set()
-        for df in doc_files:
-            doc_symbols.update(_extract_doc_symbols(df))
-
-        # Extract test-covered symbols
-        tested_modules: Set[str] = set(test_map.keys())
-
-        # --- Find gaps ---
-
-        # 1. Undocumented: source symbols not in any doc
-        for rel_path, syms in source_symbols.items():
-            for sym in sorted(syms):
-                if sym.startswith("_") and not sym.startswith("__"):
-                    continue  # Skip private
-                if sym not in doc_symbols:
-                    report.gaps.append(Gap(
-                        gap_type=GapType.UNDOCUMENTED,
-                        severity=GapSeverity.WARNING,
-                        file=rel_path,
-                        line=None,
-                        name=sym,
-                        description=f"{sym} defined in {rel_path} but not referenced in any docs",
-                        suggestion=f"Add documentation for {sym} in a .md file",
-                    ))
-
-        # 2. Untested: source modules without a corresponding test file
-        for rel_path in source_symbols:
-            module_name = Path(rel_path).stem
-            if module_name not in tested_modules and module_name not in ("__init__", "main", "config"):
-                report.gaps.append(Gap(
-                    gap_type=GapType.UNTESTED,
-                    severity=GapSeverity.ERROR,
-                    file=rel_path,
-                    line=None,
-                    name=module_name,
-                    description=f"No test file found for {rel_path}",
-                    suggestion=f"Create tests/test_{module_name}.py",
-                ))
-
-        # 3. Missing implementation: doc references symbol not in any source
-        referenced_but_missing = doc_symbols - all_source_symbols
-        for sym in sorted(referenced_but_missing):
-            # Filter out common non-code terms
-            if sym.lower() in {"todo", "fixme", "note", "example", "usage", "api",
-                               "install", "setup", "config", "license", "contributing",
-                               "changelog", "readme", "python", "bash", "json", "yaml",
-                               "http", "url", "cli", "gui", "ui", "api", "rest"}:
-                continue
-            if len(sym) < 3:
-                continue
-            report.gaps.append(Gap(
-                gap_type=GapType.MISSING_IMPLEMENTATION,
-                severity=GapSeverity.INFO,
-                file="(docs)",
-                line=None,
-                name=sym,
-                description=f"{sym} referenced in docs but not found in source code",
-                suggestion=f"Verify if {sym} should be implemented or update docs",
-            ))
-
-        # 4. Orphan tests: test files without matching source
-        for test_mod, test_path in test_map.items():
-            if test_mod not in tested_modules and not any(
-                test_mod in Path(f).stem for f in source_symbols
-            ):
-                # Check if any source file partially matches
-                matches_source = any(test_mod.replace("_", "-") in f or test_mod.replace("_", "") in Path(f).stem
-                                     for f in source_symbols)
-                if not matches_source:
-                    rel = str(test_path.relative_to(root))
-                    report.gaps.append(Gap(
-                        gap_type=GapType.ORPHAN_TEST,
-                        severity=GapSeverity.WARNING,
-                        file=rel,
-                        line=None,
-                        name=test_mod,
-                        description=f"Test file {rel} exists but no matching source module found",
-                        suggestion=f"Verify if the source was renamed or removed",
-                    ))
-
-        return report
--- a/scripts/knowledge_staleness_check.py
+++ b/scripts/knowledge_staleness_check.py
@@ -1,131 +0,0 @@
-#!/usr/bin/env python3
-"""
-Knowledge Store Staleness Detector — Detect stale knowledge entries by comparing source file hashes.
-
-Usage:
-    python3 scripts/knowledge_staleness_check.py --index knowledge/index.json
-    python3 scripts/knowledge_staleness_check.py --index knowledge/index.json --json
-    python3 scripts/knowledge_staleness_check.py --index knowledge/index.json --fix
-"""
-
-import argparse
-import hashlib
-import json
-import os
-import sys
-from datetime import datetime, timezone
-from pathlib import Path
-from typing import Dict, List, Any, Optional
-
-
-def compute_file_hash(filepath: str) -> Optional[str]:
-    """Compute SHA-256 hash of a file. Returns None if file doesn't exist."""
-    try:
-        with open(filepath, "rb") as f:
-            return "sha256:" + hashlib.sha256(f.read()).hexdigest()
-    except (FileNotFoundError, IsADirectoryError, PermissionError):
-        return None
-
-
-def check_staleness(index_path: str, repo_root: str = ".") -> List[Dict[str, Any]]:
-    """Check all entries in knowledge index for staleness.
-    
-    Returns list of entries with staleness info:
-    - status: "fresh" | "stale" | "missing_source" | "no_hash"
-    - current_hash: computed hash (if source exists)
-    - stored_hash: hash from index
-    """
-    with open(index_path) as f:
-        data = json.load(f)
-
-    facts = data.get("facts", [])
-    results = []
-
-    for entry in facts:
-        source_file = entry.get("source_file")
-        stored_hash = entry.get("source_hash")
-
-        if not source_file:
-            results.append({**entry, "status": "no_source", "current_hash": None})
-            continue
-
-        full_path = os.path.join(repo_root, source_file)
-        current_hash = compute_file_hash(full_path)
-
-        if current_hash is None:
-            results.append({**entry, "status": "missing_source", "current_hash": None})
-        elif not stored_hash:
-            results.append({**entry, "status": "no_hash", "current_hash": current_hash})
-        elif current_hash != stored_hash:
-            results.append({**entry, "status": "stale", "current_hash": current_hash})
-        else:
-            results.append({**entry, "status": "fresh", "current_hash": current_hash})
-
-    return results
-
-
-def fix_hashes(index_path: str, repo_root: str = ".") -> int:
-    """Add hashes to entries missing them. Returns count of fixed entries."""
-    with open(index_path) as f:
-        data = json.load(f)
-
-    fixed = 0
-    for entry in data.get("facts", []):
-        if entry.get("source_hash"):
-            continue
-        source_file = entry.get("source_file")
-        if not source_file:
-            continue
-        full_path = os.path.join(repo_root, source_file)
-        h = compute_file_hash(full_path)
-        if h:
-            entry["source_hash"] = h
-            fixed += 1
-
-    with open(index_path, "w") as f:
-        json.dump(data, f, indent=2)
-
-    return fixed
-
-
-def main():
-    parser = argparse.ArgumentParser(description="Check knowledge store staleness")
-    parser.add_argument("--index", required=True, help="Path to knowledge/index.json")
-    parser.add_argument("--repo", default=".", help="Repo root for source file resolution")
-    parser.add_argument("--json", action="store_true", help="Output as JSON")
-    parser.add_argument("--fix", action="store_true", help="Add hashes to entries missing them")
-
-    args = parser.parse_args()
-
-    if args.fix:
-        fixed = fix_hashes(args.index, args.repo)
-        print(f"Fixed {fixed} entries with missing hashes.")
-        return
-
-    results = check_staleness(args.index, args.repo)
-
-    if args.json:
-        print(json.dumps(results, indent=2))
-    else:
-        stale = [r for r in results if r["status"] != "fresh"]
-        fresh = [r for r in results if r["status"] == "fresh"]
-
-        print(f"Knowledge Store Staleness Check")
-        print(f"  Total entries: {len(results)}")
-        print(f"  Fresh: {len(fresh)}")
-        print(f"  Stale/Issues: {len(stale)}")
-        print()
-
-        if stale:
-            print("Issues found:")
-            for r in stale:
-                status = r["status"]
-                fact = r.get("fact", "?")[:60]
-                source = r.get("source_file", "?")
-                print(f"  [{status}] {source}: {fact}")
-        else:
-            print("All entries are fresh!")
-
-
-if __name__ == "__main__":
-    main()
--- a/scripts/perf_bottleneck_finder.py
+++ b/scripts/perf_bottleneck_finder.py
@@ -1,551 +0,0 @@
-#!/usr/bin/env python3
-"""
-Performance Bottleneck Finder — Identify slow tests, builds, and CI steps.
-
-Analyzes:
-  1. Pytest output for slow tests
-  2. Build logs for slow steps
-  3. CI workflow durations
-  4. File system for large/slow artifacts
-
-Usage:
-    python3 scripts/perf_bottleneck_finder.py --repo /path/to/repo
-    python3 scripts/perf_bottleneck_finder.py --repo /path/to/repo --json
-    python3 scripts/perf_bottleneck_finder.py --repo /path/to/repo --report metrics/perf_report.md
-
-Weekly cron:
-    0 9 * * 1 cd /path/to/compounding-intelligence && python3 scripts/perf_bottleneck_finder.py --repo /path/to/target --report metrics/perf_report.md
-"""
-
-import argparse
-import json
-import os
-import re
-import subprocess
-import sys
-from collections import defaultdict
-from dataclasses import dataclass, field, asdict
-from datetime import datetime, timezone
-from pathlib import Path
-from typing import Any, Dict, List, Optional, Tuple
-
-
-# ── Configuration ──────────────────────────────────────────────────
-
-SLOW_TEST_THRESHOLD_S = 2.0      # Tests slower than this are flagged
-SLOW_BUILD_STEP_THRESHOLD_S = 10.0
-TOP_N_BOTTLENECKS = 10           # Report top N bottlenecks
-PYTEST_DURATIONS_COUNT = 20      # Number of slow tests to collect
-LOG_EXTENSIONS = {".log", ".txt"}
-
-
-@dataclass
-class Bottleneck:
-    """A single performance bottleneck."""
-    category: str          # "test", "build", "ci", "artifact", "import"
-    name: str              # What's slow
-    duration_s: float      # How long it takes
-    severity: str          # "critical", "warning", "info"
-    recommendation: str    # How to fix
-    file_path: Optional[str] = None
-    line_number: Optional[int] = None
-
-
-@dataclass
-class PerfReport:
-    """Full performance report."""
-    timestamp: str
-    repo_path: str
-    bottlenecks: List[Bottleneck] = field(default_factory=list)
-    summary: Dict[str, Any] = field(default_factory=dict)
-    test_stats: Dict[str, Any] = field(default_factory=dict)
-    build_stats: Dict[str, Any] = field(default_factory=dict)
-    ci_stats: Dict[str, Any] = field(default_factory=dict)
-
-    def to_dict(self) -> dict:
-        d = asdict(self)
-        return d
-
-
-# ── Test Analysis ──────────────────────────────────────────────────
-
-def find_slow_tests_pytest(repo_path: str) -> List[Bottleneck]:
-    """Run pytest --durations and parse slow tests."""
-    bottlenecks = []
-
-    # Try to run pytest with durations
-    try:
-        result = subprocess.run(
-            ["python3", "-m", "pytest", "--co", "-q", "--durations=0"],
-            cwd=repo_path, capture_output=True, text=True, timeout=30
-        )
-        # If tests exist, try to get durations from last run
-        durations_file = os.path.join(repo_path, ".pytest_cache", "v", "cache", "durations")
-        if os.path.exists(durations_file):
-            with open(durations_file) as f:
-                for line in f:
-                    parts = line.strip().split()
-                    if len(parts) >= 2:
-                        try:
-                            duration = float(parts[0])
-                            test_name = " ".join(parts[1:])
-                            if duration > SLOW_TEST_THRESHOLD_S:
-                                severity = "critical" if duration > 10 else "warning"
-                                bottlenecks.append(Bottleneck(
-                                    category="test",
-                                    name=test_name,
-                                    duration_s=duration,
-                                    severity=severity,
-                                    recommendation=f"Test takes {duration:.1f}s. Consider mocking slow I/O, using fixtures, or marking with @pytest.mark.slow."
-                                ))
-                        except ValueError:
-                            continue
-    except (subprocess.TimeoutExpired, FileNotFoundError):
-        pass
-
-    return bottlenecks
-
-
-def find_slow_tests_by_scan(repo_path: str) -> List[Bottleneck]:
-    """Scan test files for patterns that indicate slow tests."""
-    bottlenecks = []
-    test_patterns = [
-        (r"time\.sleep\((\d+(?:\.\d+)?)\)", "Contains time.sleep() — consider using mock or async wait"),
-        (r"subprocess\.run\(.*timeout=(\d+)", "Subprocess with timeout — may block test"),
-        (r"requests\.(get|post|put|delete)\(", "Real HTTP call — mock with responses or httpretty"),
-        (r"open\\([^)]*)[\x27\x22]w[\x27\x22]", "File I/O in test — use tmp_path fixture"),
-    ]
-
-    for root, dirs, files in os.walk(repo_path):
-        # Skip hidden and cache dirs
-        dirs[:] = [d for d in dirs if not d.startswith(('.', '__pycache__', 'node_modules', '.git'))]
-
-        for fname in files:
-            if not (fname.startswith("test_") or fname.endswith("_test.py")):
-                continue
-            if not fname.endswith(".py"):
-                continue
-
-            fpath = os.path.join(root, fname)
-            rel_path = os.path.relpath(fpath, repo_path)
-
-            try:
-                with open(fpath) as f:
-                    lines = f.readlines()
-            except (PermissionError, UnicodeDecodeError):
-                continue
-
-            for i, line in enumerate(lines):
-                for pattern, recommendation in test_patterns:
-                    match = re.search(pattern, line)
-                    if match:
-                        duration = 1.0  # Default estimate
-                        if "sleep" in pattern:
-                            try:
-                                duration = float(match.group(1))
-                            except (ValueError, IndexError):
-                                duration = 1.0
-                        elif "timeout" in pattern:
-                            try:
-                                duration = float(match.group(1))
-                            except (ValueError, IndexError):
-                                duration = 10.0
-                        else:
-                            duration = 2.0  # Estimated
-
-                        bottlenecks.append(Bottleneck(
-                            category="test",
-                            name=f"{rel_path}:{i+1}",
-                            duration_s=duration,
-                            severity="warning" if duration < 5 else "critical",
-                            recommendation=recommendation,
-                            file_path=rel_path,
-                            line_number=i + 1
-                        ))
-
-    return bottlenecks
-
-
-# ── Build Analysis ─────────────────────────────────────────────────
-
-def analyze_build_artifacts(repo_path: str) -> List[Bottleneck]:
-    """Find large build artifacts that slow down builds."""
-    bottlenecks = []
-    large_dirs = {
-        "node_modules": "Consider using npm ci --production or yarn --production",
-        "__pycache__": "Consider .gitignore and cleaning before builds",
-        ".tox": "Consider caching tox environments",
-        ".pytest_cache": "Consider cleaning between CI runs",
-        "dist": "Check if dist/ artifacts are being rebuilt unnecessarily",
-        "build": "Check if build/ artifacts are being rebuilt unnecessarily",
-        ".next": "Next.js cache — consider incremental builds",
-        "venv": "Virtual env in repo — move outside or use Docker",
-    }
-
-    for dirname, recommendation in large_dirs.items():
-        dirpath = os.path.join(repo_path, dirname)
-        if os.path.isdir(dirpath):
-            total_size = 0
-            file_count = 0
-            for root, dirs, files in os.walk(dirpath):
-                for f in files:
-                    try:
-                        fpath = os.path.join(root, f)
-                        total_size += os.path.getsize(fpath)
-                        file_count += 1
-                    except OSError:
-                        pass
-
-            if total_size > 10 * 1024 * 1024:  # > 10MB
-                size_mb = total_size / (1024 * 1024)
-                bottlenecks.append(Bottleneck(
-                    category="build",
-                    name=f"{dirname}/ ({size_mb:.1f}MB, {file_count} files)",
-                    duration_s=size_mb * 0.5,  # Rough estimate
-                    severity="critical" if size_mb > 100 else "warning",
-                    recommendation=recommendation
-                ))
-
-    return bottlenecks
-
-
-def analyze_makefile_targets(repo_path: str) -> List[Bottleneck]:
-    """Analyze Makefile for potentially slow targets."""
-    bottlenecks = []
-    makefiles = []
-
-    for root, dirs, files in os.walk(repo_path):
-        dirs[:] = [d for d in dirs if not d.startswith(('.', '__pycache__'))]
-        for f in files:
-            if f in ("Makefile", "makefile", "GNUmakefile"):
-                makefiles.append(os.path.join(root, f))
-
-    slow_patterns = [
-        (r"pip install", "pip install without --no-deps or constraints"),
-        (r"npm install(?!.*--production)", "npm install without --production flag"),
-        (r"docker build", "Docker build — consider multi-stage and layer caching"),
-        (r"pytest(?!.*-x|--maxfail)", "pytest without early exit on failure"),
-        (r"mypy|mypy --strict", "Type checking — consider incremental mode"),
-    ]
-
-    for mfile in makefiles:
-        rel_path = os.path.relpath(mfile, repo_path)
-        try:
-            with open(mfile) as f:
-                content = f.read()
-        except (PermissionError, UnicodeDecodeError):
-            continue
-
-        for pattern, recommendation in slow_patterns:
-            if re.search(pattern, content):
-                bottlenecks.append(Bottleneck(
-                    category="build",
-                    name=f"{rel_path}: {pattern}",
-                    duration_s=5.0,
-                    severity="info",
-                    recommendation=recommendation,
-                    file_path=rel_path
-                ))
-
-    return bottlenecks
-
-
-# ── CI Analysis ────────────────────────────────────────────────────
-
-def analyze_github_actions(repo_path: str) -> List[Bottleneck]:
-    """Analyze GitHub Actions workflow files for inefficiencies."""
-    bottlenecks = []
-    workflow_dir = os.path.join(repo_path, ".github", "workflows")
-
-    if not os.path.isdir(workflow_dir):
-        return bottlenecks
-
-    slow_patterns = [
-        (r"runs-on:\s*ubuntu-latest", 0, "Consider caching dependencies between runs"),
-        (r"npm install", 2, "Use npm ci instead of npm install for reproducible builds"),
-        (r"pip install(?!.*--cache-dir)", 2, "Add --cache-dir or use pip cache action"),
-        (r"docker build(?!.*--cache-from)", 5, "Use Docker layer caching"),
-        (r"python -m pytest(?!.*-n|--numprocesses)", 3, "Consider pytest-xdist for parallel test execution"),
-    ]
-
-    for fname in os.listdir(workflow_dir):
-        if not fname.endswith(('.yml', '.yaml')):
-            continue
-
-        fpath = os.path.join(workflow_dir, fname)
-        try:
-            with open(fpath) as f:
-                content = f.read()
-        except (PermissionError, UnicodeDecodeError):
-            continue
-
-        for pattern, est_savings, recommendation in slow_patterns:
-            if re.search(pattern, content):
-                bottlenecks.append(Bottleneck(
-                    category="ci",
-                    name=f"{fname}: {pattern}",
-                    duration_s=est_savings,
-                    severity="info",
-                    recommendation=recommendation,
-                    file_path=f".github/workflows/{fname}"
-                ))
-
-    return bottlenecks
-
-
-def analyze_gitea_ci(repo_path: str) -> List[Bottleneck]:
-    """Analyze Gitea/Drone CI config files."""
-    bottlenecks = []
-    ci_files = [".gitea/workflows", ".drone.yml", ".woodpecker.yml"]
-
-    for ci_path in ci_files:
-        full_path = os.path.join(repo_path, ci_path)
-        if os.path.isfile(full_path):
-            try:
-                with open(full_path) as f:
-                    content = f.read()
-            except (PermissionError, UnicodeDecodeError):
-                continue
-
-            if "pip install" in content and "--cache-dir" not in content:
-                bottlenecks.append(Bottleneck(
-                    category="ci",
-                    name=f"{ci_path}: pip without cache",
-                    duration_s=5.0,
-                    severity="warning",
-                    recommendation="Add --cache-dir or mount pip cache volume",
-                    file_path=ci_path
-                ))
-
-        elif os.path.isdir(full_path):
-            for fname in os.listdir(full_path):
-                if not fname.endswith(('.yml', '.yaml')):
-                    continue
-                fpath = os.path.join(full_path, fname)
-                try:
-                    with open(fpath) as f:
-                        content = f.read()
-                except (PermissionError, UnicodeDecodeError):
-                    continue
-
-                if "pip install" in content and "--cache-dir" not in content:
-                    bottlenecks.append(Bottleneck(
-                        category="ci",
-                        name=f"{ci_path}/{fname}: pip without cache",
-                        duration_s=5.0,
-                        severity="warning",
-                        recommendation="Add --cache-dir or mount pip cache volume",
-                        file_path=f"{ci_path}/{fname}"
-                    ))
-
-    return bottlenecks
-
-
-# ── Import Analysis ────────────────────────────────────────────────
-
-def find_slow_imports(repo_path: str) -> List[Bottleneck]:
-    """Find Python files with heavy import chains."""
-    bottlenecks = []
-    heavy_imports = {
-        "pandas": 0.5,
-        "numpy": 0.3,
-        "torch": 2.0,
-        "tensorflow": 3.0,
-        "scipy": 0.5,
-        "matplotlib": 0.8,
-        "sklearn": 0.5,
-        "transformers": 1.5,
-    }
-
-    for root, dirs, files in os.walk(repo_path):
-        dirs[:] = [d for d in dirs if not d.startswith(('.', '__pycache__', 'node_modules'))]
-        for fname in files:
-            if not fname.endswith(".py"):
-                continue
-
-            fpath = os.path.join(root, fname)
-            rel_path = os.path.relpath(fpath, repo_path)
-
-            try:
-                with open(fpath) as f:
-                    lines = f.readlines()
-            except (PermissionError, UnicodeDecodeError):
-                continue
-
-            for i, line in enumerate(lines):
-                stripped = line.strip()
-                if stripped.startswith("import ") or stripped.startswith("from "):
-                    for heavy, est_time in heavy_imports.items():
-                        if heavy in stripped:
-                            bottlenecks.append(Bottleneck(
-                                category="import",
-                                name=f"{rel_path}:{i+1}: import {heavy}",
-                                duration_s=est_time,
-                                severity="info" if est_time < 1.0 else "warning",
-                                recommendation=f"Heavy import ({heavy} ~{est_time}s). Consider lazy import or conditional import.",
-                                file_path=rel_path,
-                                line_number=i + 1
-                            ))
-
-    return bottlenecks
-
-
-# ── Report Generation ──────────────────────────────────────────────
-
-def severity_sort_key(b: Bottleneck) -> Tuple[int, float]:
-    """Sort by severity then duration."""
-    sev_order = {"critical": 0, "warning": 1, "info": 2}
-    return (sev_order.get(b.severity, 3), -b.duration_s)
-
-
-def generate_report(repo_path: str) -> PerfReport:
-    """Run all analyses and generate a performance report."""
-    report = PerfReport(
-        timestamp=datetime.now(timezone.utc).isoformat(),
-        repo_path=os.path.abspath(repo_path)
-    )
-
-    # Collect all bottlenecks
-    all_bottlenecks = []
-
-    print("Scanning for slow tests (pytest cache)...")
-    all_bottlenecks.extend(find_slow_tests_pytest(repo_path))
-
-    print("Scanning for slow test patterns...")
-    all_bottlenecks.extend(find_slow_tests_by_scan(repo_path))
-
-    print("Analyzing build artifacts...")
-    all_bottlenecks.extend(analyze_build_artifacts(repo_path))
-
-    print("Analyzing Makefiles...")
-    all_bottlenecks.extend(analyze_makefile_targets(repo_path))
-
-    print("Analyzing CI workflows...")
-    all_bottlenecks.extend(analyze_github_actions(repo_path))
-    all_bottlenecks.extend(analyze_gitea_ci(repo_path))
-
-    print("Scanning for heavy imports...")
-    all_bottlenecks.extend(find_slow_imports(repo_path))
-
-    # Sort by severity and duration
-    all_bottlenecks.sort(key=severity_sort_key)
-    report.bottlenecks = all_bottlenecks[:TOP_N_BOTTLENECKS * 2]  # Keep more for stats
-
-    # Compute summary
-    by_category = defaultdict(list)
-    for b in all_bottlenecks:
-        by_category[b.category].append(b)
-
-    report.summary = {
-        "total_bottlenecks": len(all_bottlenecks),
-        "critical": sum(1 for b in all_bottlenecks if b.severity == "critical"),
-        "warning": sum(1 for b in all_bottlenecks if b.severity == "warning"),
-        "info": sum(1 for b in all_bottlenecks if b.severity == "info"),
-        "estimated_total_slowdown_s": sum(b.duration_s for b in all_bottlenecks),
-        "by_category": {cat: len(items) for cat, items in by_category.items()},
-    }
-
-    report.test_stats = {
-        "slow_tests": len(by_category.get("test", [])),
-        "total_estimated_s": sum(b.duration_s for b in by_category.get("test", [])),
-    }
-
-    report.build_stats = {
-        "build_issues": len(by_category.get("build", [])),
-        "total_estimated_s": sum(b.duration_s for b in by_category.get("build", [])),
-    }
-
-    report.ci_stats = {
-        "ci_issues": len(by_category.get("ci", [])),
-        "total_estimated_s": sum(b.duration_s for b in by_category.get("ci", [])),
-    }
-
-    return report
-
-
-def format_markdown(report: PerfReport) -> str:
-    """Format report as markdown."""
-    lines = []
-    lines.append(f"# Performance Bottleneck Report")
-    lines.append(f"")
-    lines.append(f"Generated: {report.timestamp}")
-    lines.append(f"Repository: {report.repo_path}")
-    lines.append(f"")
-
-    # Summary
-    s = report.summary
-    lines.append(f"## Summary")
-    lines.append(f"")
-    lines.append(f"- **Total bottlenecks:** {s['total_bottlenecks']}")
-    lines.append(f"- **Critical:** {s['critical']} | **Warning:** {s['warning']} | **Info:** {s['info']}")
-    lines.append(f"- **Estimated total slowdown:** {s['estimated_total_slowdown_s']:.1f}s")
-    lines.append(f"- **By category:** {', '.join(f'{k}: {v}' for k, v in s['by_category'].items())}")
-    lines.append(f"")
-
-    # Top bottlenecks
-    lines.append(f"## Top {min(TOP_N_BOTTLENECKS, len(report.bottlenecks))} Bottlenecks")
-    lines.append(f"")
-
-    for i, b in enumerate(report.bottlenecks[:TOP_N_BOTTLENECKS], 1):
-        icon = {"critical": "🔴", "warning": "🟡", "info": "🔵"}.get(b.severity, "⚪")
-        loc = f" ({b.file_path}:{b.line_number})" if b.file_path else ""
-        lines.append(f"{i}. {icon} **{b.category}** — {b.name}{loc}")
-        lines.append(f"   - Duration: ~{b.duration_s:.1f}s | Severity: {b.severity}")
-        lines.append(f"   - Fix: {b.recommendation}")
-        lines.append(f"")
-
-    # Category breakdowns
-    for cat in ["test", "build", "ci", "import"]:
-        items = [b for b in report.bottlenecks if b.category == cat]
-        if items:
-            lines.append(f"## {cat.title()} Bottlenecks")
-            lines.append(f"")
-            for b in items:
-                icon = {"critical": "🔴", "warning": "🟡", "info": "🔵"}.get(b.severity, "⚪")
-                loc = f" ({b.file_path}:{b.line_number})" if b.file_path else ""
-                lines.append(f"- {icon} {b.name}{loc} — ~{b.duration_s:.1f}s — {b.recommendation}")
-            lines.append(f"")
-
-    return "\n".join(lines)
-
-
-
-# ── Main ───────────────────────────────────────────────────────────
-
-def main():
-    parser = argparse.ArgumentParser(description="Performance Bottleneck Finder")
-    parser.add_argument("--repo", default=".", help="Path to repository to analyze")
-    parser.add_argument("--json", action="store_true", help="Output as JSON")
-    parser.add_argument("--report", help="Write markdown report to file")
-    parser.add_argument("--threshold", type=float, default=SLOW_TEST_THRESHOLD_S,
-                        help="Slow test threshold in seconds")
-    args = parser.parse_args()
-
-    # Threshold override handled via module-level default
-    # (scan_tests uses SLOW_TEST_THRESHOLD_S from module scope)
-
-    if not os.path.isdir(args.repo):
-        print(f"Error: {args.repo} is not a directory", file=sys.stderr)
-        sys.exit(1)
-
-    report = generate_report(args.repo)
-
-    if args.json:
-        print(json.dumps(report.to_dict(), indent=2))
-    else:
-        md = format_markdown(report)
-        if args.report:
-            os.makedirs(os.path.dirname(args.report) or ".", exist_ok=True)
-            with open(args.report, "w") as f:
-                f.write(md)
-            print(f"Report written to {args.report}")
-        else:
-            print(md)
-
-    # Exit code: 1 if critical bottlenecks found
-    if report.summary.get("critical", 0) > 0:
-        sys.exit(1)
-
-
-if __name__ == "__main__":
-    main()
--- a/scripts/priority_rebalancer.py
+++ b/scripts/priority_rebalancer.py
@@ -1,682 +0,0 @@
-#!/usr/bin/env python3
-"""
-Priority Rebalancer — Re-evaluate issue priorities based on accumulated data.
-
-Reads pipeline outputs, knowledge store, and Gitea issues to suggest
-priority changes based on what the fleet has learned.
-
-Usage:
-    python3 scripts/priority_rebalancer.py --org Timmy_Foundation
-    python3 scripts/priority_rebalancer.py --org Timmy_Foundation --repo compounding-intelligence
-    python3 scripts/priority_rebalancer.py --org Timmy_Foundation --dry-run
-    python3 scripts/priority_rebalancer.py --org Timmy_Foundation --apply
-
-Output:
-    metrics/priority_report.json   — full analysis
-    metrics/priority_suggestions.md — human-readable suggestions
-"""
-
-import argparse
-import json
-import os
-import sys
-from datetime import datetime, timezone, timedelta
-from pathlib import Path
-from typing import Dict, List, Any, Optional, Tuple
-from dataclasses import dataclass, field, asdict
-from collections import Counter, defaultdict
-import urllib.request
-import urllib.error
-
-
-# ============================================================
-# Data Models
-# ============================================================
-
-@dataclass
-class IssueScore:
-    issue_id: int
-    repo: str
-    title: str
-    current_labels: List[str]
-    current_priority: Optional[str]
-    suggested_priority: Optional[str]
-    score: float
-    reasons: List[str]
-    age_days: int
-    comment_count: int
-    assignee: Optional[str]
-    dependencies: List[str] = field(default_factory=list)
-    blocking: List[str] = field(default_factory=list)
-
-
-@dataclass
-class PipelineSignal:
-    source: str  # "knowledge", "metrics", "sessions", "staleness"
-    signal_type: str  # "stale_knowledge", "high_error_rate", "missing_coverage", etc.
-    weight: float  # 0.0 - 1.0
-    detail: str
-    affected_repos: List[str] = field(default_factory=list)
-    affected_issues: List[int] = field(default_factory=list)
-
-
-# ============================================================
-# Gitea API Client
-# ============================================================
-
-class GiteaClient:
-    def __init__(self, base_url: str, token: str):
-        self.base_url = base_url.rstrip("/")
-        self.token = token
-
-    def _request(self, path: str, params: Dict = None) -> Any:
-        url = f"{self.base_url}/api/v1{path}"
-        if params:
-            qs = "&".join(f"{k}={v}" for k, v in params.items() if v is not None)
-            url += f"?{qs}"
-
-        req = urllib.request.Request(url)
-        req.add_header("Authorization", f"token {self.token}")
-        req.add_header("Content-Type", "application/json")
-
-        try:
-            with urllib.request.urlopen(req, timeout=30) as resp:
-                return json.loads(resp.read().decode())
-        except urllib.error.HTTPError as e:
-            print(f"API error {e.code} for {path}: {e.read().decode()[:200]}", file=sys.stderr)
-            return None
-
-    def get_org_repos(self, org: str) -> List[Dict]:
-        repos = []
-        page = 1
-        while True:
-            batch = self._request(f"/orgs/{org}/repos", {"limit": 50, "page": page})
-            if not batch:
-                break
-            repos.extend(batch)
-            if len(batch) < 50:
-                break
-            page += 1
-        return repos
-
-    def get_issues(self, org: str, repo: str, state: str = "open") -> List[Dict]:
-        issues = []
-        page = 1
-        while True:
-            batch = self._request(f"/repos/{org}/{repo}/issues",
-                                  {"state": state, "limit": 50, "page": page, "type": "issues"})
-            if not batch:
-                break
-            issues.extend(batch)
-            if len(batch) < 50:
-                break
-            page += 1
-        return issues
-
-    def add_label_to_issue(self, org: str, repo: str, issue_num: int, label_ids: List[int]) -> bool:
-        url = f"{self.base_url}/api/v1/repos/{org}/{repo}/issues/{issue_num}/labels"
-        data = json.dumps({"labels": label_ids}).encode()
-        req = urllib.request.Request(url, data=data, method="POST")
-        req.add_header("Authorization", f"token {self.token}")
-        req.add_header("Content-Type", "application/json")
-        try:
-            with urllib.request.urlopen(req, timeout=15) as resp:
-                return resp.status == 200
-        except Exception:
-            return False
-
-    def remove_label_from_issue(self, org: str, repo: str, issue_num: int, label_id: int) -> bool:
-        url = f"{self.base_url}/api/v1/repos/{org}/{repo}/issues/{issue_num}/labels/{label_id}"
-        req = urllib.request.Request(url, method="DELETE")
-        req.add_header("Authorization", f"token {self.token}")
-        try:
-            with urllib.request.urlopen(req, timeout=15) as resp:
-                return resp.status == 200
-        except Exception:
-            return False
-
-    def get_repo_labels(self, org: str, repo: str) -> List[Dict]:
-        labels = []
-        page = 1
-        while True:
-            batch = self._request(f"/repos/{org}/{repo}/labels", {"limit": 50, "page": page})
-            if not batch:
-                break
-            labels.extend(batch)
-            if len(batch) < 50:
-                break
-            page += 1
-        return labels
-
-    def add_comment(self, org: str, repo: str, issue_num: int, body: str) -> bool:
-        url = f"{self.base_url}/api/v1/repos/{org}/{repo}/issues/{issue_num}/comments"
-        data = json.dumps({"body": body}).encode()
-        req = urllib.request.Request(url, data=data, method="POST")
-        req.add_header("Authorization", f"token {self.token}")
-        req.add_header("Content-Type", "application/json")
-        try:
-            with urllib.request.urlopen(req, timeout=15) as resp:
-                return resp.status == 201
-        except Exception:
-            return False
-
-
-# ============================================================
-# Pipeline Signal Collectors
-# ============================================================
-
-def collect_knowledge_signals(knowledge_dir: str) -> List[PipelineSignal]:
-    """Analyze knowledge store for coverage gaps and staleness."""
-    signals = []
-    index_path = os.path.join(knowledge_dir, "index.json")
-
-    if not os.path.exists(index_path):
-        signals.append(PipelineSignal(
-            source="knowledge",
-            signal_type="missing_index",
-            weight=0.8,
-            detail="knowledge/index.json not found — no knowledge base exists"
-        ))
-        return signals
-
-    try:
-        with open(index_path) as f:
-            data = json.load(f)
-    except (json.JSONDecodeError, IOError) as e:
-        signals.append(PipelineSignal(
-            source="knowledge",
-            signal_type="corrupt_index",
-            weight=0.9,
-            detail=f"knowledge/index.json is corrupt: {e}"
-        ))
-        return signals
-
-    facts = data.get("facts", [])
-    total = len(facts)
-
-    if total == 0:
-        signals.append(PipelineSignal(
-            source="knowledge",
-            signal_type="empty_knowledge",
-            weight=0.7,
-            detail="Knowledge store has 0 facts — harvester not running or not finding sessions"
-        ))
-        return signals
-
-    # Check staleness
-    stale_count = 0
-    missing_source = 0
-    for fact in facts:
-        status = fact.get("status", "unknown")
-        if status == "stale":
-            stale_count += 1
-        elif status in ("missing_source", "no_source"):
-            missing_source += 1
-
-    if stale_count > 0:
-        signals.append(PipelineSignal(
-            source="knowledge",
-            signal_type="stale_knowledge",
-            weight=min(1.0, stale_count / max(1, total)),
-            detail=f"{stale_count}/{total} facts are stale (source files changed)"
-        ))
-
-    if missing_source > 0:
-        signals.append(PipelineSignal(
-            source="knowledge",
-            signal_type="missing_sources",
-            weight=min(1.0, missing_source / max(1, total)),
-            detail=f"{missing_source}/{total} facts have missing source files"
-        ))
-
-    # Coverage by repo
-    repo_counts = Counter(f.get("repo", "unknown") for f in facts)
-    if len(repo_counts) < 3:
-        signals.append(PipelineSignal(
-            source="knowledge",
-            signal_type="low_coverage",
-            weight=0.5,
-            detail=f"Knowledge covers only {len(repo_counts)} repos — expand harvester scope",
-            affected_repos=list(repo_counts.keys())
-        ))
-
-    return signals
-
-
-def collect_staleness_signals(scripts_dir: str, knowledge_dir: str) -> List[PipelineSignal]:
-    """Run staleness checker if available."""
-    signals = []
-    checker = os.path.join(scripts_dir, "knowledge_staleness_check.py")
-    index_path = os.path.join(knowledge_dir, "index.json")
-
-    if not os.path.exists(checker) or not os.path.exists(index_path):
-        return signals
-
-    try:
-        import subprocess
-        result = subprocess.run(
-            ["python3", checker, "--index", index_path, "--json"],
-            capture_output=True, text=True, timeout=30
-        )
-        if result.returncode == 0:
-            data = json.loads(result.stdout)
-            stale = data.get("stale_count", 0)
-            total = data.get("total", 0)
-            if stale > 0:
-                signals.append(PipelineSignal(
-                    source="staleness",
-                    signal_type="stale_knowledge",
-                    weight=min(1.0, stale / max(1, total)),
-                    detail=f"Staleness checker found {stale}/{total} stale entries"
-                ))
-    except Exception:
-        pass
-
-    return signals
-
-
-def collect_metrics_signals(metrics_dir: str) -> List[PipelineSignal]:
-    """Analyze metrics directory for pipeline health."""
-    signals = []
-
-    if not os.path.isdir(metrics_dir):
-        return signals
-
-    files = os.listdir(metrics_dir)
-    if len(files) <= 1:  # Only .gitkeep
-        signals.append(PipelineSignal(
-            source="metrics",
-            signal_type="no_metrics",
-            weight=0.6,
-            detail="Metrics directory is empty — measurer pipeline not producing output"
-        ))
-
-    return signals
-
-
-# ============================================================
-# Priority Scoring Engine
-# ============================================================
-
-PRIORITY_LEVELS = ["P0", "P1", "P2", "P3", "none"]
-PRIORITY_LABELS = {"p0": "P0", "p1": "P1", "p2": "P2", "p3": "P3",
-                   "priority:critical": "P0", "priority:high": "P1",
-                   "priority:medium": "P2", "priority:low": "P3"}
-
-
-def extract_priority(labels: List[str]) -> Optional[str]:
-    """Extract priority level from issue labels."""
-    for label in labels:
-        lower = label.lower().strip()
-        if lower in PRIORITY_LABELS:
-            return PRIORITY_LABELS[lower]
-    return None
-
-
-def compute_issue_score(
-    issue: Dict,
-    repo: str,
-    signals: List[PipelineSignal],
-    now: datetime
-) -> IssueScore:
-    """Compute priority score for a single issue."""
-
-    labels = [l.get("name", "") if isinstance(l, dict) else l for l in issue.get("labels", [])]
-    title = issue.get("title", "")
-    issue_id = issue.get("number", 0)
-    current_priority = extract_priority(labels)
-
-    # Parse dates
-    created_str = issue.get("created_at", "")
-    try:
-        created = datetime.fromisoformat(created_str.replace("Z", "+00:00"))
-        age_days = (now - created.replace(tzinfo=None)).days
-    except (ValueError, AttributeError):
-        age_days = 0
-
-    comment_count = issue.get("comments", 0)
-    assignee = None
-    assignees = issue.get("assignees") or []
-    if assignees:
-        assignee = assignees[0].get("login") if isinstance(assignees[0], dict) else str(assignees[0])
-
-    # Base score
-    score = 0.0
-    reasons = []
-
-    # Age factor: older issues drift down unless actively discussed
-    if age_days > 90 and comment_count < 2:
-        score -= 15
-        reasons.append(f"Dormant: {age_days} days old with only {comment_count} comments")
-    elif age_days > 30:
-        score -= 5
-        reasons.append(f"Aging: {age_days} days old")
-
-    # Activity factor: recent discussion suggests urgency
-    if comment_count > 5:
-        score += 10
-        reasons.append(f"Active discussion: {comment_count} comments")
-    elif comment_count > 0:
-        score += 3
-
-    # Assignment factor: unassigned issues need triage
-    if not assignee:
-        score += 5
-        reasons.append("Unassigned — needs triage")
-
-    # Pipeline signal alignment
-    for signal in signals:
-        title_lower = title.lower()
-        if signal.signal_type == "stale_knowledge" and "stale" in title_lower:
-            score += signal.weight * 20
-            reasons.append(f"Matches signal: {signal.detail}")
-        elif signal.signal_type == "empty_knowledge" and ("harvester" in title_lower or "knowledge" in title_lower):
-            score += signal.weight * 25
-            reasons.append(f"Critical gap: {signal.detail}")
-        elif signal.signal_type == "no_metrics" and "measur" in title_lower:
-            score += signal.weight * 15
-            reasons.append(f"Pipeline gap: {signal.detail}")
-        elif signal.signal_type == "low_coverage" and any(r.lower() in title_lower for r in signal.affected_repos):
-            score += signal.weight * 10
-            reasons.append(f"Coverage gap: {signal.detail}")
-
-    # Keyword boosts
-    keyword_scores = {
-        "broken": 20, "bug": 15, "fix": 10, "error": 12, "fail": 15,
-        "security": 25, "auth": 20, "data loss": 30, "crash": 25,
-        "blocker": 20, "urgent": 15, "critical": 15,
-        "epic": 8, "feature": -3, "nice to have": -10, "someday": -15
-    }
-    title_lower = title.lower()
-    for keyword, boost in keyword_scores.items():
-        if keyword in title_lower:
-            score += boost
-            if boost > 0:
-                reasons.append(f"Keyword match: '{keyword}' (+{boost})")
-
-    # Label-based adjustments
-    for label in labels:
-        lower = label.lower()
-        if lower == "pipeline":
-            score += 5  # Pipeline issues are infrastructure
-        elif lower == "bug":
-            score += 12
-        elif lower == "enhancement":
-            score -= 2
-        elif lower == "documentation":
-            score -= 5
-        elif "epic" in lower:
-            score += 3
-
-    # Determine suggested priority
-    if score >= 40:
-        suggested = "P0"
-    elif score >= 25:
-        suggested = "P1"
-    elif score >= 10:
-        suggested = "P2"
-    elif score >= 0:
-        suggested = "P3"
-    else:
-        suggested = None  # Consider closing or deprioritizing
-
-    return IssueScore(
-        issue_id=issue_id,
-        repo=repo,
-        title=title,
-        current_labels=labels,
-        current_priority=current_priority,
-        suggested_priority=suggested,
-        score=round(score, 1),
-        reasons=reasons if reasons else ["No strong signals"],
-        age_days=age_days,
-        comment_count=comment_count,
-        assignee=assignee
-    )
-
-
-# ============================================================
-# Report Generation
-# ============================================================
-
-def generate_report(
-    scores: List[IssueScore],
-    signals: List[PipelineSignal],
-    org: str,
-    repos_scanned: List[str]
-) -> Dict[str, Any]:
-    """Generate the full priority report."""
-    now = datetime.now(timezone.utc).isoformat()
-
-    # Categorize changes
-    upgrades = [s for s in scores if s.suggested_priority and s.current_priority and
-                PRIORITY_LEVELS.index(s.suggested_priority) < PRIORITY_LEVELS.index(s.current_priority)]
-    downgrades = [s for s in scores if s.suggested_priority and s.current_priority and
-                  PRIORITY_LEVELS.index(s.suggested_priority) > PRIORITY_LEVELS.index(s.current_priority)]
-    new_assignments = [s for s in scores if s.suggested_priority and not s.current_priority]
-    no_change = [s for s in scores if s.suggested_priority == s.current_priority]
-
-    return {
-        "generated_at": now,
-        "org": org,
-        "repos_scanned": repos_scanned,
-        "total_issues": len(scores),
-        "signals": [asdict(s) for s in signals],
-        "summary": {
-            "suggested_upgrades": len(upgrades),
-            "suggested_downgrades": len(downgrades),
-            "suggested_new_priorities": len(new_assignments),
-            "unchanged": len(no_change)
-        },
-        "top_priority": [asdict(s) for s in sorted(scores, key=lambda x: x.score, reverse=True)[:10]],
-        "upgrades": [asdict(s) for s in upgrades],
-        "downgrades": [asdict(s) for s in downgrades],
-        "new_assignments": [asdict(s) for s in new_assignments],
-        "all_scores": [asdict(s) for s in sorted(scores, key=lambda x: x.score, reverse=True)]
-    }
-
-
-def generate_markdown_report(report: Dict[str, Any]) -> str:
-    """Generate human-readable markdown report."""
-    lines = []
-    lines.append("# Priority Rebalancer Report")
-    lines.append(f"**Generated:** {report['generated_at']}")
-    lines.append(f"**Org:** {report['org']}")
-    lines.append(f"**Repos scanned:** {', '.join(report['repos_scanned'])}")
-    lines.append(f"**Issues analyzed:** {report['total_issues']}")
-    lines.append("")
-
-    # Signals
-    if report["signals"]:
-        lines.append("## Pipeline Signals")
-        for sig in report["signals"]:
-            weight_bar = "█" * int(sig["weight"] * 10) + "░" * (10 - int(sig["weight"] * 10))
-            lines.append(f"- [{weight_bar}] **{sig['source']}/{sig['signal_type']}** — {sig['detail']}")
-        lines.append("")
-
-    # Summary
-    s = report["summary"]
-    lines.append("## Summary")
-    lines.append(f"- Suggested upgrades: **{s['suggested_upgrades']}**")
-    lines.append(f"- Suggested downgrades: **{s['suggested_downgrades']}**")
-    lines.append(f"- New priority assignments: **{s['suggested_new_priorities']}**")
-    lines.append(f"- Unchanged: **{s['unchanged']}**")
-    lines.append("")
-
-    # Top 10
-    lines.append("## Top 10 by Score")
-    for i, item in enumerate(report["top_priority"][:10], 1):
-        cur = item["current_priority"] or "none"
-        sug = item["suggested_priority"] or "none"
-        arrow = "↑" if PRIORITY_LEVELS.index(sug) < PRIORITY_LEVELS.index(cur) else "↓" if PRIORITY_LEVELS.index(sug) > PRIORITY_LEVELS.index(cur) else "→"
-        lines.append(f"{i}. **[{item['repo']}#{item['issue_id']}]** {item['title']}")
-        lines.append(f"   Score: {item['score']} | Current: {cur} {arrow} Suggested: {sug}")
-        lines.append(f"   Reasons: {'; '.join(item['reasons'][:3])}")
-        lines.append("")
-
-    # Upgrades
-    if report["upgrades"]:
-        lines.append("## Suggested Upgrades")
-        for item in report["upgrades"]:
-            lines.append(f"- **[{item['repo']}#{item['issue_id']}]** {item['title']}")
-            lines.append(f"  {item['current_priority']} → {item['suggested_priority']} (score: {item['score']})")
-            lines.append(f"  {'; '.join(item['reasons'][:2])}")
-        lines.append("")
-
-    # Downgrades
-    if report["downgrades"]:
-        lines.append("## Suggested Downgrades")
-        for item in report["downgrades"]:
-            lines.append(f"- **[{item['repo']}#{item['issue_id']}]** {item['title']}")
-            lines.append(f"  {item['current_priority']} → {item['suggested_priority']} (score: {item['score']})")
-            lines.append(f"  {'; '.join(item['reasons'][:2])}")
-        lines.append("")
-
-    # New assignments
-    if report["new_assignments"]:
-        lines.append("## New Priority Suggestions (currently unlabelled)")
-        for item in report["new_assignments"][:20]:
-            lines.append(f"- **[{item['repo']}#{item['issue_id']}]** {item['title']}")
-            lines.append(f"  Suggested: {item['suggested_priority']} (score: {item['score']})")
-            lines.append(f"  {'; '.join(item['reasons'][:2])}")
-        lines.append("")
-
-    return "\n".join(lines)
-
-
-# ============================================================
-# Main
-# ============================================================
-
-def main():
-    parser = argparse.ArgumentParser(description="Priority Rebalancer — re-score issues based on pipeline data")
-    parser.add_argument("--org", default="Timmy_Foundation", help="Gitea org name")
-    parser.add_argument("--repo", help="Single repo to analyze (default: all)")
-    parser.add_argument("--base-url", default="https://forge.alexanderwhitestone.com", help="Gitea base URL")
-    parser.add_argument("--knowledge-dir", default=None, help="Path to knowledge directory")
-    parser.add_argument("--metrics-dir", default=None, help="Path to metrics directory")
-    parser.add_argument("--scripts-dir", default=None, help="Path to scripts directory")
-    parser.add_argument("--output-dir", default=None, help="Path to output directory")
-    parser.add_argument("--dry-run", action="store_true", help="Show what would change without applying")
-    parser.add_argument("--apply", action="store_true", help="Apply priority changes via API")
-    parser.add_argument("--json", action="store_true", help="Output JSON instead of markdown")
-
-    args = parser.parse_args()
-
-    # Resolve paths relative to script location
-    script_dir = Path(__file__).parent
-    repo_root = script_dir.parent
-
-    knowledge_dir = args.knowledge_dir or str(repo_root / "knowledge")
-    metrics_dir = args.metrics_dir or str(repo_root / "metrics")
-    scripts_dir = args.scripts_dir or str(repo_root / "scripts")
-    output_dir = args.output_dir or str(repo_root / "metrics")
-
-    # Get token
-    token = os.environ.get("GITEA_TOKEN")
-    if not token:
-        token_path = os.path.expanduser("~/.config/gitea/token")
-        if os.path.exists(token_path):
-            with open(token_path) as f:
-                token = f.read().strip()
-
-    if not token:
-        print("Error: No Gitea token found. Set GITEA_TOKEN or create ~/.config/gitea/token", file=sys.stderr)
-        sys.exit(1)
-
-    client = GiteaClient(args.base_url, token)
-    now = datetime.utcnow()
-
-    # Collect pipeline signals
-    print("Collecting pipeline signals...", file=sys.stderr)
-    signals = []
-    signals.extend(collect_knowledge_signals(knowledge_dir))
-    signals.extend(collect_staleness_signals(scripts_dir, knowledge_dir))
-    signals.extend(collect_metrics_signals(metrics_dir))
-    print(f"  Found {len(signals)} signals", file=sys.stderr)
-
-    # Get repos
-    if args.repo:
-        repos = [{"name": args.repo}]
-    else:
-        repos = client.get_org_repos(args.org)
-
-    repo_names = [r["name"] for r in repos]
-    print(f"Scanning {len(repo_names)} repos: {', '.join(repo_names[:5])}{'...' if len(repo_names) > 5 else ''}", file=sys.stderr)
-
-    # Score all issues
-    all_scores = []
-    for repo in repos:
-        repo_name = repo["name"]
-        issues = client.get_issues(args.org, repo_name)
-        print(f"  {repo_name}: {len(issues)} open issues", file=sys.stderr)
-
-        for issue in issues:
-            if issue.get("pull_request"):
-                continue
-            score = compute_issue_score(issue, repo_name, signals, now)
-            all_scores.append(score)
-
-    # Generate report
-    report = generate_report(all_scores, signals, args.org, repo_names)
-
-    # Output
-    os.makedirs(output_dir, exist_ok=True)
-
-    if args.json:
-        print(json.dumps(report, indent=2))
-    else:
-        md = generate_markdown_report(report)
-        print(md)
-
-    # Write files
-    report_path = os.path.join(output_dir, "priority_report.json")
-    with open(report_path, "w") as f:
-        json.dump(report, f, indent=2)
-    print(f"\nFull report: {report_path}", file=sys.stderr)
-
-    md_path = os.path.join(output_dir, "priority_suggestions.md")
-    with open(md_path, "w") as f:
-        f.write(generate_markdown_report(report))
-    print(f"Suggestions: {md_path}", file=sys.stderr)
-
-    # Apply changes if requested
-    if args.apply:
-        print("\nApplying priority changes...", file=sys.stderr)
-        applied = 0
-
-        # Get label IDs for priority labels
-        priority_label_map = {}
-        for repo_name in repo_names:
-            labels = client.get_repo_labels(args.org, repo_name)
-            for label in labels:
-                name = label.get("name", "").lower()
-                if name in ("p0", "p1", "p2", "p3"):
-                    priority_label_map[(repo_name, name)] = label["id"]
-
-        for score in all_scores:
-            if score.suggested_priority and score.suggested_priority != score.current_priority:
-                sug_lower = score.suggested_priority.lower()
-                label_id = priority_label_map.get((score.repo, sug_lower))
-                if label_id:
-                    ok = client.add_label_to_issue(args.org, score.repo, score.issue_id, [label_id])
-                    if ok:
-                        applied += 1
-                        print(f"  Applied: [{score.repo}#{score.issue_id}] → {score.suggested_priority}", file=sys.stderr)
-
-                    # Add comment explaining the change
-                    comment = f"**Priority Rebalancer** suggested: **{score.suggested_priority}** (was: {score.current_priority or 'none'})\n\n"
-                    comment += f"Score: {score.score}\n"
-                    comment += f"Reasons:\n"
-                    for r in score.reasons[:5]:
-                        comment += f"- {r}\n"
-                    client.add_comment(args.org, score.repo, score.issue_id, comment)
-
-        print(f"Applied {applied} priority changes", file=sys.stderr)
-
-    elif args.dry_run:
-        print(f"\nDry run — {report['summary']['suggested_upgrades'] + report['summary']['suggested_downgrades'] + report['summary']['suggested_new_priorities']} changes would be applied", file=sys.stderr)
-
-
-if __name__ == "__main__":
-    main()
--- a/scripts/refactoring_opportunity_finder.py
+++ b/scripts/refactoring_opportunity_finder.py
@@ -1,290 +0,0 @@
-#!/usr/bin/env python3
-"""
-Finds refactoring opportunities in codebases
-
-Engine ID: 10.4
-
-Usage:
-    python3 scripts/refactoring_opportunity_finder.py --output proposals/refactoring_opportunity_finder.json
-    python3 scripts/refactoring_opportunity_finder.py --output proposals/refactoring_opportunity_finder.json --dry-run
-"""
-
-import argparse
-import ast
-import json
-import os
-import sys
-from dataclasses import dataclass, field
-from datetime import datetime, timezone
-from typing import Optional, Tuple
-
-
-# ── Data Classes ────────────────────────────────────────────────────────
-
-@dataclass
-class FileMetrics:
-    """Metrics for a single source file."""
-    path: str
-    lines: int = 0
-    complexity: float = 0.0
-    max_complexity: int = 0
-    functions: int = 0
-    classes: int = 0
-    churn_30d: int = 0
-    churn_90d: int = 0
-    test_coverage: Optional[float] = None
-    refactoring_score: float = 0.0
-
-
-# ── Complexity Analysis ─────────────────────────────────────────────────
-
-class ComplexityVisitor(ast.NodeVisitor):
-    """AST visitor that computes cyclomatic complexity per function."""
-
-    def __init__(self):
-        self.complexities = []
-        self.function_count = 0
-        self.class_count = 0
-        self._current_complexity = 0
-        self._in_function = False
-
-    def visit_FunctionDef(self, node):
-        self.function_count += 1
-        old_complexity = self._current_complexity
-        old_in_function = self._in_function
-        self._current_complexity = 1  # Base complexity
-        self._in_function = True
-
-        self.generic_visit(node)
-
-        self.complexities.append(self._current_complexity)
-        self._current_complexity = old_complexity
-        self._in_function = old_in_function
-
-    visit_AsyncFunctionDef = visit_FunctionDef
-
-    def visit_ClassDef(self, node):
-        self.class_count += 1
-        self.generic_visit(node)
-
-    def visit_If(self, node):
-        if self._in_function:
-            self._current_complexity += 1
-        self.generic_visit(node)
-
-    def visit_For(self, node):
-        if self._in_function:
-            self._current_complexity += 1
-        self.generic_visit(node)
-
-    visit_AsyncFor = visit_For
-
-    def visit_While(self, node):
-        if self._in_function:
-            self._current_complexity += 1
-        self.generic_visit(node)
-
-    def visit_ExceptHandler(self, node):
-        if self._in_function:
-            self._current_complexity += 1
-        self.generic_visit(node)
-
-    def visit_With(self, node):
-        if self._in_function:
-            self._current_complexity += 1
-        self.generic_visit(node)
-
-    visit_AsyncWith = visit_With
-
-    def visit_Assert(self, node):
-        if self._in_function:
-            self._current_complexity += 1
-        self.generic_visit(node)
-
-    def visit_BoolOp(self, node):
-        # Each 'and'/'or' adds a branch
-        if self._in_function:
-            self._current_complexity += len(node.values) - 1
-        self.generic_visit(node)
-
-    def visit_IfExp(self, node):
-        # Ternary expression
-        if self._in_function:
-            self._current_complexity += 1
-        self.generic_visit(node)
-
-
-def compute_file_complexity(filepath: str) -> Tuple[float, int, int, int, int]:
-    """
-    Compute cyclomatic complexity for a Python file.
-
-    Returns:
-        (avg_complexity, max_complexity, function_count, class_count, line_count)
-    """
-    try:
-        with open(filepath, 'r', encoding='utf-8', errors='replace') as f:
-            source = f.read()
-    except (IOError, OSError):
-        return 0.0, 0, 0, 0, 0
-
-    try:
-        tree = ast.parse(source, filename=filepath)
-    except SyntaxError:
-        return 0.0, 0, 0, 0, 0
-
-    visitor = ComplexityVisitor()
-    visitor.visit(tree)
-
-    line_count = len(source.splitlines())
-
-    if not visitor.complexities:
-        # No functions, but might have classes
-        return 0.0, 0, visitor.function_count, visitor.class_count, line_count
-
-    avg = sum(visitor.complexities) / len(visitor.complexities)
-    max_c = max(visitor.complexities)
-
-    return avg, max_c, visitor.function_count, visitor.class_count, line_count
-
-
-# ── Refactoring Score ───────────────────────────────────────────────────
-
-def calculate_refactoring_score(metrics: FileMetrics) -> float:
-    """
-    Calculate a refactoring priority score (0-100) based on file metrics.
-
-    Higher score = higher priority for refactoring.
-    Components:
-        - Complexity (0-30 points): higher avg/max complexity = higher score
-        - Size (0-20 points): larger files = higher score
-        - Churn (0-30 points): more changes recently = higher score
-        - Coverage (0-20 points): lower test coverage = higher score
-    """
-    score = 0.0
-
-    # Complexity component (0-30)
-    # avg=10+ or max=20+ → 30 points
-    complexity_score = min(30.0, (metrics.complexity * 2) + (metrics.max_complexity * 0.5))
-    score += max(0.0, complexity_score)
-
-    # Size component (0-20)
-    # 500+ lines → 20 points
-    size_score = min(20.0, metrics.lines / 25.0)
-    score += max(0.0, size_score)
-
-    # Churn component (0-30)
-    # Weighted: recent churn (30d) counts more than older (90d)
-    churn_score = min(30.0, (metrics.churn_30d * 2) + (metrics.churn_90d * 0.5))
-    score += max(0.0, churn_score)
-
-    # Coverage component (0-20)
-    # Lower coverage → higher score
-    if metrics.test_coverage is not None:
-        # coverage=0 → 20 points, coverage=1 → 0 points
-        coverage_score = (1.0 - metrics.test_coverage) * 20.0
-    else:
-        # No data → assume medium risk (10 points)
-        coverage_score = 10.0
-    score += max(0.0, coverage_score)
-
-    return min(100.0, max(0.0, score))
-
-
-# ── Proposal Generation ─────────────────────────────────────────────────
-
-def scan_directory(directory: str, extensions: tuple = ('.py',)) -> list:
-    """Scan directory for source files."""
-    files = []
-    for root, dirs, filenames in os.walk(directory):
-        # Skip hidden dirs and common non-source dirs
-        dirs[:] = [d for d in dirs if not d.startswith('.') and d not in (
-            '__pycache__', 'node_modules', 'venv', '.venv', 'env',
-            'build', 'dist', '.git', '.tox'
-        )]
-        for fname in filenames:
-            if any(fname.endswith(ext) for ext in extensions):
-                files.append(os.path.join(root, fname))
-    return files
-
-
-def generate_proposals(directory: str = '.', min_score: float = 30.0) -> list:
-    """Generate refactoring proposals by analyzing source files."""
-    proposals = []
-    files = scan_directory(directory)
-
-    for filepath in files:
-        avg, max_c, funcs, classes, lines = compute_file_complexity(filepath)
-
-        if funcs == 0 and classes == 0:
-            continue
-
-        metrics = FileMetrics(
-            path=filepath,
-            lines=lines,
-            complexity=avg,
-            max_complexity=max_c,
-            functions=funcs,
-            classes=classes
-        )
-        score = calculate_refactoring_score(metrics)
-        metrics.refactoring_score = score
-
-        if score >= min_score:
-            reasons = []
-            if max_c > 10:
-                reasons.append(f"high max complexity ({max_c})")
-            if avg > 5:
-                reasons.append(f"high avg complexity ({avg:.1f})")
-            if lines > 300:
-                reasons.append(f"large file ({lines} lines)")
-
-            proposals.append({
-                "title": f"Refactor {os.path.basename(filepath)} (score: {score:.0f})",
-                "description": f"{filepath}: {', '.join(reasons) if reasons else 'general improvement candidate'}",
-                "impact": min(10, int(score / 10)),
-                "effort": min(10, max(1, int(max_c / 3))),
-                "category": "refactoring",
-                "source_engine": "10.4",
-                "timestamp": datetime.now(timezone.utc).isoformat(),
-                "metrics": {
-                    "path": filepath,
-                    "score": round(score, 2),
-                    "avg_complexity": round(avg, 2),
-                    "max_complexity": max_c,
-                    "lines": lines,
-                    "functions": funcs,
-                    "classes": classes
-                }
-            })
-
-    # Sort by score descending
-    proposals.sort(key=lambda p: p.get('metrics', {}).get('score', 0), reverse=True)
-    return proposals
-
-
-# ── CLI ─────────────────────────────────────────────────────────────────
-
-def main():
-    parser = argparse.ArgumentParser(description="Finds refactoring opportunities in codebases")
-    parser.add_argument("--output", required=True, help="Output file for proposals")
-    parser.add_argument("--dry-run", action="store_true", help="Don't write output file")
-    parser.add_argument("--directory", default=".", help="Directory to scan")
-    parser.add_argument("--min-score", type=float, default=30.0, help="Minimum score threshold")
-
-    args = parser.parse_args()
-
-    proposals = generate_proposals(args.directory, args.min_score)
-
-    if not args.dry_run:
-        os.makedirs(os.path.dirname(args.output) or '.', exist_ok=True)
-        with open(args.output, "w") as f:
-            json.dump({"proposals": proposals}, f, indent=2)
-        print(f"Generated {len(proposals)} proposals -> {args.output}")
-    else:
-        print(f"Would generate {len(proposals)} proposals")
-        for p in proposals:
-            print(f"  - {p['title']}")
-
-
-if __name__ == "__main__":
-    main()
--- a/scripts/sampler.py
+++ b/scripts/sampler.py
@@ -1,353 +0,0 @@
-#!/usr/bin/env python3
-"""
-sampler.py — Score and rank sessions by harvest value.
-
-With 20k+ sessions on disk, we can't harvest all at once. This script
-scores each session by how likely it is to contain valuable knowledge,
-so the harvester processes the best ones first.
-
-Scoring strategy:
-  - Recency: last 7d=3pts, last 30d=2pts, older=1pt
-  - Length: >50 messages=3pts, >20=2pts, <20=1pt
-  - Repo uniqueness: first session for a repo=5pts, otherwise=1pt
-  - Outcome: failure=3pts (most to learn), success=2pts, unknown=1pt
-  - Tool calls: >10 tool invocations=2pts (complex sessions)
-
-Usage:
-    python3 sampler.py --count 100                          # Top 100 sessions
-    python3 sampler.py --repo the-nexus --count 20          # Top 20 for a repo
-    python3 sampler.py --since 2026-04-01                   # All sessions since date
-    python3 sampler.py --count 50 --min-score 8             # Only high-value sessions
-    python3 sampler.py --count 100 --output sample.json     # Save to file
-"""
-
-import argparse
-import json
-import os
-import sys
-import time
-from datetime import datetime, timedelta, timezone
-from pathlib import Path
-from typing import Optional
-
-
-# --- Fast session scanning (no full parse) ---
-
-def scan_session_fast(path: str) -> dict:
-    """Extract scoring metadata from a session without parsing the full JSONL.
-    
-    Reads only: first line, last ~20 lines, and line count. This processes
-    20k sessions in seconds instead of minutes.
-    """
-    meta = {
-        'path': path,
-        'message_count': 0,
-        'has_tool_calls': False,
-        'tool_call_count': 0,
-        'first_timestamp': '',
-        'last_timestamp': '',
-        'is_failure': False,
-        'repos_mentioned': set(),
-        'first_role': '',
-        'last_content_preview': '',
-    }
-    
-    try:
-        file_size = os.path.getsize(path)
-        if file_size == 0:
-            return meta
-        
-        with open(path, 'r', encoding='utf-8', errors='replace') as f:
-            # Read first line for timestamp + role
-            first_line = f.readline().strip()
-            if first_line:
-                try:
-                    first_msg = json.loads(first_line)
-                    meta['first_timestamp'] = first_msg.get('timestamp', '')
-                    meta['first_role'] = first_msg.get('role', '')
-                except json.JSONDecodeError:
-                    pass
-            
-            # Fast line count + collect tail lines
-            # For the tail, seek to near end of file
-            tail_lines = []
-            line_count = 1  # already read first
-            
-            if file_size > 8192:
-                # Seek to last 8KB for tail sampling
-                f.seek(max(0, file_size - 8192))
-                f.readline()  # skip partial line
-                for line in f:
-                    line = line.strip()
-                    if line:
-                        tail_lines.append(line)
-                    line_count += 1
-                # We lost the exact count for big files — estimate from file size
-                # Average JSONL line is ~500 bytes
-                if line_count < 100:
-                    line_count = max(line_count, file_size // 500)
-            else:
-                # Small file — read all
-                for line in f:
-                    line = line.strip()
-                    if line:
-                        tail_lines.append(line)
-                    line_count += 1
-            
-            meta['message_count'] = line_count
-            
-            # Parse tail lines for outcome, tool calls, repos
-            for line in tail_lines[-30:]:  # last 30 non-empty lines
-                try:
-                    msg = json.loads(line)
-                    
-                    # Track last timestamp
-                    ts = msg.get('timestamp', '')
-                    if ts:
-                        meta['last_timestamp'] = ts
-                    
-                    # Count tool calls
-                    if msg.get('tool_calls'):
-                        meta['has_tool_calls'] = True
-                        meta['tool_call_count'] += len(msg['tool_calls'])
-                    
-                    # Detect failure signals in content
-                    content = ''
-                    if isinstance(msg.get('content'), str):
-                        content = msg['content'].lower()
-                    elif isinstance(msg.get('content'), list):
-                        for part in msg['content']:
-                            if isinstance(part, dict) and part.get('type') == 'text':
-                                content += part.get('text', '').lower()
-                    
-                    if content:
-                        meta['last_content_preview'] = content[:200]
-                        failure_signals = ['error', 'failed', 'cannot', 'unable', 
-                                          'exception', 'traceback', 'rejected', 'denied']
-                        if any(sig in content for sig in failure_signals):
-                            meta['is_failure'] = True
-                    
-                    # Extract repo references from tool call arguments
-                    if msg.get('tool_calls'):
-                        for tc in msg['tool_calls']:
-                            args = tc.get('function', {}).get('arguments', '')
-                            if isinstance(args, str):
-                                # Look for repo patterns
-                                for pattern in ['Timmy_Foundation/', 'Rockachopa/', 'compounding-intelligence', 'the-nexus', 'timmy-home', 'hermes-agent', 'the-beacon', 'the-door']:
-                                    if pattern in args:
-                                        repo = pattern.rstrip('/')
-                                        meta['repos_mentioned'].add(repo)
-                
-                except json.JSONDecodeError:
-                    continue
-    
-    except (IOError, OSError):
-        pass
-    
-    meta['repos_mentioned'] = list(meta['repos_mentioned'])
-    return meta
-
-
-# --- Filename timestamp parsing ---
-
-def parse_session_timestamp(filename: str) -> Optional[datetime]:
-    """Parse timestamp from session filename.
-    
-    Common formats:
-      session_20260413_123456_hash.jsonl
-      20260413_123456_hash.jsonl
-    """
-    stem = Path(filename).stem
-    parts = stem.split('_')
-    
-    # Try session_YYYYMMDD_HHMMSS format
-    for i, part in enumerate(parts):
-        if len(part) == 8 and part.isdigit():
-            date_part = part
-            time_part = parts[i + 1] if i + 1 < len(parts) and len(parts[i + 1]) == 6 else '000000'
-            try:
-                return datetime.strptime(f"{date_part}_{time_part}", '%Y%m%d_%H%M%S').replace(tzinfo=timezone.utc)
-            except ValueError:
-                continue
-    
-    # Fallback: use file modification time
-    return None
-
-
-# --- Scoring ---
-
-def score_session(meta: dict, now: datetime, seen_repos: set) -> tuple[int, dict]:
-    """Score a session for harvest value. Returns (score, breakdown)."""
-    score = 0
-    breakdown = {}
-    
-    # 1. Recency
-    ts = parse_session_timestamp(os.path.basename(meta['path']))
-    if ts is None:
-        # Fallback to mtime
-        try:
-            ts = datetime.fromtimestamp(os.path.getmtime(meta['path']), tz=timezone.utc)
-        except OSError:
-            ts = now - timedelta(days=365)
-    
-    age_days = (now - ts).days
-    if age_days <= 7:
-        recency = 3
-    elif age_days <= 30:
-        recency = 2
-    else:
-        recency = 1
-    score += recency
-    breakdown['recency'] = recency
-    
-    # 2. Length
-    count = meta['message_count']
-    if count > 50:
-        length = 3
-    elif count > 20:
-        length = 2
-    else:
-        length = 1
-    score += length
-    breakdown['length'] = length
-    
-    # 3. Repo uniqueness (first session mentioning a repo gets bonus)
-    repo_score = 0
-    for repo in meta.get('repos_mentioned', []):
-        if repo not in seen_repos:
-            seen_repos.add(repo)
-            repo_score = max(repo_score, 5)
-        else:
-            repo_score = max(repo_score, 1)
-    score += repo_score
-    breakdown['repo_unique'] = repo_score
-    
-    # 4. Outcome
-    if meta.get('is_failure'):
-        outcome = 3
-    elif meta.get('last_content_preview', '').strip():
-        outcome = 2  # has some content = likely completed
-    else:
-        outcome = 1
-    score += outcome
-    breakdown['outcome'] = outcome
-    
-    # 5. Tool calls
-    if meta.get('tool_call_count', 0) > 10:
-        tool = 2
-    else:
-        tool = 0
-    score += tool
-    breakdown['tool_calls'] = tool
-    
-    return score, breakdown
-
-
-# --- Main ---
-
-def main():
-    parser = argparse.ArgumentParser(description="Score and rank sessions for harvesting")
-    parser.add_argument('--sessions-dir', default=os.path.expanduser('~/.hermes/sessions'),
-                        help='Directory containing session files')
-    parser.add_argument('--count', type=int, default=100, help='Number of top sessions to return')
-    parser.add_argument('--repo', default='', help='Filter to sessions mentioning this repo')
-    parser.add_argument('--since', default='', help='Only score sessions after this date (YYYY-MM-DD)')
-    parser.add_argument('--min-score', type=int, default=0, help='Minimum score threshold')
-    parser.add_argument('--output', default='', help='Output file (JSON). Default: stdout')
-    parser.add_argument('--format', choices=['json', 'paths', 'table'], default='table',
-                        help='Output format: json (full), paths (one per line), table (human)')
-    parser.add_argument('--top-percent', type=float, default=0, help='Return top N%% instead of --count')
-    
-    args = parser.parse_args()
-    
-    sessions_dir = Path(args.sessions_dir)
-    if not sessions_dir.is_dir():
-        print(f"ERROR: Sessions directory not found: {sessions_dir}", file=sys.stderr)
-        sys.exit(1)
-    
-    # Find all JSONL files
-    print(f"Scanning {sessions_dir}...", file=sys.stderr)
-    t0 = time.time()
-    
-    session_files = list(sessions_dir.glob('*.jsonl'))
-    total = len(session_files)
-    print(f"Found {total} session files", file=sys.stderr)
-    
-    # Parse since date
-    since_dt = None
-    if args.since:
-        since_dt = datetime.strptime(args.since, '%Y-%m-%d').replace(tzinfo=timezone.utc)
-    
-    # Score all sessions
-    now = datetime.now(timezone.utc)
-    seen_repos = set()  # Track repos for uniqueness scoring
-    scored = []
-    
-    for i, sf in enumerate(session_files):
-        # Date filter (fast path: check filename first)
-        if since_dt:
-            ts = parse_session_timestamp(sf.name)
-            if ts and ts < since_dt:
-                continue
-        
-        meta = scan_session_fast(str(sf))
-        
-        # Repo filter
-        if args.repo:
-            repos = meta.get('repos_mentioned', [])
-            if args.repo.lower() not in [r.lower() for r in repos]:
-                # Also check filename
-                if args.repo.lower() not in sf.name.lower():
-                    continue
-        
-        score, breakdown = score_session(meta, now, seen_repos)
-        
-        if score >= args.min_score:
-            scored.append({
-                'path': str(sf),
-                'filename': sf.name,
-                'score': score,
-                'breakdown': breakdown,
-                'message_count': meta['message_count'],
-                'repos': meta['repos_mentioned'],
-                'is_failure': meta['is_failure'],
-            })
-        
-        if (i + 1) % 5000 == 0:
-            elapsed = time.time() - t0
-            print(f"  Scanned {i + 1}/{total} ({elapsed:.1f}s)", file=sys.stderr)
-    
-    elapsed = time.time() - t0
-    print(f"Scored {len(scored)} sessions in {elapsed:.1f}s", file=sys.stderr)
-    
-    # Sort by score descending
-    scored.sort(key=lambda x: x['score'], reverse=True)
-    
-    # Apply count or percent
-    if args.top_percent > 0:
-        count = max(1, int(len(scored) * args.top_percent / 100))
-    else:
-        count = args.count
-    scored = scored[:count]
-    
-    # Output
-    if args.output:
-        with open(args.output, 'w', encoding='utf-8') as f:
-            json.dump(scored, f, indent=2)
-        print(f"Wrote {len(scored)} sessions to {args.output}", file=sys.stderr)
-    elif args.format == 'json':
-        json.dump(scored, sys.stdout, indent=2)
-    elif args.format == 'paths':
-        for s in scored:
-            print(s['path'])
-    else:  # table
-        print(f"{'SCORE':>5}  {'MSGS':>5}  {'REPOS':<25}  {'FILE'}")
-        print(f"{'-'*5}  {'-'*5}  {'-'*25}  {'-'*40}")
-        for s in scored:
-            repos = ', '.join(s['repos'][:2]) if s['repos'] else '-'
-            fail = ' FAIL' if s['is_failure'] else ''
-            print(f"{s['score']:>5}  {s['message_count']:>5}  {repos:<25}  {s['filename'][:40]}{fail}")
-
-
-if __name__ == '__main__':
-    main()
--- a/scripts/session_metadata.py
+++ b/scripts/session_metadata.py
@@ -1,276 +0,0 @@
-#!/usr/bin/env python3
-"""
-session_metadata.py - Extract structured metadata from Hermes session transcripts.
-Works alongside session_reader.py to provide higher-level session analysis.
-"""
-
-import json
-import re
-import sys
-from dataclasses import dataclass, asdict
-from datetime import datetime
-from pathlib import Path
-from typing import Dict, List, Optional, Any
-
-# Import from session_reader (the canonical reader)
-from session_reader import read_session
-
-
-@dataclass
-class SessionSummary:
-    """Structured summary of a Hermes session transcript."""
-    session_id: str
-    model: str
-    repo: str
-    outcome: str
-    message_count: int
-    tool_calls: int
-    duration_estimate: str
-    key_actions: List[str]
-    errors_encountered: List[str]
-    start_time: Optional[str] = None
-    end_time: Optional[str] = None
-    total_tokens_estimate: int = 0
-    user_messages: int = 0
-    assistant_messages: int = 0
-    tool_outputs: int = 0
-
-
-def extract_session_metadata(file_path: str) -> SessionSummary:
-    """
-    Extract structured metadata from a Hermes session JSONL transcript.
-    Uses session_reader.read_session() for file reading.
-    """
-    session_id = Path(file_path).stem
-    messages = []
-    model = "unknown"
-    repo = "unknown"
-    tool_calls_count = 0
-    key_actions = []
-    errors = []
-    start_time = None
-    end_time = None
-    total_tokens = 0
-    
-    # Common repo patterns to look for
-    repo_patterns = [
-        r"(?:the-nexus|compounding-intelligence|timmy-config|hermes-agent)",
-        r"(?:forge\.alexanderwhitestone\.com/([^/]+/[^/\\s]+))",
-        r"(?:github\.com/([^/]+/[^/\\s]+))",
-        r"(?:Timmy_Foundation/([^/\\s]+))",
-    ]
-    
-    try:
-        # Use the canonical reader from session_reader.py
-        messages = read_session(file_path)
-    except FileNotFoundError:
-        return SessionSummary(
-            session_id=session_id,
-            model="unknown",
-            repo="unknown",
-            outcome="failure",
-            message_count=0,
-            tool_calls=0,
-            duration_estimate="0m",
-            key_actions=[],
-            errors_encountered=[f"File not found: {file_path}"]
-        )
-    
-    # Process messages for metadata
-    for entry in messages:
-        # Extract model from assistant messages
-        if entry.get("role") == "assistant" and entry.get("model"):
-            model = entry["model"]
-        
-        # Extract timestamps
-        if entry.get("timestamp"):
-            ts = entry["timestamp"]
-            if start_time is None:
-                start_time = ts
-            end_time = ts
-        
-        # Count tool calls
-        if entry.get("tool_calls"):
-            tool_calls_count += len(entry["tool_calls"])
-            for tc in entry["tool_calls"]:
-                if tc.get("function", {}).get("name"):
-                    action = f"{tc['function']['name']}"
-                    if action not in key_actions:
-                        key_actions.append(action)
-        
-        # Estimate tokens from content length
-        content = entry.get("content", "")
-        if isinstance(content, str):
-            total_tokens += len(content.split())
-        elif isinstance(content, list):
-            for item in content:
-                if isinstance(item, dict) and "text" in item:
-                    total_tokens += len(item["text"].split())
-        
-        # Look for repo mentions in content
-        if entry.get("content"):
-            content_str = str(entry["content"])
-            for pattern in repo_patterns:
-                match = re.search(pattern, content_str, re.IGNORECASE)
-                if match:
-                    if match.groups():
-                        repo = match.group(1)
-                    else:
-                        repo = match.group(0)
-                    break
-        
-        # Look for error messages
-        if entry.get("role") == "tool" and entry.get("is_error"):
-            error_msg = entry.get("content", "Unknown error")
-            if isinstance(error_msg, str) and len(error_msg) < 200:
-                errors.append(error_msg[:200])
-    
-    # Count message types
-    user_messages = sum(1 for m in messages if m.get("role") == "user")
-    assistant_messages = sum(1 for m in messages if m.get("role") == "assistant")
-    tool_outputs = sum(1 for m in messages if m.get("role") == "tool")
-    
-    # Calculate duration estimate
-    duration_estimate = "unknown"
-    if start_time and end_time:
-        try:
-            # Try to parse timestamps
-            start_dt = None
-            end_dt = None
-            
-            # Handle various timestamp formats
-            for fmt in ["%Y-%m-%dT%H:%M:%S.%fZ", "%Y-%m-%dT%H:%M:%SZ", "%Y-%m-%d %H:%M:%S"]:
-                try:
-                    if start_dt is None:
-                        start_dt = datetime.strptime(start_time, fmt)
-                    if end_dt is None:
-                        end_dt = datetime.strptime(end_time, fmt)
-                except ValueError:
-                    continue
-            
-            if start_dt and end_dt:
-                duration = end_dt - start_dt
-                minutes = duration.total_seconds() / 60
-                duration_estimate = f"{minutes:.0f}m"
-        except Exception:
-            pass
-    
-    # Classify outcome
-    outcome = "unknown"
-    if errors:
-        # Check if any errors are fatal
-        fatal_errors = any("405" in e or "permission" in e.lower() or "authentication" in e.lower() 
-                          for e in errors)
-        if fatal_errors:
-            outcome = "failure"
-        else:
-            outcome = "partial"
-    elif messages:
-        # Check last message for success indicators
-        last_msg = messages[-1]
-        if last_msg.get("role") == "assistant":
-            content = last_msg.get("content", "")
-            if isinstance(content, str):
-                success_indicators = ["done", "completed", "success", "merged", "pushed"]
-                if any(indicator in content.lower() for indicator in success_indicators):
-                    outcome = "success"
-                else:
-                    outcome = "unknown"
-    
-    # Deduplicate key actions (keep unique, limit to 10)
-    unique_actions = []
-    for action in key_actions:
-        if action not in unique_actions:
-            unique_actions.append(action)
-        if len(unique_actions) >= 10:
-            break
-    
-    # Deduplicate errors (keep unique, limit to 5)
-    unique_errors = []
-    for error in errors:
-        if error not in unique_errors:
-            unique_errors.append(error)
-        if len(unique_errors) >= 5:
-            break
-    
-    return SessionSummary(
-        session_id=session_id,
-        model=model,
-        repo=repo,
-        outcome=outcome,
-        message_count=len(messages),
-        tool_calls=tool_calls_count,
-        duration_estimate=duration_estimate,
-        key_actions=unique_actions,
-        errors_encountered=unique_errors,
-        start_time=start_time,
-        end_time=end_time,
-        total_tokens_estimate=total_tokens,
-        user_messages=user_messages,
-        assistant_messages=assistant_messages,
-        tool_outputs=tool_outputs
-    )
-
-
-def process_session_directory(directory_path: str, output_file: Optional[str] = None) -> List[SessionSummary]:
-    """
-    Process all JSONL files in a directory.
-    """
-    directory = Path(directory_path)
-    if not directory.exists():
-        print(f"Error: Directory {directory_path} does not exist", file=sys.stderr)
-        return []
-    
-    jsonl_files = list(directory.glob("*.jsonl"))
-    if not jsonl_files:
-        print(f"Warning: No JSONL files found in {directory_path}", file=sys.stderr)
-        return []
-    
-    summaries = []
-    for jsonl_file in sorted(jsonl_files):
-        print(f"Processing {jsonl_file.name}...", file=sys.stderr)
-        summary = extract_session_metadata(str(jsonl_file))
-        summaries.append(summary)
-    
-    if output_file:
-        with open(output_file, 'w', encoding='utf-8') as f:
-            json.dump([asdict(s) for s in summaries], f, indent=2)
-        print(f"Wrote {len(summaries)} summaries to {output_file}", file=sys.stderr)
-    
-    return summaries
-
-
-def main():
-    """CLI entry point."""
-    import argparse
-    
-    parser = argparse.ArgumentParser(description="Extract metadata from Hermes session JSONL transcripts")
-    parser.add_argument("path", help="Path to JSONL file or directory of session files")
-    parser.add_argument("-o", "--output", help="Output JSON file (default: stdout)")
-    parser.add_argument("-v", "--verbose", action="store_true", help="Verbose output")
-    
-    args = parser.parse_args()
-    
-    path = Path(args.path)
-    
-    if path.is_file():
-        summary = extract_session_metadata(str(path))
-        if args.output:
-            with open(args.output, 'w') as f:
-                json.dump(asdict(summary), f, indent=2)
-            print(f"Wrote summary to {args.output}", file=sys.stderr)
-        else:
-            print(json.dumps(asdict(summary), indent=2))
-    
-    elif path.is_dir():
-        summaries = process_session_directory(str(path), args.output)
-        if not args.output:
-            print(json.dumps([asdict(s) for s in summaries], indent=2))
-    
-    else:
-        print(f"Error: {args.path} is not a file or directory", file=sys.stderr)
-        sys.exit(1)
-
-
-if __name__ == "__main__":
-    main()
--- a/scripts/session_pair_harvester.py
+++ b/scripts/session_pair_harvester.py
@@ -1,234 +0,0 @@
-#!/usr/bin/env python3
-"""
-Session Transcript → Training Pair Harvester
-
-Scans Hermes session JSONL files for Q&A patterns and extracts
-terse→rich training pairs. Outputs JSONL matching the timmy-config
-training pairs spec.
-
-Usage:
-  python3 scripts/session_pair_harvester.py ~/.hermes/sessions/
-  python3 scripts/session_pair_harvester.py session.jsonl --output pairs.jsonl
-  python3 scripts/session_pair_harvester.py --dir ~/.hermes/sessions/ --min-ratio 2.0
-
-Output format:
-  {"terse": "user short prompt", "rich": "ai detailed response", "source": "session_id", "model": "..."}
-"""
-
-import argparse
-import hashlib
-import json
-import sys
-from pathlib import Path
-from typing import Optional
-
-
-def compute_hash(text: str) -> str:
-    """Content hash for deduplication."""
-    return hashlib.sha256(text.encode()).hexdigest()[:16]
-
-
-def extract_pairs_from_session(session_data: dict, min_ratio: float = 1.5,
-                                min_response_words: int = 20) -> list:
-    """Extract terse→rich pairs from a single session object."""
-    pairs = []
-    conversations = session_data.get("conversations", [])
-    session_id = session_data.get("id", "unknown")
-    model = session_data.get("model", "unknown")
-
-    seen_hashes = set()
-
-    for i, msg in enumerate(conversations):
-        # Look for assistant/gpt responses
-        if msg.get("from") not in ("gpt", "assistant"):
-            continue
-
-        response_text = msg.get("value", "")
-        if not response_text or len(response_text.split()) < min_response_words:
-            continue
-
-        # Find the preceding human message
-        prompt_text = ""
-        for j in range(i - 1, -1, -1):
-            if conversations[j].get("from") == "human":
-                prompt_text = conversations[j].get("value", "")
-                break
-
-        if not prompt_text:
-            continue
-
-        # Filter: skip tool results, system messages embedded as human
-        if prompt_text.startswith("{") and "output" in prompt_text[:100]:
-            continue  # likely a tool result
-        if prompt_text.startswith("# SOUL.md") or prompt_text.startswith("You are"):
-            continue  # system prompt leak
-
-        # Quality filters
-        prompt_words = len(prompt_text.split())
-        response_words = len(response_text.split())
-
-        # Must have meaningful length ratio
-        if prompt_words == 0 or response_words == 0:
-            continue
-        ratio = response_words / prompt_words
-        if ratio < min_ratio:
-            continue
-
-        # Skip responses that are mostly code
-        code_blocks = response_text.count("```")
-        if code_blocks >= 4 and len(response_text.replace("```", "").strip()) < 50:
-            continue
-
-        # Skip responses with tool call artifacts
-        if "tool_call" in response_text[:100] or "function_call" in response_text[:100]:
-            continue
-
-        # Deduplicate by content hash
-        content_hash = compute_hash(prompt_text + response_text[:200])
-        if content_hash in seen_hashes:
-            continue
-        seen_hashes.add(content_hash)
-
-        # Clean up response: remove markdown headers if too many
-        clean_response = response_text
-
-        pairs.append({
-            "terse": prompt_text.strip(),
-            "rich": clean_response.strip(),
-            "source": session_id,
-            "model": model,
-            "prompt_words": prompt_words,
-            "response_words": response_words,
-            "ratio": round(ratio, 2),
-        })
-
-    return pairs
-
-
-def extract_from_jsonl_file(filepath: str, **kwargs) -> list:
-    """Extract pairs from a session JSONL file."""
-    pairs = []
-    path = Path(filepath)
-
-    if not path.exists():
-        print(f"Warning: {filepath} not found", file=sys.stderr)
-        return pairs
-
-    content = path.read_text()
-    lines = content.strip().split("\n")
-
-    for line in lines:
-        line = line.strip()
-        if not line:
-            continue
-        try:
-            session = json.loads(line)
-        except json.JSONDecodeError:
-            continue
-
-        session_pairs = extract_pairs_from_session(session, **kwargs)
-        pairs.extend(session_pairs)
-
-    return pairs
-
-
-def deduplicate_pairs(pairs: list) -> list:
-    """Remove duplicate pairs across files."""
-    seen = set()
-    unique = []
-    for pair in pairs:
-        key = compute_hash(pair["terse"] + pair["rich"][:200])
-        if key not in seen:
-            seen.add(key)
-            unique.append(pair)
-    return unique
-
-
-def main():
-    parser = argparse.ArgumentParser(description="Harvest training pairs from session transcripts")
-    parser.add_argument("input", nargs="?", help="Session JSONL file or directory")
-    parser.add_argument("--dir", "-d", help="Directory to scan for session files")
-    parser.add_argument("--output", "-o", default="harvested_pairs.jsonl", help="Output file")
-    parser.add_argument("--min-ratio", type=float, default=1.5, help="Min response/prompt word ratio")
-    parser.add_argument("--min-words", type=int, default=20, help="Min response word count")
-    parser.add_argument("--dry-run", action="store_true", help="Print stats without writing")
-    args = parser.parse_args()
-
-    all_pairs = []
-    files_scanned = 0
-
-    scan_dir = args.dir or args.input
-    if not scan_dir:
-        parser.print_help()
-        sys.exit(1)
-
-    scan_path = Path(scan_dir)
-    if scan_path.is_dir():
-        jsonl_files = sorted(scan_path.rglob("*.jsonl"))
-        print(f"Scanning {len(jsonl_files)} files in {scan_dir}...", file=sys.stderr)
-        for fpath in jsonl_files:
-            pairs = extract_from_jsonl_file(
-                str(fpath),
-                min_ratio=args.min_ratio,
-                min_response_words=args.min_words
-            )
-            all_pairs.extend(pairs)
-            files_scanned += 1
-    else:
-        pairs = extract_from_jsonl_file(
-            str(scan_path),
-            min_ratio=args.min_ratio,
-            min_response_words=args.min_words
-        )
-        all_pairs.extend(pairs)
-        files_scanned = 1
-
-    # Deduplicate
-    unique_pairs = deduplicate_pairs(all_pairs)
-
-    # Stats
-    if unique_pairs:
-        avg_prompt = sum(p["prompt_words"] for p in unique_pairs) / len(unique_pairs)
-        avg_response = sum(p["response_words"] for p in unique_pairs) / len(unique_pairs)
-        avg_ratio = sum(p["ratio"] for p in unique_pairs) / len(unique_pairs)
-    else:
-        avg_prompt = avg_response = avg_ratio = 0
-
-    stats = {
-        "files_scanned": files_scanned,
-        "raw_pairs": len(all_pairs),
-        "unique_pairs": len(unique_pairs),
-        "duplicates_removed": len(all_pairs) - len(unique_pairs),
-        "avg_prompt_words": round(avg_prompt, 1),
-        "avg_response_words": round(avg_response, 1),
-        "avg_ratio": round(avg_ratio, 2),
-    }
-
-    print(json.dumps(stats, indent=2), file=sys.stderr)
-
-    if args.dry_run:
-        # Print sample pairs
-        for pair in unique_pairs[:3]:
-            print(f"\n--- Source: {pair['source']} (ratio: {pair['ratio']}) ---", file=sys.stderr)
-            print(f"TERSE: {pair['terse'][:100]}...", file=sys.stderr)
-            print(f"RICH: {pair['rich'][:150]}...", file=sys.stderr)
-        return
-
-    # Write output
-    output_path = Path(args.output)
-    with open(output_path, "w") as f:
-        for pair in unique_pairs:
-            # Strip internal fields for output
-            output = {
-                "terse": pair["terse"],
-                "rich": pair["rich"],
-                "source": pair["source"],
-                "model": pair["model"],
-            }
-            f.write(json.dumps(output) + "\n")
-
-    print(f"\nWrote {len(unique_pairs)} pairs to {output_path}", file=sys.stderr)
-
-
-if __name__ == "__main__":
-    main()
--- a/scripts/test_automation_opportunity_finder.py
+++ b/scripts/test_automation_opportunity_finder.py
@@ -1,147 +0,0 @@
-#!/usr/bin/env python3
-"""Tests for scripts/automation_opportunity_finder.py — 8 tests."""
-
-import json
-import os
-import sys
-import tempfile
-
-sys.path.insert(0, os.path.dirname(__file__) or ".")
-import importlib.util
-spec = importlib.util.spec_from_file_location(
-    "aof",
-    os.path.join(os.path.dirname(__file__) or ".", "automation_opportunity_finder.py"),
-)
-mod = importlib.util.module_from_spec(spec)
-spec.loader.exec_module(mod)
-
-
-def test_analyze_cron_jobs_no_file():
-    """Returns empty list when no cron jobs file exists."""
-    with tempfile.TemporaryDirectory() as tmpdir:
-        result = mod.analyze_cron_jobs(tmpdir)
-        assert result == []
-        print("PASS: test_analyze_cron_jobs_no_file")
-
-
-def test_analyze_cron_jobs_disabled():
-    """Detects disabled cron jobs."""
-    with tempfile.TemporaryDirectory() as tmpdir:
-        cron_dir = os.path.join(tmpdir, "cron")
-        os.makedirs(cron_dir)
-        jobs = [
-            {"id": "j1", "name": "backup", "enabled": False, "schedule": "0 * * * *"},
-            {"id": "j2", "name": "health", "enabled": True, "schedule": "*/5 * * * *"},
-        ]
-        with open(os.path.join(cron_dir, "jobs.json"), "w") as f:
-            json.dump(jobs, f)
-        result = mod.analyze_cron_jobs(tmpdir)
-        assert any(p["category"] == "cron_disabled" for p in result)
-        print("PASS: test_analyze_cron_jobs_disabled")
-
-
-def test_analyze_cron_jobs_errors():
-    """Detects cron jobs with error status."""
-    with tempfile.TemporaryDirectory() as tmpdir:
-        cron_dir = os.path.join(tmpdir, "cron")
-        os.makedirs(cron_dir)
-        jobs = [
-            {"id": "j1", "name": "broken", "enabled": True, "last_status": "error", "schedule": "0 * * * *"},
-        ]
-        with open(os.path.join(cron_dir, "jobs.json"), "w") as f:
-            json.dump(jobs, f)
-        result = mod.analyze_cron_jobs(tmpdir)
-        assert any(p["category"] == "cron_errors" for p in result)
-        print("PASS: test_analyze_cron_jobs_errors")
-
-
-def test_analyze_documents_finds_todos():
-    """Detects TODO markers in documents."""
-    with tempfile.TemporaryDirectory() as tmpdir:
-        docs_dir = os.path.join(tmpdir, "docs")
-        os.makedirs(docs_dir)
-        for i in range(3):
-            with open(os.path.join(docs_dir, f"guide{i}.md"), "w") as f:
-                f.write(f"# Guide {i}\n\nTODO: Automate this step\n")
-        result = mod.analyze_documents([tmpdir])
-        assert any(p["category"] == "manual_todo" for p in result)
-        todo_proposals = [p for p in result if p["category"] == "manual_todo"]
-        assert todo_proposals[0]["details"].__len__() == 3
-        print("PASS: test_analyze_documents_finds_todos")
-
-
-def test_analyze_scripts_repeated_commands():
-    """Detects repeated shell commands across scripts."""
-    with tempfile.TemporaryDirectory() as tmpdir:
-        scripts_dir = os.path.join(tmpdir, "scripts")
-        os.makedirs(scripts_dir)
-        repeated_cmd = "docker restart myapp"
-        for i in range(4):
-            with open(os.path.join(scripts_dir, f"deploy{i}.sh"), "w") as f:
-                f.write(f"#!/bin/bash\n{repeated_cmd}\n")
-        result = mod.analyze_scripts([tmpdir])
-        assert any(p["category"] == "repeated_command" for p in result)
-        print("PASS: test_analyze_scripts_repeated_commands")
-
-
-def test_analyze_session_transcripts():
-    """Detects repeated tool-call sequences."""
-    with tempfile.TemporaryDirectory() as tmpdir:
-        sessions_dir = os.path.join(tmpdir, "sessions")
-        os.makedirs(sessions_dir)
-        for i in range(4):
-            with open(os.path.join(sessions_dir, f"session{i}.jsonl"), "w") as f:
-                f.write(json.dumps({"role": "user", "content": f"task {i}"}) + "\n")
-                f.write(json.dumps({
-                    "role": "assistant",
-                    "content": "working",
-                    "tool_calls": [
-                        {"function": {"name": "read_file"}},
-                        {"function": {"name": "write_file"}},
-                    ]
-                }) + "\n")
-        result = mod.analyze_session_transcripts([sessions_dir])
-        assert any(p["category"] == "tool_sequence" for p in result)
-        seq_proposals = [p for p in result if p["category"] == "tool_sequence"]
-        assert any("read_file" in p["title"] and "write_file" in p["title"] for p in seq_proposals)
-        print("PASS: test_analyze_session_transcripts")
-
-
-def test_deduplicate_proposals():
-    """Deduplicates proposals with similar titles."""
-    proposals = [
-        {"title": "TODO found 3 times", "category": "manual_todo", "confidence": 0.7, "impact": "medium", "description": "x", "sources": []},
-        {"title": "TODO found 3 times", "category": "manual_todo", "confidence": 0.7, "impact": "medium", "description": "x", "sources": []},
-        {"title": "FIXME found 5 times", "category": "manual_fixme", "confidence": 0.8, "impact": "medium", "description": "y", "sources": []},
-    ]
-    result = mod.deduplicate_proposals(proposals)
-    assert len(result) == 2
-    print("PASS: test_deduplicate_proposals")
-
-
-def test_rank_proposals():
-    """Ranks proposals by impact * confidence."""
-    proposals = [
-        {"title": "low", "category": "x", "confidence": 0.9, "impact": "low", "description": "", "sources": []},
-        {"title": "high", "category": "x", "confidence": 0.8, "impact": "high", "description": "", "sources": []},
-        {"title": "med", "category": "x", "confidence": 0.7, "impact": "medium", "description": "", "sources": []},
-    ]
-    result = mod.rank_proposals(proposals)
-    assert result[0]["title"] == "high"
-    assert result[-1]["title"] == "low"
-    print("PASS: test_rank_proposals")
-
-
-if __name__ == "__main__":
-    tests = [v for k, v in globals().items() if k.startswith("test_")]
-    passed = 0
-    failed = 0
-    for t in tests:
-        try:
-            t()
-            passed += 1
-        except Exception as e:
-            print(f"FAIL: {t.__name__}: {e}")
-            failed += 1
-    print(f"\n{passed}/{passed+failed} tests passed")
-    sys.exit(1 if failed else 0)
--- a/scripts/test_bootstrapper.py
+++ b/scripts/test_bootstrapper.py
@@ -1,239 +0,0 @@
-#!/usr/bin/env python3
-"""
-Tests for bootstrapper.py — context assembly from knowledge store.
-"""
-
-import json
-import sys
-import tempfile
-from pathlib import Path
-
-# Add scripts dir to path for import
-sys.path.insert(0, str(Path(__file__).resolve().parent))
-
-from bootstrapper import (
-    build_bootstrap_context,
-    estimate_tokens,
-    filter_facts,
-    load_index,
-    sort_facts,
-    truncate_to_tokens,
-)
-
-
-def make_index(facts: list[dict], tmp_dir: Path) -> Path:
-    """Create a temporary index.json with given facts."""
-    index = {
-        "version": 1,
-        "last_updated": "2026-04-13T20:00:00Z",
-        "total_facts": len(facts),
-        "facts": facts,
-    }
-    path = tmp_dir / "index.json"
-    with open(path, "w") as f:
-        json.dump(index, f)
-    return path
-
-
-def test_empty_index():
-    """Empty knowledge store produces graceful output."""
-    with tempfile.TemporaryDirectory() as tmp:
-        tmp_dir = Path(tmp)
-        index_path = make_index([], tmp_dir)
-
-        # Create empty knowledge dirs
-        for sub in ["repos", "agents", "global"]:
-            (tmp_dir / sub).mkdir(exist_ok=True)
-
-        context = build_bootstrap_context(
-            repo="the-nexus", index_path=index_path
-        )
-        assert "No relevant knowledge found" in context
-        assert "Starting fresh" in context
-        print("PASS: empty_index")
-
-
-def test_filter_by_repo():
-    """Filter facts by repository."""
-    facts = [
-        {"fact": "A", "category": "fact", "repo": "the-nexus", "confidence": 0.9},
-        {"fact": "B", "category": "fact", "repo": "fleet-ops", "confidence": 0.8},
-        {"fact": "C", "category": "fact", "repo": "global", "confidence": 0.7},
-    ]
-    filtered = filter_facts(facts, repo="the-nexus", include_global=True)
-    texts = [f["fact"] for f in filtered]
-    assert "A" in texts
-    assert "B" not in texts
-    assert "C" in texts
-    print("PASS: filter_by_repo")
-
-
-def test_filter_by_agent():
-    """Filter facts by agent type."""
-    facts = [
-        {"fact": "A", "category": "pattern", "repo": "global", "agent": "mimo-sprint", "confidence": 0.8},
-        {"fact": "B", "category": "pattern", "repo": "global", "agent": "groq-fast", "confidence": 0.7},
-        {"fact": "C", "category": "fact", "repo": "global", "confidence": 0.9},
-    ]
-    filtered = filter_facts(facts, agent="mimo-sprint", include_global=True)
-    texts = [f["fact"] for f in filtered]
-    assert "A" in texts
-    assert "B" not in texts
-    assert "C" in texts  # global, no agent restriction
-    print("PASS: filter_by_agent")
-
-
-def test_no_global_flag():
-    """Excluding global facts works."""
-    facts = [
-        {"fact": "A", "category": "fact", "repo": "the-nexus", "confidence": 0.9},
-        {"fact": "B", "category": "fact", "repo": "global", "confidence": 0.8},
-    ]
-    filtered = filter_facts(facts, repo="the-nexus", include_global=False)
-    texts = [f["fact"] for f in filtered]
-    assert "A" in texts
-    assert "B" not in texts
-    print("PASS: no_global_flag")
-
-
-def test_sort_by_confidence():
-    """Facts sort by confidence descending."""
-    facts = [
-        {"fact": "low", "category": "fact", "repo": "global", "confidence": 0.3},
-        {"fact": "high", "category": "fact", "repo": "global", "confidence": 0.95},
-        {"fact": "mid", "category": "fact", "repo": "global", "confidence": 0.7},
-    ]
-    sorted_f = sort_facts(facts)
-    assert sorted_f[0]["fact"] == "high"
-    assert sorted_f[1]["fact"] == "mid"
-    assert sorted_f[2]["fact"] == "low"
-    print("PASS: sort_by_confidence")
-
-
-def test_sort_pitfalls_first():
-    """Pitfalls sort before facts at same confidence."""
-    facts = [
-        {"fact": "regular fact", "category": "fact", "repo": "global", "confidence": 0.8},
-        {"fact": "danger pitfall", "category": "pitfall", "repo": "global", "confidence": 0.8},
-    ]
-    sorted_f = sort_facts(facts)
-    assert sorted_f[0]["category"] == "pitfall"
-    print("PASS: sort_pitfalls_first")
-
-
-def test_truncate_to_tokens():
-    """Truncation cuts at line boundary."""
-    text = "line1\nline2\nline3\nline4\nline5\n"
-    truncated = truncate_to_tokens(text, max_tokens=2)  # ~8 chars
-    assert "line1" in truncated
-    assert "truncated" in truncated.lower()
-    print("PASS: truncate_to_tokens")
-
-
-def test_estimate_tokens():
-    """Token estimation is reasonable."""
-    text = "a" * 400
-    tokens = estimate_tokens(text)
-    assert 90 <= tokens <= 110  # ~100 tokens
-    print("PASS: estimate_tokens")
-
-
-def test_build_full_context():
-    """Full context with facts renders correctly."""
-    facts = [
-        {"fact": "API merges fail with 405", "category": "pitfall", "repo": "the-nexus", "confidence": 0.95},
-        {"fact": "Has 50+ open PRs", "category": "fact", "repo": "the-nexus", "confidence": 0.9},
-        {"fact": "Token at ~/.config/gitea/token", "category": "tool-quirk", "repo": "global", "confidence": 0.9},
-        {"fact": "Check git remote -v first", "category": "pattern", "repo": "global", "confidence": 0.8},
-    ]
-
-    with tempfile.TemporaryDirectory() as tmp:
-        tmp_dir = Path(tmp)
-        index_path = make_index(facts, tmp_dir)
-
-        # Create knowledge dirs
-        for sub in ["repos", "agents", "global"]:
-            (tmp_dir / sub).mkdir(exist_ok=True)
-
-        context = build_bootstrap_context(
-            repo="the-nexus",
-            agent="mimo-sprint",
-            include_global=True,
-            index_path=index_path,
-        )
-
-        assert "What You Know" in context
-        assert "PITFALLS" in context
-        assert "API merges fail with 405" in context
-        assert "the-nexus" in context
-        assert "Token at" in context  # global fact included
-        print("PASS: build_full_context")
-
-
-def test_max_tokens_respected():
-    """Output respects max_tokens limit."""
-    # Generate lots of facts
-    facts = [
-        {"fact": f"Fact number {i} with some detail about things", "category": "fact", "repo": "global", "confidence": 0.8}
-        for i in range(100)
-    ]
-
-    with tempfile.TemporaryDirectory() as tmp:
-        tmp_dir = Path(tmp)
-        index_path = make_index(facts, tmp_dir)
-        for sub in ["repos", "agents", "global"]:
-            (tmp_dir / sub).mkdir(exist_ok=True)
-
-        context = build_bootstrap_context(
-            repo=None,
-            max_tokens=500,
-            index_path=index_path,
-        )
-
-        actual_tokens = estimate_tokens(context)
-        # Allow 10% overshoot since we cut at line boundaries
-        assert actual_tokens <= 550, f"Expected ~500 tokens, got {actual_tokens}"
-        print(f"PASS: max_tokens_respected (got {actual_tokens} tokens)")
-
-
-def test_missing_index_graceful():
-    """Missing index.json doesn't crash."""
-    with tempfile.TemporaryDirectory() as tmp:
-        tmp_dir = Path(tmp)
-        # Don't create index.json
-        for sub in ["repos", "agents", "global"]:
-            (tmp_dir / sub).mkdir(exist_ok=True)
-
-        fake_index = tmp_dir / "nonexistent.json"
-        context = build_bootstrap_context(repo="anything", index_path=fake_index)
-        assert "No relevant knowledge found" in context
-        print("PASS: missing_index_graceful")
-
-
-if __name__ == "__main__":
-    tests = [
-        test_empty_index,
-        test_filter_by_repo,
-        test_filter_by_agent,
-        test_no_global_flag,
-        test_sort_by_confidence,
-        test_sort_pitfalls_first,
-        test_truncate_to_tokens,
-        test_estimate_tokens,
-        test_build_full_context,
-        test_max_tokens_respected,
-        test_missing_index_graceful,
-    ]
-
-    passed = 0
-    failed = 0
-    for test in tests:
-        try:
-            test()
-            passed += 1
-        except Exception as e:
-            print(f"FAIL: {test.__name__} — {e}")
-            failed += 1
-
-    print(f"\n{passed} passed, {failed} failed")
-    sys.exit(0 if failed == 0 else 1)
--- a/scripts/test_diff_analyzer.py
+++ b/scripts/test_diff_analyzer.py
@@ -1,189 +0,0 @@
-#!/usr/bin/env python3
-"""Tests for scripts/diff_analyzer.py — 10 tests."""
-
-import sys
-import os
-sys.path.insert(0, os.path.dirname(__file__) or ".")
-
-import importlib.util
-spec = importlib.util.spec_from_file_location("da", os.path.join(os.path.dirname(__file__) or ".", "diff_analyzer.py"))
-mod = importlib.util.module_from_spec(spec)
-spec.loader.exec_module(mod)
-DiffAnalyzer = mod.DiffAnalyzer
-ChangeCategory = mod.ChangeCategory
-
-
-SAMPLE_ADD = """diff --git a/new.py b/new.py
-new file mode 100644
--- /dev/null
-+++ b/new.py
-@@ -0,0 +1,3 @@
-+def hello():
-+    print("world")
-+    return True
-"""
-
-SAMPLE_DELETE = """diff --git a/old.py b/old.py
-deleted file mode 100644
--- a/old.py
-+++ /dev/null
-@@ -1,2 +0,0 @@
-def goodbye():
-    pass
-"""
-
-SAMPLE_MODIFY = """diff --git a/app.py b/app.py
--- a/app.py
-+++ b/app.py
-@@ -1,3 +1,4 @@
- def main():
-    print("old")
-+    print("new")
-+    print("extra")
-     return 0
-"""
-
-SAMPLE_RENAME = """diff --git a/old_name.py b/new_name.py
-rename from old_name.py
-rename to new_name.py
--- a/old_name.py
-+++ b/new_name.py
-@@ -1,1 +1,1 @@
-old content
-+new content
-"""
-
-SAMPLE_MULTI = """diff --git a/a.py b/a.py
--- a/a.py
-+++ b/a.py
-@@ -1,1 +1,2 @@
- existing
-+added line
-diff --git b/b.py b/b.py
-new file mode 100644
--- /dev/null
-+++ b/b.py
-@@ -0,0 +1,1 @@
-+new file
-"""
-
-SAMPLE_BINARY = """diff --git a/img.png b/img.png
-Binary files a/img.png and b/img.png differ
-"""
-
-
-def test_empty():
-    a = DiffAnalyzer()
-    s = a.analyze("")
-    assert s.total_files_changed == 0
-    print("PASS: test_empty")
-
-def test_addition():
-    a = DiffAnalyzer()
-    s = a.analyze(SAMPLE_ADD)
-    assert s.total_files_changed == 1
-    assert s.total_added == 3
-    assert s.total_deleted == 0
-    assert s.new_files == 1
-    assert s.files[0].hunks[0].category == ChangeCategory.ADDED
-    print("PASS: test_addition")
-
-def test_deletion():
-    a = DiffAnalyzer()
-    s = a.analyze(SAMPLE_DELETE)
-    assert s.total_deleted == 2
-    assert s.deleted_files == 1
-    assert s.files[0].hunks[0].category == ChangeCategory.DELETED
-    print("PASS: test_deletion")
-
-def test_modification():
-    a = DiffAnalyzer()
-    s = a.analyze(SAMPLE_MODIFY)
-    assert s.total_added == 2
-    assert s.total_deleted == 1
-    assert s.files[0].hunks[0].category == ChangeCategory.MODIFIED
-    print("PASS: test_modification")
-
-def test_rename():
-    a = DiffAnalyzer()
-    s = a.analyze(SAMPLE_RENAME)
-    assert s.renamed_files == 1
-    assert s.files[0].old_path == "old_name.py"
-    assert s.files[0].path == "new_name.py"
-    assert s.files[0].is_renamed == True
-    print("PASS: test_rename")
-
-def test_multiple_files():
-    a = DiffAnalyzer()
-    s = a.analyze(SAMPLE_MULTI)
-    assert s.total_files_changed == 2
-    assert s.new_files == 1
-    print("PASS: test_multiple_files")
-
-def test_binary():
-    a = DiffAnalyzer()
-    s = a.analyze(SAMPLE_BINARY)
-    assert s.binary_files == 1
-    assert s.files[0].is_binary == True
-    assert len(s.files[0].hunks) == 0
-    print("PASS: test_binary")
-
-def test_to_dict():
-    a = DiffAnalyzer()
-    s = a.analyze(SAMPLE_MODIFY)
-    d = s.to_dict()
-    assert "total_files_changed" in d
-    assert "files" in d
-    assert isinstance(d["files"], list)
-    print("PASS: test_to_dict")
-
-def test_context_only():
-    diff = """diff --git a/f.py b/f.py
--- a/f.py
-+++ b/f.py
-@@ -1,3 +1,3 @@
- line1
-old
-+new
- line3
-"""
-    a = DiffAnalyzer()
-    s = a.analyze(diff)
-    # Has both added and deleted = MODIFIED
-    assert s.files[0].hunks[0].category == ChangeCategory.MODIFIED
-    print("PASS: test_context_only")
-
-def test_multi_hunk():
-    diff = """diff --git a/f.py b/f.py
--- a/f.py
-+++ b/f.py
-@@ -1,1 +1,2 @@
- existing
-+first addition
-@@ -10,1 +11,2 @@
- more
-+second addition
-"""
-    a = DiffAnalyzer()
-    s = a.analyze(diff)
-    assert s.total_hunks == 2
-    assert s.total_added == 2
-    print("PASS: test_multi_hunk")
-
-
-def run_all():
-    test_empty()
-    test_addition()
-    test_deletion()
-    test_modification()
-    test_rename()
-    test_multiple_files()
-    test_binary()
-    test_to_dict()
-    test_context_only()
-    test_multi_hunk()
-    print("\nAll 10 tests passed!")
-
-
-if __name__ == "__main__":
-    run_all()
--- a/scripts/test_gitea_issue_parser.py
+++ b/scripts/test_gitea_issue_parser.py
@@ -1,109 +0,0 @@
-#!/usr/bin/env python3
-"""Tests for scripts/gitea_issue_parser.py"""
-
-import sys
-import os
-sys.path.insert(0, os.path.dirname(__file__) or ".")
-
-# Import from sibling
-import importlib.util
-spec = importlib.util.spec_from_file_location("parser", os.path.join(os.path.dirname(__file__) or ".", "gitea_issue_parser.py"))
-mod = importlib.util.module_from_spec(spec)
-spec.loader.exec_module(mod)
-parse_issue_body = mod.parse_issue_body
-
-
-def test_basic_parsing():
-    body = """## Context
-
-This is the background info.
-
-## Acceptance Criteria
-
- [ ] First criterion
- [x] Second criterion (done)
-
-## What to build
-
-Some description.
-"""
-    result = parse_issue_body(body, title="Test (#42)", labels=["bug"])
-    assert result["title"] == "Test (#42)"
-    assert result["labels"] == ["bug"]
-    assert result["epic_ref"] == 42
-    assert len(result["criteria"]) == 2
-    assert result["criteria"][0]["text"] == "First criterion"
-    assert result["criteria"][0]["checked"] == False
-    assert result["criteria"][1]["checked"] == True
-    assert "context" in result["sections"]
-    print("PASS: test_basic_parsing")
-
-
-def test_numbered_criteria():
-    body = """## Acceptance Criteria
-
-1. First item
-2. Second item
-3. Third item
-"""
-    result = parse_issue_body(body)
-    assert len(result["criteria"]) == 3
-    assert result["criteria"][0]["text"] == "First item"
-    print("PASS: test_numbered_criteria")
-
-
-def test_epic_ref_from_body():
-    body = "Closes #123\n\nSome description."
-    result = parse_issue_body(body)
-    assert result["epic_ref"] == 123
-    print("PASS: test_epic_ref_from_body")
-
-
-def test_empty_body():
-    result = parse_issue_body("")
-    assert result["criteria"] == []
-    assert result["context"] == ""
-    assert result["sections"] == {}
-    print("PASS: test_empty_body")
-
-
-def test_no_sections():
-    body = "Just a plain issue body with no headings."
-    result = parse_issue_body(body)
-    assert result["context"] == "Just a plain issue body with no headings."
-    print("PASS: test_no_sections")
-
-
-def test_multiple_sections():
-    body = """## Problem
-
-Something is broken.
-
-## Fix
-
-Do this instead.
-
-## Notes
-
-Additional info.
-"""
-    result = parse_issue_body(body)
-    assert "problem" in result["sections"]
-    assert "fix" in result["sections"]
-    assert "notes" in result["sections"]
-    assert "Something is broken" in result["sections"]["problem"]
-    print("PASS: test_multiple_sections")
-
-
-def run_all():
-    test_basic_parsing()
-    test_numbered_criteria()
-    test_epic_ref_from_body()
-    test_empty_body()
-    test_no_sections()
-    test_multiple_sections()
-    print("\nAll 6 tests passed!")
-
-
-if __name__ == "__main__":
-    run_all()
--- a/scripts/test_harvest_prompt.py
+++ b/scripts/test_harvest_prompt.py
@@ -1,129 +1,41 @@
 #!/usr/bin/env python3
 """
-Test harness for knowledge extraction prompt.
-Validates output structure, content quality, and hallucination resistance.
-
-Usage:
-    python3 scripts/test_harvest_prompt.py                    # Run all tests
-    python3 scripts/test_harvest_prompt.py --transcript FILE  # Test against a real transcript
-    python3 scripts/test_harvest_prompt.py --validate FILE    # Validate an existing extraction JSON
+Test script for knowledge extraction prompt.
+Validates that the prompt produces consistent, structured output.
 """

 import json
 import sys
-import argparse
 from pathlib import Path

-VALID_CATEGORIES = {"fact", "pitfall", "pattern", "tool-quirk", "question"}
-REQUIRED_FIELDS = {"fact", "category", "repo", "confidence", "evidence"}
-REQUIRED_META = {"session_outcome", "tools_used", "repos_touched", "error_count", "knowledge_count"}
-
-
-def validate_knowledge_item(item, idx):
-    """Validate a single knowledge item. Returns list of errors."""
-    errors = []
-    if not isinstance(item, dict):
-        return [f"Item {idx}: not a dict"]
-    for field in REQUIRED_FIELDS:
+def validate_knowledge_item(item):
+    """Validate a single knowledge item."""
+    required_fields = ["fact", "category", "repo", "confidence"]
+    for field in required_fields:
        if field not in item:
-            errors.append(f"Item {idx}: missing field '{field}'")
-    if not isinstance(item.get("fact", ""), str) or len(item.get("fact", "").strip()) == 0:
-        errors.append(f"Item {idx}: fact must be a non-empty string")
-    if item.get("category") not in VALID_CATEGORIES:
-        errors.append(f"Item {idx}: invalid category '{item.get('category')}'")
-    if not isinstance(item.get("repo", ""), str) or len(item.get("repo", "").strip()) == 0:
-        errors.append(f"Item {idx}: repo must be a non-empty string")
-    conf = item.get("confidence")
-    if not isinstance(conf, (int, float)) or not (0.0 <= conf <= 1.0):
-        errors.append(f"Item {idx}: confidence must be a number 0.0-1.0, got {conf}")
-    if not isinstance(item.get("evidence", ""), str) or len(item.get("evidence", "").strip()) == 0:
-        errors.append(f"Item {idx}: evidence must be a non-empty string (hallucination check)")
-    return errors
+            return False, f"Missing field: {field}"
+    
+    if not isinstance(item["fact"], str) or len(item["fact"].strip()) == 0:
+        return False, "Fact must be a non-empty string"
+    
+    valid_categories = ["fact", "pitfall", "pattern", "tool-quirk", "question"]
+    if item["category"] not in valid_categories:
+        return False, f"Invalid category: {item['category']}"
+    
+    if not isinstance(item["repo"], str):
+        return False, "Repo must be a string"
+    
+    if not isinstance(item["confidence"], (int, float)):
+        return False, "Confidence must be a number"
+    
+    if not (0.0 <= item["confidence"] <= 1.0):
+        return False, "Confidence must be between 0.0 and 1.0"
+    
+    return True, "Valid"

-
-def validate_extraction(data):
-    """Validate a full extraction result. Returns (is_valid, errors, warnings)."""
-    errors = []
-    warnings = []
-
-    if not isinstance(data, dict):
-        return False, ["Root is not a JSON object"], []
-
-    if "knowledge" not in data:
-        return False, ["Missing 'knowledge' array"], []
-
-    if not isinstance(data["knowledge"], list):
-        return False, ["'knowledge' is not an array"], []
-
-    for i, item in enumerate(data["knowledge"]):
-        errors.extend(validate_knowledge_item(item, i))
-
-    # Meta block validation
-    if "meta" not in data:
-        warnings.append("Missing 'meta' block (session_outcome, tools_used, etc.)")
-    else:
-        meta = data["meta"]
-        for field in REQUIRED_META:
-            if field not in meta:
-                warnings.append(f"Meta missing field '{field}'")
-
-    # Quality checks
-    facts = data["knowledge"]
-    if len(facts) == 0:
-        warnings.append("No knowledge extracted (empty session or extraction failure)")
-
-    # Check for near-duplicate facts
-    seen_facts = set()
-    for item in facts:
-        normalized = item.get("fact", "").lower().strip()[:80]
-        if normalized in seen_facts:
-            warnings.append(f"Duplicate fact detected: '{normalized[:50]}...'")
-        seen_facts.add(normalized)
-
-    # Check confidence distribution
-    confidences = [item.get("confidence", 0) for item in facts]
-    if confidences:
-        avg_conf = sum(confidences) / len(confidences)
-        if avg_conf > 0.9:
-            warnings.append(f"Average confidence {avg_conf:.2f} is suspiciously high (possible hallucination)")
-        if avg_conf < 0.4:
-            warnings.append(f"Average confidence {avg_conf:.2f} is very low (extraction may be too cautious)")
-
-    return len(errors) == 0, errors, warnings
-
-
-def validate_transcript_coverage(data, transcript):
-    """
-    Check that extracted facts are actually supported by the transcript.
-    This is a heuristic — checks that key terms from facts appear in transcript.
-    Returns list of potential hallucinations.
-    """
-    hallucinations = []
-    transcript_lower = transcript.lower()
-
-    for item in data.get("knowledge", []):
-        fact = item.get("fact", "")
-        evidence = item.get("evidence", "")
-
-        # Check if evidence string appears in transcript
-        if evidence and evidence.lower() not in transcript_lower:
-            # Partial match — check if key terms are present
-            evidence_words = set(evidence.lower().split())
-            transcript_words = set(transcript_lower.split())
-            overlap = evidence_words & transcript_words
-            if len(overlap) < min(3, len(evidence_words) * 0.3):
-                hallucinations.append({
-                    "fact": fact[:80],
-                    "evidence": evidence[:80],
-                    "issue": "Evidence not found in transcript — possible hallucination"
-                })
-
-    return hallucinations
-
-
-# --- Test transcripts ---
-
-SAMPLE_TRANSCRIPT_SUCCESS = """
+def test_sample_transcript():
+    """Test with a sample transcript."""
+    sample_transcript = """
 User: Clone the repo at forge.alexanderwhitestone.com/Timmy_Foundation/compounding-intelligence
 Agent: Trying git clone... Error: 401 Unauthorized
 Agent: Checking for token... Found in ~/.config/gitea/token
@@ -132,253 +44,30 @@ Agent: Clone successful with token authentication
 Agent: Now examining the repo structure...
 Agent: Found knowledge/index.json with 0 facts
 Agent: Creating harvest-prompt.md in templates/
-Agent: Done. Files created: templates/harvest-prompt.md, scripts/test_harvest_prompt.py
 """
-
-SAMPLE_TRANSCRIPT_FAILURE = """
-User: Deploy the cron jobs to production
-Agent: Connecting to VPS... SSH timeout after 30s
-Agent: Retrying with verbose output... Connection refused on port 22
-Agent: Checking firewall rules... iptables shows port 22 is open
-Agent: Checking sshd config... sshd is not running
-Agent: Attempting to start sshd... systemctl start sshd failed: unit not found
-Agent: ERROR: OpenSSH server not installed on this VPS
-Agent: Cannot proceed without SSH access. Deployment blocked.
-"""
-
-SAMPLE_TRANSCRIPT_API_QUIRK = """
-User: List all open issues in hermes-agent
-Agent: Querying Gitea API... endpoint: /repos/Timmy_Foundation/hermes-agent/issues?state=open&limit=50
-Agent: Response: 0 items. That's odd.
-Agent: Trying without state filter... endpoint: /repos/Timmy_Foundation/hermes-agent/issues?limit=50
-Agent: Response: 50 items returned, but all have pull_request field. These are PRs, not issues.
-Agent: Gitea issues API returns PRs alongside issues. Must filter by presence of pull_request key.
-Agent: Also discovered: state=open parameter returns empty when used with limit=50, but works with limit=5.
-Agent: This appears to be a Gitea API quirk — pagination interacts with state filter unexpectedly.
-Agent: Workaround: fetch without state filter, then filter client-side for state=='open' and 'pull_request' not in item.
-"""
-
-
-def run_tests():
-    """Run the built-in test suite."""
-    tests_passed = 0
-    tests_failed = 0
-
-    print("=" * 60)
-    print("KNOWLEDGE EXTRACTION PROMPT — TEST SUITE")
-    print("=" * 60)
-
-    # Test 1: Prompt file exists and is under 2k tokens (~8k chars)
-    print("\n[Test 1] Prompt file size constraint")
-    prompt_path = Path("templates/harvest-prompt.md")
-    if not prompt_path.exists():
-        print("  FAIL: harvest-prompt.md not found")
-        tests_failed += 1
-    else:
-        size = prompt_path.stat().st_size
-        # Rough token estimate: ~4 chars per token
-        est_tokens = size / 4
-        print(f"  Prompt size: {size} bytes (~{est_tokens:.0f} tokens)")
-        if est_tokens > 2000:
-            print(f"  WARN: Prompt exceeds ~1500 tokens (target: ~1000)")
-        else:
-            print(f"  PASS: Within token budget")
-        tests_passed += 1
-
-    # Test 2: Validate a well-formed extraction
-    print("\n[Test 2] Valid extraction passes validation")
-    valid_extraction = {
-        "knowledge": [
-            {
-                "fact": "Gitea auth token is at ~/.config/gitea/token",
-                "category": "tool-quirk",
-                "repo": "global",
-                "confidence": 0.9,
-                "evidence": "Found in ~/.config/gitea/token"
-            },
-            {
-                "fact": "Clone fails with 401 when no token is provided",
-                "category": "pitfall",
-                "repo": "compounding-intelligence",
-                "confidence": 0.9,
-                "evidence": "Error: 401 Unauthorized"
-            }
-        ],
-        "meta": {
-            "session_outcome": "success",
-            "tools_used": ["git"],
-            "repos_touched": ["compounding-intelligence"],
-            "error_count": 1,
-            "knowledge_count": 2
-        }
-    }
-    is_valid, errors, warnings = validate_extraction(valid_extraction)
-    if is_valid:
-        print(f"  PASS: Valid extraction accepted ({len(warnings)} warnings)")
-        tests_passed += 1
-    else:
-        print(f"  FAIL: Valid extraction rejected: {errors}")
-        tests_failed += 1
-
-    # Test 3: Reject missing fields
-    print("\n[Test 3] Missing fields are rejected")
-    bad_extraction = {
-        "knowledge": [
-            {"fact": "Something learned", "category": "fact"}  # Missing repo, confidence, evidence
-        ]
-    }
-    is_valid, errors, warnings = validate_extraction(bad_extraction)
-    if not is_valid:
-        print(f"  PASS: Rejected with {len(errors)} errors")
-        tests_passed += 1
-    else:
-        print(f"  FAIL: Should have rejected missing fields")
-        tests_failed += 1
-
-    # Test 4: Reject invalid category
-    print("\n[Test 4] Invalid category is rejected")
-    bad_cat = {
-        "knowledge": [
-            {"fact": "Test", "category": "discovery", "repo": "x", "confidence": 0.8, "evidence": "test"}
-        ]
-    }
-    is_valid, errors, warnings = validate_extraction(bad_cat)
-    if not is_valid and any("category" in e for e in errors):
-        print(f"  PASS: Invalid category 'discovery' rejected")
-        tests_passed += 1
-    else:
-        print(f"  FAIL: Should have rejected invalid category")
-        tests_failed += 1
-
-    # Test 5: Detect near-duplicates
-    print("\n[Test 5] Duplicate detection")
-    dup_extraction = {
-        "knowledge": [
-            {"fact": "Token is at ~/.config/gitea/token", "category": "fact", "repo": "x", "confidence": 0.9, "evidence": "a"},
-            {"fact": "Token is at ~/.config/gitea/token", "category": "fact", "repo": "x", "confidence": 0.9, "evidence": "b"}
-        ],
-        "meta": {"session_outcome": "success", "tools_used": [], "repos_touched": [], "error_count": 0, "knowledge_count": 2}
-    }
-    is_valid, errors, warnings = validate_extraction(dup_extraction)
-    if any("Duplicate" in w for w in warnings):
-        print(f"  PASS: Duplicate detected")
-        tests_passed += 1
-    else:
-        print(f"  FAIL: Should have detected duplicate")
-        tests_failed += 1
-
-    # Test 6: Hallucination check against transcript
-    print("\n[Test 6] Hallucination detection")
-    hallucinated = {
-        "knowledge": [
-            {
-                "fact": "Database port is 5433",
-                "category": "fact",
-                "repo": "x",
-                "confidence": 0.9,
-                "evidence": "PostgreSQL listening on port 5433"
-            }
-        ],
-        "meta": {"session_outcome": "success", "tools_used": [], "repos_touched": [], "error_count": 0, "knowledge_count": 1}
-    }
-    hallucinations = validate_transcript_coverage(hallucinated, SAMPLE_TRANSCRIPT_SUCCESS)
-    if hallucinations:
-        print(f"  PASS: Hallucination detected ({len(hallucinations)} items)")
-        tests_passed += 1
-    else:
-        print(f"  FAIL: Should have detected hallucinated evidence")
-        tests_failed += 1
-
-    # Test 7: Failed session should extract pitfalls
-    print("\n[Test 7] Failed session extraction shape")
-    failed_extraction = {
-        "knowledge": [
-            {
-                "fact": "SSH server not installed on target VPS",
-                "category": "pitfall",
-                "repo": "global",
-                "confidence": 0.9,
-                "evidence": "ERROR: OpenSSH server not installed on this VPS"
-            },
-            {
-                "fact": "VPS blocks deployment without SSH access",
-                "category": "question",
-                "repo": "global",
-                "confidence": 0.7,
-                "evidence": "Cannot proceed without SSH access. Deployment blocked."
-            }
-        ],
-        "meta": {
-            "session_outcome": "failed",
-            "tools_used": ["ssh", "systemctl"],
-            "repos_touched": [],
-            "error_count": 3,
-            "knowledge_count": 2
-        }
-    }
-    is_valid, errors, warnings = validate_extraction(failed_extraction)
-    if is_valid:
-        categories = [item["category"] for item in failed_extraction["knowledge"]]
-        if "pitfall" in categories:
-            print(f"  PASS: Failed session extracted {len(categories)} items including pitfalls")
-            tests_passed += 1
-        else:
-            print(f"  FAIL: Failed session should extract pitfalls")
-            tests_failed += 1
-    else:
-        print(f"  FAIL: {errors}")
-        tests_failed += 1
-
-    # Test 8: Empty extraction is warned
-    print("\n[Test 8] Empty extraction warning")
-    empty = {"knowledge": [], "meta": {"session_outcome": "success", "tools_used": [], "repos_touched": [], "error_count": 0, "knowledge_count": 0}}
-    is_valid, errors, warnings = validate_extraction(empty)
-    if any("No knowledge" in w for w in warnings):
-        print(f"  PASS: Empty extraction warned")
-        tests_passed += 1
-    else:
-        print(f"  FAIL: Should warn on empty extraction")
-        tests_failed += 1
-
-    # Summary
-    print(f"\n{'=' * 60}")
-    print(f"Results: {tests_passed} passed, {tests_failed} failed")
-    print(f"{'=' * 60}")
-    return tests_failed == 0
-
-
-def validate_file(filepath):
-    """Validate an existing extraction JSON file."""
-    path = Path(filepath)
-    if not path.exists():
-        print(f"ERROR: {filepath} not found")
-        return False
-
-    data = json.loads(path.read_text())
-    is_valid, errors, warnings = validate_extraction(data)
-
-    print(f"Validation of {filepath}:")
-    print(f"  Knowledge items: {len(data.get('knowledge', []))}")
-    print(f"  Errors: {len(errors)}")
-    print(f"  Warnings: {len(warnings)}")
-
-    for e in errors:
-        print(f"  ERROR: {e}")
-    for w in warnings:
-        print(f"  WARN: {w}")
-
-    return is_valid
-
+    
+    # This would be replaced with actual prompt execution
+    print("Sample transcript processed")
+    print("Expected categories: fact, pitfall, pattern, tool-quirk, question")
+    return True

 if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description="Test knowledge extraction prompt")
-    parser.add_argument("--validate", help="Validate an existing extraction JSON file")
-    parser.add_argument("--transcript", help="Test against a real transcript file (informational)")
-    args = parser.parse_args()
-
-    if args.validate:
-        success = validate_file(args.validate)
-        sys.exit(0 if success else 1)
-    else:
-        success = run_tests()
-        sys.exit(0 if success else 1)
+    print("Testing knowledge extraction prompt...")
+    
+    # Test 1: Validate prompt file exists
+    prompt_path = Path("templates/harvest-prompt.md")
+    if not prompt_path.exists():
+        print("ERROR: harvest-prompt.md not found")
+        sys.exit(1)
+    
+    print(f"OK: Prompt file exists: {prompt_path}")
+    
+    # Test 2: Check prompt size
+    prompt_size = prompt_path.stat().st_size
+    print(f"OK: Prompt size: {prompt_size} bytes")
+    
+    # Test 3: Test sample transcript processing
+    if test_sample_transcript():
+        print("OK: Sample transcript test passed")
+    
+    print("\nAll tests passed!")
--- a/scripts/test_harvest_prompt_comprehensive.py
+++ b/scripts/test_harvest_prompt_comprehensive.py
@@ -1,72 +0,0 @@
-#!/usr/bin/env python3
-"""Comprehensive tests for knowledge extraction prompt."""
-import json, re
-from pathlib import Path
-
-def check_prompt_structure():
-    p = Path("templates/harvest-prompt.md")
-    if not p.exists(): return False, "harvest-prompt.md not found"
-    c = p.read_text()
-    for s in ["System Prompt","Instructions","Categories","Output Format","Confidence Scoring","Constraints","Example"]:
-        if s.lower() not in c.lower(): return False, f"Missing section: {s}"
-    for cat in ["fact","pitfall","pattern","tool-quirk","question"]:
-        if cat not in c: return False, f"Missing category: {cat}"
-    if len(c) > 5000: return False, f"Too large: {len(c)}"
-    if len(c) < 1000: return False, f"Too small: {len(c)}"
-    return True, "Prompt structure is valid"
-
-def check_confidence_scoring():
-    c = Path("templates/harvest-prompt.md").read_text()
-    for l in ["0.9-1.0","0.7-0.8","0.5-0.6","0.3-0.4","0.1-0.2"]:
-        if l not in c: return False, f"Missing level: {l}"
-    return True, "Confidence scoring defined"
-
-def check_example_quality():
-    c = Path("templates/harvest-prompt.md").read_text()
-    if "example" not in c.lower(): return False, "No examples"
-    m = re.search(r'"knowledge"', c[c.lower().find("example"):])
-    if not m: return False, "No JSON example"
-    return True, "Examples present"
-
-def check_constraint_coverage():
-    c = Path("templates/harvest-prompt.md").read_text()
-    for x in ["no hallucination","explicitly","partial","failed sessions"]:
-        if x not in c.lower(): return False, f"Missing: {x}"
-    return True, "Constraints covered"
-
-def check_test_sessions():
-    d = Path("test_sessions")
-    if not d.exists(): return False, "test_sessions/ not found"
-    files = list(d.glob("*.jsonl"))
-    if len(files) < 5: return False, f"Only {len(files)} sessions"
-    for f in files:
-        for i, line in enumerate(f.read_text().strip().split("\n"), 1):
-            try: json.loads(line)
-            except json.JSONDecodeError as e: return False, f"{f.name}:{i}: {e}"
-    return True, f"{len(files)} valid sessions"
-
-def test_prompt_structure():
-    passed, msg = check_prompt_structure()
-    assert passed, msg
-
-def test_confidence_scoring():
-    passed, msg = check_confidence_scoring()
-    assert passed, msg
-
-def test_example_quality():
-    passed, msg = check_example_quality()
-    assert passed, msg
-
-def test_constraint_coverage():
-    passed, msg = check_constraint_coverage()
-    assert passed, msg
-
-def test_test_sessions():
-    passed, msg = check_test_sessions()
-    assert passed, msg
-
-if __name__ == "__main__":
-    checks = [check_prompt_structure, check_confidence_scoring, check_example_quality, check_constraint_coverage, check_test_sessions]
-    for fn in checks:
-        ok, msg = fn()
-        print(f"{'PASS' if ok else 'FAIL'}: {fn.__name__} -- {msg}")
--- a/scripts/test_harvester_pipeline.py
+++ b/scripts/test_harvester_pipeline.py
@@ -1,162 +0,0 @@
-#!/usr/bin/env python3
-"""
-Smoke test for harvester pipeline — verifies the full chain:
-session_reader -> prompt -> LLM (mocked) -> validate -> deduplicate -> store
-
-Does NOT call the real LLM. Tests plumbing only.
-"""
-
-import json
-import sys
-import tempfile
-import os
-from pathlib import Path
-
-# Setup path
-SCRIPT_DIR = Path(__file__).parent.absolute()
-sys.path.insert(0, str(SCRIPT_DIR))
-
-from session_reader import read_session, extract_conversation, truncate_for_context, messages_to_text
-from harvester import validate_fact, deduplicate, load_existing_knowledge, fact_fingerprint
-
-
-def test_session_reader():
-    """Test that session_reader parses JSONL correctly."""
-    with tempfile.NamedTemporaryFile(mode='w', suffix='.jsonl', delete=False) as f:
-        f.write('{"role": "user", "content": "Hello", "timestamp": "2026-04-13T10:00:00Z"}\n')
-        f.write('{"role": "assistant", "content": "Hi there", "timestamp": "2026-04-13T10:00:01Z"}\n')
-        f.write('{"role": "user", "content": "Clone the repo", "timestamp": "2026-04-13T10:00:02Z"}\n')
-        f.write('{"role": "assistant", "content": "Cloned successfully", "timestamp": "2026-04-13T10:00:05Z"}\n')
-        path = f.name
-
-    messages = read_session(path)
-    assert len(messages) == 4, f"Expected 4 messages, got {len(messages)}"
-
-    conv = extract_conversation(messages)
-    assert len(conv) == 4, f"Expected 4 conversation turns, got {len(conv)}"
-
-    text = messages_to_text(conv)
-    assert "USER: Hello" in text
-    assert "ASSISTANT: Hi there" in text
-
-    truncated = truncate_for_context(conv, head=2, tail=2)
-    assert len(truncated) == 4  # 4 <= head+tail, so no truncation
-
-    os.unlink(path)
-    print("  [PASS] session_reader pipeline works")
-
-
-def test_validate_fact():
-    """Test fact validation."""
-    good = {"fact": "Gitea token is at ~/.config/gitea/token", "category": "tool-quirk", "repo": "global", "confidence": 0.9}
-    assert validate_fact(good), "Valid fact should pass"
-
-    bad_missing = {"fact": "Something", "category": "fact"}
-    assert not validate_fact(bad_missing), "Missing fields should fail"
-
-    bad_category = {"fact": "Something", "category": "nonsense", "repo": "x", "confidence": 0.5}
-    assert not validate_fact(bad_category), "Bad category should fail"
-
-    bad_conf = {"fact": "Something", "category": "fact", "repo": "x", "confidence": 1.5}
-    assert not validate_fact(bad_conf), "Confidence > 1.0 should fail"
-
-    print("  [PASS] fact validation works")
-
-
-def test_deduplicate():
-    """Test deduplication."""
-    existing = [
-        {"fact": "Token is at ~/.config/gitea/token", "category": "tool-quirk", "repo": "global", "confidence": 0.9}
-    ]
-    new = [
-        {"fact": "Token is at ~/.config/gitea/token", "category": "tool-quirk", "repo": "global", "confidence": 0.9},  # exact dup
-        {"fact": "Deploy uses Ansible on port 22", "category": "pattern", "repo": "fleet", "confidence": 0.8},  # unique
-    ]
-    result = deduplicate(new, existing)
-    assert len(result) == 1, f"Expected 1 unique, got {len(result)}"
-    assert result[0]["fact"] == "Deploy uses Ansible on port 22"
-    print("  [PASS] deduplication works")
-
-
-def test_knowledge_store_roundtrip():
-    """Test loading and writing knowledge index."""
-    with tempfile.TemporaryDirectory() as tmpdir:
-        # Load empty index
-        index = load_existing_knowledge(tmpdir)
-        assert index["total_facts"] == 0
-
-        # Write a fact
-        new_facts = [{"fact": "Test fact", "category": "fact", "repo": "test", "confidence": 0.9}]
-
-        # Use harvester's write function
-        from harvester import write_knowledge
-        write_knowledge(index, new_facts, tmpdir, source_session="test.jsonl")
-
-        # Reload and verify
-        index2 = load_existing_knowledge(tmpdir)
-        assert index2["total_facts"] == 1
-        assert index2["facts"][0]["fact"] == "Test fact"
-        assert index2["facts"][0]["source_session"] == "test.jsonl"
-
-        # Check markdown was written
-        md_path = Path(tmpdir) / "repos" / "test.md"
-        assert md_path.exists(), "Markdown file should be created"
-
-    print("  [PASS] knowledge store roundtrip works")
-
-
-def test_full_chain_no_llm():
-    """Test the full pipeline minus the LLM call."""
-    with tempfile.NamedTemporaryFile(mode='w', suffix='.jsonl', delete=False) as f:
-        f.write('{"role": "user", "content": "Clone compounding-intelligence", "timestamp": "2026-04-13T10:00:00Z"}\n')
-        f.write('{"role": "assistant", "content": "Cloned successfully", "timestamp": "2026-04-13T10:00:05Z"}\n')
-        session_path = f.name
-
-    with tempfile.TemporaryDirectory() as knowledge_dir:
-        # Step 1: Read
-        messages = read_session(session_path)
-        assert len(messages) == 2
-
-        # Step 2: Extract conversation
-        conv = extract_conversation(messages)
-        assert len(conv) == 2
-
-        # Step 3: Truncate
-        truncated = truncate_for_context(conv, head=50, tail=50)
-
-        # Step 4: Convert to text (this goes to the LLM)
-        transcript = messages_to_text(truncated)
-        assert "Clone compounding-intelligence" in transcript
-
-        # Step 5-7: Would be LLM call, validate, deduplicate
-        # We simulate LLM output here
-        mock_facts = [
-            {"fact": "compounding-intelligence repo was cloned", "category": "fact", "repo": "compounding-intelligence", "confidence": 0.9}
-        ]
-        valid = [f for f in mock_facts if validate_fact(f)]
-
-        # Step 6: Deduplicate
-        index = load_existing_knowledge(knowledge_dir)
-        new_facts = deduplicate(valid, index.get("facts", []))
-        assert len(new_facts) == 1
-
-        # Step 7: Store
-        from harvester import write_knowledge
-        write_knowledge(index, new_facts, knowledge_dir, source_session=session_path)
-
-        # Verify
-        index2 = load_existing_knowledge(knowledge_dir)
-        assert index2["total_facts"] == 1
-
-    os.unlink(session_path)
-    print("  [PASS] full chain (reader -> validate -> dedup -> store) works")
-
-
-if __name__ == "__main__":
-    print("Running harvester pipeline smoke tests...")
-    test_session_reader()
-    test_validate_fact()
-    test_deduplicate()
-    test_knowledge_store_roundtrip()
-    test_full_chain_no_llm()
-    print("\nAll tests passed.")
--- a/scripts/test_improvement_proposals.py
+++ b/scripts/test_improvement_proposals.py
@@ -1,282 +0,0 @@
-#!/usr/bin/env python3
-"""Tests for scripts/improvement_proposals.py — 15 tests."""
-
-import json
-import os
-import sys
-import tempfile
-
-sys.path.insert(0, os.path.dirname(__file__) or ".")
-import importlib.util
-
-spec = importlib.util.spec_from_file_location(
-    "ip", os.path.join(os.path.dirname(__file__) or ".", "improvement_proposals.py")
-)
-mod = importlib.util.module_from_spec(spec)
-spec.loader.exec_module(mod)
-
-analyze_sessions = mod.analyze_sessions
-generate_proposals = mod.generate_proposals
-format_proposals_markdown = mod.format_proposals_markdown
-format_proposals_json = mod.format_proposals_json
-_normalize_error = mod._normalize_error
-
-
-# ── Helper to build test sessions ─────────────────────────────
-
-def _make_session(session_id, repo="test-repo", errors=None, tool_calls=None, duration=30):
-    return {
-        "session_id": session_id,
-        "repo": repo,
-        "errors": [{"message": e} for e in (errors or [])],
-        "tool_calls": tool_calls or [],
-        "duration_minutes": duration,
-        "messages": [],
-    }
-
-
-def _make_tool_calls(repeats):
-    """Create tool call list with repeated tools."""
-    calls = []
-    for tool, count in repeats:
-        for _ in range(count):
-            calls.append({"tool": tool, "latency_ms": 100})
-    return calls
-
-
-# ── Tests ─────────────────────────────────────────────────────
-
-def test_empty_sessions():
-    patterns = analyze_sessions([])
-    assert patterns == []
-    print("PASS: test_empty_sessions")
-
-
-def test_no_patterns_on_clean_sessions():
-    sessions = [
-        _make_session("s1", tool_calls=[{"tool": "read_file", "latency_ms": 50}]),
-        _make_session("s2", tool_calls=[{"tool": "write_file", "latency_ms": 80}]),
-    ]
-    patterns = analyze_sessions(sessions)
-    # No repeated errors, no slow tools, no retries
-    assert len(patterns) == 0
-    print("PASS: test_no_patterns_on_clean_sessions")
-
-
-def test_repeated_error_detection():
-    """Same error across 3+ sessions triggers pattern."""
-    sessions = [
-        _make_session(f"s{i}", errors=["ModuleNotFoundError: No module named bannerlord_trace"])
-        for i in range(4)
-    ]
-    patterns = analyze_sessions(sessions)
-    repeated = [p for p in patterns if p.pattern_type == "repeated_error"]
-    assert len(repeated) == 1
-    assert repeated[0].occurrences == 4
-    print("PASS: test_repeated_error_detection")
-
-
-def test_repeated_error_threshold():
-    """2 occurrences should NOT trigger (threshold is 3)."""
-    sessions = [
-        _make_session("s1", errors=["TimeoutError: connection timed out"]),
-        _make_session("s2", errors=["TimeoutError: connection timed out"]),
-    ]
-    patterns = analyze_sessions(sessions)
-    repeated = [p for p in patterns if p.pattern_type == "repeated_error"]
-    assert len(repeated) == 0
-    print("PASS: test_repeated_error_threshold")
-
-
-def test_slow_tool_detection():
-    """Tool with avg latency > 5000ms across 5+ calls."""
-    calls = [{"tool": "git_push", "latency_ms": 8000} for _ in range(10)]
-    sessions = [_make_session("s1", tool_calls=calls)]
-    patterns = analyze_sessions(sessions)
-    slow = [p for p in patterns if p.pattern_type == "slow_tool"]
-    assert len(slow) == 1
-    assert "git_push" in slow[0].description
-    print("PASS: test_slow_tool_detection")
-
-
-def test_fast_tool_not_flagged():
-    """Tool under 5000ms avg should not trigger."""
-    calls = [{"tool": "read_file", "latency_ms": 50} for _ in range(10)]
-    sessions = [_make_session("s1", tool_calls=calls)]
-    patterns = analyze_sessions(sessions)
-    slow = [p for p in patterns if p.pattern_type == "slow_tool"]
-    assert len(slow) == 0
-    print("PASS: test_fast_tool_not_flagged")
-
-
-def test_failed_retry_detection():
-    """3+ consecutive calls to same tool triggers retry pattern."""
-    calls = _make_tool_calls([("execute_code", 5)])
-    sessions = [_make_session("s1", tool_calls=calls)]
-    sessions.extend([
-        _make_session(f"s{i}", tool_calls=_make_tool_calls([("execute_code", 4)]))
-        for i in range(2, 5)
-    ])
-    patterns = analyze_sessions(sessions)
-    retries = [p for p in patterns if p.pattern_type == "failed_retry"]
-    assert len(retries) >= 1
-    print("PASS: test_failed_retry_detection")
-
-
-def test_manual_process_detection():
-    """10+ tool calls with <= 3 unique tools."""
-    calls = _make_tool_calls([("terminal", 8), ("read_file", 5)])
-    sessions = [_make_session("s1", tool_calls=calls, duration=25)]
-    patterns = analyze_sessions(sessions)
-    manual = [p for p in patterns if p.pattern_type == "manual_process"]
-    assert len(manual) == 1
-    print("PASS: test_manual_process_detection")
-
-
-def test_generate_proposals_from_patterns():
-    """Proposals generated from waste patterns."""
-    sessions = [
-        _make_session(f"s{i}", errors=["Error: push timeout"])
-        for i in range(5)
-    ]
-    patterns = analyze_sessions(sessions)
-    proposals = generate_proposals(patterns)
-    assert len(proposals) >= 1
-    assert proposals[0].estimated_monthly_hours_saved > 0
-    assert proposals[0].priority in ("critical", "high", "medium", "low")
-    print("PASS: test_generate_proposals_from_patterns")
-
-
-def test_proposal_roi_positive():
-    """ROI weeks should be a positive number for recoverable time."""
-    patterns = [mod.WastePattern(
-        pattern_type="repeated_error",
-        description="Test error",
-        occurrences=10,
-        total_time_hours=5.0,
-        affected_repos=["test"],
-    )]
-    proposals = generate_proposals(patterns)
-    assert len(proposals) == 1
-    assert proposals[0].roi_weeks > 0
-    assert proposals[0].roi_weeks < 100
-    print("PASS: test_proposal_roi_positive")
-
-
-def test_proposals_sorted_by_impact():
-    """Proposals should be sorted by monthly hours saved (descending)."""
-    sessions = [
-        _make_session("s1", errors=["Minor warning"] * 3, duration=5),
-        _make_session("s2", errors=["Critical failure: deploy crashed"] * 5, duration=60),
-    ]
-    # Add more sessions to cross threshold
-    for i in range(3, 7):
-        sessions.append(_make_session(f"s{i}", errors=["Critical failure: deploy crashed"]))
-
-    patterns = analyze_sessions(sessions)
-    proposals = generate_proposals(patterns)
-    if len(proposals) >= 2:
-        for i in range(len(proposals) - 1):
-            assert proposals[i].estimated_monthly_hours_saved >= proposals[i + 1].estimated_monthly_hours_saved
-    print("PASS: test_proposals_sorted_by_impact")
-
-
-def test_format_markdown():
-    """Markdown output should contain expected sections."""
-    patterns = [mod.WastePattern(
-        pattern_type="repeated_error", description="Test", occurrences=5,
-        total_time_hours=2.5, affected_repos=["repo"],
-    )]
-    proposals = generate_proposals(patterns)
-    md = format_proposals_markdown(proposals, patterns, "2026-04-15T00:00:00Z")
-    assert "# Improvement Proposals" in md
-    assert "## Summary" in md
-    assert "### Problem" in md
-    assert "### ROI Estimate" in md
-    assert "## Appendix" in md
-    print("PASS: test_format_markdown")
-
-
-def test_format_json():
-    """JSON output should be valid and parseable."""
-    patterns = [mod.WastePattern(
-        pattern_type="slow_tool", description="Slow", occurrences=10,
-        total_time_hours=3.0, affected_repos=["global"],
-    )]
-    proposals = generate_proposals(patterns)
-    output = format_proposals_json(proposals)
-    parsed = json.loads(output)
-    assert isinstance(parsed, list)
-    assert len(parsed) == len(proposals)
-    assert "title" in parsed[0]
-    assert "roi_weeks" in parsed[0]
-    print("PASS: test_format_json")
-
-
-def test_normalize_error():
-    """Error normalization should remove paths and hashes."""
-    err1 = _normalize_error("Failed to clone /Users/apayne/repo with token abc123def456")
-    assert "/PATH" in err1
-    assert "HASH" in err1
-    assert "/Users/apayne" not in err1
-
-    err2 = _normalize_error("")
-    assert err2 == ""
-
-    err3 = _normalize_error("Simple error message")
-    assert "simple error" in err3
-    print("PASS: test_normalize_error")
-
-
-def test_cli_integration():
-    """End-to-end test: write input JSON, run script, check output."""
-    import subprocess
-
-    sessions = [
-        _make_session(f"s{i}", errors=["Connection refused: port 8080"])
-        for i in range(5)
-    ]
-
-    with tempfile.TemporaryDirectory() as tmpdir:
-        input_path = os.path.join(tmpdir, "analytics.json")
-        output_path = os.path.join(tmpdir, "proposals.md")
-
-        with open(input_path, "w") as f:
-            json.dump({"sessions": sessions}, f)
-
-        script = os.path.join(os.path.dirname(__file__) or ".", "improvement_proposals.py")
-        result = subprocess.run(
-            [sys.executable, script, "--input", input_path, "--output", output_path],
-            capture_output=True, text=True, timeout=10,
-        )
-
-        assert result.returncode == 0, f"CLI failed: {result.stderr}"
-        assert os.path.exists(output_path)
-
-        with open(output_path) as f:
-            content = f.read()
-        assert "# Improvement Proposals" in content
-        print("PASS: test_cli_integration")
-
-
-def run_all():
-    test_empty_sessions()
-    test_no_patterns_on_clean_sessions()
-    test_repeated_error_detection()
-    test_repeated_error_threshold()
-    test_slow_tool_detection()
-    test_fast_tool_not_flagged()
-    test_failed_retry_detection()
-    test_manual_process_detection()
-    test_generate_proposals_from_patterns()
-    test_proposal_roi_positive()
-    test_proposals_sorted_by_impact()
-    test_format_markdown()
-    test_format_json()
-    test_normalize_error()
-    test_cli_integration()
-    print("\nAll 15 tests passed!")
-
-
-if __name__ == "__main__":
-    run_all()
--- a/scripts/test_knowledge_staleness.py
+++ b/scripts/test_knowledge_staleness.py
@@ -1,129 +0,0 @@
-#!/usr/bin/env python3
-"""Tests for scripts/knowledge_staleness_check.py — 8 tests."""
-
-import json
-import os
-import sys
-import tempfile
-
-sys.path.insert(0, os.path.dirname(__file__) or ".")
-import importlib.util
-spec = importlib.util.spec_from_file_location("ks", os.path.join(os.path.dirname(__file__) or ".", "knowledge_staleness_check.py"))
-mod = importlib.util.module_from_spec(spec)
-spec.loader.exec_module(mod)
-check_staleness = mod.check_staleness
-fix_hashes = mod.fix_hashes
-compute_file_hash = mod.compute_file_hash
-
-
-def test_fresh_entry():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        src = os.path.join(tmpdir, "source.py")
-        with open(src, "w") as f:
-            f.write("print('hello')")
-        h = compute_file_hash(src)
-        idx = os.path.join(tmpdir, "index.json")
-        with open(idx, "w") as f:
-            json.dump({"facts": [{"fact": "hello", "source_file": "source.py", "source_hash": h}]}, f)
-        results = check_staleness(idx, tmpdir)
-        assert results[0]["status"] == "fresh"
-        print("PASS: test_fresh_entry")
-
-
-def test_stale_entry():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        src = os.path.join(tmpdir, "source.py")
-        with open(src, "w") as f:
-            f.write("original content")
-        idx = os.path.join(tmpdir, "index.json")
-        with open(idx, "w") as f:
-            json.dump({"facts": [{"fact": "old", "source_file": "source.py", "source_hash": "sha256:wrong"}]}, f)
-        # Now change the source
-        with open(src, "w") as f:
-            f.write("modified content")
-        results = check_staleness(idx, tmpdir)
-        assert results[0]["status"] == "stale"
-        print("PASS: test_stale_entry")
-
-
-def test_missing_source():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        idx = os.path.join(tmpdir, "index.json")
-        with open(idx, "w") as f:
-            json.dump({"facts": [{"fact": "gone", "source_file": "nonexistent.py", "source_hash": "sha256:abc"}]}, f)
-        results = check_staleness(idx, tmpdir)
-        assert results[0]["status"] == "missing_source"
-        print("PASS: test_missing_source")
-
-
-def test_no_hash():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        src = os.path.join(tmpdir, "source.py")
-        with open(src, "w") as f:
-            f.write("content")
-        idx = os.path.join(tmpdir, "index.json")
-        with open(idx, "w") as f:
-            json.dump({"facts": [{"fact": "no hash", "source_file": "source.py"}]}, f)
-        results = check_staleness(idx, tmpdir)
-        assert results[0]["status"] == "no_hash"
-        assert results[0]["current_hash"].startswith("sha256:")
-        print("PASS: test_no_hash")
-
-
-def test_no_source_field():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        idx = os.path.join(tmpdir, "index.json")
-        with open(idx, "w") as f:
-            json.dump({"facts": [{"fact": "orphan"}]}, f)
-        results = check_staleness(idx, tmpdir)
-        assert results[0]["status"] == "no_source"
-        print("PASS: test_no_source_field")
-
-
-def test_fix_hashes():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        src = os.path.join(tmpdir, "source.py")
-        with open(src, "w") as f:
-            f.write("content for hashing")
-        idx = os.path.join(tmpdir, "index.json")
-        with open(idx, "w") as f:
-            json.dump({"facts": [{"fact": "needs hash", "source_file": "source.py"}]}, f)
-        fixed = fix_hashes(idx, tmpdir)
-        assert fixed == 1
-        # Verify hash was added
-        with open(idx) as f:
-            data = json.load(f)
-        assert data["facts"][0]["source_hash"].startswith("sha256:")
-        print("PASS: test_fix_hashes")
-
-
-def test_empty_index():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        idx = os.path.join(tmpdir, "index.json")
-        with open(idx, "w") as f:
-            json.dump({"facts": []}, f)
-        results = check_staleness(idx, tmpdir)
-        assert results == []
-        print("PASS: test_empty_index")
-
-
-def test_compute_hash_nonexistent():
-    h = compute_file_hash("/nonexistent/path/file.py")
-    assert h is None
-    print("PASS: test_compute_hash_nonexistent")
-
-
-def run_all():
-    test_fresh_entry()
-    test_stale_entry()
-    test_missing_source()
-    test_no_hash()
-    test_no_source_field()
-    test_fix_hashes()
-    test_empty_index()
-    test_compute_hash_nonexistent()
-    print("\nAll 8 tests passed!")
-
-
-if __name__ == "__main__":
-    run_all()
--- a/scripts/test_priority_rebalancer.py
+++ b/scripts/test_priority_rebalancer.py
@@ -1,305 +0,0 @@
-#!/usr/bin/env python3
-"""
-Tests for Priority Rebalancer
-"""
-
-import json
-import os
-import sys
-import tempfile
-from datetime import datetime, timedelta
-from pathlib import Path
-
-# Add script dir to path
-sys.path.insert(0, str(Path(__file__).parent))
-
-from priority_rebalancer import (
-    GiteaClient,
-    IssueScore,
-    PipelineSignal,
-    compute_issue_score,
-    collect_knowledge_signals,
-    collect_metrics_signals,
-    extract_priority,
-    generate_report,
-    generate_markdown_report,
-    PRIORITY_LEVELS,
-)
-
-# ============================================================
-# Test Helpers
-# ============================================================
-
-PASS = 0
-FAIL = 0
-
-def test(name):
-    def decorator(fn):
-        global PASS, FAIL
-        try:
-            fn()
-            PASS += 1
-            print(f"  [PASS] {name}")
-        except Exception as e:
-            FAIL += 1
-            print(f"  [FAIL] {name}: {e}")
-    return decorator
-
-def assert_eq(a, b, msg=""):
-    if a != b:
-        raise AssertionError(f"{msg} expected {b!r}, got {a!r}")
-
-def assert_true(v, msg=""):
-    if not v:
-        raise AssertionError(msg or "Expected True")
-
-def assert_false(v, msg=""):
-    if v:
-        raise AssertionError(msg or "Expected False")
-
-
-# ============================================================
-# Priority Extraction Tests
-# ============================================================
-
-print("=== Priority Rebalancer Tests ===\n")
-
-print("-- Priority Extraction --")
-
-@test("extract P0 from label")
-def _():
-    assert_eq(extract_priority(["P0", "bug"]), "P0")
-
-@test("extract P1 from priority:high")
-def _():
-    assert_eq(extract_priority(["priority:high"]), "P1")
-
-@test("extract P2 from priority:medium")
-def _():
-    assert_eq(extract_priority(["priority:medium"]), "P2")
-
-@test("extract P3 from priority:low")
-def _():
-    assert_eq(extract_priority(["priority:low"]), "P3")
-
-@test("returns None for no priority")
-def _():
-    assert_eq(extract_priority(["bug", "enhancement"]), None)
-
-@test("case insensitive")
-def _():
-    assert_eq(extract_priority(["p1"]), "P1")
-    assert_eq(extract_priority(["PRIORITY:CRITICAL"]), "P0")
-
-
-# ============================================================
-# Issue Scoring Tests
-# ============================================================
-
-print("\n-- Issue Scoring --")
-
-def make_issue(**kwargs):
-    defaults = {
-        "number": 1,
-        "title": "Test issue",
-        "labels": [],
-        "created_at": (datetime.utcnow() - timedelta(days=5)).isoformat() + "Z",
-        "comments": 0,
-        "assignees": None,
-    }
-    defaults.update(kwargs)
-    return defaults
-
-@test("bug gets score boost")
-def _():
-    issue = make_issue(title="Incorrect output format", labels=["bug"])
-    score = compute_issue_score(issue, "test-repo", [], datetime.utcnow())
-    assert_true(score.score > 0, f"Bug should boost score, got {score.score}")
-    # Bug label alone should be P2 or P3 (not P0)
-    assert_true(score.suggested_priority in ("P2", "P3"),
-                f"Bug label alone should be P2/P3, got {score.suggested_priority}")
-
-@test("security gets high score")
-def _():
-    issue = make_issue(title="Security: auth bypass", labels=["bug"])
-    score = compute_issue_score(issue, "test-repo", [], datetime.utcnow())
-    assert_true(score.score >= 25, f"Security should score high, got {score.score}")
-
-@test("old dormant issue gets penalized")
-def _():
-    issue = make_issue(
-        title="Some old feature",
-        created_at=(datetime.utcnow() - timedelta(days=120)).isoformat() + "Z",
-        comments=0
-    )
-    score = compute_issue_score(issue, "test-repo", [], datetime.utcnow())
-    assert_true(score.score < 0, f"Old dormant should be negative, got {score.score}")
-    assert_true(any("Dormant" in r for r in score.reasons), "Should mention dormancy")
-
-@test("active discussion boosts score")
-def _():
-    issue = make_issue(title="Important fix", comments=8)
-    score = compute_issue_score(issue, "test-repo", [], datetime.utcnow())
-    assert_true(score.score > 5, f"Active discussion should boost, got {score.score}")
-    assert_true(any("Active" in r for r in score.reasons))
-
-@test("unassigned gets slight boost")
-def _():
-    issue = make_issue(title="Fix bug", assignees=None)
-    score = compute_issue_score(issue, "test-repo", [], datetime.utcnow())
-    assert_true(any("Unassigned" in r for r in score.reasons))
-
-@test("assigned issue notes assignee")
-def _():
-    issue = make_issue(title="Fix bug", assignees=[{"login": "alice"}])
-    score = compute_issue_score(issue, "test-repo", [], datetime.utcnow())
-    assert_eq(score.assignee, "alice")
-
-@test("nice-to-have gets penalized")
-def _():
-    issue = make_issue(title="Nice to have: fancy animation")
-    score = compute_issue_score(issue, "test-repo", [], datetime.utcnow())
-    assert_true(score.score < 0, f"Nice-to-have should be negative, got {score.score}")
-
-
-# ============================================================
-# Pipeline Signal Tests
-# ============================================================
-
-print("\n-- Pipeline Signals --")
-
-@test("signal alignment boosts matching issues")
-def _():
-    signals = [PipelineSignal(
-        source="knowledge",
-        signal_type="stale_knowledge",
-        weight=0.8,
-        detail="20 stale facts"
-    )]
-    issue = make_issue(title="Fix stale knowledge entries")
-    score = compute_issue_score(issue, "test-repo", signals, datetime.utcnow())
-    assert_true(any("Matches signal" in r for r in score.reasons))
-
-@test("empty knowledge boosts harvester issues")
-def _():
-    signals = [PipelineSignal(
-        source="knowledge",
-        signal_type="empty_knowledge",
-        weight=0.7,
-        detail="0 facts"
-    )]
-    issue = make_issue(title="Implement harvester pipeline")
-    score = compute_issue_score(issue, "test-repo", signals, datetime.utcnow())
-    assert_true(any("Critical gap" in r for r in score.reasons))
-
-
-# ============================================================
-# Knowledge Signal Collection Tests
-# ============================================================
-
-print("\n-- Knowledge Signal Collection --")
-
-@test("missing index generates signal")
-def _():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        signals = collect_knowledge_signals(tmpdir)
-        assert_true(len(signals) > 0)
-        assert_eq(signals[0].signal_type, "missing_index")
-
-@test("empty knowledge generates signal")
-def _():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        idx = os.path.join(tmpdir, "index.json")
-        with open(idx, "w") as f:
-            json.dump({"facts": []}, f)
-        signals = collect_knowledge_signals(tmpdir)
-        assert_true(any(s.signal_type == "empty_knowledge" for s in signals))
-
-@test("corrupt index generates signal")
-def _():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        idx = os.path.join(tmpdir, "index.json")
-        with open(idx, "w") as f:
-            f.write("not json {{{")
-        signals = collect_knowledge_signals(tmpdir)
-        assert_true(any(s.signal_type == "corrupt_index" for s in signals))
-
-@test("knowledge with facts passes")
-def _():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        idx = os.path.join(tmpdir, "index.json")
-        with open(idx, "w") as f:
-            json.dump({"facts": [
-                {"id": 1, "repo": "test", "status": "fresh"},
-                {"id": 2, "repo": "test", "status": "fresh"},
-            ]}, f)
-        signals = collect_knowledge_signals(tmpdir)
-        # Should not generate missing_index or empty_knowledge
-        assert_false(any(s.signal_type in ("missing_index", "empty_knowledge") for s in signals))
-
-
-# ============================================================
-# Metrics Signal Collection Tests
-# ============================================================
-
-print("\n-- Metrics Signal Collection --")
-
-@test("empty metrics dir generates signal")
-def _():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        signals = collect_metrics_signals(tmpdir)
-        assert_true(any(s.signal_type == "no_metrics" for s in signals))
-
-@test("metrics with files passes")
-def _():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        # Create files (simulating real metrics dir with .gitkeep + actual files)
-        with open(os.path.join(tmpdir, ".gitkeep"), "w") as f:
-            f.write("")
-        with open(os.path.join(tmpdir, "report.json"), "w") as f:
-            f.write("{}")
-        signals = collect_metrics_signals(tmpdir)
-        assert_false(any(s.signal_type == "no_metrics" for s in signals))
-
-
-# ============================================================
-# Report Generation Tests
-# ============================================================
-
-print("\n-- Report Generation --")
-
-@test("report has correct structure")
-def _():
-    scores = [
-        IssueScore(1, "repo1", "Bug fix", ["bug"], None, "P1", 30.0, ["test"], 5, 3, None),
-        IssueScore(2, "repo1", "Feature", ["enhancement"], "P3", None, -5.0, ["test"], 60, 0, "alice"),
-    ]
-    signals = [PipelineSignal("knowledge", "stale_knowledge", 0.5, "10 stale")]
-    report = generate_report(scores, signals, "test-org", ["repo1"])
-
-    assert_eq(report["org"], "test-org")
-    assert_eq(report["total_issues"], 2)
-    assert_true("generated_at" in report)
-    assert_true("summary" in report)
-    assert_true("top_priority" in report)
-    assert_eq(report["summary"]["suggested_new_priorities"], 1)
-
-@test("markdown report is non-empty")
-def _():
-    scores = [IssueScore(1, "repo1", "Test", ["bug"], None, "P2", 15.0, ["reason"], 5, 0, None)]
-    report = generate_report(scores, [], "test-org", ["repo1"])
-    md = generate_markdown_report(report)
-    assert_true(len(md) > 100)
-    assert_true("Priority Rebalancer Report" in md)
-    assert_true("Top 10" in md)
-
-
-# ============================================================
-# Summary
-# ============================================================
-
-print(f"\n=== Summary ===")
-print(f"Total: {PASS + FAIL} | Passed: {PASS} | Failed: {FAIL}")
-
-if FAIL > 0:
-    sys.exit(1)
--- a/scripts/test_refactoring_opportunity_finder.py
+++ b/scripts/test_refactoring_opportunity_finder.py
@@ -1,242 +0,0 @@
-#!/usr/bin/env python3
-"""Tests for scripts/refactoring_opportunity_finder.py — 10 tests."""
-
-import json
-import os
-import sys
-import tempfile
-
-sys.path.insert(0, os.path.dirname(__file__) or ".")
-import importlib.util
-spec = importlib.util.spec_from_file_location(
-    "rof", os.path.join(os.path.dirname(__file__) or ".", "refactoring_opportunity_finder.py"))
-mod = importlib.util.module_from_spec(spec)
-spec.loader.exec_module(mod)
-
-compute_file_complexity = mod.compute_file_complexity
-calculate_refactoring_score = mod.calculate_refactoring_score
-FileMetrics = mod.FileMetrics
-
-
-def test_complexity_simple_function():
-    """Simple function should have low complexity."""
-    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
-        f.write("""
-def simple():
-    return 42
-""")
-        f.flush()
-        avg, max_c, funcs, classes, lines = compute_file_complexity(f.name)
-        assert avg == 1.0, f"Expected 1.0, got {avg}"
-        assert max_c == 1, f"Expected 1, got {max_c}"
-        assert funcs == 1, f"Expected 1, got {funcs}"
-        assert classes == 0, f"Expected 0, got {classes}"
-        os.unlink(f.name)
-    print("PASS: test_complexity_simple_function")
-
-
-def test_complexity_with_conditionals():
-    """Function with if/else should have higher complexity."""
-    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
-        f.write("""
-def complex_func(x):
-    if x > 0:
-        if x > 10:
-            return "big"
-        else:
-            return "small"
-    elif x < 0:
-        return "negative"
-    else:
-        return "zero"
-""")
-        f.flush()
-        avg, max_c, funcs, classes, lines = compute_file_complexity(f.name)
-        # Base 1 + 3 if/elif + 1 nested if = 5
-        assert max_c >= 4, f"Expected max_c >= 4, got {max_c}"
-        assert funcs == 1, f"Expected 1, got {funcs}"
-        os.unlink(f.name)
-    print("PASS: test_complexity_with_conditionals")
-
-
-def test_complexity_with_loops():
-    """Function with loops should increase complexity."""
-    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
-        f.write("""
-def loop_func(items):
-    result = []
-    for item in items:
-        if item > 0:
-            result.append(item)
-    while len(result) > 10:
-        result.pop()
-    return result
-""")
-        f.flush()
-        avg, max_c, funcs, classes, lines = compute_file_complexity(f.name)
-        # Base 1 + 1 for + 1 if + 1 while = 4
-        assert max_c >= 3, f"Expected max_c >= 3, got {max_c}"
-        os.unlink(f.name)
-    print("PASS: test_complexity_with_loops")
-
-
-def test_complexity_with_class():
-    """Class with methods should count both."""
-    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
-        f.write("""
-class MyClass:
-    def method1(self):
-        if True:
-            pass
-    
-    def method2(self):
-        for i in range(10):
-            pass
-""")
-        f.flush()
-        avg, max_c, funcs, classes, lines = compute_file_complexity(f.name)
-        assert classes == 1, f"Expected 1 class, got {classes}"
-        assert funcs == 2, f"Expected 2 functions, got {funcs}"
-        os.unlink(f.name)
-    print("PASS: test_complexity_with_class")
-
-
-def test_complexity_syntax_error():
-    """File with syntax error should return zeros."""
-    with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
-        f.write("def broken(:\n    pass")
-        f.flush()
-        avg, max_c, funcs, classes, lines = compute_file_complexity(f.name)
-        assert avg == 0.0, f"Expected 0.0, got {avg}"
-        assert funcs == 0, f"Expected 0, got {funcs}"
-        os.unlink(f.name)
-    print("PASS: test_complexity_syntax_error")
-
-
-def test_refactoring_score_high_complexity():
-    """High complexity should give high score."""
-    metrics = FileMetrics(
-        path="test.py",
-        lines=200,
-        complexity=15.0,
-        max_complexity=25,
-        functions=10,
-        classes=2,
-        churn_30d=5,
-        churn_90d=15,
-        test_coverage=0.3,
-        refactoring_score=0.0
-    )
-    score = calculate_refactoring_score(metrics)
-    assert score > 50, f"Expected score > 50, got {score}"
-    print("PASS: test_refactoring_score_high_complexity")
-
-
-def test_refactoring_score_low_complexity():
-    """Low complexity should give lower score."""
-    metrics = FileMetrics(
-        path="test.py",
-        lines=50,
-        complexity=2.0,
-        max_complexity=3,
-        functions=3,
-        classes=0,
-        churn_30d=0,
-        churn_90d=1,
-        test_coverage=0.9,
-        refactoring_score=0.0
-    )
-    score = calculate_refactoring_score(metrics)
-    assert score < 30, f"Expected score < 30, got {score}"
-    print("PASS: test_refactoring_score_low_complexity")
-
-
-def test_refactoring_score_high_churn():
-    """High churn should increase score."""
-    metrics = FileMetrics(
-        path="test.py",
-        lines=100,
-        complexity=5.0,
-        max_complexity=8,
-        functions=5,
-        classes=0,
-        churn_30d=10,
-        churn_90d=20,
-        test_coverage=0.5,
-        refactoring_score=0.0
-    )
-    score = calculate_refactoring_score(metrics)
-    # Churn should contribute significantly
-    assert score > 40, f"Expected score > 40 for high churn, got {score}"
-    print("PASS: test_refactoring_score_high_churn")
-
-
-def test_refactoring_score_no_coverage():
-    """No coverage data should assume medium risk."""
-    metrics = FileMetrics(
-        path="test.py",
-        lines=100,
-        complexity=5.0,
-        max_complexity=8,
-        functions=5,
-        classes=0,
-        churn_30d=1,
-        churn_90d=2,
-        test_coverage=None,
-        refactoring_score=0.0
-    )
-    score = calculate_refactoring_score(metrics)
-    # Should have some score from the 5-point coverage component
-    assert score > 0, f"Expected positive score, got {score}"
-    print("PASS: test_refactoring_score_no_coverage")
-
-
-def test_refactoring_score_large_file():
-    """Large files should score higher."""
-    metrics_small = FileMetrics(
-        path="small.py",
-        lines=50,
-        complexity=5.0,
-        max_complexity=8,
-        functions=3,
-        classes=0,
-        churn_30d=1,
-        churn_90d=2,
-        test_coverage=0.8,
-        refactoring_score=0.0
-    )
-    metrics_large = FileMetrics(
-        path="large.py",
-        lines=1000,
-        complexity=5.0,
-        max_complexity=8,
-        functions=3,
-        classes=0,
-        churn_30d=1,
-        churn_90d=2,
-        test_coverage=0.8,
-        refactoring_score=0.0
-    )
-    score_small = calculate_refactoring_score(metrics_small)
-    score_large = calculate_refactoring_score(metrics_large)
-    assert score_large > score_small, \
-        f"Large file ({score_large}) should score higher than small ({score_small})"
-    print("PASS: test_refactoring_score_large_file")
-
-
-def run_all():
-    test_complexity_simple_function()
-    test_complexity_with_conditionals()
-    test_complexity_with_loops()
-    test_complexity_with_class()
-    test_complexity_syntax_error()
-    test_refactoring_score_high_complexity()
-    test_refactoring_score_low_complexity()
-    test_refactoring_score_high_churn()
-    test_refactoring_score_no_coverage()
-    test_refactoring_score_large_file()
-    print("\nAll 10 tests passed!")
-
-
-if __name__ == "__main__":
-    run_all()
--- a/scripts/test_session_pair_harvester.py
+++ b/scripts/test_session_pair_harvester.py
@@ -1,90 +0,0 @@
-#!/usr/bin/env python3
-"""Tests for session_pair_harvester."""
-
-import json
-import sys
-import os
-import tempfile
-
-sys.path.insert(0, os.path.dirname(__file__))
-from session_pair_harvester import extract_pairs_from_session, deduplicate_pairs, compute_hash
-
-
-def test_basic_extraction():
-    session = {
-        "id": "test_001",
-        "model": "test-model",
-        "conversations": [
-            {"from": "system", "value": "You are helpful."},
-            {"from": "human", "value": "What is Python?"},
-            {"from": "gpt", "value": "Python is a high-level programming language known for its readability and versatility. It supports multiple paradigms including procedural, object-oriented, and functional programming. Python is widely used in web development, data science, machine learning, and automation."},
-        ]
-    }
-    pairs = extract_pairs_from_session(session, min_ratio=1.5, min_response_words=10)
-    assert len(pairs) == 1
-    assert pairs[0]["terse"] == "What is Python?"
-    assert "programming language" in pairs[0]["rich"]
-    assert pairs[0]["source"] == "test_001"
-    print("PASS: test_basic_extraction")
-
-
-def test_filters_short_responses():
-    session = {
-        "id": "test_002",
-        "model": "test",
-        "conversations": [
-            {"from": "human", "value": "Hi"},
-            {"from": "gpt", "value": "Hello!"},
-        ]
-    }
-    pairs = extract_pairs_from_session(session, min_ratio=1.5, min_response_words=20)
-    assert len(pairs) == 0
-    print("PASS: test_filters_short_responses")
-
-
-def test_skips_tool_results():
-    session = {
-        "id": "test_003",
-        "model": "test",
-        "conversations": [
-            {"from": "human", "value": '{"output": "file content", "exit_code": 0}'},
-            {"from": "gpt", "value": "The file was read successfully. Now let me analyze the content and provide a detailed summary of what was found in the file system."},
-        ]
-    }
-    pairs = extract_pairs_from_session(session, min_ratio=1.5, min_response_words=10)
-    assert len(pairs) == 0
-    print("PASS: test_skips_tool_results")
-
-
-def test_deduplication():
-    pairs = [
-        {"terse": "What is X?", "rich": "X is Y.", "source": "s1", "model": "m"},
-        {"terse": "What is X?", "rich": "X is Y.", "source": "s2", "model": "m"},
-        {"terse": "What is Z?", "rich": "Z is W.", "source": "s1", "model": "m"},
-    ]
-    unique = deduplicate_pairs(pairs)
-    assert len(unique) == 2
-    print("PASS: test_deduplication")
-
-
-def test_ratio_filter():
-    session = {
-        "id": "test_005",
-        "model": "test",
-        "conversations": [
-            {"from": "human", "value": "Explain quantum computing in detail with examples and applications"},
-            {"from": "gpt", "value": "OK."},
-        ]
-    }
-    pairs = extract_pairs_from_session(session, min_ratio=1.5, min_response_words=10)
-    assert len(pairs) == 0  # response too short relative to prompt
-    print("PASS: test_ratio_filter")
-
-
-if __name__ == "__main__":
-    test_basic_extraction()
-    test_filters_short_responses()
-    test_skips_tool_results()
-    test_deduplication()
-    test_ratio_filter()
-    print("\nAll tests passed.")
--- a/scripts/validate_doc_links.py
+++ b/scripts/validate_doc_links.py
@@ -1,131 +0,0 @@
-#!/usr/bin/env python3
-"""
-Doc Link Validator — Extract and verify all documentation links.
-Issue: #103 — 4.8: Doc Link Validator
-
-Acceptance:
-  Extracts links from docs | HTTP HEAD check | Reports broken links
-  (Weekly cron/CI integration out of scope for this minimal script)
-"""
-
-import argparse
-import re
-import sys
-from pathlib import Path
-from typing import List, Tuple, Optional
-from urllib.request import Request, urlopen
-from urllib.error import URLError, HTTPError
-from urllib.parse import urlparse
-
-# Markdown link patterns
-INLINE_LINK_RE = re.compile(r'\[[^\]]*\]\(([^)\s]+)(?:\s+"[^"]*")?\)')
-AUTOLINK_RE = re.compile(r'<([^>]+)>')
-
-
-def extract_links(content: str) -> List[str]:
-    urls = [m.group(1) for m in INLINE_LINK_RE.finditer(content)]
-    urls += [m.group(1) for m in AUTOLINK_RE.finditer(content)]
-    return urls
-
-
-def is_ignorable(url: str, ignore_prefixes: List[str]) -> bool:
-    p = urlparse(url)
-    if p.scheme not in ('http', 'https'):
-        return True
-    host = p.netloc.split(':')[0]
-    if host in ('localhost', '127.0.0.1', '::1'):
-        return True
-    # Private IPv4 ranges
-    if re.match(r'^(10\.|192\.168\.|172\.(1[6-9]|2[0-9]|3[01])\.)', host):
-        return True
-    for prefix in ignore_prefixes:
-        if url.startswith(prefix):
-            return True
-    return False
-
-
-def check_url(url: str, timeout: float = 8.0) -> Tuple[bool, Optional[int], str]:
-    try:
-        req = Request(url, method='HEAD')
-        req.add_header('User-Agent', 'DocLinkValidator/1.0')
-        try:
-            with urlopen(req, timeout=timeout) as resp:
-                return True, resp.getcode(), "OK"
-        except HTTPError as e:
-            if e.code in (405, 403, 400):
-                req2 = Request(url, method='GET')
-                req2.add_header('User-Agent', 'DocLinkValidator/1.0')
-                req2.add_header('Range', 'bytes=0-1')
-                with urlopen(req2, timeout=timeout) as resp2:
-                    return True, resp2.getcode(), "OK via GET"
-            return False, e.code, e.reason
-    except URLError as e:
-        return False, None, str(e.reason) if hasattr(e, 'reason') else str(e)
-    except Exception as e:
-        return False, None, str(e)
-
-
-def main() -> int:
-    p = argparse.ArgumentParser(description="Validate documentation links")
-    p.add_argument('--root', default='.', help='Repository root')
-    p.add_argument('--fail-on-broken', action='store_true', help='Exit non-zero if broken links found')
-    p.add_argument('--json', action='store_true', help='Emit JSON report')
-    p.add_argument('--ignore', default='', help='Comma-separated URL prefixes to ignore')
-    args = p.parse_args()
-
-    root = Path(args.root).resolve()
-    ignore_prefixes = [x.strip() for x in args.ignore.split(',') if x.strip()]
-
-    md_files = list(root.rglob('*.md'))
-    if not md_files:
-        print("No markdown files found.", file=sys.stderr)
-        return 1
-
-    print(f"Scanning {len(md_files)} markdown files")
-
-    all_links: List[Tuple[Path, str]] = []
-    for md in md_files:
-        content = md.read_text(errors='replace')
-        for m in INLINE_LINK_RE.finditer(content):
-            all_links.append((md, m.group(1)))
-        for m in AUTOLINK_RE.finditer(content):
-            all_links.append((md, m.group(1)))
-
-    print(f"Raw link occurrences: {len(all_links)}")
-
-    # De-duplicate by URL, keep first file context
-    first_file: dict[str, Path] = {}
-    unique_urls: List[str] = []
-    for file, url in all_links:
-        if url not in first_file:
-            first_file[url] = file
-            unique_urls.append(url)
-
-    print(f"Unique URLs to check: {len(unique_urls)}")
-
-    broken: List[dict] = []
-    ok_count = 0
-    for url in unique_urls:
-        if is_ignorable(url, ignore_prefixes):
-            continue
-        ok, code, reason = check_url(url)
-        if ok:
-            ok_count += 1
-        else:
-            broken.append({"url": url, "file": str(first_file[url]), "error": reason})
-
-    print(f"OK: {ok_count}   Broken: {len(broken)}")
-    if broken:
-        print("\nBroken links:")
-        for b in broken:
-            print(f"  [{b['file']}] {b['url']} — {b['error']}")
-
-    if args.json:
-        print(json.dumps({"scanned": len(unique_urls), "ok": ok_count,
-                          "broken": len(broken), "broken_links": broken}, indent=2))
-
-    return 1 if (args.fail_on_broken and broken) else 0
-
-
-if __name__ == '__main__':
-    sys.exit(main())
--- a/scripts/validate_knowledge.py
+++ b/scripts/validate_knowledge.py
@@ -1,38 +0,0 @@
-#!/usr/bin/env python3
-"""Validate knowledge files and index.json against the schema."""
-import json, sys
-from pathlib import Path
-
-VALID_CATEGORIES = {"fact", "pitfall", "pattern", "tool-quirk", "question"}
-REQUIRED = {"id", "fact", "category", "domain", "confidence"}
-
-def validate_fact(fact, src=""):
-    errs = []
-    for f in REQUIRED:
-        if f not in fact: errs.append(f"{src}: missing '{f}'")
-    if "category" in fact and fact["category"] not in VALID_CATEGORIES:
-        errs.append(f"{src}: invalid category '{fact['category']}'")
-    if "confidence" in fact:
-        if not isinstance(fact["confidence"], (int, float)) or not (0 <= fact["confidence"] <= 1):
-            errs.append(f"{src}: confidence must be 0.0-1.0")
-    if "id" in fact:
-        parts = fact["id"].split(":")
-        if len(parts) != 3: errs.append(f"{src}: id must be domain:category:sequence")
-    return errs
-
-def main():
-    idx = Path(__file__).parent.parent / "knowledge" / "index.json"
-    if not idx.exists(): print(f"FAILED: {idx} not found"); sys.exit(1)
-    data = json.load(open(idx))
-    errs = []
-    seen = set()
-    for i, f in enumerate(data.get("facts", [])):
-        errs.extend(validate_fact(f, f"[{i}]"))
-        if "id" in f:
-            if f["id"] in seen: errs.append(f"duplicate id '{f['id']}'")
-            seen.add(f["id"])
-    if errs:
-        print(f"FAILED - {len(errs)} errors:"); [print(f"  x {e}") for e in errs]; sys.exit(1)
-    print(f"PASSED - {len(data.get('facts', []))} facts")
-
-if __name__ == "__main__": main()
--- a/templates/harvest-prompt.md
+++ b/templates/harvest-prompt.md
@@ -2,107 +2,98 @@

 ## System Prompt

-You are a knowledge extraction engine. You read session transcripts and output ONLY structured JSON. You never infer. You never assume. You extract only what the transcript explicitly states.
+You are a knowledge extraction engine. Your task is to analyze a session transcript and extract durable knowledge that will help future sessions be more efficient.

-## Prompt
+## Instructions

+Read the session transcript carefully. Extract ONLY information that is explicitly stated in the transcript. Do NOT infer, assume, or hallucinate information.
+
+### Categories
+
+Extract knowledge into these categories:
+
+1. **fact**: Concrete, verifiable information learned (e.g., "Repository X has 5 files", "API returns JSON with field Y")
+2. **pitfall**: Errors encountered, wrong assumptions, things that wasted time (e.g., "Assumed API token was in env var GITEA_TOKEN, but it's in ~/.config/gitea/token")
+3. **pattern**: Successful sequences of actions (e.g., "To deploy: 1. Run tests 2. Build 3. Push to Gitea 4. Trigger webhook")
+4. **tool-quirk**: Environment-specific behaviors (e.g., "Token paths are different on macOS vs Linux", "URL format requires trailing slash")
+5. **question**: Things identified but not answered (e.g., "Need to determine optimal batch size for harvesting")
+
+### Output Format
+
+Return a JSON object with an array of extracted knowledge items. Each item must have:
+
+```json
+{
+  "fact": "One sentence description of the knowledge",
+  "category": "fact|pitfall|pattern|tool-quirk|question",
+  "repo": "Repository name this applies to, or 'global' if general",
+  "confidence": 0.0-1.0
+}
 ```
-TASK: Extract durable knowledge from this session transcript.

-RULES:
-1. Extract ONLY information explicitly stated in the transcript.
-2. Do NOT infer, assume, or hallucinate.
-3. Every fact must be verifiable by pointing to a specific line in the transcript.
-4. If the session failed or was partial, extract pitfalls and questions — these are the most valuable.
-5. Be specific. "Gitea API is slow" is worthless. "Gitea issues endpoint with state=open returns empty when limit=50 but works with limit=5" is knowledge.
+### Confidence Scoring

-CATEGORIES (assign exactly one per item):
- fact: Concrete, verifiable thing learned (paths, formats, counts, configs)
- pitfall: Error hit, wrong assumption, time wasted, thing that didn't work
- pattern: Successful sequence that should be reused (deploy steps, debug flow)
- tool-quirk: Environment-specific behavior (token paths, URL formats, API gotchas)
- question: Something identified but not answered — the NEXT agent should investigate
+- 0.9-1.0: Explicitly stated with verification (e.g., "Error message shows X")
+- 0.7-0.8: Clearly implied by multiple data points
+- 0.5-0.6: Suggested but not fully verified
+- 0.3-0.4: Inferred from limited data
+- 0.1-0.2: Speculative or uncertain

-CONFIDENCE:
- 0.9: Directly observed with error output or explicit verification
- 0.7: Multiple data points confirm, but not explicitly verified
- 0.5: Suggested by context, not tested
- 0.3: Inferred from limited evidence
+### Constraints

-OUTPUT FORMAT (valid JSON only, no markdown, no explanation):
+1. **No hallucination**: Only extract what's explicitly in the transcript
+2. **Specificity**: Each fact must be specific and actionable
+3. **Relevance**: Only extract knowledge that would help future sessions
+4. **Brevity**: One sentence per fact
+5. **Partial sessions**: Even failed or incomplete sessions may contain valuable pitfalls
+
+### Example Input/Output
+
+**Input Transcript (excerpt):**
+```
+User: Clone the repo at forge.alexanderwhitestone.com/Timmy_Foundation/compounding-intelligence
+Agent: Trying git clone... Error: 401 Unauthorized
+Agent: Checking for token... Found in ~/.config/gitea/token
+Agent: Token is gitea_token format, not OAuth
+Agent: Clone successful with token authentication
+```
+
+**Output:**
+```json
 {
  "knowledge": [
    {
-      "fact": "One specific sentence of knowledge",
-      "category": "fact|pitfall|pattern|tool-quirk|question",
-      "repo": "repo-name or global",
-      "confidence": 0.0-1.0,
-      "evidence": "Brief quote or reference from transcript that supports this"
+      "fact": "Gitea repo at forge.alexanderwhitestone.com requires authentication for cloning",
+      "category": "fact",
+      "repo": "compounding-intelligence",
+      "confidence": 0.9
+    },
+    {
+      "fact": "Gitea authentication token is stored at ~/.config/gitea/token",
+      "category": "tool-quirk",
+      "repo": "global",
+      "confidence": 0.9
+    },
+    {
+      "fact": "Gitea uses gitea_token format, not OAuth for API access",
+      "category": "tool-quirk",
+      "repo": "global",
+      "confidence": 0.8
+    },
+    {
+      "fact": "Clone fails with 401 when no token is provided",
+      "category": "pitfall",
+      "repo": "compounding-intelligence",
+      "confidence": 0.9
    }
-  ],
-  "meta": {
-    "session_outcome": "success|partial|failed",
-    "tools_used": ["tool1", "tool2"],
-    "repos_touched": ["repo1"],
-    "error_count": 0,
-    "knowledge_count": 0
-  }
+  ]
 }
-
-TRANSCRIPT:
-{{transcript}}
 ```

-## Design Notes
+## Final Notes

-### Why this works with mimo-v2-pro
-
-Mimo needs:
- Explicit format constraints ("valid JSON only, no markdown")
- Clear category definitions with concrete examples
- Hard rules before soft guidance
- The transcript at the END (so it reads all instructions first)
-
-This prompt front-loads all rules, then gives the transcript last. Mimo follows the pattern.
-
-### Handling partial/failed sessions
-
-Failed sessions are the richest source of pitfalls. The prompt explicitly says:
-> "If the session failed or was partial, extract pitfalls and questions — these are the most valuable."
-
-This reframes failure as valuable output, not noise to discard.
-
-### The `evidence` field
-
-Added to the original spec. Every extracted item must cite where in the transcript it came from. This:
- Prevents hallucination (can't cite what isn't there)
- Enables verification (reviewer can check the source)
- Trains confidence calibration (the agent must find evidence, not just claim it)
-
-### Token budget
-
-Target: ~1,000 tokens for the prompt (excluding transcript).
-
-```
-System prompt:        ~50 tokens
-Rules:               ~200 tokens
-Categories:          ~150 tokens
-Confidence:          ~100 tokens
-Output format:       ~200 tokens
-Design notes:        NOT included in prompt (documentation only)
-─────────────────────────────
-Total prompt:        ~700 tokens
-```
-
-Leaves ~300 tokens headroom for variable content (transcript insertion, edge cases).
-
-### What this replaces
-
-The v1 prompt had:
- Verbose prose explanations (waste tokens for mimo)
- No `evidence` field (hallucination risk)
- No `meta` block (no session-level metadata)
- No explicit handling of failed sessions
- Example was too long (~150 tokens of example for a 1k prompt)
-
-This v2 is tighter, more structured, and adds the evidence requirement that prevents the #1 failure mode of extraction prompts: generating plausible-sounding facts that aren't in the transcript.
+- Process the entire transcript, not just the beginning
+- Pay special attention to errors and corrections
+- Note any environment-specific details
+- Track tool-specific behaviors and quirks
+- If the session failed, focus on pitfalls and questions
--- a/test_sessions/session_failure.jsonl
+++ b/test_sessions/session_failure.jsonl
@@ -1,18 +0,0 @@
-{"role": "user", "content": "Deploy the application to production"}
-{"role": "assistant", "content": "I'll deploy the application to production.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ssh deploy@prod-server 'cd /app && git pull && docker-compose up -d'"}}]}
-{"role": "tool", "content": "ssh: connect to host prod-server port 22: Connection refused"}
-{"role": "assistant", "content": "Connection refused. Let me check the server status.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ping -c 3 prod-server"}}]}
-{"role": "tool", "content": "PING prod-server (192.168.1.100): 56 data bytes\nRequest timeout for icmp_seq 0\nRequest timeout for icmp_seq 1\nRequest timeout for icmp_seq 2\n--- prod-server ping statistics ---\n3 packets transmitted, 0 packets received, 100.0% packet loss"}
-{"role": "assistant", "content": "The server appears to be down. Let me check if there's an alternative deployment method."}
-{"role": "user", "content": "Try using the VPS instead"}
-{"role": "assistant", "content": "I'll try deploying to the VPS.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ssh deploy@vps-server 'cd /app && git pull && docker-compose up -d'"}}]}
-{"role": "tool", "content": "Permission denied (publickey)."}
-{"role": "assistant", "content": "Permission denied. The SSH key might not be set up correctly. Let me check the SSH configuration.", "tool_calls": [{"function": "terminal", "arguments": {"command": "cat ~/.ssh/config"}}]}
-{"role": "tool", "content": "Host vps-server\n    HostName 167.99.126.228\n    User deploy\n    IdentityFile ~/.ssh/id_rsa_vps"}
-{"role": "assistant", "content": "The SSH config looks correct. Let me check if the key exists.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ls -la ~/.ssh/id_rsa_vps"}}]}
-{"role": "tool", "content": "ls: /Users/user/.ssh/id_rsa_vps: No such file or directory"}
-{"role": "assistant", "content": "The SSH key doesn't exist. This is a pitfall - we need to generate the SSH key first."}
-{"role": "user", "content": "Generate a new SSH key for the VPS"}
-{"role": "assistant", "content": "I'll generate a new SSH key.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_rsa_vps -N ''"}}]}
-{"role": "tool", "content": "Generating public/private rsa key pair.\nYour identification has been saved in /Users/user/.ssh/id_rsa_vps\nYour public key has been saved in /Users/user/.ssh/id_rsa_vps.pub\nThe key fingerprint is:\nSHA256:abc123... user@machine"}
-{"role": "assistant", "content": "SSH key generated. Now I need to add it to the VPS."}
--- a/test_sessions/session_partial.jsonl
+++ b/test_sessions/session_partial.jsonl
@@ -1,14 +0,0 @@
-{"role": "user", "content": "Check the Gitea API for open issues"}
-{"role": "assistant", "content": "I'll check the Gitea API for open issues.", "tool_calls": [{"function": "terminal", "arguments": {"command": "curl -H 'Authorization: token $GITEA_TOKEN' https://forge.alexanderwhitestone.com/api/v1/repos/Timmy_Foundation/compounding-intelligence/issues?state=open"}}]}
-{"role": "tool", "content": "curl: (6) Could not resolve host: forge.alexanderwhitestone.com"}
-{"role": "assistant", "content": "DNS resolution failed. Let me check the network connectivity.", "tool_calls": [{"function": "terminal", "arguments": {"command": "nslookup forge.alexanderwhitestone.com"}}]}
-{"role": "tool", "content": "Server:		8.8.8.8\nAddress:	8.8.8.8#53\n\n** server can't find forge.alexanderwhitestone.com: NXDOMAIN"}
-{"role": "assistant", "content": "The domain doesn't resolve. This might be a DNS issue. Let me try using the IP address directly.", "tool_calls": [{"function": "terminal", "arguments": {"command": "curl -H 'Authorization: token $GITEA_TOKEN' https://167.99.126.228/api/v1/repos/Timmy_Foundation/compounding-intelligence/issues?state=open"}}]}
-{"role": "tool", "content": "curl: (60) SSL certificate problem: self signed certificate"}
-{"role": "assistant", "content": "SSL certificate issue. The server is using a self-signed certificate. Let me try with -k flag to skip SSL verification.", "tool_calls": [{"function": "terminal", "arguments": {"command": "curl -k -H 'Authorization: token $GITEA_TOKEN' https://167.99.126.228/api/v1/repos/Timmy_Foundation/compounding-intelligence/issues?state=open"}}]}
-{"role": "tool", "content": "[]"}
-{"role": "assistant", "content": "No open issues found. The API returned an empty array."}
-{"role": "user", "content": "Check if there are any pull requests"}
-{"role": "assistant", "content": "Let me check for pull requests.", "tool_calls": [{"function": "terminal", "arguments": {"command": "curl -k -H 'Authorization: token $GITEA_TOKEN' https://167.99.126.228/api/v1/repos/Timmy_Foundation/compounding-intelligence/pulls?state=open"}}]}
-{"role": "tool", "content": "[{"id": 20, "title": "feat: add harvester.py + session_reader.py — session knowledge extractor (closes #8)", "state": "open", "user": {"login": "Rockachopa"}, "created_at": "2026-04-14T17:28:20Z"}]"}
-{"role": "assistant", "content": "Found 1 open pull request: PR #20 - "feat: add harvester.py + session_reader.py — session knowledge extractor (closes #8)" by Rockachopa."}
--- a/test_sessions/session_patterns.jsonl
+++ b/test_sessions/session_patterns.jsonl
@@ -1,10 +0,0 @@
-{"role": "user", "content": "How do I deploy a new version of the application?"}
-{"role": "assistant", "content": "Here's the deployment process:\n1. Run tests: pytest\n2. Build Docker image: docker build -t app:latest .\n3. Push to registry: docker push registry.example.com/app:latest\n4. Update production: ssh prod 'cd /app && docker-compose pull && docker-compose up -d'\n5. Verify: curl https://app.example.com/health"}
-{"role": "user", "content": "What if tests fail?"}
-{"role": "assistant", "content": "If tests fail:\n1. Check the error logs: pytest -v\n2. Fix the failing tests\n3. Re-run tests: pytest\n4. If still failing, check recent changes: git log --oneline -10\n5. Consider reverting: git revert HEAD"}
-{"role": "user", "content": "What if the Docker build fails?"}
-{"role": "assistant", "content": "If Docker build fails:\n1. Check Dockerfile syntax: docker build --no-cache -t app:latest .\n2. Check for missing dependencies: cat requirements.txt\n3. Check disk space: df -h\n4. Clean up Docker: docker system prune -f\n5. Try building with verbose output: docker build --progress=plain -t app:latest ."}
-{"role": "user", "content": "What if the push to registry fails?"}
-{"role": "assistant", "content": "If push to registry fails:\n1. Check registry credentials: docker login registry.example.com\n2. Check registry space: curl https://registry.example.com/v2/_catalog\n3. Check image size: docker images app:latest\n4. Try tagging with version: docker tag app:latest registry.example.com/app:v1.0.0\n5. Push with version tag: docker push registry.example.com/app:v1.0.0"}
-{"role": "user", "content": "What if the production update fails?"}
-{"role": "assistant", "content": "If production update fails:\n1. Check SSH connection: ssh prod 'echo connected'\n2. Check disk space on prod: ssh prod 'df -h'\n3. Check Docker status: ssh prod 'docker ps -a'\n4. Check logs: ssh prod 'docker-compose logs --tail=100'\n5. Rollback: ssh prod 'cd /app && git checkout HEAD~1 && docker-compose up -d'"}
--- a/test_sessions/session_questions.jsonl
+++ b/test_sessions/session_questions.jsonl
@@ -1,12 +0,0 @@
-{"role": "user", "content": "I need to implement a new feature for the application"}
-{"role": "assistant", "content": "I can help you implement a new feature. What feature would you like to add?"}
-{"role": "user", "content": "I want to add user authentication"}
-{"role": "assistant", "content": "User authentication is a good feature. I have a few questions:\n1. What authentication method do you prefer? (JWT, OAuth, Session-based)\n2. Do you need social login? (Google, GitHub, etc.)\n3. What user data do you need to store?\n4. Do you need role-based access control?\n5. What's your database? (PostgreSQL, MySQL, SQLite)"}
-{"role": "user", "content": "I prefer JWT, no social login, just email and password, no roles, PostgreSQL"}
-{"role": "assistant", "content": "Great choices. Here's the implementation plan:\n1. Create user model with email and password hash\n2. Create JWT token generation and validation\n3. Create login endpoint: POST /api/auth/login\n4. Create register endpoint: POST /api/auth/register\n5. Create middleware for protected routes\n6. Add password hashing with bcrypt\n7. Add rate limiting for login attempts\n8. Add password reset functionality\n9. Add email verification\n10. Test all endpoints"}
-{"role": "user", "content": "What about password reset?"}
-{"role": "assistant", "content": "For password reset:\n1. Create reset token table\n2. Create forgot password endpoint: POST /api/auth/forgot-password\n3. Send reset email with token\n4. Create reset password endpoint: POST /api/auth/reset-password\n5. Validate reset token and expiration\n6. Update password and delete token\n7. Send confirmation email"}
-{"role": "user", "content": "What about email verification?"}
-{"role": "assistant", "content": "For email verification:\n1. Add email_verified field to user model\n2. Create verification token table\n3. Send verification email on registration\n4. Create verify email endpoint: GET /api/auth/verify-email?token=...\n5. Validate token and update email_verified\n6. Add middleware to check email_verified for certain routes\n7. Add resend verification email endpoint"}
-{"role": "user", "content": "What about rate limiting?"}
-{"role": "assistant", "content": "For rate limiting:\n1. Use Redis for rate limit storage\n2. Create rate limit middleware\n3. Set limits: 5 login attempts per minute, 3 password resets per hour\n4. Return 429 Too Many Requests when limit exceeded\n5. Add X-RateLimit headers to responses\n6. Log rate limit violations\n7. Add whitelist for trusted IPs"}
--- a/test_sessions/session_success.jsonl
+++ b/test_sessions/session_success.jsonl
@@ -1,13 +0,0 @@
-{"role": "user", "content": "Clone the repo at forge.alexanderwhitestone.com/Timmy_Foundation/compounding-intelligence"}
-{"role": "assistant", "content": "I'll clone the repository for you.", "tool_calls": [{"function": "terminal", "arguments": {"command": "git clone https://forge.alexanderwhitestone.com/Timmy_Foundation/compounding-intelligence.git"}}]}
-{"role": "tool", "content": "Cloning into 'compounding-intelligence'...\nremote: Enumerating objects: 15, done.\nremote: Counting objects: 100% (15/15), done.\nremote: Compressing objects: 100% (12/12), done.\nremote: Total 15 (delta 2), reused 0 (delta 0), pack-reused 0\nReceiving objects: 100% (15/15), done.\nResolving deltas: 100% (2/2), done."}
-{"role": "assistant", "content": "Repository cloned successfully. Let me check the structure.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ls -la compounding-intelligence/"}}]}
-{"role": "tool", "content": "total 8\ndrwxr-xr-x   5 user  staff  160 Apr 14 17:21 .\ndrwxr-xr-x  12 user  staff  384 Apr 14 17:21 ..\ndrwxr-xr-x  13 user  staff  416 Apr 14 17:21 .git\n-rw-r--r--   1 user  staff  1234 Apr 14 17:21 README.md\ndrwxr-xr-x   3 user  staff   96 Apr 14 17:21 knowledge\n-rw-r--r--   1 user  staff   456 Apr 14 17:21 metrics\ndrwxr-xr-x   4 user  staff  128 Apr 14 17:21 scripts\ndrwxr-xr-x   3 user  staff   96 Apr 14 17:21 templates"}
-{"role": "user", "content": "What files are in the templates directory?"}
-{"role": "assistant", "content": "Let me check the templates directory.", "tool_calls": [{"function": "terminal", "arguments": {"command": "ls -la compounding-intelligence/templates/"}}]}
-{"role": "tool", "content": "total 0\ndrwxr-xr-x  3 user  staff   96 Apr 14 17:21 .\ndrwxr-xr-x  5 user  staff  160 Apr 14 17:21 ..\n-rw-r--r--  1 user  staff    0 Apr 14 17:21 .gitkeep"}
-{"role": "assistant", "content": "The templates directory currently only contains a .gitkeep file. No templates have been added yet."}
-{"role": "user", "content": "Create a harvest-prompt.md template in the templates directory."}
-{"role": "assistant", "content": "I'll create the harvest-prompt.md template for you.", "tool_calls": [{"function": "terminal", "arguments": {"command": "cat > compounding-intelligence/templates/harvest-prompt.md << 'EOF'\n# Knowledge Extraction Prompt\n\nExtract knowledge from session transcripts.\nEOF"}}]}
-{"role": "tool", "content": ""}
-{"role": "assistant", "content": "Template created successfully. The harvest-prompt.md file is now in the templates directory."}
--- a/tests/test_ci_config.py
+++ b/tests/test_ci_config.py
@@ -1,19 +0,0 @@
-from pathlib import Path
-
-
-def test_requirements_makefile_and_workflow_exist() -> None:
-    assert Path("requirements.txt").exists()
-    assert Path("Makefile").exists()
-    assert Path(".gitea/workflows/test.yml").exists()
-
-
-def test_ci_workflow_runs_project_test_command() -> None:
-    workflow = Path(".gitea/workflows/test.yml").read_text(encoding="utf-8")
-    requirements = Path("requirements.txt").read_text(encoding="utf-8")
-    makefile = Path("Makefile").read_text(encoding="utf-8")
-
-    assert "pytest" in requirements
-    assert "test:" in makefile
-    assert "python3 -m pytest tests/test_ci_config.py scripts/test_*.py -v" in makefile
-    assert "pip install -r requirements.txt" in workflow
-    assert "make test" in workflow
--- a/tests/test_dedup.py
+++ b/tests/test_dedup.py
@@ -1,207 +0,0 @@
-"""Tests for knowledge deduplication module (Issue #196)."""
-
-import json
-import sys
-from pathlib import Path
-
-import pytest
-
-sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))
-
-from dedup import (
-    normalize_text,
-    content_hash,
-    tokenize,
-    token_similarity,
-    quality_score,
-    merge_facts,
-    dedup_facts,
-    generate_test_duplicates,
-)
-
-
-class TestNormalize:
-    def test_lowercases(self):
-        assert normalize_text("Hello World") == "hello world"
-
-    def test_collapses_whitespace(self):
-        assert normalize_text("  hello   world  ") == "hello world"
-
-    def test_strips(self):
-        assert normalize_text("  text  ") == "text"
-
-
-class TestContentHash:
-    def test_deterministic(self):
-        h1 = content_hash("Hello World")
-        h2 = content_hash("hello world")
-        h3 = content_hash("  Hello   World  ")
-        assert h1 == h2 == h3
-
-    def test_different_texts(self):
-        h1 = content_hash("Hello")
-        h2 = content_hash("World")
-        assert h1 != h2
-
-    def test_returns_hex(self):
-        h = content_hash("test")
-        assert len(h) == 64  # SHA256
-        assert all(c in '0123456789abcdef' for c in h)
-
-
-class TestTokenize:
-    def test_extracts_words(self):
-        tokens = tokenize("Hello World Test")
-        assert "hello" in tokens
-        assert "world" in tokens
-        assert "test" in tokens
-
-    def test_skips_short_words(self):
-        tokens = tokenize("a to is the hello")
-        assert "a" not in tokens
-        assert "to" not in tokens
-        assert "hello" in tokens
-
-    def test_returns_set(self):
-        tokens = tokenize("hello hello world")
-        assert isinstance(tokens, set)
-        assert len(tokens) == 2
-
-
-class TestTokenSimilarity:
-    def test_identical(self):
-        assert token_similarity("hello world", "hello world") == 1.0
-
-    def test_no_overlap(self):
-        assert token_similarity("alpha beta", "gamma delta") == 0.0
-
-    def test_partial_overlap(self):
-        sim = token_similarity("hello world test", "hello universe test")
-        assert 0.3 < sim < 0.7
-
-    def test_empty(self):
-        assert token_similarity("", "hello") == 0.0
-        assert token_similarity("hello", "") == 0.0
-
-    def test_symmetric(self):
-        a = "hello world test"
-        b = "hello universe test"
-        assert token_similarity(a, b) == token_similarity(b, a)
-
-
-class TestQualityScore:
-    def test_high_confidence(self):
-        fact = {"confidence": 0.95, "source_count": 5, "tags": ["test"], "related": ["x"]}
-        score = quality_score(fact)
-        assert score > 0.7
-
-    def test_low_confidence(self):
-        fact = {"confidence": 0.3, "source_count": 1}
-        score = quality_score(fact)
-        assert score < 0.5
-
-    def test_defaults(self):
-        score = quality_score({})
-        assert 0 < score < 1
-
-
-class TestMergeFacts:
-    def test_merges_tags(self):
-        keep = {"id": "a", "fact": "test", "tags": ["git"], "confidence": 0.9}
-        drop = {"id": "b", "fact": "test", "tags": ["python"], "confidence": 0.8}
-        merged = merge_facts(keep, drop)
-        assert "git" in merged["tags"]
-        assert "python" in merged["tags"]
-
-    def test_merges_source_count(self):
-        keep = {"id": "a", "fact": "test", "source_count": 3}
-        drop = {"id": "b", "fact": "test", "source_count": 2}
-        merged = merge_facts(keep, drop)
-        assert merged["source_count"] == 5
-
-    def test_keeps_higher_confidence(self):
-        keep = {"id": "a", "fact": "test", "confidence": 0.7}
-        drop = {"id": "b", "fact": "test", "confidence": 0.9}
-        merged = merge_facts(keep, drop)
-        assert merged["confidence"] == 0.9
-
-    def test_tracks_merged_from(self):
-        keep = {"id": "a", "fact": "test"}
-        drop = {"id": "b", "fact": "test"}
-        merged = merge_facts(keep, drop)
-        assert "b" in merged["_merged_from"]
-
-
-class TestDedupFacts:
-    def test_removes_exact_dupes(self):
-        facts = [
-            {"id": "1", "fact": "Always use git rebase"},
-            {"id": "2", "fact": "Always use git rebase"},  # exact dupe
-            {"id": "3", "fact": "Check logs first"},
-        ]
-        deduped, stats = dedup_facts(facts)
-        assert stats["exact_dupes"] == 1
-        assert stats["unique"] == 2
-
-    def test_removes_near_dupes(self):
-        facts = [
-            {"id": "1", "fact": "Always check logs before deploying to production server"},
-            {"id": "2", "fact": "Always check logs before deploying to production environment"},
-            {"id": "3", "fact": "Use docker compose for local development environments"},
-        ]
-        deduped, stats = dedup_facts(facts, near_threshold=0.5)
-        assert stats["near_dupes"] >= 1
-        assert stats["unique"] == 2
-
-    def test_preserves_unique(self):
-        facts = [
-            {"id": "1", "fact": "Use git rebase for clean history"},
-            {"id": "2", "fact": "Docker containers should be stateless"},
-            {"id": "3", "fact": "Always write tests before code"},
-        ]
-        deduped, stats = dedup_facts(facts)
-        assert stats["unique"] == 3
-        assert stats["removed"] == 0
-
-    def test_empty_input(self):
-        deduped, stats = dedup_facts([])
-        assert stats["total"] == 0
-        assert stats["unique"] == 0
-
-    def test_keeps_higher_quality_near_dup(self):
-        facts = [
-            {"id": "1", "fact": "Check logs before deploying to production server", "confidence": 0.5, "source_count": 1},
-            {"id": "2", "fact": "Check logs before deploying to production environment", "confidence": 0.9, "source_count": 5, "tags": ["ops"]},
-        ]
-        deduped, stats = dedup_facts(facts, near_threshold=0.5)
-        assert stats["unique"] == 1
-        # Higher quality fact should be kept
-        assert deduped[0]["confidence"] == 0.9
-
-    def test_dry_run_does_not_modify(self):
-        facts = [
-            {"id": "1", "fact": "Same text"},
-            {"id": "2", "fact": "Same text"},
-        ]
-        deduped, stats = dedup_facts(facts, dry_run=True)
-        assert stats["exact_dupes"] == 1
-        # In dry_run, merge_facts is skipped so facts aren't modified
-        assert len(deduped) == 1
-
-
-class TestGenerateTestDuplicates:
-    def test_generates_correct_count(self):
-        facts = generate_test_duplicates(20)
-        assert len(facts) > 20  # 20 unique + duplicates
-
-    def test_has_exact_dupes(self):
-        facts = generate_test_duplicates(20)
-        hashes = [content_hash(f["fact"]) for f in facts]
-        # Should have some duplicate hashes
-        assert len(hashes) != len(set(hashes))
-
-    def test_dedup_removes_dupes(self):
-        facts = generate_test_duplicates(20)
-        deduped, stats = dedup_facts(facts)
-        assert stats["unique"] <= 20
-        assert stats["removed"] > 0
--- a/tests/test_freshness.py
+++ b/tests/test_freshness.py
@@ -1,227 +0,0 @@
-#!/usr/bin/env python3
-"""Tests for scripts/freshness.py — 8 tests."""
-
-import json
-import os
-import sys
-import tempfile
-
-sys.path.insert(0, os.path.join(os.path.dirname(__file__) or ".", ".."))
-import importlib.util
-spec = importlib.util.spec_from_file_location(
-    "freshness", os.path.join(os.path.dirname(__file__) or ".", "..", "scripts", "freshness.py"))
-mod = importlib.util.module_from_spec(spec)
-spec.loader.exec_module(mod)
-
-compute_file_hash = mod.compute_file_hash
-check_freshness = mod.check_freshness
-load_knowledge_entries = mod.load_knowledge_entries
-
-
-def test_compute_file_hash():
-    """File hash should be computed correctly."""
-    with tempfile.NamedTemporaryFile(mode='w', delete=False) as f:
-        f.write("test content")
-        f.flush()
-        h = compute_file_hash(f.name)
-        assert h is not None
-        assert h.startswith("sha256:")
-        os.unlink(f.name)
-    print("PASS: test_compute_file_hash")
-
-
-def test_compute_file_hash_nonexistent():
-    """Nonexistent file should return None."""
-    h = compute_file_hash("/nonexistent/file.txt")
-    assert h is None
-    print("PASS: test_compute_file_hash_nonexistent")
-
-
-def test_load_knowledge_entries_empty():
-    """Empty knowledge dir should return empty list."""
-    with tempfile.TemporaryDirectory() as tmpdir:
-        entries = load_knowledge_entries(tmpdir)
-        assert entries == []
-    print("PASS: test_load_knowledge_entries_empty")
-
-
-def test_load_knowledge_entries_from_index():
-    """Should load entries from index.json."""
-    with tempfile.TemporaryDirectory() as tmpdir:
-        # Create index.json
-        index_path = os.path.join(tmpdir, "index.json")
-        with open(index_path, "w") as f:
-            json.dump({
-                "facts": [
-                    {
-                        "fact": "Test fact",
-                        "source_file": "test.py",
-                        "source_hash": "sha256:abc123",
-                        "category": "fact",
-                        "confidence": 0.9
-                    }
-                ]
-            }, f)
-        
-        entries = load_knowledge_entries(tmpdir)
-        assert len(entries) == 1
-        assert entries[0]["fact"] == "Test fact"
-        assert entries[0]["source_file"] == "test.py"
-    print("PASS: test_load_knowledge_entries_from_index")
-
-
-def test_load_knowledge_entries_from_yaml():
-    """Should load entries from YAML files."""
-    with tempfile.TemporaryDirectory() as tmpdir:
-        # Create global directory
-        global_dir = os.path.join(tmpdir, "global")
-        os.makedirs(global_dir)
-        
-        # Create YAML file
-        yaml_path = os.path.join(global_dir, "test.yaml")
-        with open(yaml_path, "w") as f:
-            f.write("""
-pitfalls:
-  - description: "Test pitfall"
-    source_file: "test.py"
-    source_hash: "sha256:def456"
-    category: "pitfall"
-    confidence: 0.8
-""")
-        
-        entries = load_knowledge_entries(tmpdir)
-        assert len(entries) == 1
-        assert entries[0]["fact"] == "Test pitfall"
-        assert entries[0]["category"] == "pitfall"
-    print("PASS: test_load_knowledge_entries_from_yaml")
-
-
-def test_check_freshness_no_changes():
-    """With no source file reference, entries should be counted correctly."""
-    with tempfile.TemporaryDirectory() as tmpdir:
-        # Create knowledge dir
-        knowledge_dir = os.path.join(tmpdir, "knowledge")
-        os.makedirs(knowledge_dir)
-        
-        # Create repo dir
-        repo_dir = os.path.join(tmpdir, "repo")
-        os.makedirs(repo_dir)
-        
-        # Create index.json with entry that has no source_file
-        index_path = os.path.join(knowledge_dir, "index.json")
-        with open(index_path, "w") as f:
-            json.dump({
-                "facts": [
-                    {
-                        "fact": "General knowledge",
-                        "category": "fact",
-                        "confidence": 0.9
-                        # No source_file or source_hash
-                    }
-                ]
-            }, f)
-        
-        result = check_freshness(knowledge_dir, repo_dir, days=1)
-        
-        # Entry without source_file should be counted as "fresh" (no_source status)
-        assert result["summary"]["total"] == 1
-        assert result["summary"]["stale"] == 0
-        assert result["summary"]["fresh"] == 1
-        assert result["fresh_entries"][0]["status"] == "no_source"
-    print("PASS: test_check_freshness_no_changes")
-
-
-def test_check_freshness_with_hash_mismatch():
-    """Hash mismatch should mark entry as stale."""
-    with tempfile.TemporaryDirectory() as tmpdir:
-        # Create knowledge dir
-        knowledge_dir = os.path.join(tmpdir, "knowledge")
-        os.makedirs(knowledge_dir)
-        
-        # Create repo dir with a file
-        repo_dir = os.path.join(tmpdir, "repo")
-        os.makedirs(repo_dir)
-        
-        test_file = os.path.join(repo_dir, "test.py")
-        with open(test_file, "w") as f:
-            f.write("print('hello')")
-        
-        # Create index.json with wrong hash
-        index_path = os.path.join(knowledge_dir, "index.json")
-        with open(index_path, "w") as f:
-            json.dump({
-                "facts": [
-                    {
-                        "fact": "Test fact",
-                        "source_file": "test.py",
-                        "source_hash": "sha256:wronghash",
-                        "category": "fact",
-                        "confidence": 0.9
-                    }
-                ]
-            }, f)
-        
-        # Initialize git repo
-        os.system(f"cd {repo_dir} && git init && git add . && git commit -m 'init' 2>/dev/null")
-        
-        result = check_freshness(knowledge_dir, repo_dir, days=1)
-        
-        assert result["summary"]["total"] == 1
-        assert result["summary"]["stale"] == 1
-        assert result["summary"]["fresh"] == 0
-        assert result["stale_entries"][0]["reason"] == "hash_mismatch"
-    print("PASS: test_check_freshness_with_hash_mismatch")
-
-
-def test_check_freshness_missing_source():
-    """Missing source file should mark entry as stale."""
-    with tempfile.TemporaryDirectory() as tmpdir:
-        # Create knowledge dir
-        knowledge_dir = os.path.join(tmpdir, "knowledge")
-        os.makedirs(knowledge_dir)
-        
-        # Create repo dir (without the referenced file)
-        repo_dir = os.path.join(tmpdir, "repo")
-        os.makedirs(repo_dir)
-        
-        # Create index.json referencing nonexistent file
-        index_path = os.path.join(knowledge_dir, "index.json")
-        with open(index_path, "w") as f:
-            json.dump({
-                "facts": [
-                    {
-                        "fact": "Test fact",
-                        "source_file": "nonexistent.py",
-                        "source_hash": "sha256:abc123",
-                        "category": "fact",
-                        "confidence": 0.9
-                    }
-                ]
-            }, f)
-        
-        # Initialize git repo
-        os.system(f"cd {repo_dir} && git init && git add . && git commit -m 'init' 2>/dev/null")
-        
-        result = check_freshness(knowledge_dir, repo_dir, days=1)
-        
-        assert result["summary"]["total"] == 1
-        assert result["summary"]["stale"] == 1
-        assert result["summary"]["fresh"] == 0
-        assert result["stale_entries"][0]["reason"] == "source_missing"
-    print("PASS: test_check_freshness_missing_source")
-
-
-def run_all():
-    test_compute_file_hash()
-    test_compute_file_hash_nonexistent()
-    test_load_knowledge_entries_empty()
-    test_load_knowledge_entries_from_index()
-    test_load_knowledge_entries_from_yaml()
-    test_check_freshness_no_changes()
-    test_check_freshness_with_hash_mismatch()
-    test_check_freshness_missing_source()
-    print("\nAll 8 tests passed!")
-
-
-if __name__ == "__main__":
-    run_all()
--- a/tests/test_knowledge_gap_identifier.py
+++ b/tests/test_knowledge_gap_identifier.py
@@ -1,141 +0,0 @@
-"""Tests for knowledge_gap_identifier module."""
-
-import sys
-import os
-import tempfile
-import shutil
-from pathlib import Path
-
-sys.path.insert(0, os.path.join(os.path.dirname(__file__), '..', 'scripts'))
-
-from knowledge_gap_identifier import KnowledgeGapIdentifier, GapType, GapSeverity
-
-
-def _make_repo(tmpdir, structure):
-    """Create a test repo from a dict of {path: content}."""
-    for rel_path, content in structure.items():
-        p = Path(tmpdir) / rel_path
-        p.parent.mkdir(parents=True, exist_ok=True)
-        p.write_text(content)
-
-
-def test_undocumented_symbol():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        _make_repo(tmpdir, {
-            "src/calculator.py": "def add(a, b):\n    return a + b\n",
-            "README.md": "# Calculator\n",
-        })
-        report = KnowledgeGapIdentifier().analyze(tmpdir)
-        undocumented = [g for g in report.gaps if g.gap_type == GapType.UNDOCUMENTED]
-        assert any(g.name == "add" for g in undocumented), "add should be undocumented"
-
-
-def test_documented_symbol_no_gap():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        _make_repo(tmpdir, {
-            "src/calculator.py": "def add(a, b):\n    return a + b\n",
-            "README.md": "# Calculator\nUse `add()` to add numbers.\n",
-        })
-        report = KnowledgeGapIdentifier().analyze(tmpdir)
-        undocumented = [g for g in report.gaps
-                       if g.gap_type == GapType.UNDOCUMENTED and g.name == "add"]
-        assert len(undocumented) == 0, "add is documented, should not be flagged"
-
-
-def test_untested_module():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        _make_repo(tmpdir, {
-            "src/calculator.py": "def add(a, b):\n    return a + b\n",
-            "src/helper.py": "def format(x):\n    return str(x)\n",
-            "tests/test_calculator.py": "from src.calculator import add\nassert add(1,2) == 3\n",
-        })
-        report = KnowledgeGapIdentifier().analyze(tmpdir)
-        untested = [g for g in report.gaps if g.gap_type == GapType.UNTESTED]
-        assert any("helper" in g.name for g in untested), "helper should be untested"
-
-
-def test_tested_module_no_gap():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        _make_repo(tmpdir, {
-            "src/calculator.py": "def add(a, b):\n    return a + b\n",
-            "tests/test_calculator.py": "def test_add():\n    assert True\n",
-        })
-        report = KnowledgeGapIdentifier().analyze(tmpdir)
-        untested = [g for g in report.gaps
-                    if g.gap_type == GapType.UNTESTED and "calculator" in g.name]
-        assert len(untested) == 0, "calculator has tests, should not be flagged"
-
-
-def test_missing_implementation():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        _make_repo(tmpdir, {
-            "src/app.py": "def run():\n    pass\n",
-            "docs/api.md": "# API\nUse `NonExistentClass` to do things.\n",
-        })
-        report = KnowledgeGapIdentifier().analyze(tmpdir)
-        missing = [g for g in report.gaps if g.gap_type == GapType.MISSING_IMPLEMENTATION]
-        assert any(g.name == "NonExistentClass" for g in missing)
-
-
-def test_private_symbols_skipped():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        _make_repo(tmpdir, {
-            "src/app.py": "def _internal():\n    pass\ndef public():\n    pass\n",
-            "README.md": "# App\n",
-        })
-        report = KnowledgeGapIdentifier().analyze(tmpdir)
-        undocumented_names = [g.name for g in report.gaps if g.gap_type == GapType.UNDOCUMENTED]
-        assert "_internal" not in undocumented_names, "Private symbols should be skipped"
-        assert "public" in undocumented_names
-
-
-def test_empty_repo():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        report = KnowledgeGapIdentifier().analyze(tmpdir)
-        assert len(report.gaps) == 0
-
-
-def test_invalid_path():
-    report = KnowledgeGapIdentifier().analyze("/nonexistent/path/xyz")
-    assert len(report.gaps) == 1
-    assert report.gaps[0].severity == GapSeverity.ERROR
-
-
-def test_report_summary():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        _make_repo(tmpdir, {
-            "src/app.py": "class MyService:\n    def handle(self):\n        pass\n",
-            "README.md": "# App\n",
-        })
-        report = KnowledgeGapIdentifier().analyze(tmpdir)
-        summary = report.summary()
-        assert "UNDOCUMENTED" in summary
-        assert "MyService" in summary
-
-
-def test_report_to_dict():
-    with tempfile.TemporaryDirectory() as tmpdir:
-        _make_repo(tmpdir, {
-            "src/app.py": "def hello():\n    pass\n",
-            "README.md": "# App\n",
-        })
-        report = KnowledgeGapIdentifier().analyze(tmpdir)
-        d = report.to_dict()
-        assert "total_gaps" in d
-        assert "gaps" in d
-        assert isinstance(d["gaps"], list)
-        assert d["total_gaps"] > 0
-
-
-if __name__ == "__main__":
-    test_undocumented_symbol()
-    test_documented_symbol_no_gap()
-    test_untested_module()
-    test_tested_module_no_gap()
-    test_missing_implementation()
-    test_private_symbols_skipped()
-    test_empty_repo()
-    test_invalid_path()
-    test_report_summary()
-    test_report_to_dict()
-    print("All 10 tests passed.")
--- a/tests/test_perf_bottleneck_finder.py
+++ b/tests/test_perf_bottleneck_finder.py
@@ -1,319 +0,0 @@
-#!/usr/bin/env python3
-"""
-Tests for Performance Bottleneck Finder.
-"""
-
-import json
-import os
-import tempfile
-import textwrap
-from pathlib import Path
-
-import pytest
-
-# Add scripts to path
-import sys
-sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "scripts"))
-
-from perf_bottleneck_finder import (
-    Bottleneck,
-    PerfReport,
-    find_slow_tests_by_scan,
-    analyze_build_artifacts,
-    analyze_makefile_targets,
-    find_slow_imports,
-    generate_report,
-    format_markdown,
-    severity_sort_key,
-)
-
-
-class TestBottleneck:
-    """Test Bottleneck dataclass."""
-
-    def test_creation(self):
-        b = Bottleneck(
-            category="test",
-            name="test_foo",
-            duration_s=5.0,
-            severity="warning",
-            recommendation="Mock it"
-        )
-        assert b.category == "test"
-        assert b.name == "test_foo"
-        assert b.duration_s == 5.0
-        assert b.severity == "warning"
-        assert b.recommendation == "Mock it"
-        assert b.file_path is None
-        assert b.line_number is None
-
-    def test_with_location(self):
-        b = Bottleneck(
-            category="test",
-            name="test_bar",
-            duration_s=2.0,
-            severity="info",
-            recommendation="Consider",
-            file_path="tests/test_bar.py",
-            line_number=42
-        )
-        assert b.file_path == "tests/test_bar.py"
-        assert b.line_number == 42
-
-    def test_to_dict(self):
-        b = Bottleneck("test", "x", 1.0, "info", "y")
-        d = b.__dict__
-        assert "category" in d
-        assert "duration_s" in d
-
-
-class TestPerfReport:
-    """Test PerfReport dataclass."""
-
-    def test_creation(self):
-        report = PerfReport(
-            timestamp="2026-01-01T00:00:00Z",
-            repo_path="/tmp/repo"
-        )
-        assert report.timestamp == "2026-01-01T00:00:00Z"
-        assert report.bottlenecks == []
-        assert report.summary == {}
-
-    def test_to_dict(self):
-        report = PerfReport(
-            timestamp="2026-01-01T00:00:00Z",
-            repo_path="/tmp/repo",
-            bottlenecks=[Bottleneck("test", "x", 1.0, "info", "y")]
-        )
-        d = report.to_dict()
-        assert "bottlenecks" in d
-        assert len(d["bottlenecks"]) == 1
-
-
-class TestSeveritySort:
-    """Test severity sorting."""
-
-    def test_critical_first(self):
-        items = [
-            Bottleneck("test", "a", 1.0, "info", ""),
-            Bottleneck("test", "b", 0.5, "critical", ""),
-            Bottleneck("test", "c", 2.0, "warning", ""),
-        ]
-        items.sort(key=severity_sort_key)
-        assert items[0].severity == "critical"
-        assert items[1].severity == "warning"
-        assert items[2].severity == "info"
-
-    def test_duration_within_severity(self):
-        items = [
-            Bottleneck("test", "slow", 10.0, "warning", ""),
-            Bottleneck("test", "fast", 1.0, "warning", ""),
-        ]
-        items.sort(key=severity_sort_key)
-        assert items[0].name == "slow"  # Higher duration first within same severity
-
-
-class TestSlowTestScan:
-    """Test slow test pattern scanning."""
-
-    def test_finds_sleep(self, tmp_path):
-        test_file = tmp_path / "test_sleepy.py"
-        test_file.write_text(textwrap.dedent('''
-            import time
-
-            def test_slow():
-                time.sleep(5)
-                assert True
-        '''))
-
-        bottlenecks = find_slow_tests_by_scan(str(tmp_path))
-        assert len(bottlenecks) >= 1
-        assert any("sleep" in b.recommendation.lower() for b in bottlenecks)
-
-    def test_finds_http_calls(self, tmp_path):
-        test_file = tmp_path / "test_http.py"
-        test_file.write_text(textwrap.dedent('''
-            import requests
-
-            def test_api():
-                resp = requests.get("https://example.com")
-                assert resp.status_code == 200
-        '''))
-
-        bottlenecks = find_slow_tests_by_scan(str(tmp_path))
-        assert len(bottlenecks) >= 1
-        assert any("HTTP" in b.recommendation or "mock" in b.recommendation.lower() for b in bottlenecks)
-
-    def test_skips_non_test_files(self, tmp_path):
-        src_file = tmp_path / "main.py"
-        src_file.write_text("import time\ntime.sleep(10)\n")
-
-        bottlenecks = find_slow_tests_by_scan(str(tmp_path))
-        assert len(bottlenecks) == 0
-
-    def test_handles_missing_dir(self):
-        bottlenecks = find_slow_tests_by_scan("/nonexistent/path")
-        assert bottlenecks == []
-
-    def test_file_path_populated(self, tmp_path):
-        test_file = tmp_path / "test_example.py"
-        test_file.write_text("import time\n\ndef test_it():\n    time.sleep(2)\n")
-
-        bottlenecks = find_slow_tests_by_scan(str(tmp_path))
-        assert len(bottlenecks) >= 1
-        assert bottlenecks[0].file_path is not None
-        assert bottlenecks[0].line_number is not None
-
-
-class TestBuildArtifacts:
-    """Test build artifact analysis."""
-
-    def test_finds_large_node_modules(self, tmp_path):
-        nm = tmp_path / "node_modules"
-        nm.mkdir()
-        # Create a file > 10MB
-        big_file = nm / "big.txt"
-        big_file.write_bytes(b"x" * (11 * 1024 * 1024))
-
-        bottlenecks = analyze_build_artifacts(str(tmp_path))
-        assert len(bottlenecks) >= 1
-        assert any("node_modules" in b.name for b in bottlenecks)
-
-    def test_ignores_small_dirs(self, tmp_path):
-        nm = tmp_path / "node_modules"
-        nm.mkdir()
-        small_file = nm / "small.txt"
-        small_file.write_bytes(b"x" * 100)
-
-        bottlenecks = analyze_build_artifacts(str(tmp_path))
-        assert not any("node_modules" in b.name for b in bottlenecks)
-
-    def test_finds_pycache(self, tmp_path):
-        cache = tmp_path / "__pycache__"
-        cache.mkdir()
-        big_file = cache / "big.pyc"
-        big_file.write_bytes(b"x" * (11 * 1024 * 1024))
-
-        bottlenecks = analyze_build_artifacts(str(tmp_path))
-        assert any("__pycache__" in b.name for b in bottlenecks)
-
-
-class TestMakefileAnalysis:
-    """Test Makefile analysis."""
-
-    def test_finds_pip_install(self, tmp_path):
-        makefile = tmp_path / "Makefile"
-        makefile.write_text(textwrap.dedent('''
-            install:
-                pip install -r requirements.txt
-
-            test:
-                pytest
-        '''))
-
-        bottlenecks = analyze_makefile_targets(str(tmp_path))
-        assert len(bottlenecks) >= 1
-
-    def test_no_makefile(self, tmp_path):
-        bottlenecks = analyze_makefile_targets(str(tmp_path))
-        assert bottlenecks == []
-
-
-class TestImportAnalysis:
-    """Test heavy import detection."""
-
-    def test_finds_pandas(self, tmp_path):
-        src = tmp_path / "analysis.py"
-        src.write_text("import pandas as pd\n")
-
-        bottlenecks = find_slow_imports(str(tmp_path))
-        assert len(bottlenecks) >= 1
-        assert any("pandas" in b.name for b in bottlenecks)
-
-    def test_finds_torch(self, tmp_path):
-        src = tmp_path / "model.py"
-        src.write_text("import torch\n")
-
-        bottlenecks = find_slow_imports(str(tmp_path))
-        assert any("torch" in b.name for b in bottlenecks)
-
-    def test_skips_light_imports(self, tmp_path):
-        src = tmp_path / "utils.py"
-        src.write_text("import json\nimport os\nimport sys\n")
-
-        bottlenecks = find_slow_imports(str(tmp_path))
-        assert len(bottlenecks) == 0
-
-
-class TestGenerateReport:
-    """Test full report generation."""
-
-    def test_empty_repo(self, tmp_path):
-        report = generate_report(str(tmp_path))
-        assert report.summary["total_bottlenecks"] >= 0
-        assert "critical" in report.summary
-        assert "warning" in report.summary
-
-    def test_with_findings(self, tmp_path):
-        # Create a test file with issues
-        test_file = tmp_path / "test_slow.py"
-        test_file.write_text(textwrap.dedent('''
-            import time
-            import requests
-
-            def test_sleepy():
-                time.sleep(3)
-
-            def test_http():
-                requests.get("https://example.com")
-        '''))
-
-        report = generate_report(str(tmp_path))
-        assert report.summary["total_bottlenecks"] >= 2
-        assert len(report.bottlenecks) > 0
-
-    def test_summary_categories(self, tmp_path):
-        report = generate_report(str(tmp_path))
-        assert "by_category" in report.summary
-
-
-class TestMarkdownReport:
-    """Test markdown output."""
-
-    def test_format(self):
-        report = PerfReport(
-            timestamp="2026-01-01T00:00:00Z",
-            repo_path="/tmp/repo",
-            bottlenecks=[
-                Bottleneck("test", "slow_test", 5.0, "critical", "Fix it")
-            ],
-            summary={
-                "total_bottlenecks": 1,
-                "critical": 1,
-                "warning": 0,
-                "info": 0,
-                "estimated_total_slowdown_s": 5.0,
-                "by_category": {"test": 1},
-            }
-        )
-        md = format_markdown(report)
-        assert "# Performance Bottleneck Report" in md
-        assert "slow_test" in md
-        assert "🔴" in md
-        assert "Fix it" in md
-
-    def test_empty_report(self):
-        report = PerfReport(
-            timestamp="2026-01-01T00:00:00Z",
-            repo_path="/tmp/repo",
-            summary={
-                "total_bottlenecks": 0,
-                "critical": 0,
-                "warning": 0,
-                "info": 0,
-                "estimated_total_slowdown_s": 0,
-                "by_category": {},
-            }
-        )
-        md = format_markdown(report)
-        assert "Total bottlenecks:** 0" in md
--- a/tests/test_quality_gate.py
+++ b/tests/test_quality_gate.py
@@ -1,108 +0,0 @@
-"""
-Tests for quality_gate.py — Knowledge entry quality scoring.
-"""
-
-import unittest
-from datetime import datetime, timezone, timedelta
-
-import sys
-from pathlib import Path
-sys.path.insert(0, str(Path(__file__).parent.parent))
-
-from quality_gate import (
-    score_specificity,
-    score_actionability,
-    score_freshness,
-    score_source_quality,
-    score_entry,
-    filter_entries,
-)
-
-
-class TestScoreSpecificity(unittest.TestCase):
-    def test_specific_content_scores_high(self):
-        content = "Run `python3 deploy.py --env prod` on 2026-04-15. Example: step 1 configure nginx."
-        score = score_specificity(content)
-        self.assertGreater(score, 0.6)
-
-    def test_vague_content_scores_low(self):
-        content = "It generally depends. Various factors might affect this. Basically, it varies."
-        score = score_specificity(content)
-        self.assertLess(score, 0.5)
-
-    def test_empty_scores_baseline(self):
-        score = score_specificity("")
-        self.assertAlmostEqual(score, 0.5, delta=0.1)
-
-
-class TestScoreActionability(unittest.TestCase):
-    def test_actionable_content_scores_high(self):
-        content = "1. Run `pip install -r requirements.txt`\n2. Execute `python3 train.py`\n3. Verify with `pytest`"
-        score = score_actionability(content)
-        self.assertGreater(score, 0.6)
-
-    def test_abstract_content_scores_low(self):
-        content = "The concept of intelligence is fascinating and multifaceted."
-        score = score_actionability(content)
-        self.assertLess(score, 0.5)
-
-
-class TestScoreFreshness(unittest.TestCase):
-    def test_recent_timestamp_scores_high(self):
-        recent = datetime.now(timezone.utc).isoformat()
-        score = score_freshness(recent)
-        self.assertGreater(score, 0.9)
-
-    def test_old_timestamp_scores_low(self):
-        old = (datetime.now(timezone.utc) - timedelta(days=365)).isoformat()
-        score = score_freshness(old)
-        self.assertLess(score, 0.2)
-
-    def test_none_returns_baseline(self):
-        score = score_freshness(None)
-        self.assertEqual(score, 0.5)
-
-
-class TestScoreSourceQuality(unittest.TestCase):
-    def test_claude_scores_high(self):
-        self.assertGreater(score_source_quality("claude-sonnet"), 0.85)
-
-    def test_ollama_scores_lower(self):
-        self.assertLess(score_source_quality("ollama"), 0.7)
-
-    def test_unknown_returns_default(self):
-        self.assertEqual(score_source_quality("unknown"), 0.5)
-
-
-class TestScoreEntry(unittest.TestCase):
-    def test_good_entry_scores_high(self):
-        entry = {
-            "content": "To deploy: run `kubectl apply -f deployment.yaml`. Verify with `kubectl get pods`.",
-            "model": "claude-sonnet",
-            "timestamp": datetime.now(timezone.utc).isoformat(),
-        }
-        score = score_entry(entry)
-        self.assertGreater(score, 0.6)
-
-    def test_poor_entry_scores_low(self):
-        entry = {
-            "content": "It depends. Various things might happen.",
-            "model": "unknown",
-        }
-        score = score_entry(entry)
-        self.assertLess(score, 0.5)
-
-
-class TestFilterEntries(unittest.TestCase):
-    def test_filters_low_quality(self):
-        entries = [
-            {"content": "Run `deploy.py` to fix the issue.", "model": "claude"},
-            {"content": "It might work sometimes.", "model": "unknown"},
-            {"content": "Configure nginx: step 1 edit nginx.conf", "model": "gpt-4"},
-        ]
-        filtered = filter_entries(entries, threshold=0.5)
-        self.assertGreaterEqual(len(filtered), 2)
-
-
-if __name__ == "__main__":
-    unittest.main()
Author	SHA1	Message	Date
Alexander Whitestone	9a2135b1df	feat: add harvester.py — session knowledge extractor (#8 )	2026-04-14 17:27:20 +00:00
Alexander Whitestone	b2a9bca162	feat: add session_reader.py for JSONL parsing (#8 )	2026-04-14 17:26:21 +00:00