feat: measurer.py — compounding intelligence metrics engine

Implements issue #14: 7 metrics that prove knowledge compounding. Metrics: - Knowledge velocity: new facts/day (from index.json) - Knowledge coverage: % domains with 10+ facts (from YAML files) - Hit rate: % sessions referencing bootstrap knowledge - Error recurrence: same errors across sessions (should decrease) - Task completion: % sessions with successful end_reason - First-try success: actions without backtracking (tool/msg ratio) - Knowledge age: staleness of facts (freshness score) Data sources: - knowledge/index.json + YAML files for fact metrics - ~/.hermes/state.db sessions + messages tables Features: - JSON and markdown output formats - --since, --repo, --format flags - 7-day trend tracking via snapshot persistence - Runs in 33ms on 11.9K sessions / 192K messages - Dashboard auto-generation with --save-snapshot Closes #14
2026-04-14 14:16:31 -04:00
9 changed files with 1135 additions and 0 deletions
--- a/knowledge/SCHEMA.md
+++ b/knowledge/SCHEMA.md
@@ -0,0 +1,164 @@
 # Knowledge File Format Specification
 **Version:** 1
 **Issue:** #10
 **Status:** Draft
 ---
 ## Overview
 The knowledge system has two layers:
 1. **index.json** — Machine-readable fact index. Fast lookups by ID, category, repo, tags.
 2. **Knowledge files** (YAML) — Human-readable, editable facts organized by domain.
 The harvester writes to both. The bootstrapper reads from index.json. Humans edit the YAML files directly.
 ---
 ## index.json Schema
 ```json
 {
  "version": 1,
  "last_updated": "ISO-8601 timestamp",
  "total_facts": 0,
  "facts": []
 }
 ```
 ### Fact Object
 | Field | Type | Required | Description |
 |-------|------|----------|-------------|
 | `id` | string | yes | Unique identifier: `{domain}:{category}:{sequence}` |
 | `fact` | string | yes | One-sentence description of the knowledge |
 | `category` | enum | yes | One of: `fact`, `pitfall`, `pattern`, `tool-quirk`, `question` |
 | `domain` | string | yes | Where this applies: repo name, `global`, or agent name |
 | `confidence` | float | yes | 0.0-1.0. How certain is this knowledge? |
 | `tags` | string[] | no | Searchable labels: `["git", "auth", "gitea"]` |
 | `source_count` | int | no | How many sessions confirmed this fact |
 | `first_seen` | date | no | ISO-8601 date first extracted |
 | `last_confirmed` | date | no | ISO-8601 date last seen in a session |
 | `expires` | date | no | Optional. After this date, fact is stale |
 | `related` | string[] | no | IDs of related facts |
 ### ID Format
 ```
 {domain}:{category}:{sequence}
 ```
 - `domain` — repo name, `global`, or agent type
 - `category` — one of the 5 categories
 - `sequence` — zero-padded 3-digit number: `001`, `002`, ...
 Examples:
 - `the-nexus:pitfall:001`
 - `global:tool-quirk:012`
 - `hermes-agent:pattern:003`
 ### Categories
 | Category | Definition | Example |
 |----------|------------|---------|
 | `fact` | Concrete, verifiable information | "Gitea API requires token auth at /api/v1" |
 | `pitfall` | Errors, wrong assumptions, time-wasters | "Assumed env var GITEA_TOKEN; actual path is ~/.config/gitea/token" |
 | `pattern` | Successful sequences of actions | "To deploy: test -> build -> push -> webhook" |
 | `tool-quirk` | Environment-specific behaviors | "URL format requires trailing slash on macOS" |
 | `question` | Identified but unanswered | "Need optimal batch size for harvesting" |
 ### Confidence Scoring
 | Range | Meaning |
 |-------|---------|
 | 0.9-1.0 | Explicitly stated and verified |
 | 0.7-0.8 | Clearly implied by multiple data points |
 | 0.5-0.6 | Suggested but not fully verified |
 | 0.3-0.4 | Inferred from limited data |
 | 0.1-0.2 | Speculative or uncertain |
 ---
 ## Knowledge Files (YAML)
 Human-readable files stored in `knowledge/` subdirectories.
 ### Directory Structure
 ```
 knowledge/
 ├── index.json                  # Machine-readable fact index
 ├── SCHEMA.md                   # This file
 ├── global/                     # Cross-repo knowledge
 │   ├── pitfalls.yaml           # Pitfalls that span multiple repos
 │   ├── patterns.yaml           # Proven workflows
 │   └── tool-quirks.yaml        # Environment behaviors
 ├── repos/                      # Per-repo knowledge
 │   ├── the-nexus.yaml
 │   ├── hermes-agent.yaml
 │   └── ...
 └── agents/                     # Agent-type knowledge
    ├── mimo-sprint.yaml
    └── ...
 ```
 ### YAML File Format
 ```yaml
 ---
 domain: global                    # or repo name or agent name
 category: tool-quirk              # fact, pitfall, pattern, tool-quirk, question
 version: 1
 last_updated: "2026-04-13"
 ---
 # Tool Quirks (Global)
 Cross-environment behaviors that bite you if you don't know them.
 ## Authentication
 - id: global:tool-quirk:001
  fact: "Gitea token stored at ~/.config/gitea/token, not env var"
  confidence: 0.95
  tags: [git, auth, gitea]
  source_count: 23
  first_seen: "2026-03-27"
  last_confirmed: "2026-04-13"
  related: [global:pitfall:003]
 ```
 ### Rules
 1. **One file per domain per category.** `repos/the-nexus.yaml` holds all the-nexus facts.
 2. **Markdown sections for humans.** YAML items live under markdown headers for Gitea UI readability.
 3. **ID is the link.** The `id` field connects YAML facts to index.json entries.
 4. **Harvester writes, humans edit.** Harvester appends. Humans correct confidence, add tags, mark expired.
 ---
 ## Sync Rules
 1. **Harvester -> YAML:** Appends new facts to the appropriate YAML file.
 2. **Harvester -> index.json:** Adds/updates fact entries.
 3. **Human edits YAML:** Changes propagate to index.json on next harvester run.
 4. **Confidence decay:** Facts not confirmed in 30+ sessions get confidence *= 0.9.
 5. **Expiration:** Facts with `expires` date past current date are marked `stale`.
 ---
 ## Validation
 Facts must pass these checks:
 1. `id` matches format `{domain}:{category}:{sequence}`
 2. `category` is one of the 5 allowed values
 3. `confidence` is between 0.0 and 1.0
 4. `fact` is non-empty string, max 280 characters
 5. `domain` is non-empty string
 6. `tags` are lowercase alphanumeric + hyphens
 7. No duplicate IDs in index.json
 Validation script: `scripts/validate_knowledge.py`
--- a/knowledge/global/pitfalls.yaml
+++ b/knowledge/global/pitfalls.yaml
@@ -0,0 +1,80 @@
 ---
 domain: global
 category: pitfall
 version: 1
 last_updated: "2026-04-13"
 ---
 # Pitfalls (Global)
 Cross-repo traps that waste time across the fleet.
 ## Git & Forge
 - id: global:pitfall:001
  fact: "Branch protection requires 1 approval on main - API merges fail with 405 without it"
  confidence: 0.95
  tags: [git, merge, branch-protection, gitea]
  source_count: 12
  first_seen: "2026-04-05"
  last_confirmed: "2026-04-13"
  related: [the-nexus:pitfall:001]
 - id: global:pitfall:002
  fact: "Never use --no-verify on git commits - it bypasses all hooks including safety checks"
  confidence: 0.95
  tags: [git, hooks, safety]
  source_count: 5
  first_seen: "2026-03-28"
  last_confirmed: "2026-04-13"
 - id: global:pitfall:003
  fact: "Gitea PR creation workaround needed on the-nexus - direct API call fails, use alternative endpoint"
  confidence: 0.9
  tags: [gitea, pr, api, workaround]
  source_count: 4
  first_seen: "2026-04-06"
  last_confirmed: "2026-04-12"
 ## Agent Operations
 - id: global:pitfall:004
  fact: "Anthropic is BANNED from fallback chain - if fallback triggers to Anthropic, something is wrong"
  confidence: 0.95
  tags: [provider, anthropic, fallback]
  source_count: 7
  first_seen: "2026-03-30"
  last_confirmed: "2026-04-13"
 - id: global:pitfall:005
  fact: "Telegram tokens expired - don't assume Telegram notifications work without checking"
  confidence: 0.85
  tags: [telegram, notifications, token]
  source_count: 3
  first_seen: "2026-04-02"
 - id: global:pitfall:006
  fact: "Multiple gateways = 'cannot schedule futures' error - only one gateway process should run"
  confidence: 0.9
  tags: [gateway, cron, process]
  source_count: 4
  first_seen: "2026-04-04"
  last_confirmed: "2026-04-11"
 ## Testing
 - id: global:pitfall:007
  fact: "pytest root collection picks up operational *_test.py scripts - restrict to tests/ directory"
  confidence: 0.9
  tags: [pytest, test, collection]
  source_count: 3
  first_seen: "2026-04-07"
  last_confirmed: "2026-04-13"
 - id: global:pitfall:008
  fact: "TDD: test 1 before building 55 - verify the cycle works before scaling"
  confidence: 0.95
  tags: [tdd, testing, methodology]
  source_count: 8
  first_seen: "2026-03-25"
  last_confirmed: "2026-04-13"
--- a/knowledge/global/tool-quirks.yaml
+++ b/knowledge/global/tool-quirks.yaml
@@ -0,0 +1,73 @@
 ---
 domain: global
 category: tool-quirk
 version: 1
 last_updated: "2026-04-13"
 ---
 # Tool Quirks (Global)
 Cross-environment behaviors that bite you if you don't know them.
 ## Authentication
 - id: global:tool-quirk:001
  fact: "Gitea token stored at ~/.config/gitea/token, not env var GITEA_TOKEN"
  confidence: 0.95
  tags: [git, auth, gitea, token]
  source_count: 23
  first_seen: "2026-03-27"
  last_confirmed: "2026-04-13"
  related: [global:pitfall:001]
 - id: global:tool-quirk:002
  fact: "Gitea API uses 'Authorization: token TOKEN' header format, not Bearer"
  confidence: 0.9
  tags: [git, api, gitea]
  source_count: 8
  first_seen: "2026-03-28"
  last_confirmed: "2026-04-12"
 - id: global:tool-quirk:003
  fact: "Gitea Issues API type=issues param does NOT filter PRs - use truthiness check on pull_request field"
  confidence: 0.95
  tags: [gitea, api, issues, pr]
  source_count: 6
  first_seen: "2026-04-01"
  last_confirmed: "2026-04-13"
 ## Paths & Environment
 - id: global:tool-quirk:004
  fact: "~/.hermes is the default hermes home - check get_hermes_home() not the path literal"
  confidence: 0.9
  tags: [paths, hermes, env]
  source_count: 10
  first_seen: "2026-03-30"
  last_confirmed: "2026-04-13"
  related: [hermes-agent:pitfall:005]
 - id: global:tool-quirk:005
  fact: "Ansible vault-encrypted vars in YAML require vault_inline_vars plugin - standard ansible-vault fails"
  confidence: 0.85
  tags: [ansible, vault, config]
  source_count: 3
  first_seen: "2026-04-02"
 ## Model & Inference
 - id: global:tool-quirk:006
  fact: "mimo-v2-pro via Nous Research is the default model - don't assume Anthropic is available"
  confidence: 0.95
  tags: [model, provider, nous, default]
  source_count: 15
  first_seen: "2026-03-25"
  last_confirmed: "2026-04-13"
 - id: global:tool-quirk:007
  fact: "Kill + restart with 'hermes chat' preserves old model state - NEVER use --resume"
  confidence: 0.95
  tags: [hermes, model, restart, session]
  source_count: 8
  first_seen: "2026-03-29"
  last_confirmed: "2026-04-12"
--- a/knowledge/repos/hermes-agent.yaml
+++ b/knowledge/repos/hermes-agent.yaml
@@ -0,0 +1,82 @@
 ---
 domain: hermes-agent
 category: pitfall
 version: 1
 last_updated: "2026-04-13"
 ---
 # Pitfalls (hermes-agent)
 Things that go wrong in this repo if you don't know the traps.
 ## Cron & Deployment
 - id: hermes-agent:pitfall:001
  fact: "deploy-crons.py leaves jobs in mixed model format - some have provider/model, some just model"
  confidence: 0.95
  tags: [cron, deploy, model, config]
  source_count: 5
  first_seen: "2026-04-08"
  last_confirmed: "2026-04-13"
  related: [hermes-agent:pitfall:002, hermes-agent:pitfall:003]
 - id: hermes-agent:pitfall:002
  fact: "deploy-crons.py --deploy doesn't set legacy skill field from skills list, breaking older jobs"
  confidence: 0.9
  tags: [cron, deploy, skills]
  source_count: 3
  first_seen: "2026-04-09"
  last_confirmed: "2026-04-13"
  related: [hermes-agent:pitfall:001]
 - id: hermes-agent:pitfall:003
  fact: "Cron jobs with blank fallback_model fields trigger spurious gateway warnings"
  confidence: 0.9
  tags: [cron, model, fallback]
  source_count: 4
  first_seen: "2026-04-07"
  last_confirmed: "2026-04-12"
  related: [hermes-agent:pitfall:001]
 - id: hermes-agent:pitfall:004
  fact: "model-watchdog.py checks first provider line, not model.provider - causes false drift alarms"
  confidence: 0.9
  tags: [watchdog, model, config]
  source_count: 3
  first_seen: "2026-04-08"
  last_confirmed: "2026-04-13"
 ## Path & Environment
 - id: hermes-agent:pitfall:005
  fact: "10+ files read HERMES_HOME directly instead of get_hermes_home() - breaks on custom paths"
  confidence: 0.85
  tags: [paths, env, hermes-home]
  source_count: 6
  first_seen: "2026-04-06"
  last_confirmed: "2026-04-12"
  related: [global:pitfall:002]
 - id: hermes-agent:pitfall:006
  fact: "get_hermes_home() doesn't expand tilde when HERMES_HOME=~/... is set"
  confidence: 0.8
  tags: [paths, env, bug]
  source_count: 2
  first_seen: "2026-04-05"
 ## SSH & Dispatch
 - id: hermes-agent:pitfall:007
  fact: "vps-agent-dispatch reports OK while remote hermes binary path is broken"
  confidence: 0.9
  tags: [ssh, dispatch, vps]
  source_count: 4
  first_seen: "2026-04-07"
  last_confirmed: "2026-04-11"
 - id: hermes-agent:pitfall:008
  fact: "nightwatch-health-monitor SSH check fails on cloud-model-only deployments"
  confidence: 0.85
  tags: [ssh, health, cloud]
  source_count: 2
  first_seen: "2026-04-10"
--- a/knowledge/repos/the-nexus.yaml
+++ b/knowledge/repos/the-nexus.yaml
@@ -0,0 +1,65 @@
 ---
 domain: the-nexus
 category: pitfall
 version: 1
 last_updated: "2026-04-13"
 ---
 # Pitfalls (the-nexus)
 Things that go wrong in this repo if you don't know the traps.
 ## Git & Merging
 - id: the-nexus:pitfall:001
  fact: "Merges fail with HTTP 405 due to branch protection - must use merge API with 1 approval"
  confidence: 0.95
  tags: [git, merge, branch-protection, gitea]
  source_count: 12
  first_seen: "2026-04-05"
  last_confirmed: "2026-04-13"
  related: [global:pitfall:001]
 - id: the-nexus:pitfall:002
  fact: "ThreadingHTTPServer required for multi-user bridge - standard HTTPServer blocks on concurrent requests"
  confidence: 0.95
  tags: [server, concurrency, bridge]
  source_count: 5
  first_seen: "2026-04-10"
  last_confirmed: "2026-04-13"
 - id: the-nexus:pitfall:003
  fact: "ChatLog.log() crashes on message persistence when index.html has orphaned button tags"
  confidence: 0.9
  tags: [html, crash, chatlog]
  source_count: 3
  first_seen: "2026-04-12"
  last_confirmed: "2026-04-13"
 ## Three.js & Performance
 - id: the-nexus:pitfall:004
  fact: "Three.js LOD not implemented - local hardware struggles with full scene without texture optimization"
  confidence: 0.85
  tags: [threejs, performance, lod]
  source_count: 4
  first_seen: "2026-04-09"
  last_confirmed: "2026-04-13"
 - id: the-nexus:pitfall:005
  fact: "Duplicate content blocks appear in index.html when PR merges conflict silently"
  confidence: 0.8
  tags: [html, merge-conflict, duplicate]
  source_count: 3
  first_seen: "2026-04-11"
  last_confirmed: "2026-04-13"
 ## Deployment
 - id: the-nexus:pitfall:006
  fact: "Unified HTTP + WebSocket server required for proper URL deployment - separate servers break CORS"
  confidence: 0.9
  tags: [deploy, websocket, http, cors]
  source_count: 4
  first_seen: "2026-04-10"
  last_confirmed: "2026-04-13"
--- a/metrics/dashboard.md
+++ b/metrics/dashboard.md
@@ -0,0 +1,61 @@
 # Compounding Intelligence Metrics
 **Generated:** 2026-04-14T18:12:26.469085+00:00
 ## knowledge_velocity
 New facts extracted per day. Higher = compounding loop working.
 **Value:** 1.61  |  **7d trend:** N/A --- (unknown)
 - total_facts: 29
 - period_days: 18
 - new_facts: 29
 ## knowledge_coverage
 Percentage of domains/repos with 10+ facts. Measures breadth.
 **Value:** 0.333  |  **7d trend:** N/A --- (unknown)
 - covered_domains: 1
 - total_domains: 3
 ## hit_rate
 Percentage of sessions referencing bootstrapped knowledge.
 **Value:** 0.676  |  **7d trend:** N/A --- (unknown)
 - hit_sessions: 8058
 - total_sessions: 11922
 ## error_recurrence
 Ratio of recurring errors. Lower = fleet learning from mistakes.
 **Value:** 0.17  |  **7d trend:** N/A --- (unknown)
 - unique_errors: 53615
 - recurring_errors: 9093
 ## task_completion
 Percentage of sessions ending with successful completion.
 **Value:** 0.452  |  **7d trend:** N/A --- (unknown)
 - normal_end_rate: 0.56
 - completed: 5385
 - total: 11922
 ## first_try_success
 Percentage of sessions completed without backtracking.
 **Value:** 0.818  |  **7d trend:** N/A --- (unknown)
 - avg_tool_msg_ratio: 0.392
 - sampled: 5923
 ## knowledge_age
 Freshness of knowledge store. 1.0 = all fresh, 0.0 = all stale.
 **Value:** 0.973  |  **7d trend:** N/A --- (unknown)
 - avg_age_days: 2.4
 - stale_facts: 0
 - total_facts: 29
--- a/metrics/latest_snapshot.json
+++ b/metrics/latest_snapshot.json
@@ -0,0 +1,127 @@
 {
  "generated_at": "2026-04-14T18:12:26.469085+00:00",
  "knowledge_velocity": {
    "value": 1.61,
    "total_facts": 29,
    "period_days": 18,
    "new_facts": 29
  },
  "knowledge_coverage": {
    "value": 0.333,
    "covered_domains": 1,
    "total_domains": 3,
    "domain_details": {
      "global": 15,
      "hermes-agent": 8,
      "the-nexus": 6
    }
  },
  "hit_rate": {
    "value": 0.676,
    "hit_sessions": 8058,
    "total_sessions": 11922
  },
  "error_recurrence": {
    "value": 0.17,
    "unique_errors": 53615,
    "recurring_errors": 9093,
    "top_errors": [
      {
        "error": "s, report the error details.",
        "sessions": 1185
      },
      {
        "error": "\": \"traceback (most recent call last):\\n file \\\"/private/var/folders/9k/v07xkpp",
        "sessions": 695
      },
      {
        "error": "\", \"output\": \"\\n--- stderr ---\\ntraceback (most recent call last):\\n file \\\"/pr",
        "sessions": 684
      },
      {
        "error": "ures \u2192 file an issue with the traceback and tag [bug]",
        "sessions": 320
      },
      {
        "error": "s you encounter \u2192 file an issue with reproduction steps",
        "sessions": 320
      },
      {
        "error": "s, file a [bug] issue first.",
        "sessions": 320
      },
      {
        "error": ", fix the code.",
        "sessions": 314
      },
      {
        "error": "fix it before doing anything else.",
        "sessions": 313
      },
      {
        "error": "ures \u2192 add a review comment explaining what's wrong",
        "sessions": 303
      },
      {
        "error": "ures \u2014 they're your roadmap for guardrails",
        "sessions": 303
      }
    ]
  },
  "task_completion": {
    "value": 0.452,
    "normal_end_rate": 0.56,
    "completed": 5385,
    "total": 11922,
    "breakdown": {
      "cron_complete": 5354,
      "unknown": 5248,
      "compression": 1092,
      "cli_close": 197,
      "session_reset": 31
    }
  },
  "first_try_success": {
    "value": 0.818,
    "avg_tool_msg_ratio": 0.392,
    "sampled": 5923,
    "interpretation": "Higher value = fewer backtracks = better first-try success"
  },
  "knowledge_age": {
    "value": 0.973,
    "avg_age_days": 2.4,
    "stale_facts": 0,
    "total_facts": 29,
    "interpretation": "1.0 = all facts fresh. 0.0 = all facts 90+ days old"
  },
  "trend_7d": {
    "knowledge_velocity": {
      "delta": "N/A",
      "direction": "unknown"
    },
    "knowledge_coverage": {
      "delta": "N/A",
      "direction": "unknown"
    },
    "hit_rate": {
      "delta": "N/A",
      "direction": "unknown"
    },
    "error_recurrence": {
      "delta": "N/A",
      "direction": "unknown"
    },
    "task_completion": {
      "delta": "N/A",
      "direction": "unknown"
    },
    "first_try_success": {
      "delta": "N/A",
      "direction": "unknown"
    },
    "knowledge_age": {
      "delta": "N/A",
      "direction": "unknown"
    }
  }
 }
--- a/scripts/measurer.py
+++ b/scripts/measurer.py
@@ -0,0 +1,403 @@
 #!/usr/bin/env python3
 """
 Compounding Intelligence Metrics Engine.
 Computes 7 metrics that prove whether the knowledge compounding loop is working:
  1. Knowledge velocity -- new facts per day
  2. Knowledge coverage -- % of domains with >10 facts
  3. Hit rate -- % of sessions referencing bootstrap knowledge
  4. Error recurrence -- same errors across sessions (should decrease)
  5. Task completion -- % of sessions ending successfully
  6. First-try success -- actions without backtracking
  7. Knowledge age -- staleness of facts
 Usage:
  python3 measurer.py                        # All metrics, all time
  python3 measurer.py --since 2026-04-01     # Time range
  python3 measurer.py --repo the-nexus       # Per-repo metrics
  python3 measurer.py --format json          # JSON output (default)
  python3 measurer.py --format markdown      # Human-readable
  python3 measurer.py --knowledge-dir ./knowledge  # Custom knowledge path
  python3 measurer.py --db ~/.hermes/state.db      # Custom DB path
 Data sources:
  - knowledge/index.json  -- fact index
  - knowledge/            -- YAML fact files for coverage
  - ~/.hermes/state.db    -- session/message metadata
 """
 import argparse
 import json
 import os
 import re
 import sqlite3
 import sys
 from collections import Counter, defaultdict
 from datetime import datetime, timedelta, timezone
 from pathlib import Path
 from typing import Any
 # --- Defaults ---
 DEFAULT_KNOWLEDGE_DIR = Path(__file__).parent.parent / "knowledge"
 DEFAULT_DB_PATH = Path.home() / ".hermes" / "state.db"
 SEVEN_DAYS = timedelta(days=7)
 # --- Knowledge Store ---
 def load_facts(knowledge_dir):
    index_path = knowledge_dir / "index.json"
    if not index_path.exists():
        return []
    with open(index_path) as f:
        data = json.load(f)
    return data.get("facts", [])
 def count_yaml_facts(knowledge_dir):
    domain_counts = {}
    for subdir in ["repos", "global", "agents"]:
        dirpath = knowledge_dir / subdir
        if not dirpath.exists():
            continue
        for yaml_file in dirpath.glob("*.yaml"):
            count = 0
            try:
                content = yaml_file.read_text()
                count = len(re.findall(r"^\s*-\s*id:", content, re.MULTILINE))
            except Exception:
                pass
            domain = yaml_file.stem
            domain_counts[domain] = domain_counts.get(domain, 0) + count
    return domain_counts
 # --- Session Database ---
 def open_db(db_path):
    if not db_path.exists():
        print(f"WARNING: Database not found at {db_path}", file=sys.stderr)
        return None
    conn = sqlite3.connect(str(db_path))
    conn.row_factory = sqlite3.Row
    return conn
 def query_sessions(conn, since=None, repo=None):
    if conn is None:
        return []
    query = "SELECT id, started_at, ended_at, end_reason, message_count, tool_call_count, model FROM sessions WHERE 1=1"
    params = []
    if since:
        since_ts = datetime.fromisoformat(since).replace(tzinfo=timezone.utc).timestamp()
        query += " AND started_at >= ?"
        params.append(since_ts)
    query += " ORDER BY started_at ASC"
    cur = conn.execute(query, params)
    return [dict(row) for row in cur.fetchall()]
 def query_messages(conn, session_ids=None, since_ts=None):
    if conn is None:
        return []
    query = "SELECT m.session_id, m.role, m.content, m.tool_name, m.timestamp FROM messages m WHERE 1=1"
    params = []
    if since_ts:
        query += " AND m.timestamp >= ?"
        params.append(since_ts)
    if session_ids:
        placeholders = ",".join("?" for _ in session_ids)
        query += f" AND m.session_id IN ({placeholders})"
        params.extend(session_ids)
    cur = conn.execute(query, params)
    return [dict(row) for row in cur.fetchall()]
 # --- Metric Computations ---
 def compute_knowledge_velocity(facts, since=None):
    if not facts:
        return {"value": 0.0, "total_facts": 0, "period_days": 0, "new_facts": 0}
    dates = []
    for f in facts:
        d = f.get("first_seen") or f.get("created")
        if d:
            try:
                dt = datetime.fromisoformat(d.replace("Z", "+00:00"))
                if dt.tzinfo is None:
                    dt = dt.replace(tzinfo=timezone.utc)
                dates.append(dt)
            except (ValueError, AttributeError):
                pass
    if not dates:
        return {"value": 0.0, "total_facts": len(facts), "period_days": 0, "new_facts": 0}
    if since:
        cutoff = datetime.fromisoformat(since).replace(tzinfo=timezone.utc)
        dates = [d for d in dates if d >= cutoff]
    if not dates:
        return {"value": 0.0, "total_facts": len(facts), "period_days": 0, "new_facts": 0}
    earliest = min(dates)
    latest = max(dates)
    period_days = max((latest - earliest).days, 1)
    return {"value": round(len(dates) / period_days, 2), "total_facts": len(facts), "period_days": period_days, "new_facts": len(dates)}
 def compute_knowledge_coverage(facts, yaml_counts):
    domain_fact_counts = defaultdict(int)
    for f in facts:
        domain = f.get("domain", "unknown")
        domain_fact_counts[domain] += 1
    for domain, count in yaml_counts.items():
        domain_fact_counts[domain] = max(domain_fact_counts[domain], count)
    total_domains = len(domain_fact_counts)
    if total_domains == 0:
        return {"value": 0.0, "covered_domains": 0, "total_domains": 0, "domain_details": {}}
    covered = sum(1 for c in domain_fact_counts.values() if c >= 10)
    return {"value": round(covered / total_domains, 3), "covered_domains": covered, "total_domains": total_domains, "domain_details": dict(sorted(domain_fact_counts.items(), key=lambda x: -x[1])[:20])}
 def compute_hit_rate(sessions, messages, facts):
    if not sessions or not facts:
        return {"value": 0.0, "hit_sessions": 0, "total_sessions": len(sessions)}
    fact_fragments = set()
    for f in facts:
        text = f.get("fact", "").lower().strip()
        if len(text) > 10:
            fact_fragments.add(text)
        words = re.findall(r'\w{4,}', text)
        for w in words:
            fact_fragments.add(w)
    if not fact_fragments:
        return {"value": 0.0, "hit_sessions": 0, "total_sessions": len(sessions)}
    session_messages = defaultdict(list)
    for m in messages:
        content = (m.get("content") or "").lower()
        if content:
            session_messages[m["session_id"]].append(content)
    hit_sessions = 0
    for session in sessions:
        sid = session["id"]
        all_content = " ".join(session_messages.get(sid, []))
        if any(frag in all_content for frag in fact_fragments):
            hit_sessions += 1
    return {"value": round(hit_sessions / len(sessions), 3) if sessions else 0.0, "hit_sessions": hit_sessions, "total_sessions": len(sessions)}
 def compute_error_recurrence(messages):
    if not messages:
        return {"value": 0.0, "unique_errors": 0, "recurring_errors": 0, "top_errors": []}
    error_pattern = re.compile(r'(?:error|Error|ERROR|failed|FAIL|exception|Exception)[:\s]*(.{10,80})', re.IGNORECASE)
    error_to_sessions = defaultdict(set)
    for m in messages:
        content = m.get("content") or ""
        if not content:
            continue
        for match in error_pattern.finditer(content):
            sig = match.group(1).strip().lower()
            sig = re.sub(r'\s+', ' ', sig)
            if len(sig) > 5:
                error_to_sessions[sig].add(m["session_id"])
    if not error_to_sessions:
        return {"value": 0.0, "unique_errors": 0, "recurring_errors": 0, "top_errors": []}
    recurring = {e: s for e, s in error_to_sessions.items() if len(s) > 1}
    total_errors = len(error_to_sessions)
    recurring_count = len(recurring)
    top = sorted(recurring.items(), key=lambda x: -len(x[1]))[:10]
    return {"value": round(recurring_count / total_errors, 3) if total_errors else 0.0, "unique_errors": total_errors, "recurring_errors": recurring_count, "top_errors": [{"error": e, "sessions": len(s)} for e, s in top]}
 def compute_task_completion(sessions):
    if not sessions:
        return {"value": 0.0, "completed": 0, "total": 0, "breakdown": {}}
    breakdown = Counter()
    for s in sessions:
        reason = s.get("end_reason") or "unknown"
        breakdown[reason] += 1
    completed = breakdown.get("cron_complete", 0) + breakdown.get("session_reset", 0)
    normal_endings = completed + breakdown.get("cli_close", 0) + breakdown.get("compression", 0)
    return {"value": round(completed / len(sessions), 3) if sessions else 0.0, "normal_end_rate": round(normal_endings / len(sessions), 3) if sessions else 0.0, "completed": completed, "total": len(sessions), "breakdown": dict(breakdown.most_common())}
 def compute_first_try_success(sessions):
    if not sessions:
        return {"value": 0.0, "avg_tool_msg_ratio": 0.0, "sampled": 0}
    ratios = []
    for s in sessions:
        msgs = s.get("message_count", 0) or 0
        tools = s.get("tool_call_count", 0) or 0
        if msgs > 2:
            ratios.append(tools / msgs if msgs > 0 else 0)
    if not ratios:
        return {"value": 0.0, "avg_tool_msg_ratio": 0.0, "sampled": 0}
    avg_ratio = sum(ratios) / len(ratios)
    first_try = sum(1 for r in ratios if r < 0.5)
    return {"value": round(first_try / len(ratios), 3), "avg_tool_msg_ratio": round(avg_ratio, 3), "sampled": len(ratios), "interpretation": "Higher value = fewer backtracks = better first-try success"}
 def compute_knowledge_age(facts):
    if not facts:
        return {"value": 0.0, "avg_age_days": 0, "stale_facts": 0, "total_facts": 0}
    now = datetime.now(timezone.utc)
    ages = []
    stale_count = 0
    for f in facts:
        confirmed = f.get("last_confirmed") or f.get("first_seen")
        if confirmed:
            try:
                dt = datetime.fromisoformat(confirmed.replace("Z", "+00:00"))
                if dt.tzinfo is None:
                    dt = dt.replace(tzinfo=timezone.utc)
                age = (now - dt).days
                ages.append(age)
                if age > 30:
                    stale_count += 1
            except (ValueError, AttributeError):
                pass
    if not ages:
        return {"value": 0.0, "avg_age_days": 0, "stale_facts": 0, "total_facts": len(facts)}
    avg_age = sum(ages) / len(ages)
    freshness = max(0.0, 1.0 - (avg_age / 90))
    return {"value": round(freshness, 3), "avg_age_days": round(avg_age, 1), "stale_facts": stale_count, "total_facts": len(facts), "interpretation": "1.0 = all facts fresh. 0.0 = all facts 90+ days old"}
 # --- Trend Computation ---
 def compute_trend(current, previous, metric_key="value"):
    if not previous:
        return {"delta": "N/A", "direction": "unknown"}
    curr_val = current.get(metric_key, 0)
    prev_val = previous.get(metric_key, 0)
    if prev_val == 0:
        return {"delta": "N/A (no baseline)", "direction": "unknown"}
    pct = ((curr_val - prev_val) / abs(prev_val)) * 100
    direction = "up" if pct > 0 else "down" if pct < 0 else "flat"
    if metric_key == "error_recurrence" or metric_key == "knowledge_age":
        direction_label = "good" if pct < 0 else "bad" if pct > 0 else "neutral"
    else:
        direction_label = "good" if pct > 0 else "bad" if pct < 0 else "neutral"
    return {"delta": f"{'+' if pct > 0 else ''}{pct:.1f}%", "direction": direction, "assessment": direction_label}
 # --- Output Formatters ---
 def format_json(metrics):
    return json.dumps(metrics, indent=2)
 def format_markdown(metrics):
    lines = ["# Compounding Intelligence Metrics", f"**Generated:** {metrics.get('generated_at', 'unknown')}", ""]
    trend = metrics.get("trend_7d", {})
    def metric_block(name, data, desc, good_direction="up"):
        val = data.get("value", 0)
        t = trend.get(name, {})
        delta = t.get("delta", "N/A")
        assessment = t.get("assessment", "unknown")
        arrow = "up" if assessment == "good" else "down" if assessment == "bad" else "---"
        lines.extend([f"## {name}", desc, "", f"**Value:** {val}  |  **7d trend:** {delta} {arrow} ({assessment})", ""])
        for k, v in data.items():
            if k != "value" and k != "interpretation":
                if isinstance(v, (int, float, str)):
                    lines.append(f"- {k}: {v}")
        lines.append("")
    metric_block("knowledge_velocity", metrics.get("knowledge_velocity", {}), "New facts extracted per day. Higher = compounding loop working.")
    metric_block("knowledge_coverage", metrics.get("knowledge_coverage", {}), "Percentage of domains/repos with 10+ facts. Measures breadth.")
    metric_block("hit_rate", metrics.get("hit_rate", {}), "Percentage of sessions referencing bootstrapped knowledge.")
    metric_block("error_recurrence", metrics.get("error_recurrence", {}), "Ratio of recurring errors. Lower = fleet learning from mistakes.", good_direction="down")
    metric_block("task_completion", metrics.get("task_completion", {}), "Percentage of sessions ending with successful completion.")
    metric_block("first_try_success", metrics.get("first_try_success", {}), "Percentage of sessions completed without backtracking.")
    metric_block("knowledge_age", metrics.get("knowledge_age", {}), "Freshness of knowledge store. 1.0 = all fresh, 0.0 = all stale.", good_direction="up")
    return "\n".join(lines)
 # --- Snapshot Persistence ---
 def load_snapshot(metrics_dir):
    snapshot_path = metrics_dir / "latest_snapshot.json"
    if snapshot_path.exists():
        with open(snapshot_path) as f:
            return json.load(f)
    return {}
 def save_snapshot(metrics_dir, metrics):
    metrics_dir.mkdir(parents=True, exist_ok=True)
    snapshot_path = metrics_dir / "latest_snapshot.json"
    with open(snapshot_path, "w") as f:
        json.dump(metrics, f, indent=2)
 # --- Main ---
 def main():
    parser = argparse.ArgumentParser(description="Compounding Intelligence Metrics")
    parser.add_argument("--since", help="Start date (YYYY-MM-DD)")
    parser.add_argument("--repo", help="Filter by repo/domain")
    parser.add_argument("--format", choices=["json", "markdown"], default="json")
    parser.add_argument("--knowledge-dir", type=Path, default=DEFAULT_KNOWLEDGE_DIR)
    parser.add_argument("--db", type=Path, default=DEFAULT_DB_PATH)
    parser.add_argument("--save-snapshot", action="store_true", help="Save current metrics as snapshot for trend tracking")
    parser.add_argument("--metrics-dir", type=Path, default=Path(__file__).parent.parent / "metrics", help="Directory for snapshots and dashboard")
    args = parser.parse_args()
    facts = load_facts(args.knowledge_dir)
    yaml_counts = count_yaml_facts(args.knowledge_dir)
    if args.repo:
        facts = [f for f in facts if f.get("domain") == args.repo]
    conn = open_db(args.db)
    sessions = query_sessions(conn, since=args.since)
    messages = query_messages(conn) if conn else []
    if conn:
        conn.close()
    velocity = compute_knowledge_velocity(facts, since=args.since)
    coverage = compute_knowledge_coverage(facts, yaml_counts)
    hit_rate = compute_hit_rate(sessions, messages, facts)
    error_recurrence = compute_error_recurrence(messages)
    task_completion = compute_task_completion(sessions)
    first_try = compute_first_try_success(sessions)
    age = compute_knowledge_age(facts)
    previous = load_snapshot(args.metrics_dir)
    trend = {
        "knowledge_velocity": compute_trend(velocity, previous.get("knowledge_velocity", {})),
        "knowledge_coverage": compute_trend(coverage, previous.get("knowledge_coverage", {})),
        "hit_rate": compute_trend(hit_rate, previous.get("hit_rate", {})),
        "error_recurrence": compute_trend(error_recurrence, previous.get("error_recurrence", {}), "value"),
        "task_completion": compute_trend(task_completion, previous.get("task_completion", {})),
        "first_try_success": compute_trend(first_try, previous.get("first_try_success", {})),
        "knowledge_age": compute_trend(age, previous.get("knowledge_age", {})),
    }
    now = datetime.now(timezone.utc).isoformat()
    metrics = {
        "generated_at": now,
        "knowledge_velocity": velocity,
        "knowledge_coverage": coverage,
        "hit_rate": hit_rate,
        "error_recurrence": error_recurrence,
        "task_completion": task_completion,
        "first_try_success": first_try,
        "knowledge_age": age,
        "trend_7d": trend,
    }
    if args.since:
        metrics["since"] = args.since
    if args.save_snapshot:
        save_snapshot(args.metrics_dir, metrics)
        dashboard_path = args.metrics_dir / "dashboard.md"
        with open(dashboard_path, "w") as f:
            f.write(format_markdown(metrics))
    if args.format == "json":
        print(format_json(metrics))
    else:
        print(format_markdown(metrics))
 if __name__ == "__main__":
    main()
--- a/scripts/validate_knowledge.py
+++ b/scripts/validate_knowledge.py
@@ -0,0 +1,80 @@
 #!/usr/bin/env python3
 """
 Validate knowledge files and index.json against the schema.
 Usage:
    python scripts/validate_knowledge.py
 """
 import json
 import sys
 from pathlib import Path
 VALID_CATEGORIES = {"fact", "pitfall", "pattern", "tool-quirk", "question"}
 REQUIRED_FACT_FIELDS = {"id", "fact", "category", "domain", "confidence"}
 MAX_FACT_LENGTH = 280
 def validate_fact(fact, source=""):
    errors = []
    for field in REQUIRED_FACT_FIELDS:
        if field not in fact:
            errors.append(f"{source}: missing required field '{field}'")
    if "fact" in fact:
        if not isinstance(fact["fact"], str) or len(fact["fact"].strip()) == 0:
            errors.append(f"{source}: 'fact' must be non-empty string")
        elif len(fact["fact"]) > MAX_FACT_LENGTH:
            errors.append(f"{source}: 'fact' exceeds {MAX_FACT_LENGTH} chars")
    if "category" in fact and fact["category"] not in VALID_CATEGORIES:
        errors.append(f"{source}: invalid category '{fact['category']}'")
    if "confidence" in fact:
        if not isinstance(fact["confidence"], (int, float)):
            errors.append(f"{source}: 'confidence' must be a number")
        elif not (0.0 <= fact["confidence"] <= 1.0):
            errors.append(f"{source}: 'confidence' must be 0.0-1.0")
    if "id" in fact:
        parts = fact["id"].split(":")
        if len(parts) != 3:
            errors.append(f"{source}: 'id' must be domain:category:sequence")
        elif parts[1] not in VALID_CATEGORIES:
            errors.append(f"{source}: id category '{parts[1]}' invalid")
    return errors
 def main():
    repo_root = Path(__file__).parent.parent
    index_path = repo_root / "knowledge" / "index.json"
    all_errors = []
    if not index_path.exists():
        print(f"VALIDATION FAILED: index.json not found at {index_path}")
        sys.exit(1)
    with open(index_path) as f:
        data = json.load(f)
    if "version" not in data:
        all_errors.append("index.json: missing 'version'")
    if "facts" not in data or not isinstance(data["facts"], list):
        all_errors.append("index.json: missing or invalid 'facts'")
    seen_ids = set()
    for i, fact in enumerate(data.get("facts", [])):
        all_errors.extend(validate_fact(fact, f"facts[{i}]"))
        if "id" in fact:
            if fact["id"] in seen_ids:
                all_errors.append(f"index.json: duplicate id '{fact['id']}'")
            seen_ids.add(fact["id"])
    if all_errors:
        print(f"VALIDATION FAILED - {len(all_errors)} error(s):\n")
        for e in all_errors:
            print(f"  x {e}")
        sys.exit(1)
    else:
        print(f"VALIDATION PASSED - {len(data.get('facts', []))} facts, schema v{data.get('version', '?')}")
        sys.exit(0)
 if __name__ == "__main__":
    main()