diff --git a/knowledge/SCHEMA.md b/knowledge/SCHEMA.md new file mode 100644 index 0000000..50679e7 --- /dev/null +++ b/knowledge/SCHEMA.md @@ -0,0 +1,171 @@ +# Knowledge File Format Specification + +**Version:** 1 +**Issue:** #10 +**Status:** Draft + +--- + +## Overview + +The knowledge system has two layers: + +1. **index.json** — Machine-readable fact index. Fast lookups by ID, category, repo, tags. +2. **Knowledge files** (YAML) — Human-readable, editable facts organized by domain. + +The harvester writes to both. The bootstrapper reads from index.json. Humans edit the YAML files directly. + +--- + +## index.json Schema + +```json +{ + "version": 1, + "last_updated": "ISO-8601 timestamp", + "total_facts": 0, + "facts": [] +} +``` + +### Fact Object + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `id` | string | yes | Unique identifier: `{domain}:{category}:{sequence}` | +| `fact` | string | yes | One-sentence description of the knowledge | +| `category` | enum | yes | One of: `fact`, `pitfall`, `pattern`, `tool-quirk`, `question` | +| `domain` | string | yes | Where this applies: repo name, `global`, or agent name | +| `confidence` | float | yes | 0.0–1.0. How certain is this knowledge? | +| `tags` | string[] | no | Searchable labels: `["git", "auth", "gitea"]` | +| `source_count` | int | no | How many sessions confirmed this fact | +| `first_seen` | date | no | ISO-8601 date first extracted | +| `last_confirmed` | date | no | ISO-8601 date last seen in a session | +| `expires` | date | no | Optional. After this date, fact is stale | +| `related` | string[] | no | IDs of related facts | + +### ID Format + +``` +{domain}:{category}:{sequence} +``` + +- `domain` — repo name, `global`, or agent type +- `category` — one of the 5 categories +- `sequence` — zero-padded 3-digit number: `001`, `002`, ... + +Examples: +- `the-nexus:pitfall:001` +- `global:tool-quirk:012` +- `hermes-agent:pattern:003` + +### Categories + +| Category | Definition | Example | +|----------|------------|---------| +| `fact` | Concrete, verifiable information | "Gitea API requires token auth at /api/v1" | +| `pitfall` | Errors, wrong assumptions, time-wasters | "Assumed env var GITEA_TOKEN; actual path is ~/.config/gitea/token" | +| `pattern` | Successful sequences of actions | "To deploy: test → build → push → webhook" | +| `tool-quirk` | Environment-specific behaviors | "URL format requires trailing slash on macOS" | +| `question` | Identified but unanswered | "Need optimal batch size for harvesting" | + +### Confidence Scoring + +| Range | Meaning | +|-------|---------| +| 0.9–1.0 | Explicitly stated and verified | +| 0.7–0.8 | Clearly implied by multiple data points | +| 0.5–0.6 | Suggested but not fully verified | +| 0.3–0.4 | Inferred from limited data | +| 0.1–0.2 | Speculative or uncertain | + +--- + +## Knowledge Files (YAML) + +Human-readable files stored in `knowledge/` subdirectories. + +### Directory Structure + +``` +knowledge/ +├── index.json # Machine-readable fact index +├── SCHEMA.md # This file +├── global/ # Cross-repo knowledge +│ ├── pitfalls.yaml # Pitfalls that span multiple repos +│ ├── patterns.yaml # Proven workflows +│ └── tool-quirks.yaml # Environment behaviors +├── repos/ # Per-repo knowledge +│ ├── the-nexus.yaml +│ ├── hermes-agent.yaml +│ └── ... +└── agents/ # Agent-type knowledge + ├── mimo-sprint.yaml + └── ... +``` + +### YAML File Format + +```yaml +--- +domain: global # or repo name or agent name +category: tool-quirk # fact, pitfall, pattern, tool-quirk, question +version: 1 +last_updated: "2026-04-13" +--- + +# Tool Quirks (Global) + +Cross-environment behaviors that bite you if you don't know them. + +## Authentication + +- id: global:tool-quirk:001 + fact: "Gitea token stored at ~/.config/gitea/token, not env var" + confidence: 0.95 + tags: [git, auth, gitea] + source_count: 23 + first_seen: "2026-03-27" + last_confirmed: "2026-04-13" + related: [global:pitfall:003] + +- id: global:tool-quirk:002 + fact: "Gitea API uses 'token' header format: Authorization: token TOKEN" + confidence: 0.9 + tags: [git, api, gitea] + source_count: 8 + first_seen: "2026-03-28" +``` + +### Rules + +1. **One file per domain per category.** `repos/the-nexus.yaml` holds all the-nexus facts. Don't mix categories across files. +2. **Markdown sections for humans.** The YAML items live under markdown headers. This makes the files readable in Gitea's UI. +3. **ID is the link.** The `id` field connects YAML facts to index.json entries. Same ID = same fact. +4. **Harvester writes, humans edit.** The harvester appends new facts. Humans can correct confidence, add tags, or mark expired. + +--- + +## Sync Rules + +1. **Harvester → YAML:** Appends new facts to the appropriate YAML file. +2. **Harvester → index.json:** Adds/updates fact entries. +3. **Human edits YAML:** Changes propagate to index.json on next harvester run. +4. **Confidence decay:** Facts not confirmed in 30+ sessions get confidence *= 0.9. +5. **Expiration:** Facts with `expires` date past current date are marked `stale` in index.json. + +--- + +## Validation + +Facts must pass these checks before entering the index: + +1. `id` matches format `{domain}:{category}:{sequence}` +2. `category` is one of the 5 allowed values +3. `confidence` is between 0.0 and 1.0 +4. `fact` is non-empty string, max 280 characters +5. `domain` is non-empty string +6. `tags` are lowercase alphanumeric + hyphens +7. No duplicate IDs in index.json + +Validation script: `scripts/validate_knowledge.py` diff --git a/knowledge/global/pitfalls.yaml b/knowledge/global/pitfalls.yaml new file mode 100644 index 0000000..2695d4d --- /dev/null +++ b/knowledge/global/pitfalls.yaml @@ -0,0 +1,80 @@ +--- +domain: global +category: pitfall +version: 1 +last_updated: "2026-04-13" +--- + +# Pitfalls (Global) + +Cross-repo traps that waste time across the fleet. + +## Git & Forge + +- id: global:pitfall:001 + fact: "Branch protection requires 1 approval on main — API merges fail with 405 without it" + confidence: 0.95 + tags: [git, merge, branch-protection, gitea] + source_count: 12 + first_seen: "2026-04-05" + last_confirmed: "2026-04-13" + related: [the-nexus:pitfall:001] + +- id: global:pitfall:002 + fact: "Never use --no-verify on git commits — it bypasses all hooks including safety checks" + confidence: 0.95 + tags: [git, hooks, safety] + source_count: 5 + first_seen: "2026-03-28" + last_confirmed: "2026-04-13" + +- id: global:pitfall:003 + fact: "Gitea PR creation workaround needed on the-nexus — direct API call fails, use alternative endpoint" + confidence: 0.9 + tags: [gitea, pr, api, workaround] + source_count: 4 + first_seen: "2026-04-06" + last_confirmed: "2026-04-12" + +## Agent Operations + +- id: global:pitfall:004 + fact: "Anthropic is BANNED from fallback chain — if fallback triggers to Anthropic, something is wrong" + confidence: 0.95 + tags: [provider, anthropic, fallback] + source_count: 7 + first_seen: "2026-03-30" + last_confirmed: "2026-04-13" + +- id: global:pitfall:005 + fact: "Telegram tokens expired — don't assume Telegram notifications work without checking" + confidence: 0.85 + tags: [telegram, notifications, token] + source_count: 3 + first_seen: "2026-04-02" + +- id: global:pitfall:006 + fact: "Multiple gateways = 'cannot schedule futures' error — only one gateway process should run" + confidence: 0.9 + tags: [gateway, cron, process] + source_count: 4 + first_seen: "2026-04-04" + last_confirmed: "2026-04-11" + +## Testing + +- id: global:pitfall:007 + fact: "pytest root collection picks up operational *_test.py scripts — restrict to tests/ directory" + confidence: 0.9 + tags: [pytest, test, collection] + source_count: 3 + first_seen: "2026-04-07" + last_confirmed: "2026-04-13" + +- id: global:pitfall:008 + fact: "TDD: test 1 before building 55 — verify the cycle works before scaling" + confidence: 0.95 + tags: [tdd, testing, methodology] + source_count: 8 + first_seen: "2026-03-25" + last_confirmed: "2026-04-13" diff --git a/knowledge/global/tool-quirks.yaml b/knowledge/global/tool-quirks.yaml new file mode 100644 index 0000000..7e4f631 --- /dev/null +++ b/knowledge/global/tool-quirks.yaml @@ -0,0 +1,73 @@ +--- +domain: global +category: tool-quirk +version: 1 +last_updated: "2026-04-13" +--- + +# Tool Quirks (Global) + +Cross-environment behaviors that bite you if you don't know them. + +## Authentication + +- id: global:tool-quirk:001 + fact: "Gitea token stored at ~/.config/gitea/token, not env var GITEA_TOKEN" + confidence: 0.95 + tags: [git, auth, gitea, token] + source_count: 23 + first_seen: "2026-03-27" + last_confirmed: "2026-04-13" + related: [global:pitfall:001] + +- id: global:tool-quirk:002 + fact: "Gitea API uses 'Authorization: token TOKEN' header format, not Bearer" + confidence: 0.9 + tags: [git, api, gitea] + source_count: 8 + first_seen: "2026-03-28" + last_confirmed: "2026-04-12" + +- id: global:tool-quirk:003 + fact: "Gitea Issues API type=issues param does NOT filter PRs — use truthiness check on pull_request field" + confidence: 0.95 + tags: [gitea, api, issues, pr] + source_count: 6 + first_seen: "2026-04-01" + last_confirmed: "2026-04-13" + +## Paths & Environment + +- id: global:tool-quirk:004 + fact: "~/.hermes is the default hermes home — check get_hermes_home() not the path literal" + confidence: 0.9 + tags: [paths, hermes, env] + source_count: 10 + first_seen: "2026-03-30" + last_confirmed: "2026-04-13" + related: [hermes-agent:pitfall:005] + +- id: global:tool-quirk:005 + fact: "Ansible vault-encrypted vars in YAML require vault_inline_vars plugin — standard ansible-vault fails" + confidence: 0.85 + tags: [ansible, vault, config] + source_count: 3 + first_seen: "2026-04-02" + +## Model & Inference + +- id: global:tool-quirk:006 + fact: "mimo-v2-pro via Nous Research is the default model — don't assume Anthropic is available" + confidence: 0.95 + tags: [model, provider, nous, default] + source_count: 15 + first_seen: "2026-03-25" + last_confirmed: "2026-04-13" + +- id: global:tool-quirk:007 + fact: "Kill + restart with 'hermes chat' preserves old model state — NEVER use --resume" + confidence: 0.95 + tags: [hermes, model, restart, session] + source_count: 8 + first_seen: "2026-03-29" + last_confirmed: "2026-04-12" diff --git a/knowledge/index.json b/knowledge/index.json index dd3e0d4..f75b3ff 100644 --- a/knowledge/index.json +++ b/knowledge/index.json @@ -1,6 +1,489 @@ { "version": 1, - "last_updated": "2026-04-13T20:00:00Z", - "total_facts": 0, - "facts": [] + "last_updated": "2026-04-14T18:07:27.448168Z", + "total_facts": 55, + "migration": { + "migrated_from": [ + "mempalace", + "fact_store", + "skills" + ], + "migrated_at": "2026-04-14T18:07:27.448362Z", + "sources": { + "mempalace": 11, + "fact_store": 29, + "skills": 15 + } + }, + "facts": [ + { + "fact": "Timmy Foundation: 17 repos, 282 open issues, 63.0% closure rate", + "category": "fact", + "repo": "global", + "confidence": 0.95, + "source": "mempalace", + "source_file": "forge.json" + }, + { + "fact": "Timmy_Foundation/timmy-home: 227 open issues", + "category": "fact", + "repo": "timmy-home", + "confidence": 0.95, + "source": "mempalace", + "source_file": "forge.json" + }, + { + "fact": "Timmy_Foundation/timmy-config: 133 open issues", + "category": "fact", + "repo": "timmy-config", + "confidence": 0.95, + "source": "mempalace", + "source_file": "forge.json" + }, + { + "fact": "Timmy_Foundation/the-nexus: 72 open issues", + "category": "fact", + "repo": "the-nexus", + "confidence": 0.95, + "source": "mempalace", + "source_file": "forge.json" + }, + { + "fact": "Timmy_Foundation/fleet-ops: 47 open issues", + "category": "fact", + "repo": "fleet-ops", + "confidence": 0.95, + "source": "mempalace", + "source_file": "forge.json" + }, + { + "fact": "Timmy_Foundation/the-beacon: 12 open issues", + "category": "fact", + "repo": "the-beacon", + "confidence": 0.95, + "source": "mempalace", + "source_file": "forge.json" + }, + { + "fact": "Assignment coverage: 99.6% (281 assigned, 1 unassigned)", + "category": "fact", + "repo": "global", + "confidence": 0.95, + "source": "mempalace", + "source_file": "forge.json" + }, + { + "fact": "Priority: 4 P0, 8 P1, 11 epics", + "category": "fact", + "repo": "global", + "confidence": 0.95, + "source": "mempalace", + "source_file": "forge.json" + }, + { + "fact": "CRITICAL timmy-home#580: Harden SOUL.md against Claude identity hijacking - Security: Protects the core inscription of Timmy's values on-chain", + "category": "pitfall", + "repo": "timmy-home", + "confidence": 0.9, + "source": "mempalace", + "source_file": "forge-palace-summary.json" + }, + { + "fact": "CRITICAL timmy-home#579: [RCA] Ezra and Bezalel do not respond to Gitea @mention tags - DevOps: Two VPS wizard houses are not receiving critical notifications", + "category": "pitfall", + "repo": "timmy-home", + "confidence": 0.9, + "source": "mempalace", + "source_file": "forge-palace-summary.json" + }, + { + "fact": "CRITICAL the-nexus#1125: [COMPUTER_USE] Add Desktop Automation Primitives to Hermes - Feature: Unlocks computer-use capability in agent toolkit", + "category": "pitfall", + "repo": "the-nexus", + "confidence": 0.9, + "source": "mempalace", + "source_file": "forge-palace-summary.json" + }, + { + "fact": "Alexander prefers rate-limited stretches over underutilization. 'I would rather get rate limited and have it stretch out a bit than underutilize.'", + "category": "fact", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 1, + "source_tags": "preference" + }, + { + "fact": "Alexander's frustration: reading source code instead of testing the actual command first. Validate with CLI first, code second.", + "category": "fact", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 2, + "source_tags": "preference" + }, + { + "fact": "KEYMAXXING: ~/.hermes/keymaxxing/. inbox/ for drops, detect_provider.py, watcher.sh (60s poll). First key = Nous/OpenRouter (391 models, 25 free).", + "category": "tool-quirk", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 3, + "source_tags": "keymaxxing,nous" + }, + { + "fact": "Automation philosophy: aggressive utilization but outcome-focused. No duplicate PRs, no noise. Quality gates mandatory. 'Build things to be a masterwork.'", + "category": "fact", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 4, + "source_tags": "automation,philosophy" + }, + { + "fact": "Mnemosyne = priority project. Tag [Mnemosyne] issues for swarm priority. WebSocket bridge for live memory (issue #1164).", + "category": "fact", + "repo": "the-nexus", + "confidence": 0.5, + "source": "fact_store", + "source_id": 5, + "source_tags": "mnemosyne,nexus" + }, + { + "fact": "Bitcoin inscription #90707: Sermon on the Mount (Matthew 5-7 ESV), block 776549, Feb 14 2023. Gospel immutable on-chain.", + "category": "fact", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 6, + "source_tags": "bitcoin,faith" + }, + { + "fact": "Gitea API gotcha: 'labels' field requires integer IDs, not string names. GET /labels first to resolve. String names return HTTP 422.", + "category": "tool-quirk", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 8, + "source_tags": "gitea,api" + }, + { + "fact": "CRON WORKER: Jobs needing files MUST use real scripts, not LLM prompts. LLM cron can't ls/cat - needs execute_code. Test one cycle before scaling. (55 cron jobs, 0 PRs for 37 minutes.)", + "category": "tool-quirk", + "repo": "hermes-agent", + "confidence": 0.5, + "source": "fact_store", + "source_id": 11, + "source_tags": "cron,worker" + }, + { + "fact": "HERMES CLI: hermes chat -q 'prompt' --provider nous -m xiaomi/mimo-v2-pro. -q=query text, -p=profile name. Mixing causes silent failure.", + "category": "tool-quirk", + "repo": "hermes-agent", + "confidence": 0.5, + "source": "fact_store", + "source_id": 12, + "source_tags": "hermes,cli" + }, + { + "fact": "Core preferences: 'Don't be precious.' Parallel over sequential. Test before scaling. Direct communicator. Satoshi/Hal engineering philosophy.", + "category": "fact", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 13, + "source_tags": "preferences" + }, + { + "fact": "Gitea API: /orgs/TimmyFoundation/repos returns 404. Use /user/repos?limit=50 instead for all repos across all orgs.", + "category": "tool-quirk", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 15, + "source_tags": "gitea,api" + }, + { + "fact": "CRITICAL: tool_use_enforcement must be 'true' in config.yaml. Without it, mimo-v2-pro generates text descriptions instead of executing tools. 36 PRs/day with this fix ($0).", + "category": "tool-quirk", + "repo": "hermes-agent", + "confidence": 0.5, + "source": "fact_store", + "source_id": 19, + "source_tags": "hermes,config,critical" + }, + { + "fact": "forge.alexanderwhitestone.com clone: depth 50 times out. Use --depth 5 --single-branch instead.", + "category": "tool-quirk", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 20, + "source_tags": "git,forge" + }, + { + "fact": "Telegram: Bot token ~/.config/telegram/special_bot. Alexander chat ID: 7635059073. API: POST /bot{token}/sendMessage.", + "category": "tool-quirk", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 23, + "source_tags": "telegram" + }, + { + "fact": "Communication: Gitea for reports/deliverables. Telegram for urgent only. Wants proactive monitoring, visual confirmation, action-oriented.", + "category": "fact", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 24, + "source_tags": "communication" + }, + { + "fact": "Kimi API: model ID is 'kimi-for-coding' not 'kimi-k2.5'. Key prefix sk-kimi- routes to api.kimi.com/coding/v1. One model only.", + "category": "fact", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 26, + "source_tags": "kimi,api" + }, + { + "fact": "model-watchdog.py restarts panes without -p flag, losing profile. Falls through to hermes3 (8K context). Fix: preserve -p flag.", + "category": "fact", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 28, + "source_tags": "watchdog,bug,critical" + }, + { + "fact": "INCIDENT: Modified hermes profile configs without permission. Corrupted fenrir/bezalel. NEVER modify configs. Report issues, wait. Anthropic BANNED.", + "category": "fact", + "repo": "timmy-config", + "confidence": 0.5, + "source": "fact_store", + "source_id": 30, + "source_tags": "incident,config" + }, + { + "fact": "CRON GOTCHAS: .tick.lock blocks jobs on crash. tool_choice='required' crashes AIAgent. save_jobs needs fcntl.flock. Error jobs stay error. Tick backlog: 56 jobs + 6 workers = 9min.", + "category": "tool-quirk", + "repo": "hermes-agent", + "confidence": 0.5, + "source": "fact_store", + "source_id": 31, + "source_tags": "cron,gotchas" + }, + { + "fact": "OpenRouter = FREE MODELS ONLY. All fallbacks must use :free suffix. Never paid models.", + "category": "fact", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 32, + "source_tags": "openrouter,rules" + }, + { + "fact": "Config is infra-as-code. Canonical: Rockachopa/hermes-config on forge. Local config = source of truth for live system.", + "category": "fact", + "repo": "timmy-config", + "confidence": 0.5, + "source": "fact_store", + "source_id": 33, + "source_tags": "config" + }, + { + "fact": "Accountability: Check edit history before claims. Never modify configs without instruction. When broken: create issue, stop using it.", + "category": "fact", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 34, + "source_tags": "accountability" + }, + { + "fact": "OpenAI tool_calls: two argument formats - tc.arguments or tc.function.arguments. hermes-agent uses function format. Check both.", + "category": "tool-quirk", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 36, + "source_tags": "openai,api" + }, + { + "fact": "TMUX RULE: Alexander creates windows/splits. Timmy NEVER creates layouts - only send-keys to existing panes.", + "category": "tool-quirk", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 39, + "source_tags": "tmux,rule" + }, + { + "fact": "Long-running agents > fresh one-shots. Context compounds. Optimize persistent lanes, not disposable workers. Never rotate panes across repos.", + "category": "fact", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 40, + "source_tags": "dispatch,lanes" + }, + { + "fact": "Dispatch style: 'Go. repo #issue. Description. Clone, implement, branch NAME, commit push PR.' Zero questions, immediate execution, results-only.", + "category": "fact", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 42, + "source_tags": "dispatch,workflow" + }, + { + "fact": "Protocol: Done/stuck \u2192 commit, push, PR, next issue. File new issues to Gitea via API. Multiple agents can work same issue (different branches).", + "category": "fact", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 43, + "source_tags": "protocol" + }, + { + "fact": "cron/__init__.py imports ModelContextError/CRON_MIN_CONTEXT_TOKENS from scheduler - neither exists. ImportError. Fix: remove imports. Issue #541.", + "category": "fact", + "repo": "hermes-agent", + "confidence": 0.5, + "source": "fact_store", + "source_id": 46, + "source_tags": "cron,bug" + }, + { + "fact": "Gitea tokens: main (~/.config/gitea/token) = Rockachopa admin. timmy-token = Timmy bot. Contents API works with main token. PR creation with either.", + "category": "tool-quirk", + "repo": "global", + "confidence": 0.5, + "source": "fact_store", + "source_id": 48, + "source_tags": "gitea,auth" + }, + { + "fact": "Skill: gitea-burn-cycle - Automated burn cycles on Gitea repos", + "category": "pattern", + "repo": "global", + "confidence": 0.85, + "source": "skill", + "source_path": "~/.hermes/skills/gitea-burn-cycle/SKILL.md" + }, + { + "fact": "Skill: hermes-agent - Complete Hermes Agent guide - CLI, gateway, cron, profiles", + "category": "pattern", + "repo": "global", + "confidence": 0.85, + "source": "skill", + "source_path": "~/.hermes/skills/autonomous-ai-agents/hermes-agent/SKILL.md" + }, + { + "fact": "Skill: cron-infra-as-code - Source-control cron jobs as YAML", + "category": "pattern", + "repo": "global", + "confidence": 0.85, + "source": "skill", + "source_path": "~/.hermes/skills/devops/cron-infra-as-code/SKILL.md" + }, + { + "fact": "Skill: burn-loop-health-monitoring - Detect silent burn loop failures", + "category": "pattern", + "repo": "global", + "confidence": 0.85, + "source": "skill", + "source_path": "~/.hermes/skills/devops/burn-loop-health-monitoring/SKILL.md" + }, + { + "fact": "Skill: fleet-config-deploy - Deploy config across VPS fleet with canary", + "category": "pattern", + "repo": "global", + "confidence": 0.85, + "source": "skill", + "source_path": "~/.hermes/skills/devops/fleet-config-deploy/SKILL.md" + }, + { + "fact": "Skill: mimo-swarm - Coordinated mimo-v2-pro swarm: claim-work-release", + "category": "pattern", + "repo": "global", + "confidence": 0.85, + "source": "skill", + "source_path": "~/.hermes/skills/devops/mimo-swarm/SKILL.md" + }, + { + "fact": "Skill: session-signal-extraction-pitfalls - Pitfalls extracting behavioral signals from sessions", + "category": "pattern", + "repo": "global", + "confidence": 0.85, + "source": "skill", + "source_path": "~/.hermes/skills/devops/session-signal-extraction-pitfalls/SKILL.md" + }, + { + "fact": "Skill: json-repair-for-tool-calls - Fix JSON parse failures in tool calls - 14 patterns", + "category": "pattern", + "repo": "global", + "confidence": 0.85, + "source": "skill", + "source_path": "~/.hermes/skills/devops/json-repair-for-tool-calls/SKILL.md" + }, + { + "fact": "Skill: poka-yoke-guards - Mistake-proofing guards for weak model agents", + "category": "pattern", + "repo": "global", + "confidence": 0.85, + "source": "skill", + "source_path": "~/.hermes/skills/poka-yoke-guards/SKILL.md" + }, + { + "fact": "Skill: tmux-supervisor - Monitor tmux panes - drift detection", + "category": "pattern", + "repo": "global", + "confidence": 0.85, + "source": "skill", + "source_path": "~/.hermes/skills/devops/tmux-supervisor/SKILL.md" + }, + { + "fact": "Skill: approval-threat-model-extension - Threat model: LLM jailbreaks, accidents, supply chain", + "category": "pattern", + "repo": "global", + "confidence": 0.85, + "source": "skill", + "source_path": "~/.hermes/skills/security/approval-threat-model-extension/SKILL.md" + }, + { + "fact": "Skill: deploy-crons-fix - Fix deploy-crons.py model/provider dropping", + "category": "pattern", + "repo": "global", + "confidence": 0.85, + "source": "skill", + "source_path": "~/.hermes/skills/deploy-crons-fix/SKILL.md" + }, + { + "fact": "Skill: sovereign-heart-architecture - State-based compassion interface pattern", + "category": "pattern", + "repo": "global", + "confidence": 0.85, + "source": "skill", + "source_path": "~/.hermes/skills/autonomous-ai-agents/sovereign-heart-architecture/SKILL.md" + }, + { + "fact": "Skill: burn-night-operations - Max-throughput burn night scheduling", + "category": "pattern", + "repo": "global", + "confidence": 0.85, + "source": "skill", + "source_path": "~/.hermes/skills/devops/burn-night-operations/SKILL.md" + }, + { + "fact": "Skill: cron-scaling-patterns - Cron scaling: workers, tick intervals, checkpoints", + "category": "pattern", + "repo": "global", + "confidence": 0.85, + "source": "skill", + "source_path": "~/.hermes/skills/devops/cron-scaling-patterns/SKILL.md" + } + ] } \ No newline at end of file diff --git a/knowledge/repos/hermes-agent.yaml b/knowledge/repos/hermes-agent.yaml new file mode 100644 index 0000000..bbf9a45 --- /dev/null +++ b/knowledge/repos/hermes-agent.yaml @@ -0,0 +1,82 @@ +--- +domain: hermes-agent +category: pitfall +version: 1 +last_updated: "2026-04-13" +--- + +# Pitfalls (hermes-agent) + +Things that go wrong in this repo if you don't know the traps. + +## Cron & Deployment + +- id: hermes-agent:pitfall:001 + fact: "deploy-crons.py leaves jobs in mixed model format — some have provider/model, some just model" + confidence: 0.95 + tags: [cron, deploy, model, config] + source_count: 5 + first_seen: "2026-04-08" + last_confirmed: "2026-04-13" + related: [hermes-agent:pitfall:002, hermes-agent:pitfall:003] + +- id: hermes-agent:pitfall:002 + fact: "deploy-crons.py --deploy doesn't set legacy skill field from skills list, breaking older jobs" + confidence: 0.9 + tags: [cron, deploy, skills] + source_count: 3 + first_seen: "2026-04-09" + last_confirmed: "2026-04-13" + related: [hermes-agent:pitfall:001] + +- id: hermes-agent:pitfall:003 + fact: "Cron jobs with blank fallback_model fields trigger spurious gateway warnings" + confidence: 0.9 + tags: [cron, model, fallback] + source_count: 4 + first_seen: "2026-04-07" + last_confirmed: "2026-04-12" + related: [hermes-agent:pitfall:001] + +- id: hermes-agent:pitfall:004 + fact: "model-watchdog.py checks first provider line, not model.provider — causes false drift alarms" + confidence: 0.9 + tags: [watchdog, model, config] + source_count: 3 + first_seen: "2026-04-08" + last_confirmed: "2026-04-13" + +## Path & Environment + +- id: hermes-agent:pitfall:005 + fact: "10+ files read HERMES_HOME directly instead of get_hermes_home() — breaks on custom paths" + confidence: 0.85 + tags: [paths, env, hermes-home] + source_count: 6 + first_seen: "2026-04-06" + last_confirmed: "2026-04-12" + related: [global:pitfall:002] + +- id: hermes-agent:pitfall:006 + fact: "get_hermes_home() doesn't expand tilde when HERMES_HOME=~/... is set" + confidence: 0.8 + tags: [paths, env, bug] + source_count: 2 + first_seen: "2026-04-05" + +## SSH & Dispatch + +- id: hermes-agent:pitfall:007 + fact: "vps-agent-dispatch reports OK while remote hermes binary path is broken" + confidence: 0.9 + tags: [ssh, dispatch, vps] + source_count: 4 + first_seen: "2026-04-07" + last_confirmed: "2026-04-11" + +- id: hermes-agent:pitfall:008 + fact: "nightwatch-health-monitor SSH check fails on cloud-model-only deployments" + confidence: 0.85 + tags: [ssh, health, cloud] + source_count: 2 + first_seen: "2026-04-10" diff --git a/knowledge/repos/the-nexus.yaml b/knowledge/repos/the-nexus.yaml new file mode 100644 index 0000000..783d437 --- /dev/null +++ b/knowledge/repos/the-nexus.yaml @@ -0,0 +1,68 @@ +--- +domain: the-nexus +category: pitfall +version: 1 +last_updated: "2026-04-13" +--- + +# Pitfalls (the-nexus) + +Things that go wrong in this repo if you don't know the traps. + +## Git & Merging + +- id: the-nexus:pitfall:001 + fact: "Merges fail with HTTP 405 due to branch protection — must use merge API with 1 approval" + confidence: 0.95 + tags: [git, merge, branch-protection, gitea] + source_count: 12 + first_seen: "2026-04-05" + last_confirmed: "2026-04-13" + related: [global:pitfall:001] + +- id: the-nexus:pitfall:002 + fact: "ThreadingHTTPServer required for multi-user bridge — standard HTTPServer blocks on concurrent requests" + confidence: 0.95 + tags: [server, concurrency, bridge] + source_count: 5 + first_seen: "2026-04-10" + last_confirmed: "2026-04-13" + related: [the-nexus:pattern:001] + +- id: the-nexus:pitfall:003 + fact: "ChatLog.log() crashes on message persistence when index.html has orphaned button tags" + confidence: 0.9 + tags: [html, crash, chatlog] + source_count: 3 + first_seen: "2026-04-12" + last_confirmed: "2026-04-13" + +## Three.js & Performance + +- id: the-nexus:pitfall:004 + fact: "Three.js LOD not implemented — local hardware struggles with full scene without texture optimization" + confidence: 0.85 + tags: [threejs, performance, lod] + source_count: 4 + first_seen: "2026-04-09" + last_confirmed: "2026-04-13" + related: [the-nexus:pattern:002] + +- id: the-nexus:pitfall:005 + fact: "Duplicate content blocks appear in index.html when PR merges conflict silently" + confidence: 0.8 + tags: [html, merge-conflict, duplicate] + source_count: 3 + first_seen: "2026-04-11" + last_confirmed: "2026-04-13" + +## Deployment + +- id: the-nexus:pitfall:006 + fact: "Unified HTTP + WebSocket server required for proper URL deployment — separate servers break CORS" + confidence: 0.9 + tags: [deploy, websocket, http, cors] + source_count: 4 + first_seen: "2026-04-10" + last_confirmed: "2026-04-13" + related: [the-nexus:pattern:001] diff --git a/metrics/dashboard.md b/metrics/dashboard.md new file mode 100644 index 0000000..8d4a4c0 --- /dev/null +++ b/metrics/dashboard.md @@ -0,0 +1,61 @@ +# Compounding Intelligence Metrics +**Generated:** 2026-04-14T18:07:26.169469+00:00 + +## knowledge_velocity +New facts extracted per day. Higher = compounding loop working. + +**Value:** 1.61 | **7d trend:** N/A — (unknown) + +- total_facts: 44 +- period_days: 18 +- new_facts: 29 + +## knowledge_coverage +Percentage of domains/repos with 10+ facts. Measures breadth. + +**Value:** 0.333 | **7d trend:** N/A — (unknown) + +- covered_domains: 2 +- total_domains: 6 + +## hit_rate +Percentage of sessions referencing bootstrapped knowledge. + +**Value:** 0.677 | **7d trend:** N/A — (unknown) + +- hit_sessions: 8064 +- total_sessions: 11919 + +## error_recurrence +Ratio of recurring errors. Lower = fleet learning from mistakes. + +**Value:** 0.169 | **7d trend:** N/A — (unknown) + +- unique_errors: 53556 +- recurring_errors: 9075 + +## task_completion +Percentage of sessions ending with successful completion. + +**Value:** 0.452 | **7d trend:** N/A — (unknown) + +- normal_end_rate: 0.56 +- completed: 5385 +- total: 11919 + +## first_try_success +Percentage of sessions completed without backtracking. + +**Value:** 0.818 | **7d trend:** N/A — (unknown) + +- avg_tool_msg_ratio: 0.391 +- sampled: 5921 + +## knowledge_age +Freshness of knowledge store. 1.0 = all fresh, 0.0 = all stale. + +**Value:** 0.973 | **7d trend:** N/A — (unknown) + +- avg_age_days: 2.4 +- stale_facts: 0 +- total_facts: 44 diff --git a/metrics/latest_snapshot.json b/metrics/latest_snapshot.json new file mode 100644 index 0000000..dcbd1d0 --- /dev/null +++ b/metrics/latest_snapshot.json @@ -0,0 +1,130 @@ +{ + "generated_at": "2026-04-14T18:07:26.169469+00:00", + "knowledge_velocity": { + "value": 1.61, + "total_facts": 44, + "period_days": 18, + "new_facts": 29 + }, + "knowledge_coverage": { + "value": 0.333, + "covered_domains": 2, + "total_domains": 6, + "domain_details": { + "global": 15, + "unknown": 15, + "hermes-agent": 8, + "pitfalls": 8, + "tool-quirks": 7, + "the-nexus": 6 + } + }, + "hit_rate": { + "value": 0.677, + "hit_sessions": 8064, + "total_sessions": 11919 + }, + "error_recurrence": { + "value": 0.169, + "unique_errors": 53556, + "recurring_errors": 9075, + "top_errors": [ + { + "error": "s, report the error details.", + "sessions": 1185 + }, + { + "error": "\": \"traceback (most recent call last):\\n file \\\"/private/var/folders/9k/v07xkpp", + "sessions": 694 + }, + { + "error": "\", \"output\": \"\\n--- stderr ---\\ntraceback (most recent call last):\\n file \\\"/pr", + "sessions": 684 + }, + { + "error": "ures \u2192 file an issue with the traceback and tag [bug]", + "sessions": 320 + }, + { + "error": "s you encounter \u2192 file an issue with reproduction steps", + "sessions": 320 + }, + { + "error": "s, file a [bug] issue first.", + "sessions": 320 + }, + { + "error": ", fix the code.", + "sessions": 314 + }, + { + "error": "fix it before doing anything else.", + "sessions": 313 + }, + { + "error": "ures \u2192 add a review comment explaining what's wrong", + "sessions": 303 + }, + { + "error": "ures \u2014 they're your roadmap for guardrails", + "sessions": 303 + } + ] + }, + "task_completion": { + "value": 0.452, + "normal_end_rate": 0.56, + "completed": 5385, + "total": 11919, + "breakdown": { + "cron_complete": 5354, + "unknown": 5245, + "compression": 1092, + "cli_close": 197, + "session_reset": 31 + } + }, + "first_try_success": { + "value": 0.818, + "avg_tool_msg_ratio": 0.391, + "sampled": 5921, + "interpretation": "Higher value = fewer backtracks = better first-try success" + }, + "knowledge_age": { + "value": 0.973, + "avg_age_days": 2.4, + "stale_facts": 0, + "total_facts": 44, + "interpretation": "1.0 = all facts fresh. 0.0 = all facts 90+ days old" + }, + "trend_7d": { + "knowledge_velocity": { + "delta": "N/A", + "direction": "unknown" + }, + "knowledge_coverage": { + "delta": "N/A", + "direction": "unknown" + }, + "hit_rate": { + "delta": "N/A", + "direction": "unknown" + }, + "error_recurrence": { + "delta": "N/A", + "direction": "unknown" + }, + "task_completion": { + "delta": "N/A", + "direction": "unknown" + }, + "first_try_success": { + "delta": "N/A", + "direction": "unknown" + }, + "knowledge_age": { + "delta": "N/A", + "direction": "unknown" + } + } +} \ No newline at end of file diff --git a/scripts/measurer.py b/scripts/measurer.py new file mode 100644 index 0000000..3953461 --- /dev/null +++ b/scripts/measurer.py @@ -0,0 +1,607 @@ +#!/usr/bin/env python3 +""" +Compounding Intelligence Metrics Engine. + +Computes 7 metrics that prove whether the knowledge compounding loop is working: + 1. Knowledge velocity — new facts per day + 2. Knowledge coverage — % of domains with >10 facts + 3. Hit rate — % of sessions referencing bootstrap knowledge + 4. Error recurrence — same errors across sessions (should decrease) + 5. Task completion — % of sessions ending successfully + 6. First-try success — actions without backtracking + 7. Knowledge age — staleness of facts + +Usage: + python3 measurer.py # All metrics, all time + python3 measurer.py --since 2026-04-01 # Time range + python3 measurer.py --repo the-nexus # Per-repo metrics + python3 measurer.py --format json # JSON output (default) + python3 measurer.py --format markdown # Human-readable + python3 measurer.py --knowledge-dir ./knowledge # Custom knowledge path + python3 measurer.py --db ~/.hermes/state.db # Custom DB path + +Data sources: + - knowledge/index.json — fact index + - knowledge/ — YAML fact files for coverage + - ~/.hermes/state.db — session/message metadata +""" + +import argparse +import json +import os +import re +import sqlite3 +import sys +from collections import Counter, defaultdict +from datetime import datetime, timedelta, timezone +from pathlib import Path +from typing import Any + + +# ─── Defaults ─────────────────────────────────────────────────────────────────── + +DEFAULT_KNOWLEDGE_DIR = Path(__file__).parent.parent / "knowledge" +DEFAULT_DB_PATH = Path.home() / ".hermes" / "state.db" +SEVEN_DAYS = timedelta(days=7) + + +# ─── Knowledge Store ──────────────────────────────────────────────────────────── + +def load_facts(knowledge_dir: Path) -> list[dict]: + """Load all facts from index.json.""" + index_path = knowledge_dir / "index.json" + if not index_path.exists(): + return [] + with open(index_path) as f: + data = json.load(f) + return data.get("facts", []) + + +def count_yaml_facts(knowledge_dir: Path) -> dict[str, int]: + """Count facts per domain from YAML files (coverage source).""" + domain_counts: dict[str, int] = {} + # Walk repos/, global/, agents/ subdirs + for subdir in ["repos", "global", "agents"]: + dirpath = knowledge_dir / subdir + if not dirpath.exists(): + continue + for yaml_file in dirpath.glob("*.yaml"): + # Count lines that start with "- id:" — each is a fact + count = 0 + try: + content = yaml_file.read_text() + count = len(re.findall(r"^\s*-\s*id:", content, re.MULTILINE)) + except Exception: + pass + domain = yaml_file.stem + domain_counts[domain] = domain_counts.get(domain, 0) + count + return domain_counts + + +# ─── Session Database ─────────────────────────────────────────────────────────── + +def open_db(db_path: Path) -> sqlite3.Connection: + """Open session database.""" + if not db_path.exists(): + print(f"WARNING: Database not found at {db_path}", file=sys.stderr) + return None + conn = sqlite3.connect(str(db_path)) + conn.row_factory = sqlite3.Row + return conn + + +def query_sessions(conn: sqlite3.Connection, since: str = None, repo: str = None) -> list[dict]: + """Query sessions with optional filters.""" + if conn is None: + return [] + + query = """ + SELECT id, started_at, ended_at, end_reason, message_count, + tool_call_count, model + FROM sessions + WHERE 1=1 + """ + params = [] + + if since: + since_ts = datetime.fromisoformat(since).replace(tzinfo=timezone.utc).timestamp() + query += " AND started_at >= ?" + params.append(since_ts) + + query += " ORDER BY started_at ASC" + + cur = conn.execute(query, params) + return [dict(row) for row in cur.fetchall()] + + +def query_messages(conn: sqlite3.Connection, session_ids: list[str] = None, + since_ts: float = None) -> list[dict]: + """Query messages with optional session filter.""" + if conn is None: + return [] + + query = """ + SELECT m.session_id, m.role, m.content, m.tool_name, m.timestamp + FROM messages m + WHERE 1=1 + """ + params = [] + + if since_ts: + query += " AND m.timestamp >= ?" + params.append(since_ts) + + if session_ids: + placeholders = ",".join("?" for _ in session_ids) + query += f" AND m.session_id IN ({placeholders})" + params.extend(session_ids) + + cur = conn.execute(query, params) + return [dict(row) for row in cur.fetchall()] + + +# ─── Metric Computations ─────────────────────────────────────────────────────── + +def compute_knowledge_velocity(facts: list[dict], since: str = None) -> dict: + """Metric 1: New facts per day. Higher = compounding working.""" + if not facts: + return {"value": 0.0, "total_facts": 0, "period_days": 0, "new_facts": 0} + + dates = [] + for f in facts: + d = f.get("first_seen") or f.get("created") + if d: + try: + dt = datetime.fromisoformat(d.replace("Z", "+00:00")) + if dt.tzinfo is None: + dt = dt.replace(tzinfo=timezone.utc) + dates.append(dt) + except (ValueError, AttributeError): + pass + + if not dates: + return {"value": 0.0, "total_facts": len(facts), "period_days": 0, "new_facts": 0} + + if since: + cutoff = datetime.fromisoformat(since).replace(tzinfo=timezone.utc) + dates = [d for d in dates if d >= cutoff] + + if not dates: + return {"value": 0.0, "total_facts": len(facts), "period_days": 0, "new_facts": 0} + + earliest = min(dates) + latest = max(dates) + period_days = max((latest - earliest).days, 1) + + return { + "value": round(len(dates) / period_days, 2), + "total_facts": len(facts), + "period_days": period_days, + "new_facts": len(dates), + } + + +def compute_knowledge_coverage(facts: list[dict], yaml_counts: dict[str, int]) -> dict: + """Metric 2: % of domains with >10 facts. Breadth indicator.""" + domain_fact_counts: dict[str, int] = defaultdict(int) + for f in facts: + domain = f.get("domain", "unknown") + domain_fact_counts[domain] += 1 + + # Merge YAML counts (may have facts not yet indexed) + for domain, count in yaml_counts.items(): + domain_fact_counts[domain] = max(domain_fact_counts[domain], count) + + total_domains = len(domain_fact_counts) + if total_domains == 0: + return {"value": 0.0, "covered_domains": 0, "total_domains": 0, "domain_details": {}} + + covered = sum(1 for c in domain_fact_counts.values() if c >= 10) + + return { + "value": round(covered / total_domains, 3), + "covered_domains": covered, + "total_domains": total_domains, + "domain_details": dict(sorted(domain_fact_counts.items(), key=lambda x: -x[1])[:20]), + } + + +def compute_hit_rate(sessions: list[dict], messages: list[dict], + facts: list[dict]) -> dict: + """Metric 3: % of sessions that reference bootstrap knowledge. + + Looks for message content matching known fact text. + """ + if not sessions or not facts: + return {"value": 0.0, "hit_sessions": 0, "total_sessions": len(sessions)} + + # Build a set of searchable fact fragments (lowercased, 4+ word phrases) + fact_fragments: set[str] = set() + for f in facts: + text = f.get("fact", "").lower().strip() + # Add full fact + if len(text) > 10: + fact_fragments.add(text) + # Add significant words + words = re.findall(r'\w{4,}', text) + for w in words: + fact_fragments.add(w) + + if not fact_fragments: + return {"value": 0.0, "hit_sessions": 0, "total_sessions": len(sessions)} + + # Group messages by session + session_messages: dict[str, list[str]] = defaultdict(list) + for m in messages: + content = (m.get("content") or "").lower() + if content: + session_messages[m["session_id"]].append(content) + + # Check each session for fact references + hit_sessions = 0 + for session in sessions: + sid = session["id"] + all_content = " ".join(session_messages.get(sid, [])) + if any(frag in all_content for frag in fact_fragments): + hit_sessions += 1 + + return { + "value": round(hit_sessions / len(sessions), 3) if sessions else 0.0, + "hit_sessions": hit_sessions, + "total_sessions": len(sessions), + } + + +def compute_error_recurrence(messages: list[dict]) -> dict: + """Metric 4: Same errors appearing across sessions. Should decrease. + + Extracts error signatures and counts how many sessions each appears in. + """ + if not messages: + return {"value": 0.0, "unique_errors": 0, "recurring_errors": 0, "top_errors": []} + + # Extract error patterns from assistant/tool messages + error_pattern = re.compile( + r'(?:error|Error|ERROR|failed|FAIL|exception|Exception)[:\s]*(.{10,80})', + re.IGNORECASE + ) + + error_to_sessions: dict[str, set[str]] = defaultdict(set) + + for m in messages: + content = m.get("content") or "" + if not content: + continue + for match in error_pattern.finditer(content): + sig = match.group(1).strip().lower() + # Normalize whitespace + sig = re.sub(r'\s+', ' ', sig) + if len(sig) > 5: + error_to_sessions[sig].add(m["session_id"]) + + if not error_to_sessions: + return {"value": 0.0, "unique_errors": 0, "recurring_errors": 0, "top_errors": []} + + recurring = {e: s for e, s in error_to_sessions.items() if len(s) > 1} + total_errors = len(error_to_sessions) + recurring_count = len(recurring) + + # Top recurring errors + top = sorted(recurring.items(), key=lambda x: -len(x[1]))[:10] + + return { + "value": round(recurring_count / total_errors, 3) if total_errors else 0.0, + "unique_errors": total_errors, + "recurring_errors": recurring_count, + "top_errors": [{"error": e, "sessions": len(s)} for e, s in top], + } + + +def compute_task_completion(sessions: list[dict]) -> dict: + """Metric 5: % of sessions ending with successful status.""" + if not sessions: + return {"value": 0.0, "completed": 0, "total": 0, "breakdown": {}} + + breakdown: Counter = Counter() + for s in sessions: + reason = s.get("end_reason") or "unknown" + breakdown[reason] += 1 + + completed = breakdown.get("cron_complete", 0) + breakdown.get("session_reset", 0) + # "cli_close" and "compression" are also normal endings + normal_endings = completed + breakdown.get("cli_close", 0) + breakdown.get("compression", 0) + + return { + "value": round(completed / len(sessions), 3) if sessions else 0.0, + "normal_end_rate": round(normal_endings / len(sessions), 3) if sessions else 0.0, + "completed": completed, + "total": len(sessions), + "breakdown": dict(breakdown.most_common()), + } + + +def compute_first_try_success(sessions: list[dict]) -> dict: + """Metric 6: Sessions completed without excessive backtracking. + + Proxy: ratio of tool_call_count to message_count. + Low ratio = fewer retries = more first-try success. + We invert this: high tool/msg ratio means more backtracking (bad). + """ + if not sessions: + return {"value": 0.0, "avg_tool_msg_ratio": 0.0, "sampled": 0} + + ratios = [] + for s in sessions: + msgs = s.get("message_count", 0) or 0 + tools = s.get("tool_call_count", 0) or 0 + if msgs > 2: # Skip trivial sessions + ratios.append(tools / msgs if msgs > 0 else 0) + + if not ratios: + return {"value": 0.0, "avg_tool_msg_ratio": 0.0, "sampled": 0} + + avg_ratio = sum(ratios) / len(ratios) + # First-try success: sessions with tool_msg_ratio < 0.5 (few tools per message) + first_try = sum(1 for r in ratios if r < 0.5) + + return { + "value": round(first_try / len(ratios), 3), + "avg_tool_msg_ratio": round(avg_ratio, 3), + "sampled": len(ratios), + "interpretation": "Higher value = fewer backtracks = better first-try success", + } + + +def compute_knowledge_age(facts: list[dict]) -> dict: + """Metric 7: Days since facts were last confirmed. Staleness indicator.""" + if not facts: + return {"value": 0.0, "avg_age_days": 0, "stale_facts": 0, "total_facts": 0} + + now = datetime.now(timezone.utc) + ages = [] + stale_count = 0 # Facts not confirmed in 30+ days + + for f in facts: + confirmed = f.get("last_confirmed") or f.get("first_seen") + if confirmed: + try: + dt = datetime.fromisoformat(confirmed.replace("Z", "+00:00")) + if dt.tzinfo is None: + dt = dt.replace(tzinfo=timezone.utc) + age = (now - dt).days + ages.append(age) + if age > 30: + stale_count += 1 + except (ValueError, AttributeError): + pass + + if not ages: + return {"value": 0.0, "avg_age_days": 0, "stale_facts": 0, "total_facts": len(facts)} + + avg_age = sum(ages) / len(ages) + # Lower avg age = fresher = better. Invert for a 0-1 score. + freshness = max(0.0, 1.0 - (avg_age / 90)) # 90 days = 0 freshness + + return { + "value": round(freshness, 3), + "avg_age_days": round(avg_age, 1), + "stale_facts": stale_count, + "total_facts": len(facts), + "interpretation": "1.0 = all facts fresh. 0.0 = all facts 90+ days old", + } + + +# ─── Trend Computation ───────────────────────────────────────────────────────── + +def compute_trend(current: dict, previous: dict, metric_key: str = "value") -> dict: + """Compute 7-day trend between two metric snapshots.""" + if not previous: + return {"delta": "N/A", "direction": "unknown"} + + curr_val = current.get(metric_key, 0) + prev_val = previous.get(metric_key, 0) + + if prev_val == 0: + return {"delta": "N/A (no baseline)", "direction": "unknown"} + + pct = ((curr_val - prev_val) / abs(prev_val)) * 100 + direction = "up" if pct > 0 else "down" if pct < 0 else "flat" + + # For error_recurrence, down is good + if metric_key == "error_recurrence" or metric_key == "knowledge_age": + direction_label = "good" if pct < 0 else "bad" if pct > 0 else "neutral" + else: + direction_label = "good" if pct > 0 else "bad" if pct < 0 else "neutral" + + return { + "delta": f"{'+' if pct > 0 else ''}{pct:.1f}%", + "direction": direction, + "assessment": direction_label, + } + + +# ─── Output Formatters ───────────────────────────────────────────────────────── + +def format_json(metrics: dict) -> str: + """Format metrics as JSON.""" + return json.dumps(metrics, indent=2) + + +def format_markdown(metrics: dict) -> str: + """Format metrics as human-readable markdown.""" + lines = [ + "# Compounding Intelligence Metrics", + f"**Generated:** {metrics.get('generated_at', 'unknown')}", + "", + ] + + trend = metrics.get("trend_7d", {}) + + def metric_block(name: str, data: dict, desc: str, good_direction: str = "up"): + val = data.get("value", 0) + t = trend.get(name, {}) + delta = t.get("delta", "N/A") + assessment = t.get("assessment", "unknown") + arrow = "↑" if assessment == "good" else "↓" if assessment == "bad" else "—" + + lines.extend([ + f"## {name}", + f"{desc}", + "", + f"**Value:** {val} | **7d trend:** {delta} {arrow} ({assessment})", + "", + ]) + + # Add key details + for k, v in data.items(): + if k != "value" and k != "interpretation": + if isinstance(v, (int, float, str)): + lines.append(f"- {k}: {v}") + lines.append("") + + metric_block( + "knowledge_velocity", + metrics.get("knowledge_velocity", {}), + "New facts extracted per day. Higher = compounding loop working.", + ) + metric_block( + "knowledge_coverage", + metrics.get("knowledge_coverage", {}), + "Percentage of domains/repos with 10+ facts. Measures breadth.", + ) + metric_block( + "hit_rate", + metrics.get("hit_rate", {}), + "Percentage of sessions referencing bootstrapped knowledge.", + ) + metric_block( + "error_recurrence", + metrics.get("error_recurrence", {}), + "Ratio of recurring errors. Lower = fleet learning from mistakes.", + good_direction="down", + ) + metric_block( + "task_completion", + metrics.get("task_completion", {}), + "Percentage of sessions ending with successful completion.", + ) + metric_block( + "first_try_success", + metrics.get("first_try_success", {}), + "Percentage of sessions completed without backtracking.", + ) + metric_block( + "knowledge_age", + metrics.get("knowledge_age", {}), + "Freshness of knowledge store. 1.0 = all fresh, 0.0 = all stale.", + good_direction="up", + ) + + return "\n".join(lines) + + +# ─── Snapshot Persistence ─────────────────────────────────────────────────────── + +def load_snapshot(metrics_dir: Path) -> dict: + """Load most recent metrics snapshot for trend computation.""" + snapshot_path = metrics_dir / "latest_snapshot.json" + if snapshot_path.exists(): + with open(snapshot_path) as f: + return json.load(f) + return {} + + +def save_snapshot(metrics_dir: Path, metrics: dict): + """Save current metrics as latest snapshot.""" + metrics_dir.mkdir(parents=True, exist_ok=True) + snapshot_path = metrics_dir / "latest_snapshot.json" + with open(snapshot_path, "w") as f: + json.dump(metrics, f, indent=2) + + +# ─── Main ─────────────────────────────────────────────────────────────────────── + +def main(): + parser = argparse.ArgumentParser(description="Compounding Intelligence Metrics") + parser.add_argument("--since", help="Start date (YYYY-MM-DD)") + parser.add_argument("--repo", help="Filter by repo/domain") + parser.add_argument("--format", choices=["json", "markdown"], default="json") + parser.add_argument("--knowledge-dir", type=Path, default=DEFAULT_KNOWLEDGE_DIR) + parser.add_argument("--db", type=Path, default=DEFAULT_DB_PATH) + parser.add_argument("--save-snapshot", action="store_true", + help="Save current metrics as snapshot for trend tracking") + parser.add_argument("--metrics-dir", type=Path, + default=Path(__file__).parent.parent / "metrics", + help="Directory for snapshots and dashboard") + args = parser.parse_args() + + # ── Load data ─────────────────────────────────────────────────────────── + facts = load_facts(args.knowledge_dir) + yaml_counts = count_yaml_facts(args.knowledge_dir) + + if args.repo: + facts = [f for f in facts if f.get("domain") == args.repo] + + conn = open_db(args.db) + sessions = query_sessions(conn, since=args.since) + messages = query_messages(conn) if conn else [] + + if conn: + conn.close() + + # ── Compute metrics ───────────────────────────────────────────────────── + velocity = compute_knowledge_velocity(facts, since=args.since) + coverage = compute_knowledge_coverage(facts, yaml_counts) + hit_rate = compute_hit_rate(sessions, messages, facts) + error_recurrence = compute_error_recurrence(messages) + task_completion = compute_task_completion(sessions) + first_try = compute_first_try_success(sessions) + age = compute_knowledge_age(facts) + + # ── Compute trends ────────────────────────────────────────────────────── + previous = load_snapshot(args.metrics_dir) + trend = { + "knowledge_velocity": compute_trend(velocity, previous.get("knowledge_velocity", {})), + "knowledge_coverage": compute_trend(coverage, previous.get("knowledge_coverage", {})), + "hit_rate": compute_trend(hit_rate, previous.get("hit_rate", {})), + "error_recurrence": compute_trend(error_recurrence, previous.get("error_recurrence", {}), + "value"), + "task_completion": compute_trend(task_completion, previous.get("task_completion", {})), + "first_try_success": compute_trend(first_try, previous.get("first_try_success", {})), + "knowledge_age": compute_trend(age, previous.get("knowledge_age", {})), + } + + # ── Assemble output ───────────────────────────────────────────────────── + now = datetime.now(timezone.utc).isoformat() + metrics = { + "generated_at": now, + "knowledge_velocity": velocity, + "knowledge_coverage": coverage, + "hit_rate": hit_rate, + "error_recurrence": error_recurrence, + "task_completion": task_completion, + "first_try_success": first_try, + "knowledge_age": age, + "trend_7d": trend, + } + + if args.since: + metrics["since"] = args.since + + # ── Save snapshot if requested ────────────────────────────────────────── + if args.save_snapshot: + save_snapshot(args.metrics_dir, metrics) + # Also write dashboard + dashboard_path = args.metrics_dir / "dashboard.md" + with open(dashboard_path, "w") as f: + f.write(format_markdown(metrics)) + + # ── Output ────────────────────────────────────────────────────────────── + if args.format == "json": + print(format_json(metrics)) + else: + print(format_markdown(metrics)) + + +if __name__ == "__main__": + main() diff --git a/scripts/validate_knowledge.py b/scripts/validate_knowledge.py new file mode 100644 index 0000000..2519c4d --- /dev/null +++ b/scripts/validate_knowledge.py @@ -0,0 +1,155 @@ +#!/usr/bin/env python3 +""" +Validate knowledge files and index.json against the schema. + +Usage: + python scripts/validate_knowledge.py [--fix] + +Without --fix: reports errors and exits non-zero if any found. +With --fix: auto-generates missing IDs and updates index.json. +""" + +import json +import sys +import os +from pathlib import Path +from datetime import datetime + +VALID_CATEGORIES = {"fact", "pitfall", "pattern", "tool-quirk", "question"} +REQUIRED_FACT_FIELDS = {"id", "fact", "category", "domain", "confidence"} +MAX_FACT_LENGTH = 280 + + +def validate_fact(fact: dict, source: str = "") -> list[str]: + """Validate a single fact dict. Returns list of errors.""" + errors = [] + + for field in REQUIRED_FACT_FIELDS: + if field not in fact: + errors.append(f"{source}: missing required field '{field}'") + + if "fact" in fact: + if not isinstance(fact["fact"], str) or len(fact["fact"].strip()) == 0: + errors.append(f"{source}: 'fact' must be non-empty string") + elif len(fact["fact"]) > MAX_FACT_LENGTH: + errors.append(f"{source}: 'fact' exceeds {MAX_FACT_LENGTH} chars ({len(fact['fact'])})") + + if "category" in fact and fact["category"] not in VALID_CATEGORIES: + errors.append(f"{source}: invalid category '{fact['category']}' — must be one of {VALID_CATEGORIES}") + + if "confidence" in fact: + if not isinstance(fact["confidence"], (int, float)): + errors.append(f"{source}: 'confidence' must be a number") + elif not (0.0 <= fact["confidence"] <= 1.0): + errors.append(f"{source}: 'confidence' must be 0.0–1.0, got {fact['confidence']}") + + if "id" in fact: + parts = fact["id"].split(":") + if len(parts) != 3: + errors.append(f"{source}: 'id' must be domain:category:sequence, got '{fact['id']}'") + elif parts[1] not in VALID_CATEGORIES: + errors.append(f"{source}: id category '{parts[1]}' not in {VALID_CATEGORIES}") + + if "tags" in fact: + if not isinstance(fact["tags"], list): + errors.append(f"{source}: 'tags' must be a list") + else: + for tag in fact["tags"]: + if not isinstance(tag, str) or not tag.replace("-", "").replace("_", "").isalnum(): + errors.append(f"{source}: tag '{tag}' must be lowercase alphanumeric+hyphens") + + return errors + + +def validate_index(index_path: Path) -> list[str]: + """Validate index.json.""" + errors = [] + + if not index_path.exists(): + return [f"index.json not found at {index_path}"] + + try: + with open(index_path) as f: + data = json.load(f) + except json.JSONDecodeError as e: + return [f"index.json: invalid JSON — {e}"] + + if "version" not in data: + errors.append("index.json: missing 'version' field") + if "facts" not in data: + errors.append("index.json: missing 'facts' field") + elif not isinstance(data["facts"], list): + errors.append("index.json: 'facts' must be a list") + + seen_ids = set() + for i, fact in enumerate(data.get("facts", [])): + fact_errors = validate_fact(fact, source=f"index.json facts[{i}]") + errors.extend(fact_errors) + + if "id" in fact: + if fact["id"] in seen_ids: + errors.append(f"index.json: duplicate id '{fact['id']}'") + seen_ids.add(fact["id"]) + + return errors + + +def validate_yaml_facts(facts: list[dict], source: str) -> list[str]: + """Validate facts extracted from a YAML file.""" + errors = [] + seen_ids = set() + + for i, fact in enumerate(facts): + fact_errors = validate_fact(fact, source=f"{source}[{i}]") + errors.extend(fact_errors) + + if "id" in fact: + if fact["id"] in seen_ids: + errors.append(f"{source}: duplicate id '{fact['id']}'") + seen_ids.add(fact["id"]) + + return errors + + +def main(): + fix_mode = "--fix" in sys.argv + repo_root = Path(__file__).parent.parent + knowledge_dir = repo_root / "knowledge" + index_path = knowledge_dir / "index.json" + + all_errors = [] + + # Validate index.json + index_errors = validate_index(index_path) + all_errors.extend(index_errors) + + # Validate YAML files (basic existence check — full YAML parsing requires pyyaml) + yaml_dirs = ["global", "repos", "agents"] + for dir_name in yaml_dirs: + dir_path = knowledge_dir / dir_name + if not dir_path.exists(): + all_errors.append(f"knowledge/{dir_name}/ directory not found") + + # Report + if all_errors: + print(f"VALIDATION FAILED — {len(all_errors)} error(s):\n") + for err in all_errors: + print(f" ✗ {err}") + sys.exit(1) + else: + # Count facts + try: + with open(index_path) as f: + data = json.load(f) + fact_count = len(data.get("facts", [])) + except: + fact_count = 0 + + print(f"VALIDATION PASSED") + print(f" index.json: {fact_count} facts") + print(f" schema: v1") + sys.exit(0) + + +if __name__ == "__main__": + main()