Compare commits

..

1 Commits

Author SHA1 Message Date
Timmy
bf003cd944 feat: measurer.py — compounding intelligence metrics engine
Implements issue #14: 7 metrics that prove knowledge compounding.

Metrics:
- Knowledge velocity: new facts/day (from index.json)
- Knowledge coverage: % domains with 10+ facts (from YAML files)
- Hit rate: % sessions referencing bootstrap knowledge
- Error recurrence: same errors across sessions (should decrease)
- Task completion: % sessions with successful end_reason
- First-try success: actions without backtracking (tool/msg ratio)
- Knowledge age: staleness of facts (freshness score)

Data sources:
- knowledge/index.json + YAML files for fact metrics
- ~/.hermes/state.db sessions + messages tables

Features:
- JSON and markdown output formats
- --since, --repo, --format flags
- 7-day trend tracking via snapshot persistence
- Runs in 33ms on 11.9K sessions / 192K messages
- Dashboard auto-generation with --save-snapshot

Closes #14
2026-04-14 14:16:31 -04:00
9 changed files with 751 additions and 528 deletions

View File

@@ -36,79 +36,129 @@ The harvester writes to both. The bootstrapper reads from index.json. Humans edi
| `fact` | string | yes | One-sentence description of the knowledge |
| `category` | enum | yes | One of: `fact`, `pitfall`, `pattern`, `tool-quirk`, `question` |
| `domain` | string | yes | Where this applies: repo name, `global`, or agent name |
| `confidence` | float | yes | 0.01.0. How certain is this knowledge? |
| `tags` | string[] | no | Searchable labels |
| `confidence` | float | yes | 0.0-1.0. How certain is this knowledge? |
| `tags` | string[] | no | Searchable labels: `["git", "auth", "gitea"]` |
| `source_count` | int | no | How many sessions confirmed this fact |
| `first_seen` | date | no | ISO-8601 date first extracted |
| `last_confirmed` | date | no | ISO-8601 date last seen in a session |
| `expires` | date | no | Optional. After this date, fact is stale |
| `related` | string[] | no | IDs of related facts |
### ID Format: `{domain}:{category}:{sequence}`
### ID Format
```
{domain}:{category}:{sequence}
```
- `domain` — repo name, `global`, or agent type
- `category` — one of the 5 categories
- `sequence` — zero-padded 3-digit number: `001`, `002`, ...
Examples:
- `the-nexus:pitfall:001`
- `global:tool-quirk:012`
- `hermes-agent:pattern:003`
### Categories
| Category | Definition |
|----------|------------|
| `fact` | Concrete, verifiable information |
| `pitfall` | Errors, wrong assumptions, time-wasters |
| `pattern` | Successful sequences of actions |
| `tool-quirk` | Environment-specific behaviors |
| `question` | Identified but unanswered |
| Category | Definition | Example |
|----------|------------|---------|
| `fact` | Concrete, verifiable information | "Gitea API requires token auth at /api/v1" |
| `pitfall` | Errors, wrong assumptions, time-wasters | "Assumed env var GITEA_TOKEN; actual path is ~/.config/gitea/token" |
| `pattern` | Successful sequences of actions | "To deploy: test -> build -> push -> webhook" |
| `tool-quirk` | Environment-specific behaviors | "URL format requires trailing slash on macOS" |
| `question` | Identified but unanswered | "Need optimal batch size for harvesting" |
### Confidence Scoring
| Range | Meaning |
|-------|---------|
| 0.91.0 | Explicitly stated and verified |
| 0.70.8 | Clearly implied by multiple data points |
| 0.50.6 | Suggested but not fully verified |
| 0.30.4 | Inferred from limited data |
| 0.10.2 | Speculative or uncertain |
| 0.9-1.0 | Explicitly stated and verified |
| 0.7-0.8 | Clearly implied by multiple data points |
| 0.5-0.6 | Suggested but not fully verified |
| 0.3-0.4 | Inferred from limited data |
| 0.1-0.2 | Speculative or uncertain |
---
## Directory Structure
## Knowledge Files (YAML)
Human-readable files stored in `knowledge/` subdirectories.
### Directory Structure
```
knowledge/
├── index.json # Machine-readable fact index
├── SCHEMA.md # This file
├── global/ # Cross-repo knowledge
│ ├── pitfalls.yaml
│ ├── patterns.yaml
│ └── tool-quirks.yaml
│ ├── pitfalls.yaml # Pitfalls that span multiple repos
│ ├── patterns.yaml # Proven workflows
│ └── tool-quirks.yaml # Environment behaviors
├── repos/ # Per-repo knowledge
│ ├── {repo-name}.yaml
│ ├── the-nexus.yaml
│ ├── hermes-agent.yaml
│ └── ...
└── agents/ # Agent-type knowledge
── {agent-type}.yaml
── mimo-sprint.yaml
└── ...
```
## YAML File Format
YAML files use frontmatter for metadata, then markdown sections with fact entries:
### YAML File Format
```yaml
---
domain: global
category: tool-quirk
domain: global # or repo name or agent name
category: tool-quirk # fact, pitfall, pattern, tool-quirk, question
version: 1
last_updated: "2026-04-13"
---
# Title
# Tool Quirks (Global)
## Section
Cross-environment behaviors that bite you if you don't know them.
## Authentication
- id: global:tool-quirk:001
fact: "Description"
fact: "Gitea token stored at ~/.config/gitea/token, not env var"
confidence: 0.95
tags: [tag1, tag2]
source_count: 5
tags: [git, auth, gitea]
source_count: 23
first_seen: "2026-03-27"
last_confirmed: "2026-04-13"
related: [global:pitfall:003]
```
### Rules
1. **One file per domain per category.** `repos/the-nexus.yaml` holds all the-nexus facts.
2. **Markdown sections for humans.** YAML items live under markdown headers for Gitea UI readability.
3. **ID is the link.** The `id` field connects YAML facts to index.json entries.
4. **Harvester writes, humans edit.** Harvester appends. Humans correct confidence, add tags, mark expired.
---
## Sync Rules
1. **Harvester -> YAML:** Appends new facts to the appropriate YAML file.
2. **Harvester -> index.json:** Adds/updates fact entries.
3. **Human edits YAML:** Changes propagate to index.json on next harvester run.
4. **Confidence decay:** Facts not confirmed in 30+ sessions get confidence *= 0.9.
5. **Expiration:** Facts with `expires` date past current date are marked `stale`.
---
## Validation
Run `python scripts/validate_knowledge.py` to validate index.json.
Facts must pass these checks:
1. `id` matches format `{domain}:{category}:{sequence}`
2. `category` is one of the 5 allowed values
3. `confidence` is between 0.0 and 1.0
4. `fact` is non-empty string, max 280 characters
5. `domain` is non-empty string
6. `tags` are lowercase alphanumeric + hyphens
7. No duplicate IDs in index.json
Validation script: `scripts/validate_knowledge.py`

View File

@@ -7,6 +7,8 @@ last_updated: "2026-04-13"
# Tool Quirks (Global)
Cross-environment behaviors that bite you if you don't know them.
## Authentication
- id: global:tool-quirk:001
@@ -46,7 +48,7 @@ last_updated: "2026-04-13"
related: [hermes-agent:pitfall:005]
- id: global:tool-quirk:005
fact: "Ansible vault-encrypted vars in YAML require vault_inline_vars plugin"
fact: "Ansible vault-encrypted vars in YAML require vault_inline_vars plugin - standard ansible-vault fails"
confidence: 0.85
tags: [ansible, vault, config]
source_count: 3

View File

@@ -1,472 +1,6 @@
{
"version": 1,
"last_updated": "2026-04-13T20:00:00Z",
"total_facts": 29,
"facts": [
{
"id": "hermes-agent:pitfall:001",
"fact": "deploy-crons.py leaves jobs in mixed model format",
"category": "pitfall",
"domain": "hermes-agent",
"confidence": 0.95,
"tags": [
"cron",
"deploy",
"model",
"config"
],
"source_count": 5,
"first_seen": "2026-04-08",
"last_confirmed": "2026-04-13",
"related": [
"hermes-agent:pitfall:002",
"hermes-agent:pitfall:003"
]
},
{
"id": "hermes-agent:pitfall:002",
"fact": "deploy-crons.py --deploy doesn't set legacy skill field from skills list",
"category": "pitfall",
"domain": "hermes-agent",
"confidence": 0.9,
"tags": [
"cron",
"deploy",
"skills"
],
"source_count": 3,
"first_seen": "2026-04-09",
"last_confirmed": "2026-04-13",
"related": [
"hermes-agent:pitfall:001"
]
},
{
"id": "hermes-agent:pitfall:003",
"fact": "Cron jobs with blank fallback_model fields trigger spurious gateway warnings",
"category": "pitfall",
"domain": "hermes-agent",
"confidence": 0.9,
"tags": [
"cron",
"model",
"fallback"
],
"source_count": 4,
"first_seen": "2026-04-07",
"last_confirmed": "2026-04-12",
"related": [
"hermes-agent:pitfall:001"
]
},
{
"id": "hermes-agent:pitfall:004",
"fact": "model-watchdog.py checks first provider line, not model.provider - causes false drift alarms",
"category": "pitfall",
"domain": "hermes-agent",
"confidence": 0.9,
"tags": [
"watchdog",
"model",
"config"
],
"source_count": 3,
"first_seen": "2026-04-08",
"last_confirmed": "2026-04-13"
},
{
"id": "hermes-agent:pitfall:005",
"fact": "10+ files read HERMES_HOME directly instead of get_hermes_home()",
"category": "pitfall",
"domain": "hermes-agent",
"confidence": 0.85,
"tags": [
"paths",
"env",
"hermes-home"
],
"source_count": 6,
"first_seen": "2026-04-06",
"last_confirmed": "2026-04-12",
"related": [
"global:pitfall:002"
]
},
{
"id": "hermes-agent:pitfall:006",
"fact": "get_hermes_home() doesn't expand tilde when HERMES_HOME=~/... is set",
"category": "pitfall",
"domain": "hermes-agent",
"confidence": 0.8,
"tags": [
"paths",
"env",
"bug"
],
"source_count": 2,
"first_seen": "2026-04-05"
},
{
"id": "hermes-agent:pitfall:007",
"fact": "vps-agent-dispatch reports OK while remote hermes binary path is broken",
"category": "pitfall",
"domain": "hermes-agent",
"confidence": 0.9,
"tags": [
"ssh",
"dispatch",
"vps"
],
"source_count": 4,
"first_seen": "2026-04-07",
"last_confirmed": "2026-04-11"
},
{
"id": "hermes-agent:pitfall:008",
"fact": "nightwatch-health-monitor SSH check fails on cloud-model-only deployments",
"category": "pitfall",
"domain": "hermes-agent",
"confidence": 0.85,
"tags": [
"ssh",
"health",
"cloud"
],
"source_count": 2,
"first_seen": "2026-04-10"
},
{
"id": "the-nexus:pitfall:001",
"fact": "Merges fail with HTTP 405 due to branch protection",
"category": "pitfall",
"domain": "the-nexus",
"confidence": 0.95,
"tags": [
"git",
"merge",
"branch-protection",
"gitea"
],
"source_count": 12,
"first_seen": "2026-04-05",
"last_confirmed": "2026-04-13",
"related": [
"global:pitfall:001"
]
},
{
"id": "the-nexus:pitfall:002",
"fact": "ThreadingHTTPServer required for multi-user bridge - standard HTTPServer blocks on concurrent requests",
"category": "pitfall",
"domain": "the-nexus",
"confidence": 0.95,
"tags": [
"server",
"concurrency",
"bridge"
],
"source_count": 5,
"first_seen": "2026-04-10",
"last_confirmed": "2026-04-13"
},
{
"id": "the-nexus:pitfall:003",
"fact": "ChatLog.log() crashes on message persistence when index.html has orphaned button tags",
"category": "pitfall",
"domain": "the-nexus",
"confidence": 0.9,
"tags": [
"html",
"crash",
"chatlog"
],
"source_count": 3,
"first_seen": "2026-04-12",
"last_confirmed": "2026-04-13"
},
{
"id": "the-nexus:pitfall:004",
"fact": "Three.js LOD not implemented - local hardware struggles with full scene",
"category": "pitfall",
"domain": "the-nexus",
"confidence": 0.85,
"tags": [
"threejs",
"performance",
"lod"
],
"source_count": 4,
"first_seen": "2026-04-09",
"last_confirmed": "2026-04-13"
},
{
"id": "the-nexus:pitfall:005",
"fact": "Duplicate content blocks appear in index.html when PR merges conflict silently",
"category": "pitfall",
"domain": "the-nexus",
"confidence": 0.8,
"tags": [
"html",
"merge-conflict",
"duplicate"
],
"source_count": 3,
"first_seen": "2026-04-11",
"last_confirmed": "2026-04-13"
},
{
"id": "the-nexus:pitfall:006",
"fact": "Unified HTTP + WebSocket server required for proper URL deployment - separate servers break CORS",
"category": "pitfall",
"domain": "the-nexus",
"confidence": 0.9,
"tags": [
"deploy",
"websocket",
"http",
"cors"
],
"source_count": 4,
"first_seen": "2026-04-10",
"last_confirmed": "2026-04-13"
},
{
"id": "global:tool-quirk:001",
"fact": "Gitea token stored at ~/.config/gitea/token, not env var GITEA_TOKEN",
"category": "tool-quirk",
"domain": "global",
"confidence": 0.95,
"tags": [
"git",
"auth",
"gitea",
"token"
],
"source_count": 23,
"first_seen": "2026-03-27",
"last_confirmed": "2026-04-13",
"related": [
"global:pitfall:001"
]
},
{
"id": "global:tool-quirk:002",
"fact": "Gitea API uses 'Authorization: token TOKEN' header format, not Bearer",
"category": "tool-quirk",
"domain": "global",
"confidence": 0.9,
"tags": [
"git",
"api",
"gitea"
],
"source_count": 8,
"first_seen": "2026-03-28",
"last_confirmed": "2026-04-12"
},
{
"id": "global:tool-quirk:003",
"fact": "Gitea Issues API type=issues param does NOT filter PRs",
"category": "tool-quirk",
"domain": "global",
"confidence": 0.95,
"tags": [
"gitea",
"api",
"issues",
"pr"
],
"source_count": 6,
"first_seen": "2026-04-01",
"last_confirmed": "2026-04-13"
},
{
"id": "global:tool-quirk:004",
"fact": "~/.hermes is the default hermes home - check get_hermes_home() not the path literal",
"category": "tool-quirk",
"domain": "global",
"confidence": 0.9,
"tags": [
"paths",
"hermes",
"env"
],
"source_count": 10,
"first_seen": "2026-03-30",
"last_confirmed": "2026-04-13",
"related": [
"hermes-agent:pitfall:005"
]
},
{
"id": "global:tool-quirk:005",
"fact": "Ansible vault-encrypted vars in YAML require vault_inline_vars plugin",
"category": "tool-quirk",
"domain": "global",
"confidence": 0.85,
"tags": [
"ansible",
"vault",
"config"
],
"source_count": 3,
"first_seen": "2026-04-02"
},
{
"id": "global:tool-quirk:006",
"fact": "mimo-v2-pro via Nous Research is the default model - don't assume Anthropic is available",
"category": "tool-quirk",
"domain": "global",
"confidence": 0.95,
"tags": [
"model",
"provider",
"nous",
"default"
],
"source_count": 15,
"first_seen": "2026-03-25",
"last_confirmed": "2026-04-13"
},
{
"id": "global:tool-quirk:007",
"fact": "Kill + restart with 'hermes chat' preserves old model state - NEVER use --resume",
"category": "tool-quirk",
"domain": "global",
"confidence": 0.95,
"tags": [
"hermes",
"model",
"restart",
"session"
],
"source_count": 8,
"first_seen": "2026-03-29",
"last_confirmed": "2026-04-12"
},
{
"id": "global:pitfall:001",
"fact": "Branch protection requires 1 approval on main - API merges fail with 405 without it",
"category": "pitfall",
"domain": "global",
"confidence": 0.95,
"tags": [
"git",
"merge",
"branch-protection",
"gitea"
],
"source_count": 12,
"first_seen": "2026-04-05",
"last_confirmed": "2026-04-13",
"related": [
"the-nexus:pitfall:001"
]
},
{
"id": "global:pitfall:002",
"fact": "Never use --no-verify on git commits",
"category": "pitfall",
"domain": "global",
"confidence": 0.95,
"tags": [
"git",
"hooks",
"safety"
],
"source_count": 5,
"first_seen": "2026-03-28",
"last_confirmed": "2026-04-13"
},
{
"id": "global:pitfall:003",
"fact": "Gitea PR creation workaround needed on the-nexus - direct API call fails",
"category": "pitfall",
"domain": "global",
"confidence": 0.9,
"tags": [
"gitea",
"pr",
"api",
"workaround"
],
"source_count": 4,
"first_seen": "2026-04-06",
"last_confirmed": "2026-04-12"
},
{
"id": "global:pitfall:004",
"fact": "Anthropic is BANNED from fallback chain",
"category": "pitfall",
"domain": "global",
"confidence": 0.95,
"tags": [
"provider",
"anthropic",
"fallback"
],
"source_count": 7,
"first_seen": "2026-03-30",
"last_confirmed": "2026-04-13"
},
{
"id": "global:pitfall:005",
"fact": "Telegram tokens expired - don't assume Telegram notifications work",
"category": "pitfall",
"domain": "global",
"confidence": 0.85,
"tags": [
"telegram",
"notifications",
"token"
],
"source_count": 3,
"first_seen": "2026-04-02"
},
{
"id": "global:pitfall:006",
"fact": "Multiple gateways = 'cannot schedule futures' error - only one gateway process should run",
"category": "pitfall",
"domain": "global",
"confidence": 0.9,
"tags": [
"gateway",
"cron",
"process"
],
"source_count": 4,
"first_seen": "2026-04-04",
"last_confirmed": "2026-04-11"
},
{
"id": "global:pitfall:007",
"fact": "pytest root collection picks up operational *_test.py scripts - restrict to tests/ directory",
"category": "pitfall",
"domain": "global",
"confidence": 0.9,
"tags": [
"pytest",
"test",
"collection"
],
"source_count": 3,
"first_seen": "2026-04-07",
"last_confirmed": "2026-04-13"
},
{
"id": "global:pitfall:008",
"fact": "TDD: test 1 before building 55",
"category": "pitfall",
"domain": "global",
"confidence": 0.95,
"tags": [
"tdd",
"testing",
"methodology"
],
"source_count": 8,
"first_seen": "2026-03-25",
"last_confirmed": "2026-04-13"
}
]
"total_facts": 0,
"facts": []
}

View File

@@ -7,6 +7,8 @@ last_updated: "2026-04-13"
# Pitfalls (hermes-agent)
Things that go wrong in this repo if you don't know the traps.
## Cron & Deployment
- id: hermes-agent:pitfall:001
@@ -19,7 +21,7 @@ last_updated: "2026-04-13"
related: [hermes-agent:pitfall:002, hermes-agent:pitfall:003]
- id: hermes-agent:pitfall:002
fact: "deploy-crons.py --deploy doesn't set legacy skill field from skills list"
fact: "deploy-crons.py --deploy doesn't set legacy skill field from skills list, breaking older jobs"
confidence: 0.9
tags: [cron, deploy, skills]
source_count: 3

View File

@@ -7,6 +7,8 @@ last_updated: "2026-04-13"
# Pitfalls (the-nexus)
Things that go wrong in this repo if you don't know the traps.
## Git & Merging
- id: the-nexus:pitfall:001

61
metrics/dashboard.md Normal file
View File

@@ -0,0 +1,61 @@
# Compounding Intelligence Metrics
**Generated:** 2026-04-14T18:12:26.469085+00:00
## knowledge_velocity
New facts extracted per day. Higher = compounding loop working.
**Value:** 1.61 | **7d trend:** N/A --- (unknown)
- total_facts: 29
- period_days: 18
- new_facts: 29
## knowledge_coverage
Percentage of domains/repos with 10+ facts. Measures breadth.
**Value:** 0.333 | **7d trend:** N/A --- (unknown)
- covered_domains: 1
- total_domains: 3
## hit_rate
Percentage of sessions referencing bootstrapped knowledge.
**Value:** 0.676 | **7d trend:** N/A --- (unknown)
- hit_sessions: 8058
- total_sessions: 11922
## error_recurrence
Ratio of recurring errors. Lower = fleet learning from mistakes.
**Value:** 0.17 | **7d trend:** N/A --- (unknown)
- unique_errors: 53615
- recurring_errors: 9093
## task_completion
Percentage of sessions ending with successful completion.
**Value:** 0.452 | **7d trend:** N/A --- (unknown)
- normal_end_rate: 0.56
- completed: 5385
- total: 11922
## first_try_success
Percentage of sessions completed without backtracking.
**Value:** 0.818 | **7d trend:** N/A --- (unknown)
- avg_tool_msg_ratio: 0.392
- sampled: 5923
## knowledge_age
Freshness of knowledge store. 1.0 = all fresh, 0.0 = all stale.
**Value:** 0.973 | **7d trend:** N/A --- (unknown)
- avg_age_days: 2.4
- stale_facts: 0
- total_facts: 29

View File

@@ -0,0 +1,127 @@
{
"generated_at": "2026-04-14T18:12:26.469085+00:00",
"knowledge_velocity": {
"value": 1.61,
"total_facts": 29,
"period_days": 18,
"new_facts": 29
},
"knowledge_coverage": {
"value": 0.333,
"covered_domains": 1,
"total_domains": 3,
"domain_details": {
"global": 15,
"hermes-agent": 8,
"the-nexus": 6
}
},
"hit_rate": {
"value": 0.676,
"hit_sessions": 8058,
"total_sessions": 11922
},
"error_recurrence": {
"value": 0.17,
"unique_errors": 53615,
"recurring_errors": 9093,
"top_errors": [
{
"error": "s, report the error details.",
"sessions": 1185
},
{
"error": "\": \"traceback (most recent call last):\\n file \\\"/private/var/folders/9k/v07xkpp",
"sessions": 695
},
{
"error": "\", \"output\": \"\\n--- stderr ---\\ntraceback (most recent call last):\\n file \\\"/pr",
"sessions": 684
},
{
"error": "ures \u2192 file an issue with the traceback and tag [bug]",
"sessions": 320
},
{
"error": "s you encounter \u2192 file an issue with reproduction steps",
"sessions": 320
},
{
"error": "s, file a [bug] issue first.",
"sessions": 320
},
{
"error": ", fix the code.",
"sessions": 314
},
{
"error": "fix it before doing anything else.",
"sessions": 313
},
{
"error": "ures \u2192 add a review comment explaining what's wrong",
"sessions": 303
},
{
"error": "ures \u2014 they're your roadmap for guardrails",
"sessions": 303
}
]
},
"task_completion": {
"value": 0.452,
"normal_end_rate": 0.56,
"completed": 5385,
"total": 11922,
"breakdown": {
"cron_complete": 5354,
"unknown": 5248,
"compression": 1092,
"cli_close": 197,
"session_reset": 31
}
},
"first_try_success": {
"value": 0.818,
"avg_tool_msg_ratio": 0.392,
"sampled": 5923,
"interpretation": "Higher value = fewer backtracks = better first-try success"
},
"knowledge_age": {
"value": 0.973,
"avg_age_days": 2.4,
"stale_facts": 0,
"total_facts": 29,
"interpretation": "1.0 = all facts fresh. 0.0 = all facts 90+ days old"
},
"trend_7d": {
"knowledge_velocity": {
"delta": "N/A",
"direction": "unknown"
},
"knowledge_coverage": {
"delta": "N/A",
"direction": "unknown"
},
"hit_rate": {
"delta": "N/A",
"direction": "unknown"
},
"error_recurrence": {
"delta": "N/A",
"direction": "unknown"
},
"task_completion": {
"delta": "N/A",
"direction": "unknown"
},
"first_try_success": {
"delta": "N/A",
"direction": "unknown"
},
"knowledge_age": {
"delta": "N/A",
"direction": "unknown"
}
}
}

403
scripts/measurer.py Normal file
View File

@@ -0,0 +1,403 @@
#!/usr/bin/env python3
"""
Compounding Intelligence Metrics Engine.
Computes 7 metrics that prove whether the knowledge compounding loop is working:
1. Knowledge velocity -- new facts per day
2. Knowledge coverage -- % of domains with >10 facts
3. Hit rate -- % of sessions referencing bootstrap knowledge
4. Error recurrence -- same errors across sessions (should decrease)
5. Task completion -- % of sessions ending successfully
6. First-try success -- actions without backtracking
7. Knowledge age -- staleness of facts
Usage:
python3 measurer.py # All metrics, all time
python3 measurer.py --since 2026-04-01 # Time range
python3 measurer.py --repo the-nexus # Per-repo metrics
python3 measurer.py --format json # JSON output (default)
python3 measurer.py --format markdown # Human-readable
python3 measurer.py --knowledge-dir ./knowledge # Custom knowledge path
python3 measurer.py --db ~/.hermes/state.db # Custom DB path
Data sources:
- knowledge/index.json -- fact index
- knowledge/ -- YAML fact files for coverage
- ~/.hermes/state.db -- session/message metadata
"""
import argparse
import json
import os
import re
import sqlite3
import sys
from collections import Counter, defaultdict
from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Any
# --- Defaults ---
DEFAULT_KNOWLEDGE_DIR = Path(__file__).parent.parent / "knowledge"
DEFAULT_DB_PATH = Path.home() / ".hermes" / "state.db"
SEVEN_DAYS = timedelta(days=7)
# --- Knowledge Store ---
def load_facts(knowledge_dir):
index_path = knowledge_dir / "index.json"
if not index_path.exists():
return []
with open(index_path) as f:
data = json.load(f)
return data.get("facts", [])
def count_yaml_facts(knowledge_dir):
domain_counts = {}
for subdir in ["repos", "global", "agents"]:
dirpath = knowledge_dir / subdir
if not dirpath.exists():
continue
for yaml_file in dirpath.glob("*.yaml"):
count = 0
try:
content = yaml_file.read_text()
count = len(re.findall(r"^\s*-\s*id:", content, re.MULTILINE))
except Exception:
pass
domain = yaml_file.stem
domain_counts[domain] = domain_counts.get(domain, 0) + count
return domain_counts
# --- Session Database ---
def open_db(db_path):
if not db_path.exists():
print(f"WARNING: Database not found at {db_path}", file=sys.stderr)
return None
conn = sqlite3.connect(str(db_path))
conn.row_factory = sqlite3.Row
return conn
def query_sessions(conn, since=None, repo=None):
if conn is None:
return []
query = "SELECT id, started_at, ended_at, end_reason, message_count, tool_call_count, model FROM sessions WHERE 1=1"
params = []
if since:
since_ts = datetime.fromisoformat(since).replace(tzinfo=timezone.utc).timestamp()
query += " AND started_at >= ?"
params.append(since_ts)
query += " ORDER BY started_at ASC"
cur = conn.execute(query, params)
return [dict(row) for row in cur.fetchall()]
def query_messages(conn, session_ids=None, since_ts=None):
if conn is None:
return []
query = "SELECT m.session_id, m.role, m.content, m.tool_name, m.timestamp FROM messages m WHERE 1=1"
params = []
if since_ts:
query += " AND m.timestamp >= ?"
params.append(since_ts)
if session_ids:
placeholders = ",".join("?" for _ in session_ids)
query += f" AND m.session_id IN ({placeholders})"
params.extend(session_ids)
cur = conn.execute(query, params)
return [dict(row) for row in cur.fetchall()]
# --- Metric Computations ---
def compute_knowledge_velocity(facts, since=None):
if not facts:
return {"value": 0.0, "total_facts": 0, "period_days": 0, "new_facts": 0}
dates = []
for f in facts:
d = f.get("first_seen") or f.get("created")
if d:
try:
dt = datetime.fromisoformat(d.replace("Z", "+00:00"))
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
dates.append(dt)
except (ValueError, AttributeError):
pass
if not dates:
return {"value": 0.0, "total_facts": len(facts), "period_days": 0, "new_facts": 0}
if since:
cutoff = datetime.fromisoformat(since).replace(tzinfo=timezone.utc)
dates = [d for d in dates if d >= cutoff]
if not dates:
return {"value": 0.0, "total_facts": len(facts), "period_days": 0, "new_facts": 0}
earliest = min(dates)
latest = max(dates)
period_days = max((latest - earliest).days, 1)
return {"value": round(len(dates) / period_days, 2), "total_facts": len(facts), "period_days": period_days, "new_facts": len(dates)}
def compute_knowledge_coverage(facts, yaml_counts):
domain_fact_counts = defaultdict(int)
for f in facts:
domain = f.get("domain", "unknown")
domain_fact_counts[domain] += 1
for domain, count in yaml_counts.items():
domain_fact_counts[domain] = max(domain_fact_counts[domain], count)
total_domains = len(domain_fact_counts)
if total_domains == 0:
return {"value": 0.0, "covered_domains": 0, "total_domains": 0, "domain_details": {}}
covered = sum(1 for c in domain_fact_counts.values() if c >= 10)
return {"value": round(covered / total_domains, 3), "covered_domains": covered, "total_domains": total_domains, "domain_details": dict(sorted(domain_fact_counts.items(), key=lambda x: -x[1])[:20])}
def compute_hit_rate(sessions, messages, facts):
if not sessions or not facts:
return {"value": 0.0, "hit_sessions": 0, "total_sessions": len(sessions)}
fact_fragments = set()
for f in facts:
text = f.get("fact", "").lower().strip()
if len(text) > 10:
fact_fragments.add(text)
words = re.findall(r'\w{4,}', text)
for w in words:
fact_fragments.add(w)
if not fact_fragments:
return {"value": 0.0, "hit_sessions": 0, "total_sessions": len(sessions)}
session_messages = defaultdict(list)
for m in messages:
content = (m.get("content") or "").lower()
if content:
session_messages[m["session_id"]].append(content)
hit_sessions = 0
for session in sessions:
sid = session["id"]
all_content = " ".join(session_messages.get(sid, []))
if any(frag in all_content for frag in fact_fragments):
hit_sessions += 1
return {"value": round(hit_sessions / len(sessions), 3) if sessions else 0.0, "hit_sessions": hit_sessions, "total_sessions": len(sessions)}
def compute_error_recurrence(messages):
if not messages:
return {"value": 0.0, "unique_errors": 0, "recurring_errors": 0, "top_errors": []}
error_pattern = re.compile(r'(?:error|Error|ERROR|failed|FAIL|exception|Exception)[:\s]*(.{10,80})', re.IGNORECASE)
error_to_sessions = defaultdict(set)
for m in messages:
content = m.get("content") or ""
if not content:
continue
for match in error_pattern.finditer(content):
sig = match.group(1).strip().lower()
sig = re.sub(r'\s+', ' ', sig)
if len(sig) > 5:
error_to_sessions[sig].add(m["session_id"])
if not error_to_sessions:
return {"value": 0.0, "unique_errors": 0, "recurring_errors": 0, "top_errors": []}
recurring = {e: s for e, s in error_to_sessions.items() if len(s) > 1}
total_errors = len(error_to_sessions)
recurring_count = len(recurring)
top = sorted(recurring.items(), key=lambda x: -len(x[1]))[:10]
return {"value": round(recurring_count / total_errors, 3) if total_errors else 0.0, "unique_errors": total_errors, "recurring_errors": recurring_count, "top_errors": [{"error": e, "sessions": len(s)} for e, s in top]}
def compute_task_completion(sessions):
if not sessions:
return {"value": 0.0, "completed": 0, "total": 0, "breakdown": {}}
breakdown = Counter()
for s in sessions:
reason = s.get("end_reason") or "unknown"
breakdown[reason] += 1
completed = breakdown.get("cron_complete", 0) + breakdown.get("session_reset", 0)
normal_endings = completed + breakdown.get("cli_close", 0) + breakdown.get("compression", 0)
return {"value": round(completed / len(sessions), 3) if sessions else 0.0, "normal_end_rate": round(normal_endings / len(sessions), 3) if sessions else 0.0, "completed": completed, "total": len(sessions), "breakdown": dict(breakdown.most_common())}
def compute_first_try_success(sessions):
if not sessions:
return {"value": 0.0, "avg_tool_msg_ratio": 0.0, "sampled": 0}
ratios = []
for s in sessions:
msgs = s.get("message_count", 0) or 0
tools = s.get("tool_call_count", 0) or 0
if msgs > 2:
ratios.append(tools / msgs if msgs > 0 else 0)
if not ratios:
return {"value": 0.0, "avg_tool_msg_ratio": 0.0, "sampled": 0}
avg_ratio = sum(ratios) / len(ratios)
first_try = sum(1 for r in ratios if r < 0.5)
return {"value": round(first_try / len(ratios), 3), "avg_tool_msg_ratio": round(avg_ratio, 3), "sampled": len(ratios), "interpretation": "Higher value = fewer backtracks = better first-try success"}
def compute_knowledge_age(facts):
if not facts:
return {"value": 0.0, "avg_age_days": 0, "stale_facts": 0, "total_facts": 0}
now = datetime.now(timezone.utc)
ages = []
stale_count = 0
for f in facts:
confirmed = f.get("last_confirmed") or f.get("first_seen")
if confirmed:
try:
dt = datetime.fromisoformat(confirmed.replace("Z", "+00:00"))
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
age = (now - dt).days
ages.append(age)
if age > 30:
stale_count += 1
except (ValueError, AttributeError):
pass
if not ages:
return {"value": 0.0, "avg_age_days": 0, "stale_facts": 0, "total_facts": len(facts)}
avg_age = sum(ages) / len(ages)
freshness = max(0.0, 1.0 - (avg_age / 90))
return {"value": round(freshness, 3), "avg_age_days": round(avg_age, 1), "stale_facts": stale_count, "total_facts": len(facts), "interpretation": "1.0 = all facts fresh. 0.0 = all facts 90+ days old"}
# --- Trend Computation ---
def compute_trend(current, previous, metric_key="value"):
if not previous:
return {"delta": "N/A", "direction": "unknown"}
curr_val = current.get(metric_key, 0)
prev_val = previous.get(metric_key, 0)
if prev_val == 0:
return {"delta": "N/A (no baseline)", "direction": "unknown"}
pct = ((curr_val - prev_val) / abs(prev_val)) * 100
direction = "up" if pct > 0 else "down" if pct < 0 else "flat"
if metric_key == "error_recurrence" or metric_key == "knowledge_age":
direction_label = "good" if pct < 0 else "bad" if pct > 0 else "neutral"
else:
direction_label = "good" if pct > 0 else "bad" if pct < 0 else "neutral"
return {"delta": f"{'+' if pct > 0 else ''}{pct:.1f}%", "direction": direction, "assessment": direction_label}
# --- Output Formatters ---
def format_json(metrics):
return json.dumps(metrics, indent=2)
def format_markdown(metrics):
lines = ["# Compounding Intelligence Metrics", f"**Generated:** {metrics.get('generated_at', 'unknown')}", ""]
trend = metrics.get("trend_7d", {})
def metric_block(name, data, desc, good_direction="up"):
val = data.get("value", 0)
t = trend.get(name, {})
delta = t.get("delta", "N/A")
assessment = t.get("assessment", "unknown")
arrow = "up" if assessment == "good" else "down" if assessment == "bad" else "---"
lines.extend([f"## {name}", desc, "", f"**Value:** {val} | **7d trend:** {delta} {arrow} ({assessment})", ""])
for k, v in data.items():
if k != "value" and k != "interpretation":
if isinstance(v, (int, float, str)):
lines.append(f"- {k}: {v}")
lines.append("")
metric_block("knowledge_velocity", metrics.get("knowledge_velocity", {}), "New facts extracted per day. Higher = compounding loop working.")
metric_block("knowledge_coverage", metrics.get("knowledge_coverage", {}), "Percentage of domains/repos with 10+ facts. Measures breadth.")
metric_block("hit_rate", metrics.get("hit_rate", {}), "Percentage of sessions referencing bootstrapped knowledge.")
metric_block("error_recurrence", metrics.get("error_recurrence", {}), "Ratio of recurring errors. Lower = fleet learning from mistakes.", good_direction="down")
metric_block("task_completion", metrics.get("task_completion", {}), "Percentage of sessions ending with successful completion.")
metric_block("first_try_success", metrics.get("first_try_success", {}), "Percentage of sessions completed without backtracking.")
metric_block("knowledge_age", metrics.get("knowledge_age", {}), "Freshness of knowledge store. 1.0 = all fresh, 0.0 = all stale.", good_direction="up")
return "\n".join(lines)
# --- Snapshot Persistence ---
def load_snapshot(metrics_dir):
snapshot_path = metrics_dir / "latest_snapshot.json"
if snapshot_path.exists():
with open(snapshot_path) as f:
return json.load(f)
return {}
def save_snapshot(metrics_dir, metrics):
metrics_dir.mkdir(parents=True, exist_ok=True)
snapshot_path = metrics_dir / "latest_snapshot.json"
with open(snapshot_path, "w") as f:
json.dump(metrics, f, indent=2)
# --- Main ---
def main():
parser = argparse.ArgumentParser(description="Compounding Intelligence Metrics")
parser.add_argument("--since", help="Start date (YYYY-MM-DD)")
parser.add_argument("--repo", help="Filter by repo/domain")
parser.add_argument("--format", choices=["json", "markdown"], default="json")
parser.add_argument("--knowledge-dir", type=Path, default=DEFAULT_KNOWLEDGE_DIR)
parser.add_argument("--db", type=Path, default=DEFAULT_DB_PATH)
parser.add_argument("--save-snapshot", action="store_true", help="Save current metrics as snapshot for trend tracking")
parser.add_argument("--metrics-dir", type=Path, default=Path(__file__).parent.parent / "metrics", help="Directory for snapshots and dashboard")
args = parser.parse_args()
facts = load_facts(args.knowledge_dir)
yaml_counts = count_yaml_facts(args.knowledge_dir)
if args.repo:
facts = [f for f in facts if f.get("domain") == args.repo]
conn = open_db(args.db)
sessions = query_sessions(conn, since=args.since)
messages = query_messages(conn) if conn else []
if conn:
conn.close()
velocity = compute_knowledge_velocity(facts, since=args.since)
coverage = compute_knowledge_coverage(facts, yaml_counts)
hit_rate = compute_hit_rate(sessions, messages, facts)
error_recurrence = compute_error_recurrence(messages)
task_completion = compute_task_completion(sessions)
first_try = compute_first_try_success(sessions)
age = compute_knowledge_age(facts)
previous = load_snapshot(args.metrics_dir)
trend = {
"knowledge_velocity": compute_trend(velocity, previous.get("knowledge_velocity", {})),
"knowledge_coverage": compute_trend(coverage, previous.get("knowledge_coverage", {})),
"hit_rate": compute_trend(hit_rate, previous.get("hit_rate", {})),
"error_recurrence": compute_trend(error_recurrence, previous.get("error_recurrence", {}), "value"),
"task_completion": compute_trend(task_completion, previous.get("task_completion", {})),
"first_try_success": compute_trend(first_try, previous.get("first_try_success", {})),
"knowledge_age": compute_trend(age, previous.get("knowledge_age", {})),
}
now = datetime.now(timezone.utc).isoformat()
metrics = {
"generated_at": now,
"knowledge_velocity": velocity,
"knowledge_coverage": coverage,
"hit_rate": hit_rate,
"error_recurrence": error_recurrence,
"task_completion": task_completion,
"first_try_success": first_try,
"knowledge_age": age,
"trend_7d": trend,
}
if args.since:
metrics["since"] = args.since
if args.save_snapshot:
save_snapshot(args.metrics_dir, metrics)
dashboard_path = args.metrics_dir / "dashboard.md"
with open(dashboard_path, "w") as f:
f.write(format_markdown(metrics))
if args.format == "json":
print(format_json(metrics))
else:
print(format_markdown(metrics))
if __name__ == "__main__":
main()

View File

@@ -1,38 +1,80 @@
#!/usr/bin/env python3
"""Validate knowledge files and index.json against the schema."""
import json, sys
"""
Validate knowledge files and index.json against the schema.
Usage:
python scripts/validate_knowledge.py
"""
import json
import sys
from pathlib import Path
VALID_CATEGORIES = {"fact", "pitfall", "pattern", "tool-quirk", "question"}
REQUIRED = {"id", "fact", "category", "domain", "confidence"}
REQUIRED_FACT_FIELDS = {"id", "fact", "category", "domain", "confidence"}
MAX_FACT_LENGTH = 280
def validate_fact(fact, src=""):
errs = []
for f in REQUIRED:
if f not in fact: errs.append(f"{src}: missing '{f}'")
def validate_fact(fact, source=""):
errors = []
for field in REQUIRED_FACT_FIELDS:
if field not in fact:
errors.append(f"{source}: missing required field '{field}'")
if "fact" in fact:
if not isinstance(fact["fact"], str) or len(fact["fact"].strip()) == 0:
errors.append(f"{source}: 'fact' must be non-empty string")
elif len(fact["fact"]) > MAX_FACT_LENGTH:
errors.append(f"{source}: 'fact' exceeds {MAX_FACT_LENGTH} chars")
if "category" in fact and fact["category"] not in VALID_CATEGORIES:
errs.append(f"{src}: invalid category '{fact['category']}'")
errors.append(f"{source}: invalid category '{fact['category']}'")
if "confidence" in fact:
if not isinstance(fact["confidence"], (int, float)) or not (0 <= fact["confidence"] <= 1):
errs.append(f"{src}: confidence must be 0.0-1.0")
if not isinstance(fact["confidence"], (int, float)):
errors.append(f"{source}: 'confidence' must be a number")
elif not (0.0 <= fact["confidence"] <= 1.0):
errors.append(f"{source}: 'confidence' must be 0.0-1.0")
if "id" in fact:
parts = fact["id"].split(":")
if len(parts) != 3: errs.append(f"{src}: id must be domain:category:sequence")
return errs
if len(parts) != 3:
errors.append(f"{source}: 'id' must be domain:category:sequence")
elif parts[1] not in VALID_CATEGORIES:
errors.append(f"{source}: id category '{parts[1]}' invalid")
return errors
def main():
idx = Path(__file__).parent.parent / "knowledge" / "index.json"
if not idx.exists(): print(f"FAILED: {idx} not found"); sys.exit(1)
data = json.load(open(idx))
errs = []
seen = set()
for i, f in enumerate(data.get("facts", [])):
errs.extend(validate_fact(f, f"[{i}]"))
if "id" in f:
if f["id"] in seen: errs.append(f"duplicate id '{f['id']}'")
seen.add(f["id"])
if errs:
print(f"FAILED - {len(errs)} errors:"); [print(f" x {e}") for e in errs]; sys.exit(1)
print(f"PASSED - {len(data.get('facts', []))} facts")
repo_root = Path(__file__).parent.parent
index_path = repo_root / "knowledge" / "index.json"
all_errors = []
if __name__ == "__main__": main()
if not index_path.exists():
print(f"VALIDATION FAILED: index.json not found at {index_path}")
sys.exit(1)
with open(index_path) as f:
data = json.load(f)
if "version" not in data:
all_errors.append("index.json: missing 'version'")
if "facts" not in data or not isinstance(data["facts"], list):
all_errors.append("index.json: missing or invalid 'facts'")
seen_ids = set()
for i, fact in enumerate(data.get("facts", [])):
all_errors.extend(validate_fact(fact, f"facts[{i}]"))
if "id" in fact:
if fact["id"] in seen_ids:
all_errors.append(f"index.json: duplicate id '{fact['id']}'")
seen_ids.add(fact["id"])
if all_errors:
print(f"VALIDATION FAILED - {len(all_errors)} error(s):\n")
for e in all_errors:
print(f" x {e}")
sys.exit(1)
else:
print(f"VALIDATION PASSED - {len(data.get('facts', []))} facts, schema v{data.get('version', '?')}")
sys.exit(0)
if __name__ == "__main__":
main()