Implements issue #14: 7 metrics that prove knowledge compounding. Metrics: - Knowledge velocity: new facts/day (from index.json) - Knowledge coverage: % domains with 10+ facts (from YAML files) - Hit rate: % sessions referencing bootstrap knowledge - Error recurrence: same errors across sessions (should decrease) - Task completion: % sessions with successful end_reason - First-try success: actions without backtracking (tool/msg ratio) - Knowledge age: staleness of facts (freshness score) Data sources: - knowledge/index.json + YAML files for fact metrics - ~/.hermes/state.db sessions + messages tables Features: - JSON and markdown output formats - --since, --repo, --format flags - 7-day trend tracking via snapshot persistence - Runs in 33ms on 11.9K sessions / 192K messages - Dashboard auto-generation with --save-snapshot Closes #14
5.1 KiB
5.1 KiB
Knowledge File Format Specification
Version: 1 Issue: #10 Status: Draft
Overview
The knowledge system has two layers:
- index.json — Machine-readable fact index. Fast lookups by ID, category, repo, tags.
- Knowledge files (YAML) — Human-readable, editable facts organized by domain.
The harvester writes to both. The bootstrapper reads from index.json. Humans edit the YAML files directly.
index.json Schema
{
"version": 1,
"last_updated": "ISO-8601 timestamp",
"total_facts": 0,
"facts": []
}
Fact Object
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | yes | Unique identifier: {domain}:{category}:{sequence} |
fact |
string | yes | One-sentence description of the knowledge |
category |
enum | yes | One of: fact, pitfall, pattern, tool-quirk, question |
domain |
string | yes | Where this applies: repo name, global, or agent name |
confidence |
float | yes | 0.0-1.0. How certain is this knowledge? |
tags |
string[] | no | Searchable labels: ["git", "auth", "gitea"] |
source_count |
int | no | How many sessions confirmed this fact |
first_seen |
date | no | ISO-8601 date first extracted |
last_confirmed |
date | no | ISO-8601 date last seen in a session |
expires |
date | no | Optional. After this date, fact is stale |
related |
string[] | no | IDs of related facts |
ID Format
{domain}:{category}:{sequence}
domain— repo name,global, or agent typecategory— one of the 5 categoriessequence— zero-padded 3-digit number:001,002, ...
Examples:
the-nexus:pitfall:001global:tool-quirk:012hermes-agent:pattern:003
Categories
| Category | Definition | Example |
|---|---|---|
fact |
Concrete, verifiable information | "Gitea API requires token auth at /api/v1" |
pitfall |
Errors, wrong assumptions, time-wasters | "Assumed env var GITEA_TOKEN; actual path is ~/.config/gitea/token" |
pattern |
Successful sequences of actions | "To deploy: test -> build -> push -> webhook" |
tool-quirk |
Environment-specific behaviors | "URL format requires trailing slash on macOS" |
question |
Identified but unanswered | "Need optimal batch size for harvesting" |
Confidence Scoring
| Range | Meaning |
|---|---|
| 0.9-1.0 | Explicitly stated and verified |
| 0.7-0.8 | Clearly implied by multiple data points |
| 0.5-0.6 | Suggested but not fully verified |
| 0.3-0.4 | Inferred from limited data |
| 0.1-0.2 | Speculative or uncertain |
Knowledge Files (YAML)
Human-readable files stored in knowledge/ subdirectories.
Directory Structure
knowledge/
├── index.json # Machine-readable fact index
├── SCHEMA.md # This file
├── global/ # Cross-repo knowledge
│ ├── pitfalls.yaml # Pitfalls that span multiple repos
│ ├── patterns.yaml # Proven workflows
│ └── tool-quirks.yaml # Environment behaviors
├── repos/ # Per-repo knowledge
│ ├── the-nexus.yaml
│ ├── hermes-agent.yaml
│ └── ...
└── agents/ # Agent-type knowledge
├── mimo-sprint.yaml
└── ...
YAML File Format
---
domain: global # or repo name or agent name
category: tool-quirk # fact, pitfall, pattern, tool-quirk, question
version: 1
last_updated: "2026-04-13"
---
# Tool Quirks (Global)
Cross-environment behaviors that bite you if you don't know them.
## Authentication
- id: global:tool-quirk:001
fact: "Gitea token stored at ~/.config/gitea/token, not env var"
confidence: 0.95
tags: [git, auth, gitea]
source_count: 23
first_seen: "2026-03-27"
last_confirmed: "2026-04-13"
related: [global:pitfall:003]
Rules
- One file per domain per category.
repos/the-nexus.yamlholds all the-nexus facts. - Markdown sections for humans. YAML items live under markdown headers for Gitea UI readability.
- ID is the link. The
idfield connects YAML facts to index.json entries. - Harvester writes, humans edit. Harvester appends. Humans correct confidence, add tags, mark expired.
Sync Rules
- Harvester -> YAML: Appends new facts to the appropriate YAML file.
- Harvester -> index.json: Adds/updates fact entries.
- Human edits YAML: Changes propagate to index.json on next harvester run.
- Confidence decay: Facts not confirmed in 30+ sessions get confidence *= 0.9.
- Expiration: Facts with
expiresdate past current date are markedstale.
Validation
Facts must pass these checks:
idmatches format{domain}:{category}:{sequence}categoryis one of the 5 allowed valuesconfidenceis between 0.0 and 1.0factis non-empty string, max 280 charactersdomainis non-empty stringtagsare lowercase alphanumeric + hyphens- No duplicate IDs in index.json
Validation script: scripts/validate_knowledge.py