Files
compounding-intelligence/knowledge/SCHEMA.md
Timmy bf003cd944 feat: measurer.py — compounding intelligence metrics engine
Implements issue #14: 7 metrics that prove knowledge compounding.

Metrics:
- Knowledge velocity: new facts/day (from index.json)
- Knowledge coverage: % domains with 10+ facts (from YAML files)
- Hit rate: % sessions referencing bootstrap knowledge
- Error recurrence: same errors across sessions (should decrease)
- Task completion: % sessions with successful end_reason
- First-try success: actions without backtracking (tool/msg ratio)
- Knowledge age: staleness of facts (freshness score)

Data sources:
- knowledge/index.json + YAML files for fact metrics
- ~/.hermes/state.db sessions + messages tables

Features:
- JSON and markdown output formats
- --since, --repo, --format flags
- 7-day trend tracking via snapshot persistence
- Runs in 33ms on 11.9K sessions / 192K messages
- Dashboard auto-generation with --save-snapshot

Closes #14
2026-04-14 14:16:31 -04:00

5.1 KiB

Knowledge File Format Specification

Version: 1 Issue: #10 Status: Draft


Overview

The knowledge system has two layers:

  1. index.json — Machine-readable fact index. Fast lookups by ID, category, repo, tags.
  2. Knowledge files (YAML) — Human-readable, editable facts organized by domain.

The harvester writes to both. The bootstrapper reads from index.json. Humans edit the YAML files directly.


index.json Schema

{
  "version": 1,
  "last_updated": "ISO-8601 timestamp",
  "total_facts": 0,
  "facts": []
}

Fact Object

Field Type Required Description
id string yes Unique identifier: {domain}:{category}:{sequence}
fact string yes One-sentence description of the knowledge
category enum yes One of: fact, pitfall, pattern, tool-quirk, question
domain string yes Where this applies: repo name, global, or agent name
confidence float yes 0.0-1.0. How certain is this knowledge?
tags string[] no Searchable labels: ["git", "auth", "gitea"]
source_count int no How many sessions confirmed this fact
first_seen date no ISO-8601 date first extracted
last_confirmed date no ISO-8601 date last seen in a session
expires date no Optional. After this date, fact is stale
related string[] no IDs of related facts

ID Format

{domain}:{category}:{sequence}
  • domain — repo name, global, or agent type
  • category — one of the 5 categories
  • sequence — zero-padded 3-digit number: 001, 002, ...

Examples:

  • the-nexus:pitfall:001
  • global:tool-quirk:012
  • hermes-agent:pattern:003

Categories

Category Definition Example
fact Concrete, verifiable information "Gitea API requires token auth at /api/v1"
pitfall Errors, wrong assumptions, time-wasters "Assumed env var GITEA_TOKEN; actual path is ~/.config/gitea/token"
pattern Successful sequences of actions "To deploy: test -> build -> push -> webhook"
tool-quirk Environment-specific behaviors "URL format requires trailing slash on macOS"
question Identified but unanswered "Need optimal batch size for harvesting"

Confidence Scoring

Range Meaning
0.9-1.0 Explicitly stated and verified
0.7-0.8 Clearly implied by multiple data points
0.5-0.6 Suggested but not fully verified
0.3-0.4 Inferred from limited data
0.1-0.2 Speculative or uncertain

Knowledge Files (YAML)

Human-readable files stored in knowledge/ subdirectories.

Directory Structure

knowledge/
├── index.json                  # Machine-readable fact index
├── SCHEMA.md                   # This file
├── global/                     # Cross-repo knowledge
│   ├── pitfalls.yaml           # Pitfalls that span multiple repos
│   ├── patterns.yaml           # Proven workflows
│   └── tool-quirks.yaml        # Environment behaviors
├── repos/                      # Per-repo knowledge
│   ├── the-nexus.yaml
│   ├── hermes-agent.yaml
│   └── ...
└── agents/                     # Agent-type knowledge
    ├── mimo-sprint.yaml
    └── ...

YAML File Format

---
domain: global                    # or repo name or agent name
category: tool-quirk              # fact, pitfall, pattern, tool-quirk, question
version: 1
last_updated: "2026-04-13"
---

# Tool Quirks (Global)

Cross-environment behaviors that bite you if you don't know them.

## Authentication

- id: global:tool-quirk:001
  fact: "Gitea token stored at ~/.config/gitea/token, not env var"
  confidence: 0.95
  tags: [git, auth, gitea]
  source_count: 23
  first_seen: "2026-03-27"
  last_confirmed: "2026-04-13"
  related: [global:pitfall:003]

Rules

  1. One file per domain per category. repos/the-nexus.yaml holds all the-nexus facts.
  2. Markdown sections for humans. YAML items live under markdown headers for Gitea UI readability.
  3. ID is the link. The id field connects YAML facts to index.json entries.
  4. Harvester writes, humans edit. Harvester appends. Humans correct confidence, add tags, mark expired.

Sync Rules

  1. Harvester -> YAML: Appends new facts to the appropriate YAML file.
  2. Harvester -> index.json: Adds/updates fact entries.
  3. Human edits YAML: Changes propagate to index.json on next harvester run.
  4. Confidence decay: Facts not confirmed in 30+ sessions get confidence *= 0.9.
  5. Expiration: Facts with expires date past current date are marked stale.

Validation

Facts must pass these checks:

  1. id matches format {domain}:{category}:{sequence}
  2. category is one of the 5 allowed values
  3. confidence is between 0.0 and 1.0
  4. fact is non-empty string, max 280 characters
  5. domain is non-empty string
  6. tags are lowercase alphanumeric + hyphens
  7. No duplicate IDs in index.json

Validation script: scripts/validate_knowledge.py