Files
compounding-intelligence/knowledge/SCHEMA.md
Timmy b32d316023 feat(#10): knowledge file format schema + example knowledge files
- SCHEMA.md: full specification for index.json and YAML knowledge files
- knowledge/global/pitfalls.yaml: 8 cross-repo pitfalls
- knowledge/global/tool-quirks.yaml: 7 environment quirk facts
- knowledge/repos/hermes-agent.yaml: 8 per-repo pitfalls (cron, paths, SSH)
- knowledge/repos/the-nexus.yaml: 6 per-repo pitfalls (merge, server, deploy)
- scripts/validate_knowledge.py: schema validator (29 facts, all passing)
- knowledge/index.json: populated with 29 seed facts from real fleet data

Design decisions:
- YAML for humans, index.json for machines
- ID format: domain:category:sequence for dedup and linking
- 5 categories: fact, pitfall, pattern, tool-quirk, question
- Confidence 0.0-1.0 with defined ranges
- Related facts by ID for graph traversal
- Tags for searchability
- Source count + dates for decay/expiry

Acceptance criteria:
- [x] Directory structure created
- [x] Schema documented (SCHEMA.md)
- [x] index.json with real facts (29 total)
- [x] Example knowledge files for 2 repos (hermes-agent, the-nexus)
- [x] Validation script passes
2026-04-14 14:21:21 -04:00

3.0 KiB
Raw Blame History

Knowledge File Format Specification

Version: 1 Issue: #10 Status: Draft


Overview

The knowledge system has two layers:

  1. index.json — Machine-readable fact index. Fast lookups by ID, category, repo, tags.
  2. Knowledge files (YAML) — Human-readable, editable facts organized by domain.

The harvester writes to both. The bootstrapper reads from index.json. Humans edit the YAML files directly.


index.json Schema

{
  "version": 1,
  "last_updated": "ISO-8601 timestamp",
  "total_facts": 0,
  "facts": []
}

Fact Object

Field Type Required Description
id string yes Unique identifier: {domain}:{category}:{sequence}
fact string yes One-sentence description of the knowledge
category enum yes One of: fact, pitfall, pattern, tool-quirk, question
domain string yes Where this applies: repo name, global, or agent name
confidence float yes 0.01.0. How certain is this knowledge?
tags string[] no Searchable labels
source_count int no How many sessions confirmed this fact
first_seen date no ISO-8601 date first extracted
last_confirmed date no ISO-8601 date last seen in a session
expires date no Optional. After this date, fact is stale
related string[] no IDs of related facts

ID Format: {domain}:{category}:{sequence}

Categories

Category Definition
fact Concrete, verifiable information
pitfall Errors, wrong assumptions, time-wasters
pattern Successful sequences of actions
tool-quirk Environment-specific behaviors
question Identified but unanswered

Confidence Scoring

Range Meaning
0.91.0 Explicitly stated and verified
0.70.8 Clearly implied by multiple data points
0.50.6 Suggested but not fully verified
0.30.4 Inferred from limited data
0.10.2 Speculative or uncertain

Directory Structure

knowledge/
├── index.json                  # Machine-readable fact index
├── SCHEMA.md                   # This file
├── global/                     # Cross-repo knowledge
│   ├── pitfalls.yaml
│   ├── patterns.yaml
│   └── tool-quirks.yaml
├── repos/                      # Per-repo knowledge
│   ├── {repo-name}.yaml
│   └── ...
└── agents/                     # Agent-type knowledge
    └── {agent-type}.yaml

YAML File Format

YAML files use frontmatter for metadata, then markdown sections with fact entries:

---
domain: global
category: tool-quirk
version: 1
last_updated: "2026-04-13"
---

# Title

## Section

- id: global:tool-quirk:001
  fact: "Description"
  confidence: 0.95
  tags: [tag1, tag2]
  source_count: 5
  first_seen: "2026-03-27"

Validation

Run python scripts/validate_knowledge.py to validate index.json.