- SCHEMA.md: full specification for index.json and YAML knowledge files - knowledge/global/pitfalls.yaml: 8 cross-repo pitfalls - knowledge/global/tool-quirks.yaml: 7 environment quirk facts - knowledge/repos/hermes-agent.yaml: 8 per-repo pitfalls (cron, paths, SSH) - knowledge/repos/the-nexus.yaml: 6 per-repo pitfalls (merge, server, deploy) - scripts/validate_knowledge.py: schema validator (29 facts, all passing) - knowledge/index.json: populated with 29 seed facts from real fleet data Design decisions: - YAML for humans, index.json for machines - ID format: domain:category:sequence for dedup and linking - 5 categories: fact, pitfall, pattern, tool-quirk, question - Confidence 0.0-1.0 with defined ranges - Related facts by ID for graph traversal - Tags for searchability - Source count + dates for decay/expiry Acceptance criteria: - [x] Directory structure created - [x] Schema documented (SCHEMA.md) - [x] index.json with real facts (29 total) - [x] Example knowledge files for 2 repos (hermes-agent, the-nexus) - [x] Validation script passes
3.0 KiB
3.0 KiB
Knowledge File Format Specification
Version: 1 Issue: #10 Status: Draft
Overview
The knowledge system has two layers:
- index.json — Machine-readable fact index. Fast lookups by ID, category, repo, tags.
- Knowledge files (YAML) — Human-readable, editable facts organized by domain.
The harvester writes to both. The bootstrapper reads from index.json. Humans edit the YAML files directly.
index.json Schema
{
"version": 1,
"last_updated": "ISO-8601 timestamp",
"total_facts": 0,
"facts": []
}
Fact Object
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | yes | Unique identifier: {domain}:{category}:{sequence} |
fact |
string | yes | One-sentence description of the knowledge |
category |
enum | yes | One of: fact, pitfall, pattern, tool-quirk, question |
domain |
string | yes | Where this applies: repo name, global, or agent name |
confidence |
float | yes | 0.0–1.0. How certain is this knowledge? |
tags |
string[] | no | Searchable labels |
source_count |
int | no | How many sessions confirmed this fact |
first_seen |
date | no | ISO-8601 date first extracted |
last_confirmed |
date | no | ISO-8601 date last seen in a session |
expires |
date | no | Optional. After this date, fact is stale |
related |
string[] | no | IDs of related facts |
ID Format: {domain}:{category}:{sequence}
Categories
| Category | Definition |
|---|---|
fact |
Concrete, verifiable information |
pitfall |
Errors, wrong assumptions, time-wasters |
pattern |
Successful sequences of actions |
tool-quirk |
Environment-specific behaviors |
question |
Identified but unanswered |
Confidence Scoring
| Range | Meaning |
|---|---|
| 0.9–1.0 | Explicitly stated and verified |
| 0.7–0.8 | Clearly implied by multiple data points |
| 0.5–0.6 | Suggested but not fully verified |
| 0.3–0.4 | Inferred from limited data |
| 0.1–0.2 | Speculative or uncertain |
Directory Structure
knowledge/
├── index.json # Machine-readable fact index
├── SCHEMA.md # This file
├── global/ # Cross-repo knowledge
│ ├── pitfalls.yaml
│ ├── patterns.yaml
│ └── tool-quirks.yaml
├── repos/ # Per-repo knowledge
│ ├── {repo-name}.yaml
│ └── ...
└── agents/ # Agent-type knowledge
└── {agent-type}.yaml
YAML File Format
YAML files use frontmatter for metadata, then markdown sections with fact entries:
---
domain: global
category: tool-quirk
version: 1
last_updated: "2026-04-13"
---
# Title
## Section
- id: global:tool-quirk:001
fact: "Description"
confidence: 0.95
tags: [tag1, tag2]
source_count: 5
first_seen: "2026-03-27"
Validation
Run python scripts/validate_knowledge.py to validate index.json.