# Knowledge File Format Specification **Version:** 1 **Issue:** #10 **Status:** Draft --- ## Overview The knowledge system has two layers: 1. **index.json** — Machine-readable fact index. Fast lookups by ID, category, repo, tags. 2. **Knowledge files** (YAML) — Human-readable, editable facts organized by domain. The harvester writes to both. The bootstrapper reads from index.json. Humans edit the YAML files directly. --- ## index.json Schema ```json { "version": 1, "last_updated": "ISO-8601 timestamp", "total_facts": 0, "facts": [] } ``` ### Fact Object | Field | Type | Required | Description | |-------|------|----------|-------------| | `id` | string | yes | Unique identifier: `{domain}:{category}:{sequence}` | | `fact` | string | yes | One-sentence description of the knowledge | | `category` | enum | yes | One of: `fact`, `pitfall`, `pattern`, `tool-quirk`, `question` | | `domain` | string | yes | Where this applies: repo name, `global`, or agent name | | `confidence` | float | yes | 0.0–1.0. How certain is this knowledge? | | `tags` | string[] | no | Searchable labels | | `source_count` | int | no | How many sessions confirmed this fact | | `first_seen` | date | no | ISO-8601 date first extracted | | `last_confirmed` | date | no | ISO-8601 date last seen in a session | | `expires` | date | no | Optional. After this date, fact is stale | | `related` | string[] | no | IDs of related facts | ### ID Format: `{domain}:{category}:{sequence}` ### Categories | Category | Definition | |----------|------------| | `fact` | Concrete, verifiable information | | `pitfall` | Errors, wrong assumptions, time-wasters | | `pattern` | Successful sequences of actions | | `tool-quirk` | Environment-specific behaviors | | `question` | Identified but unanswered | ### Confidence Scoring | Range | Meaning | |-------|---------| | 0.9–1.0 | Explicitly stated and verified | | 0.7–0.8 | Clearly implied by multiple data points | | 0.5–0.6 | Suggested but not fully verified | | 0.3–0.4 | Inferred from limited data | | 0.1–0.2 | Speculative or uncertain | --- ## Directory Structure ``` knowledge/ ├── index.json # Machine-readable fact index ├── SCHEMA.md # This file ├── global/ # Cross-repo knowledge │ ├── pitfalls.yaml │ ├── patterns.yaml │ └── tool-quirks.yaml ├── repos/ # Per-repo knowledge │ ├── {repo-name}.yaml │ └── ... └── agents/ # Agent-type knowledge └── {agent-type}.yaml ``` ## YAML File Format YAML files use frontmatter for metadata, then markdown sections with fact entries: ```yaml --- domain: global category: tool-quirk version: 1 last_updated: "2026-04-13" --- # Title ## Section - id: global:tool-quirk:001 fact: "Description" confidence: 0.95 tags: [tag1, tag2] source_count: 5 first_seen: "2026-03-27" ``` ## Validation Run `python scripts/validate_knowledge.py` to validate index.json.