115 lines
3.0 KiB
Markdown
115 lines
3.0 KiB
Markdown
|
|
# Knowledge File Format Specification
|
|||
|
|
|
|||
|
|
**Version:** 1
|
|||
|
|
**Issue:** #10
|
|||
|
|
**Status:** Draft
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Overview
|
|||
|
|
|
|||
|
|
The knowledge system has two layers:
|
|||
|
|
|
|||
|
|
1. **index.json** — Machine-readable fact index. Fast lookups by ID, category, repo, tags.
|
|||
|
|
2. **Knowledge files** (YAML) — Human-readable, editable facts organized by domain.
|
|||
|
|
|
|||
|
|
The harvester writes to both. The bootstrapper reads from index.json. Humans edit the YAML files directly.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## index.json Schema
|
|||
|
|
|
|||
|
|
```json
|
|||
|
|
{
|
|||
|
|
"version": 1,
|
|||
|
|
"last_updated": "ISO-8601 timestamp",
|
|||
|
|
"total_facts": 0,
|
|||
|
|
"facts": []
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Fact Object
|
|||
|
|
|
|||
|
|
| Field | Type | Required | Description |
|
|||
|
|
|-------|------|----------|-------------|
|
|||
|
|
| `id` | string | yes | Unique identifier: `{domain}:{category}:{sequence}` |
|
|||
|
|
| `fact` | string | yes | One-sentence description of the knowledge |
|
|||
|
|
| `category` | enum | yes | One of: `fact`, `pitfall`, `pattern`, `tool-quirk`, `question` |
|
|||
|
|
| `domain` | string | yes | Where this applies: repo name, `global`, or agent name |
|
|||
|
|
| `confidence` | float | yes | 0.0–1.0. How certain is this knowledge? |
|
|||
|
|
| `tags` | string[] | no | Searchable labels |
|
|||
|
|
| `source_count` | int | no | How many sessions confirmed this fact |
|
|||
|
|
| `first_seen` | date | no | ISO-8601 date first extracted |
|
|||
|
|
| `last_confirmed` | date | no | ISO-8601 date last seen in a session |
|
|||
|
|
| `expires` | date | no | Optional. After this date, fact is stale |
|
|||
|
|
| `related` | string[] | no | IDs of related facts |
|
|||
|
|
|
|||
|
|
### ID Format: `{domain}:{category}:{sequence}`
|
|||
|
|
|
|||
|
|
### Categories
|
|||
|
|
|
|||
|
|
| Category | Definition |
|
|||
|
|
|----------|------------|
|
|||
|
|
| `fact` | Concrete, verifiable information |
|
|||
|
|
| `pitfall` | Errors, wrong assumptions, time-wasters |
|
|||
|
|
| `pattern` | Successful sequences of actions |
|
|||
|
|
| `tool-quirk` | Environment-specific behaviors |
|
|||
|
|
| `question` | Identified but unanswered |
|
|||
|
|
|
|||
|
|
### Confidence Scoring
|
|||
|
|
|
|||
|
|
| Range | Meaning |
|
|||
|
|
|-------|---------|
|
|||
|
|
| 0.9–1.0 | Explicitly stated and verified |
|
|||
|
|
| 0.7–0.8 | Clearly implied by multiple data points |
|
|||
|
|
| 0.5–0.6 | Suggested but not fully verified |
|
|||
|
|
| 0.3–0.4 | Inferred from limited data |
|
|||
|
|
| 0.1–0.2 | Speculative or uncertain |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Directory Structure
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
knowledge/
|
|||
|
|
├── index.json # Machine-readable fact index
|
|||
|
|
├── SCHEMA.md # This file
|
|||
|
|
├── global/ # Cross-repo knowledge
|
|||
|
|
│ ├── pitfalls.yaml
|
|||
|
|
│ ├── patterns.yaml
|
|||
|
|
│ └── tool-quirks.yaml
|
|||
|
|
├── repos/ # Per-repo knowledge
|
|||
|
|
│ ├── {repo-name}.yaml
|
|||
|
|
│ └── ...
|
|||
|
|
└── agents/ # Agent-type knowledge
|
|||
|
|
└── {agent-type}.yaml
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## YAML File Format
|
|||
|
|
|
|||
|
|
YAML files use frontmatter for metadata, then markdown sections with fact entries:
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
---
|
|||
|
|
domain: global
|
|||
|
|
category: tool-quirk
|
|||
|
|
version: 1
|
|||
|
|
last_updated: "2026-04-13"
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Title
|
|||
|
|
|
|||
|
|
## Section
|
|||
|
|
|
|||
|
|
- id: global:tool-quirk:001
|
|||
|
|
fact: "Description"
|
|||
|
|
confidence: 0.95
|
|||
|
|
tags: [tag1, tag2]
|
|||
|
|
source_count: 5
|
|||
|
|
first_seen: "2026-03-27"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Validation
|
|||
|
|
|
|||
|
|
Run `python scripts/validate_knowledge.py` to validate index.json.
|