Timmy_Foundation/compounding-intelligence

Files

Timmy b32d316023 feat(#10 ): knowledge file format schema + example knowledge files

- SCHEMA.md: full specification for index.json and YAML knowledge files
- knowledge/global/pitfalls.yaml: 8 cross-repo pitfalls
- knowledge/global/tool-quirks.yaml: 7 environment quirk facts
- knowledge/repos/hermes-agent.yaml: 8 per-repo pitfalls (cron, paths, SSH)
- knowledge/repos/the-nexus.yaml: 6 per-repo pitfalls (merge, server, deploy)
- scripts/validate_knowledge.py: schema validator (29 facts, all passing)
- knowledge/index.json: populated with 29 seed facts from real fleet data

Design decisions:
- YAML for humans, index.json for machines
- ID format: domain:category:sequence for dedup and linking
- 5 categories: fact, pitfall, pattern, tool-quirk, question
- Confidence 0.0-1.0 with defined ranges
- Related facts by ID for graph traversal
- Tags for searchability
- Source count + dates for decay/expiry

Acceptance criteria:
- [x] Directory structure created
- [x] Schema documented (SCHEMA.md)
- [x] index.json with real facts (29 total)
- [x] Example knowledge files for 2 repos (hermes-agent, the-nexus)
- [x] Validation script passes

2026-04-14 14:21:21 -04:00

3.0 KiB

Raw Blame History

Knowledge File Format Specification

Version: 1 Issue: #10 Status: Draft

Overview

The knowledge system has two layers:

index.json — Machine-readable fact index. Fast lookups by ID, category, repo, tags.
Knowledge files (YAML) — Human-readable, editable facts organized by domain.

The harvester writes to both. The bootstrapper reads from index.json. Humans edit the YAML files directly.

index.json Schema

{
  "version": 1,
  "last_updated": "ISO-8601 timestamp",
  "total_facts": 0,
  "facts": []
}

Fact Object

Field	Type	Required	Description
`id`	string	yes	Unique identifier: `{domain}:{category}:{sequence}`
`fact`	string	yes	One-sentence description of the knowledge
`category`	enum	yes	One of: `fact`, `pitfall`, `pattern`, `tool-quirk`, `question`
`domain`	string	yes	Where this applies: repo name, `global`, or agent name
`confidence`	float	yes	0.0–1.0. How certain is this knowledge?
`tags`	string[]	no	Searchable labels
`source_count`	int	no	How many sessions confirmed this fact
`first_seen`	date	no	ISO-8601 date first extracted
`last_confirmed`	date	no	ISO-8601 date last seen in a session
`expires`	date	no	Optional. After this date, fact is stale
`related`	string[]	no	IDs of related facts

ID Format: `{domain}:{category}:{sequence}`

Category	Definition
`fact`	Concrete, verifiable information
`pitfall`	Errors, wrong assumptions, time-wasters
`pattern`	Successful sequences of actions
`tool-quirk`	Environment-specific behaviors
`question`	Identified but unanswered

Confidence Scoring

Range	Meaning
0.9–1.0	Explicitly stated and verified
0.7–0.8	Clearly implied by multiple data points
0.5–0.6	Suggested but not fully verified
0.3–0.4	Inferred from limited data
0.1–0.2	Speculative or uncertain

Directory Structure

knowledge/
├── index.json                  # Machine-readable fact index
├── SCHEMA.md                   # This file
├── global/                     # Cross-repo knowledge
│   ├── pitfalls.yaml
│   ├── patterns.yaml
│   └── tool-quirks.yaml
├── repos/                      # Per-repo knowledge
│   ├── {repo-name}.yaml
│   └── ...
└── agents/                     # Agent-type knowledge
    └── {agent-type}.yaml

YAML File Format

YAML files use frontmatter for metadata, then markdown sections with fact entries:

---
domain: global
category: tool-quirk
version: 1
last_updated: "2026-04-13"
---

# Title

## Section

- id: global:tool-quirk:001
  fact: "Description"
  confidence: 0.95
  tags: [tag1, tag2]
  source_count: 5
  first_seen: "2026-03-27"

Validation

Run python scripts/validate_knowledge.py to validate index.json.

3.0 KiB Raw Blame History Unescape Escape