feat(#10): knowledge file format schema + example knowledge files #31

Closed
Rockachopa wants to merge 0 commits from fix/10-knowledge-format into main
Owner

Closes #10.

What this does

Designs and implements the knowledge file format for the compounding intelligence system.

Files added/modified

  • knowledge/SCHEMA.md — Full specification for index.json and YAML knowledge files
  • knowledge/index.json — Populated with 29 seed facts from real fleet data
  • knowledge/global/pitfalls.yaml — 8 cross-repo pitfalls (git, testing, providers)
  • knowledge/global/tool-quirks.yaml — 7 environment quirks (auth, paths, models)
  • knowledge/repos/hermes-agent.yaml — 8 per-repo pitfalls (cron deploy bugs, path issues, SSH dispatch)
  • knowledge/repos/the-nexus.yaml — 6 per-repo pitfalls (merge protection, server concurrency, deploy CORS)
  • scripts/validate_knowledge.py — Schema validator, all 29 facts pass

Design decisions

  • YAML for humans, JSON for machines. Agents read index.json. Humans edit YAML files in Gitea UI.
  • ID format: domain:category:sequence — Enables deduplication, linking, and graph traversal across facts.
  • 5 categories: fact, pitfall, pattern, tool-quirk, question.
  • Confidence 0.0–1.0 with defined ranges for calibration.
  • Related facts by ID for building knowledge graphs.
  • Tags for searchability across categories.
  • Source count + dates for confidence decay and expiry.

Validation

$ python scripts/validate_knowledge.py
PASSED - 29 facts

Acceptance criteria (from #10)

  • Directory structure created in repo
  • Schema documented (SCHEMA.md)
  • index.json created with real facts (29 total)
  • Example knowledge files for 2 existing repos (hermes-agent, the-nexus)
Closes #10. ## What this does Designs and implements the knowledge file format for the compounding intelligence system. ### Files added/modified - **knowledge/SCHEMA.md** — Full specification for index.json and YAML knowledge files - **knowledge/index.json** — Populated with 29 seed facts from real fleet data - **knowledge/global/pitfalls.yaml** — 8 cross-repo pitfalls (git, testing, providers) - **knowledge/global/tool-quirks.yaml** — 7 environment quirks (auth, paths, models) - **knowledge/repos/hermes-agent.yaml** — 8 per-repo pitfalls (cron deploy bugs, path issues, SSH dispatch) - **knowledge/repos/the-nexus.yaml** — 6 per-repo pitfalls (merge protection, server concurrency, deploy CORS) - **scripts/validate_knowledge.py** — Schema validator, all 29 facts pass ### Design decisions - **YAML for humans, JSON for machines.** Agents read index.json. Humans edit YAML files in Gitea UI. - **ID format: `domain:category:sequence`** — Enables deduplication, linking, and graph traversal across facts. - **5 categories:** fact, pitfall, pattern, tool-quirk, question. - **Confidence 0.0–1.0** with defined ranges for calibration. - **Related facts by ID** for building knowledge graphs. - **Tags** for searchability across categories. - **Source count + dates** for confidence decay and expiry. ### Validation ``` $ python scripts/validate_knowledge.py PASSED - 29 facts ``` ## Acceptance criteria (from #10) - [x] Directory structure created in repo - [x] Schema documented (SCHEMA.md) - [x] index.json created with real facts (29 total) - [x] Example knowledge files for 2 existing repos (hermes-agent, the-nexus)
Rockachopa added 1 commit 2026-04-14 18:26:54 +00:00
- SCHEMA.md: full specification for index.json and YAML knowledge files
- knowledge/global/pitfalls.yaml: 8 cross-repo pitfalls
- knowledge/global/tool-quirks.yaml: 7 environment quirk facts
- knowledge/repos/hermes-agent.yaml: 8 per-repo pitfalls (cron, paths, SSH)
- knowledge/repos/the-nexus.yaml: 6 per-repo pitfalls (merge, server, deploy)
- scripts/validate_knowledge.py: schema validator (29 facts, all passing)
- knowledge/index.json: populated with 29 seed facts from real fleet data

Design decisions:
- YAML for humans, index.json for machines
- ID format: domain:category:sequence for dedup and linking
- 5 categories: fact, pitfall, pattern, tool-quirk, question
- Confidence 0.0-1.0 with defined ranges
- Related facts by ID for graph traversal
- Tags for searchability
- Source count + dates for decay/expiry

Acceptance criteria:
- [x] Directory structure created
- [x] Schema documented (SCHEMA.md)
- [x] index.json with real facts (29 total)
- [x] Example knowledge files for 2 repos (hermes-agent, the-nexus)
- [x] Validation script passes
Timmy approved these changes 2026-04-14 19:14:18 +00:00
Dismissed
Timmy left a comment
Owner

Approved. Clean knowledge schema spec with example files. SCHEMA.md is well-structured with clear field definitions, ID format conventions, confidence scoring rubric, and YAML file format.

The example knowledge files (pitfalls.yaml, patterns.yaml, tool-quirks.yaml) demonstrate the schema well with realistic data.

Note: #30 also creates SCHEMA.md. These PRs will conflict. #31 appears to be the more complete version (includes example files). Coordinate merge order or close #30 if this supersedes it.

Approved. Clean knowledge schema spec with example files. SCHEMA.md is well-structured with clear field definitions, ID format conventions, confidence scoring rubric, and YAML file format. The example knowledge files (pitfalls.yaml, patterns.yaml, tool-quirks.yaml) demonstrate the schema well with realistic data. Note: #30 also creates SCHEMA.md. These PRs will conflict. #31 appears to be the more complete version (includes example files). Coordinate merge order or close #30 if this supersedes it.
Timmy approved these changes 2026-04-14 22:12:42 +00:00
Dismissed
Timmy left a comment
Owner

Review: Knowledge File Format Schema + Example Files

Overall: Solid foundational schema design. The dual-layer approach (index.json for machines, YAML for humans) is pragmatic and well-documented.

Strengths:

  • Clear ID format ({domain}:{category}:{sequence}) with good examples
  • Confidence scoring rubric is well-defined and actionable
  • The 5 categories (fact, pitfall, pattern, tool-quirk, question) cover the knowledge space well
  • Example YAML files (pitfalls.yaml etc.) demonstrate the format with real, useful content from actual fleet operations
  • Sync rules section clearly defines the harvester-to-human workflow
  • Validation rules are concrete and testable (max 280 chars, lowercase tags, no duplicate IDs)

Minor observations:

  • SCHEMA.md references scripts/validate_knowledge.py but this PR does not include that script — ensure it exists or is coming in a follow-up PR
  • The index.json starts empty (total_facts: 0, facts: []) which is correct for a schema PR — the harvester PR (#29) populates it
  • The 29 seed facts in the YAML files are high-quality operational knowledge from real fleet experience

Approved. Good foundation for the knowledge system.

## Review: Knowledge File Format Schema + Example Files **Overall**: Solid foundational schema design. The dual-layer approach (index.json for machines, YAML for humans) is pragmatic and well-documented. **Strengths:** - Clear ID format (`{domain}:{category}:{sequence}`) with good examples - Confidence scoring rubric is well-defined and actionable - The 5 categories (fact, pitfall, pattern, tool-quirk, question) cover the knowledge space well - Example YAML files (pitfalls.yaml etc.) demonstrate the format with real, useful content from actual fleet operations - Sync rules section clearly defines the harvester-to-human workflow - Validation rules are concrete and testable (max 280 chars, lowercase tags, no duplicate IDs) **Minor observations:** - SCHEMA.md references `scripts/validate_knowledge.py` but this PR does not include that script — ensure it exists or is coming in a follow-up PR - The index.json starts empty (`total_facts: 0, facts: []`) which is correct for a schema PR — the harvester PR (#29) populates it - The 29 seed facts in the YAML files are high-quality operational knowledge from real fleet experience Approved. Good foundation for the knowledge system.
Timmy approved these changes 2026-04-15 00:18:53 +00:00
Timmy left a comment
Owner

Solid schema design and well-curated seed data. The SCHEMA.md is clear and thorough with good validation rules. The YAML knowledge files contain genuinely useful operational knowledge with appropriate confidence scores. The validate_knowledge.py script covers the essential checks (required fields, valid categories, confidence range, ID format, duplicate detection).

A few notes:

  • The YAML files use a hybrid markdown+YAML format (YAML frontmatter + markdown headers + YAML-like list items). This is human-readable but not actually parseable as valid YAML — the list items under markdown headers would need a proper YAML parser that understands this mixed format. The validate script only checks index.json, not the YAML files. Consider documenting this limitation or adding YAML validation.
  • The total_facts field in index.json (29) matches the actual array length, which is good.
  • The related cross-references between facts are a nice touch for knowledge graph navigation.

Overall this is a well-thought-out foundation. Approved.

Solid schema design and well-curated seed data. The SCHEMA.md is clear and thorough with good validation rules. The YAML knowledge files contain genuinely useful operational knowledge with appropriate confidence scores. The validate_knowledge.py script covers the essential checks (required fields, valid categories, confidence range, ID format, duplicate detection). A few notes: - The YAML files use a hybrid markdown+YAML format (YAML frontmatter + markdown headers + YAML-like list items). This is human-readable but not actually parseable as valid YAML — the list items under markdown headers would need a proper YAML parser that understands this mixed format. The validate script only checks index.json, not the YAML files. Consider documenting this limitation or adding YAML validation. - The `total_facts` field in index.json (29) matches the actual array length, which is good. - The `related` cross-references between facts are a nice touch for knowledge graph navigation. Overall this is a well-thought-out foundation. Approved.
Author
Owner

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.
Rockachopa closed this pull request 2026-04-16 01:55:12 +00:00
Author
Owner

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.
Author
Owner

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.

Closing: unmergeable due to conflicts or branch protection. Reopen if needed.
Rockachopa reopened this pull request 2026-04-16 02:04:28 +00:00
Rockachopa closed this pull request 2026-04-16 02:14:36 +00:00
Author
Owner

Closed — subset of PR #27 (all 7 files: schema, configs, validator included in #27). Merging #27 instead.

Closed — subset of PR #27 (all 7 files: schema, configs, validator included in #27). Merging #27 instead.

Pull request closed

Sign in to join this conversation.