Commit Graph

64 Commits

Author SHA1 Message Date
160dfcf419 feat: add session_metadata.py — structured session metadata extractor (#6) 2026-04-14 19:06:16 +00:00
8d716ff03f Add comprehensive test script for harvest prompt validation 2026-04-14 19:02:41 +00:00
920510996e Add test session 5: Session with questions 2026-04-14 19:01:03 +00:00
1fafeaf5a4 Add test session 4: Session with patterns 2026-04-14 19:01:00 +00:00
36b440f998 Add test session 3: Partial session with tool quirks 2026-04-14 19:00:58 +00:00
9f3caabf42 Add test session 2: Failed session with pitfalls 2026-04-14 19:00:56 +00:00
a21f3a44e1 Add test session 1: Successful session 2026-04-14 18:58:05 +00:00
Timmy
b32d316023 feat(#10): knowledge file format schema + example knowledge files
- SCHEMA.md: full specification for index.json and YAML knowledge files
- knowledge/global/pitfalls.yaml: 8 cross-repo pitfalls
- knowledge/global/tool-quirks.yaml: 7 environment quirk facts
- knowledge/repos/hermes-agent.yaml: 8 per-repo pitfalls (cron, paths, SSH)
- knowledge/repos/the-nexus.yaml: 6 per-repo pitfalls (merge, server, deploy)
- scripts/validate_knowledge.py: schema validator (29 facts, all passing)
- knowledge/index.json: populated with 29 seed facts from real fleet data

Design decisions:
- YAML for humans, index.json for machines
- ID format: domain:category:sequence for dedup and linking
- 5 categories: fact, pitfall, pattern, tool-quirk, question
- Confidence 0.0-1.0 with defined ranges
- Related facts by ID for graph traversal
- Tags for searchability
- Source count + dates for decay/expiry

Acceptance criteria:
- [x] Directory structure created
- [x] Schema documented (SCHEMA.md)
- [x] index.json with real facts (29 total)
- [x] Example knowledge files for 2 repos (hermes-agent, the-nexus)
- [x] Validation script passes
2026-04-14 14:21:21 -04:00
Timmy
b65256bf76 feat: build bootstrapper.py - pre-session context assembler
Assembles relevant knowledge from the store into a compact 2k-token
context block for session injection.

Features:
- Filter by repo, agent type, and global scope
- Sort by confidence (pitfalls first, patterns, facts)
- Per-repo and per-agent markdown knowledge files
- Graceful empty-store handling
- JSON output mode for programmatic use
- Token-count-aware truncation at line boundaries

Closes #11
2026-04-14 14:05:30 -04:00
Alexander Whitestone
da073ad7cf feat: add harvester.py — session knowledge extractor (#8)
Main harvester module that chains:
  session_reader → extraction prompt → LLM → validate → deduplicate → store

Includes:
- scripts/harvester.py — main module (reader + prompt + storage pipeline)
- scripts/session_reader.py — JSONL transcript parser
- scripts/test_harvester_pipeline.py — smoke tests (all passing)

Pipeline:
  1. Read session JSONL via session_reader
  2. Truncate long sessions (first 50 + last 50 messages)
  3. Send transcript + extraction prompt to LLM (mimo-v2-pro)
  4. Parse structured JSON response (facts/pitfalls/patterns/quirks/questions)
  5. Validate fields + confidence threshold
  6. Deduplicate against knowledge/index.json (fingerprint + word overlap)
  7. Write to knowledge store (index.json + per-repo markdown)

CLI:
  Single:  python3 harvester.py --session <path> --output knowledge/
  Batch:   python3 harvester.py --batch --since 2026-04-01 --limit 100
  Dry-run: python3 harvester.py --session <path> --dry-run
2026-04-14 14:03:30 -04:00
102ef67a8e Add test script for knowledge extraction prompt 2026-04-14 17:22:17 +00:00
d9f51b30a9 Add knowledge extraction prompt template for issue #7 2026-04-14 17:21:25 +00:00
Alexander Whitestone
b5873e9e3d Initial structure: knowledge store, scripts, metrics, templates 2026-04-14 11:17:01 -04:00
8252ef5b80 Initial commit 2026-04-14 15:11:53 +00:00