[EVAL] Live memory-provider snapshot matrix — training data pipeline #276

Open
Rockachopa wants to merge 1 commits from step35/199-feat-training-data-pipeline into main

1 Commits

Author SHA1 Message Date
Step35
86eb1c9a50 feat: training data pipeline — knowledge entries → JSONL training pairs
Some checks failed
Test / pytest (pull_request) Failing after 7s
Add scripts/knowledge_to_training_pairs.py which reads quality-gated
knowledge entries from knowledge/index.json and emits terse→rich
training pairs in JSONL format.

Features:
- Derives terse queries from facts via category-aware heuristics
- Configurable quality filters: min-confidence, model-filter, date range
- Output includes domain, source_confidence, source_model
- Smoke tests added in tests/test_knowledge_to_training_pairs.py

Deliverables for #199:
1. Pipeline script: scripts/knowledge_to_training_pairs.py
2. End-to-end: knowledge/index.json → training_pairs.jsonl (or custom JSONL)
3. Config: min-confidence, model-filter, after/before date filters
4. Test: 9 smoke tests covering conversion, filtering, and end-to-end run

Closes #199
2026-04-26 13:03:06 -04:00