Step35
86eb1c9a50
Test / pytest (pull_request) Failing after 7s
feat: training data pipeline — knowledge entries → JSONL training pairs
Add scripts/knowledge_to_training_pairs.py which reads quality-gated
knowledge entries from knowledge/index.json and emits terse→rich
training pairs in JSONL format.
Features:
- Derives terse queries from facts via category-aware heuristics
- Configurable quality filters: min-confidence, model-filter, date range
- Output includes domain, source_confidence, source_model
- Smoke tests added in tests/test_knowledge_to_training_pairs.py
Deliverables for #199:
1. Pipeline script: scripts/knowledge_to_training_pairs.py
2. End-to-end: knowledge/index.json → training_pairs.jsonl (or custom JSONL)
3. Config: min-confidence, model-filter, after/before date filters
4. Test: 9 smoke tests covering conversion, filtering, and end-to-end run
Closes #199
2026-04-26 13:03:06 -04:00
..
2026-04-14 11:17:01 -04:00
2026-04-15 15:06:09 +00:00
2026-04-14 14:05:30 -04:00
2026-04-15 03:46:43 +00:00
2026-04-21 07:58:09 -04:00
2026-04-21 11:20:25 +00:00
2026-04-15 03:56:27 +00:00
2026-04-21 11:57:53 +00:00
2026-04-15 03:49:00 +00:00
2026-04-14 14:03:30 -04:00
2026-04-15 14:47:26 +00:00
2026-04-15 14:42:28 +00:00
2026-04-15 04:00:12 +00:00
2026-04-26 13:03:06 -04:00
2026-04-21 11:21:58 +00:00
2026-04-26 09:34:57 -04:00
2026-04-15 10:52:51 -04:00
2026-04-21 07:29:44 -04:00
2026-04-15 03:02:12 +00:00
2026-04-14 19:06:16 +00:00
2026-04-15 03:39:08 +00:00
2026-04-14 14:03:30 -04:00
2026-04-15 14:53:43 +00:00
2026-04-14 14:05:30 -04:00
2026-04-15 03:57:21 +00:00
2026-04-15 03:50:04 +00:00
2026-04-17 05:17:40 +00:00
2026-04-14 14:05:30 -04:00
2026-04-14 14:03:30 -04:00
2026-04-15 14:47:30 +00:00
2026-04-15 04:00:46 +00:00
2026-04-26 09:34:57 -04:00
2026-04-15 10:52:51 -04:00
2026-04-15 10:54:58 -04:00
2026-04-15 03:39:09 +00:00
2026-04-14 14:21:21 -04:00