Alexander Whitestone
|
8e791afecc
|
feat: backfill provenance on all training data (#752)
Architecture Lint / Linter Tests (pull_request) Successful in 21s
Smoke Test / smoke (pull_request) Failing after 22s
Validate Config / YAML Lint (pull_request) Failing after 16s
Validate Config / JSON Validate (pull_request) Successful in 14s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 33s
Validate Config / Cron Syntax Check (pull_request) Successful in 12s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 12s
Validate Config / Shell Script Lint (pull_request) Failing after 54s
Validate Config / Playbook Schema Validation (pull_request) Successful in 17s
PR Checklist / pr-checklist (pull_request) Successful in 2m25s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
scripts/backfill_training_provenance.py:
Backfills provenance metadata on all JSONL training files
Adds source_session_id, model, timestamp, source_type
--dry-run mode, --json output, parse error handling
Result: 11,007 pairs across 45 files now have provenance
Coverage: 0% -> 100%
Validation: python3 scripts/provenance_validate.py --threshold 50
PASS: 3800/3800 pairs have provenance
Dashboard: python3 scripts/provenance_dashboard.py
Shows pair count by model, source, coverage
|
2026-04-18 15:59:17 -04:00 |
|