feat(training): add Timmy voice batch 09 dataset (#589)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 23s
Smoke Test / smoke (pull_request) Failing after 22s
Validate Config / YAML Lint (pull_request) Failing after 16s
Validate Config / JSON Validate (pull_request) Successful in 18s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 55s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 1m2s
Validate Config / Cron Syntax Check (pull_request) Successful in 13s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 13s
Validate Config / Playbook Schema Validation (pull_request) Successful in 25s
Validate Training Data / validate (pull_request) Successful in 21s
Architecture Lint / Lint Repository (pull_request) Failing after 17s
PR Checklist / pr-checklist (pull_request) Failing after 9m48s
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 23s
Smoke Test / smoke (pull_request) Failing after 22s
Validate Config / YAML Lint (pull_request) Failing after 16s
Validate Config / JSON Validate (pull_request) Successful in 18s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 55s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 1m2s
Validate Config / Cron Syntax Check (pull_request) Successful in 13s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 13s
Validate Config / Playbook Schema Validation (pull_request) Successful in 25s
Validate Training Data / validate (pull_request) Successful in 21s
Architecture Lint / Lint Repository (pull_request) Failing after 17s
PR Checklist / pr-checklist (pull_request) Failing after 9m48s
Generate a deterministic batch09 Timmy voice corpus with 1,000 ShareGPT prompt-response pairs, 50 approved source sessions, a source manifest, and focused validation tests.
This commit is contained in:
45
training-data/README-batch09.md
Normal file
45
training-data/README-batch09.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# Timmy Voice: Batch 09 — 1K Prompt→Response Pairs
|
||||
|
||||
Training Factory — Timmy Voice Worker 9/10 (#589)
|
||||
|
||||
## Files
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `timmy-voice-batch09.jsonl` | 1,000 ShareGPT-format prompt→response pairs |
|
||||
| `timmy-voice-batch09.sources.json` | 50 source sessions with approved-model provenance |
|
||||
| `generate_timmy_voice_batch09.py` | Deterministic generator for the batch |
|
||||
|
||||
## Generation Contract
|
||||
|
||||
- 50 source sessions
|
||||
- 20 prompt variations per session
|
||||
- approved-model provenance filter
|
||||
- Knowledge Mine-style ranking using local session metadata + pair quality
|
||||
- ShareGPT format (`system` / `human` / `gpt`)
|
||||
|
||||
## Stats
|
||||
|
||||
- Total pairs: 1000
|
||||
- Source sessions: 50
|
||||
- Average quality score: 0.90
|
||||
- Minimum quality score: 0.84
|
||||
- Maximum quality score: 0.92
|
||||
|
||||
## Category Breakdown
|
||||
- technical: 1 source sessions
|
||||
- operations: 36 source sessions
|
||||
- sovereignty: 10 source sessions
|
||||
- pastoral: 0 source sessions
|
||||
- crisis: 3 source sessions
|
||||
- general: 0 source sessions
|
||||
|
||||
## Source Models
|
||||
- xiaomi/mimo-v2-pro: 47 sessions
|
||||
- qwen/qwen3.6-plus:free: 2 sessions
|
||||
- qwen3:30b: 1 sessions
|
||||
|
||||
|
||||
## Notes
|
||||
|
||||
This batch uses approved local session sources only. Banned providers (Claude/GPT/Gemini/OpenAI/Anthropic) are excluded at selection time. The generator keeps the source manifest on disk so the batch can be inspected and regenerated without guessing where the voice came from.
|
||||
Reference in New Issue
Block a user