Files
timmy-config/training-data/README-batch05.md

41 lines
1.2 KiB
Markdown
Raw Normal View History

# Timmy Voice Batch 05
**Issue:** [#585](https://forge.alexanderwhitestone.com/Timmy_Foundation/timmy-config/issues/585)
**Worker:** 5/10
**Pairs:** 1,000
**Format:** ShareGPT JSONL
**Quality Threshold:** ≥ 0.80
**Avg Quality:** 0.83
## Files
- `training-data/timmy-voice-batch05.jsonl` — 1,000 prompt→response pairs
- `training-data/generate_timmy_voice_batch05.py` — generation script
## Generation Details
- **Seed:** 585 (deterministic)
- **Source:** 40% prompts from `training/data/curated_dataset.jsonl`, 60% synthetic base prompts
- **Variations:** 20 prompt paraphrases per base prompt
- **Categories:**
- Hermes/Timmy-specific: 464
- Sovereignty & ethics: 441
- Crisis-adjacent: 95
## Voice Rules (SOUL.md)
- Speak plainly. Short sentences.
- Answer the question asked before the one not asked.
- I don't know is better than a confident guess.
- Brevity is a kindness.
- Sovereignty and service always.
## Validation
All 1,000 entries pass:
- Required fields check (id, model, batch, source, quality_score, conversations)
- 3-turn conversation structure [system, human, gpt]
- Quality score ≥ 0.80
- Response length ≤ 100 words
- Crisis protocol compliance (988 / "Are you safe" where applicable)