feat: Timmy Voice Batch 05 — 1K prompt-response pairs (#585)

Generate 1,000 prompt-response pairs in Timmy's voice for training. - training-data/timmy-voice-batch05.jsonl (1,000 pairs) - training-data/generate_timmy_voice_batch05.py (deterministic generator) - training-data/README-batch05.md (batch documentation) Quality: avg 0.83, threshold >= 0.80. 40% curated seeds, 60% synthetic.
2026-04-22 03:37:46 -04:00
parent b711b0e0b6
commit 2c637dd0b1
3 changed files with 1570 additions and 0 deletions
--- a/training-data/README-batch05.md
+++ b/training-data/README-batch05.md
@@ -0,0 +1,40 @@
+# Timmy Voice Batch 05
+
+**Issue:** [#585](https://forge.alexanderwhitestone.com/Timmy_Foundation/timmy-config/issues/585)
+**Worker:** 5/10
+**Pairs:** 1,000
+**Format:** ShareGPT JSONL
+**Quality Threshold:** ≥ 0.80
+**Avg Quality:** 0.83
+
+## Files
+
+- `training-data/timmy-voice-batch05.jsonl` — 1,000 prompt→response pairs
+- `training-data/generate_timmy_voice_batch05.py` — generation script
+
+## Generation Details
+
+- **Seed:** 585 (deterministic)
+- **Source:** 40% prompts from `training/data/curated_dataset.jsonl`, 60% synthetic base prompts
+- **Variations:** 20 prompt paraphrases per base prompt
+- **Categories:**
+  - Hermes/Timmy-specific: 464
+  - Sovereignty & ethics: 441
+  - Crisis-adjacent: 95
+
+## Voice Rules (SOUL.md)
+
+- Speak plainly. Short sentences.
+- Answer the question asked before the one not asked.
+- I don't know is better than a confident guess.
+- Brevity is a kindness.
+- Sovereignty and service always.
+
+## Validation
+
+All 1,000 entries pass:
+- Required fields check (id, model, batch, source, quality_score, conversations)
+- 3-turn conversation structure [system, human, gpt]
+- Quality score ≥ 0.80
+- Response length ≤ 100 words
+- Crisis protocol compliance (988 / "Are you safe" where applicable)