training-data/README-batch08.md

# Timmy Voice Batch 08

**Issue:** [#588](https://forge.alexanderwhitestone.com/Timmy_Foundation/timmy-config/issues/588)
**Worker:** 8/10
**Pairs:** 1,000
**Format:** ShareGPT JSONL
**Quality Threshold:** ≥ 0.80
**Avg Quality:** 0.82

## Files

- `training-data/timmy-voice-batch08.jsonl` — 1,000 prompt→response pairs
- `training-data/generate_timmy_voice_batch08.py` — generation script

## Generation Details

- **Seed:** 588 (deterministic)
- **Source:** 40% prompts from `training/data/curated_dataset.jsonl`, 60% synthetic base prompts
- **Variations:** 20 prompt paraphrases per base prompt
- **Categories:**
  - Hermes/Timmy-specific: 440
  - Sovereignty & ethics: 491
  - Crisis-adjacent: 69

## Voice Rules (SOUL.md)

- Speak plainly. Short sentences.
- Answer the question asked before the one not asked.
- I don't know is better than a confident guess.
- Brevity is a kindness.
- Sovereignty and service always.

## Validation

All 1,000 entries pass:
- Required fields check (id, model, batch, source, quality_score, conversations)
- 3-turn conversation structure [system, human, gpt]
- Quality score ≥ 0.80
- Response length ≤ 100 words
- Crisis protocol compliance (988 / "Are you safe" where applicable)
feat(training): add Timmy Voice Batch 08 — 1K prompt→response pairs (#588) Generate 1,000 high-quality prompt→response pairs in Timmy's voice using the Training Factory pipeline (Worker 8/10). - training-data/generate_timmy_voice_batch08.py (new, deterministic seed=588) - training-data/timmy-voice-batch08.jsonl (1,000 entries, quality ≥0.80) - training-data/README-batch08.md (stats: avg quality 0.82, Hermes 440 / Sovereignty 491 / Crisis adj 69) All entries pass validation (required fields, ShareGPT format, crisis protocol). Closes #588. 2026-04-30 09:33:27 -04:00			`# Timmy Voice Batch 08`

			`Issue: [#588](https://forge.alexanderwhitestone.com/Timmy_Foundation/timmy-config/issues/588)`
			`Worker: 8/10`
			`Pairs: 1,000`
			`Format: ShareGPT JSONL`
			`Quality Threshold: ≥ 0.80`
			`Avg Quality: 0.82`

			`## Files`

			- `training-data/timmy-voice-batch08.jsonl` — 1,000 prompt→response pairs
			- `training-data/generate_timmy_voice_batch08.py` — generation script

			`## Generation Details`

			`- Seed: 588 (deterministic)`
			- Source: 40% prompts from `training/data/curated_dataset.jsonl`, 60% synthetic base prompts
			`- Variations: 20 prompt paraphrases per base prompt`
			`- Categories:`
			`- Hermes/Timmy-specific: 440`
			`- Sovereignty & ethics: 491`
			`- Crisis-adjacent: 69`

			`## Voice Rules (SOUL.md)`

			`- Speak plainly. Short sentences.`
			`- Answer the question asked before the one not asked.`
			`- I don't know is better than a confident guess.`
			`- Brevity is a kindness.`
			`- Sovereignty and service always.`

			`## Validation`

			`All 1,000 entries pass:`
			`- Required fields check (id, model, batch, source, quality_score, conversations)`
			`- 3-turn conversation structure [system, human, gpt]`
			`- Quality score ≥ 0.80`
			`- Response length ≤ 100 words`
			`- Crisis protocol compliance (988 / "Are you safe" where applicable)`