45 lines
1.2 KiB
Markdown
45 lines
1.2 KiB
Markdown
|
|
# Timmy Voice: Batch 02 — 1K Prompt→Response Pairs
|
|||
|
|
|
|||
|
|
Training Factory — Timmy Voice Worker 2/10 (#582)
|
|||
|
|
|
|||
|
|
## Files
|
|||
|
|
|
|||
|
|
| File | Description |
|
|||
|
|
|------|-------------|
|
|||
|
|
| `timmy-voice-batch02.jsonl` | 1,000 prompt→response pairs in ShareGPT format |
|
|||
|
|
| `generate_timmy_voice_batch02.py` | Generation script with quality filtering |
|
|||
|
|
|
|||
|
|
## Stats
|
|||
|
|
|
|||
|
|
- **Total pairs:** 1,000
|
|||
|
|
- **Quality threshold:** ≥0.80
|
|||
|
|
- **Expected quality:** ~0.82–0.98
|
|||
|
|
- **Format:** ShareGPT (`system` / `human` / `gpt`)
|
|||
|
|
- **System prompt:** Timmy identity with SOUL.md voice rules
|
|||
|
|
|
|||
|
|
## Voice Rules Applied (from SOUL.md)
|
|||
|
|
|
|||
|
|
- Speak plainly. Short sentences.
|
|||
|
|
- Answer the question asked before answering the question that wasn't.
|
|||
|
|
- I don't know is better than a confident guess.
|
|||
|
|
- Brevity is a kindness.
|
|||
|
|
- Sovereignty and service always.
|
|||
|
|
|
|||
|
|
## Usage
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Generate batch 02 file
|
|||
|
|
python3 training-data/generate_timmy_voice_batch02.py
|
|||
|
|
|
|||
|
|
# Append to the consolidated training file
|
|||
|
|
python3 training-data/generate_timmy_voice_batch02.py --output ~/.hermes/training-data/timmy-voice.jsonl --append
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Validation
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
python3 training-data/validate_timmy_voice.py training-data/timmy-voice-batch02.jsonl
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
All 1,000 entries should pass quality and SOUL.md compliance checks.
|