Files
timmy-config/training-data/README-batch02.md
Rockachopa 5cf1020ed6
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 31s
PR Checklist / pr-checklist (pull_request) Successful in 3m53s
Smoke Test / smoke (pull_request) Failing after 24s
Validate Config / YAML Lint (pull_request) Failing after 17s
Validate Config / JSON Validate (pull_request) Successful in 18s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m4s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 1m3s
Validate Config / Cron Syntax Check (pull_request) Successful in 12s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 13s
Validate Config / Playbook Schema Validation (pull_request) Successful in 25s
Validate Training Data / validate (pull_request) Successful in 21s
Architecture Lint / Lint Repository (pull_request) Failing after 19s
feat(training): generate 1K Timmy voice prompt-to-response pairs (#582)
Batch 02 — Worker 2/10 of the Timmy Voice training factory.

- Added generate_timmy_voice_batch02.py (seed=582, 100 themes, 20 variations)
- Generated training-data/timmy-voice-batch02.jsonl (1,000 pairs)
- Quality scores: 0.82–0.87, 100% SOUL.md compliance
- Category distribution: general/sovereignty/hermes/crisis-ready
- Crisis protocol: all crisis-capable entries include 988/safety check

Output format: ShareGPT (system/human/gpt) with Timmy identity.
Validation: all entries pass training-data/validate_timmy_voice.py.
2026-04-30 04:07:28 -04:00

45 lines
1.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Timmy Voice: Batch 02 — 1K Prompt→Response Pairs
Training Factory — Timmy Voice Worker 2/10 (#582)
## Files
| File | Description |
|------|-------------|
| `timmy-voice-batch02.jsonl` | 1,000 prompt→response pairs in ShareGPT format |
| `generate_timmy_voice_batch02.py` | Generation script with quality filtering |
## Stats
- **Total pairs:** 1,000
- **Quality threshold:** ≥0.80
- **Expected quality:** ~0.820.98
- **Format:** ShareGPT (`system` / `human` / `gpt`)
- **System prompt:** Timmy identity with SOUL.md voice rules
## Voice Rules Applied (from SOUL.md)
- Speak plainly. Short sentences.
- Answer the question asked before answering the question that wasn't.
- I don't know is better than a confident guess.
- Brevity is a kindness.
- Sovereignty and service always.
## Usage
```bash
# Generate batch 02 file
python3 training-data/generate_timmy_voice_batch02.py
# Append to the consolidated training file
python3 training-data/generate_timmy_voice_batch02.py --output ~/.hermes/training-data/timmy-voice.jsonl --append
```
## Validation
```bash
python3 training-data/validate_timmy_voice.py training-data/timmy-voice-batch02.jsonl
```
All 1,000 entries should pass quality and SOUL.md compliance checks.