feat: Timmy Voice Batch 05 — 1K prompt-response pairs (#585)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 31s
Smoke Test / smoke (pull_request) Failing after 28s
Validate Config / YAML Lint (pull_request) Failing after 19s
Validate Config / JSON Validate (pull_request) Successful in 23s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 58s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 26s
Validate Config / Cron Syntax Check (pull_request) Successful in 3s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 4s
Validate Config / Playbook Schema Validation (pull_request) Successful in 8s
Validate Training Data / validate (pull_request) Successful in 7s
Architecture Lint / Lint Repository (pull_request) Failing after 28s
PR Checklist / pr-checklist (pull_request) Failing after 9m28s
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 31s
Smoke Test / smoke (pull_request) Failing after 28s
Validate Config / YAML Lint (pull_request) Failing after 19s
Validate Config / JSON Validate (pull_request) Successful in 23s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 58s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Shell Script Lint (pull_request) Failing after 26s
Validate Config / Cron Syntax Check (pull_request) Successful in 3s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 4s
Validate Config / Playbook Schema Validation (pull_request) Successful in 8s
Validate Training Data / validate (pull_request) Successful in 7s
Architecture Lint / Lint Repository (pull_request) Failing after 28s
PR Checklist / pr-checklist (pull_request) Failing after 9m28s
Generate 1,000 prompt-response pairs in Timmy's voice for training. - training-data/timmy-voice-batch05.jsonl (1,000 pairs) - training-data/generate_timmy_voice_batch05.py (deterministic generator) - training-data/README-batch05.md (batch documentation) Quality: avg 0.83, threshold >= 0.80. 40% curated seeds, 60% synthetic.
This commit is contained in:
40
training-data/README-batch05.md
Normal file
40
training-data/README-batch05.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# Timmy Voice Batch 05
|
||||
|
||||
**Issue:** [#585](https://forge.alexanderwhitestone.com/Timmy_Foundation/timmy-config/issues/585)
|
||||
**Worker:** 5/10
|
||||
**Pairs:** 1,000
|
||||
**Format:** ShareGPT JSONL
|
||||
**Quality Threshold:** ≥ 0.80
|
||||
**Avg Quality:** 0.83
|
||||
|
||||
## Files
|
||||
|
||||
- `training-data/timmy-voice-batch05.jsonl` — 1,000 prompt→response pairs
|
||||
- `training-data/generate_timmy_voice_batch05.py` — generation script
|
||||
|
||||
## Generation Details
|
||||
|
||||
- **Seed:** 585 (deterministic)
|
||||
- **Source:** 40% prompts from `training/data/curated_dataset.jsonl`, 60% synthetic base prompts
|
||||
- **Variations:** 20 prompt paraphrases per base prompt
|
||||
- **Categories:**
|
||||
- Hermes/Timmy-specific: 464
|
||||
- Sovereignty & ethics: 441
|
||||
- Crisis-adjacent: 95
|
||||
|
||||
## Voice Rules (SOUL.md)
|
||||
|
||||
- Speak plainly. Short sentences.
|
||||
- Answer the question asked before the one not asked.
|
||||
- I don't know is better than a confident guess.
|
||||
- Brevity is a kindness.
|
||||
- Sovereignty and service always.
|
||||
|
||||
## Validation
|
||||
|
||||
All 1,000 entries pass:
|
||||
- Required fields check (id, model, batch, source, quality_score, conversations)
|
||||
- 3-turn conversation structure [system, human, gpt]
|
||||
- Quality score ≥ 0.80
|
||||
- Response length ≤ 100 words
|
||||
- Crisis protocol compliance (988 / "Are you safe" where applicable)
|
||||
Reference in New Issue
Block a user