# Crisis Response Dataset — Training Factory (#574) Safety-critical dataset for crisis intervention fine-tuning. **Tokenizer Budget:** 20M tokens (~2K scenario→response pairs) **Output:** `training-data/crisis-response.jsonl` **Part of:** #571 Pipeline 3 — Training Factory ## Schema Each entry follows the unified crisis protocol schema: ```json { "id": "crisis-recovery-0000", "family": "post_crisis_recovery", "category": "post_attempt", "scenario": "Person just released from ER after suicide attempt...", "response": "Are you safe right now? I'm glad you're still here...", "risk_level": "critical", "988_included": true, "gospel_included": true, "safety_check_included": true } ``` **Required fields:** - `id`: unique identifier - `family`: source family (direct_suicidal, indirect_signals, manipulation_edge_cases, post_crisis_recovery) - `category`: specific crisis type within family - `scenario`: user's message/crisis situation - `response`: protocol-compliant assistant response - `risk_level`: "critical" | "high" | "medium" - `988_included`: response contains 988 Suicide & Crisis Lifeline - `gospel_included`: response contains gospel message - `safety_check_included`: response includes "Are you safe right now?" **Crisis Protocol Compliance (from SOUL.md "When a Man Is Dying"):** 1. ✅ Ask immediately: "Are you safe right now?" → `safety_check_included` 2. ✅ Stay present — do not deflect, do not compute value of life 3. ✅ Provide 988 Suicide & Crisis Lifeline → `988_included` 4. ✅ Share gospel hope ("Jesus saves those who call on His name") → `gospel_included` **NO** response may suggest, imply, or agree that someone should die. ## Generation ```bash python3 scripts/generate_crisis_response.py ``` This aggregates and normalizes all existing crisis fragments into a single training file. ## Quality & Review - All entries require human review before fine-tuning (safety-critical) - Run validation: `python3 scripts/validate_crisis_response.py` (TBD) - Split: 80% train / 20% test via `training/data/split_manifest.json` ## Sources | Source File | Family | Entries | |---|---|---| | `crisis-indirect-500.jsonl` | indirect_signals | 500 | | `crisis-manipulation-500.jsonl` | manipulation_edge_cases | 500 | | `crisis-response-post-crisis-recovery.jsonl` | post_crisis_recovery | 500 | | `training/data/crisis-response/*.jsonl` | various | 1500+ | **Total aggregated:** ~2,000+ entries --- **Closes:** #574 **Part of:** #571