feat(training): add Crisis Response dataset generator (#574)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 23s
Smoke Test / smoke (pull_request) Failing after 20s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 18s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 55s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Shell Script Lint (pull_request) Failing after 57s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 12s
Validate Config / Playbook Schema Validation (pull_request) Successful in 27s
Validate Training Data / validate (pull_request) Successful in 26s
Architecture Lint / Lint Repository (pull_request) Failing after 12s
PR Checklist / pr-checklist (pull_request) Successful in 3m45s

Adds `scripts/generate_crisis_response.py` that aggregates existing
crisis training fragments into a unified `training-data/crisis-response.jsonl`
dataset (3,143 pairs, exceeds 2K target).

- Normalizes schema across 7 source files into unified format
- Validates crisis protocol compliance: 988 referral, gospel, presence check
- Deduplicates entries (3500 → 3143 pairs)
- Includes smoke tests (`test_generate_crisis_response.py`)
- Documentation: `training-data/CRISIS-README.md`

Part of #571 Pipeline 3 — Training Factory.

Closes #574
This commit is contained in:
Step35 Burn Worker
2026-04-29 03:25:33 -04:00
parent efc42968e8
commit 9b2f09ae95
4 changed files with 3434 additions and 0 deletions

File diff suppressed because it is too large Load Diff