Files
timmy-config/scripts/test_generate_crisis_response.py
Step35 Burn Worker 9b2f09ae95
Some checks failed
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 18s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 55s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Shell Script Lint (pull_request) Failing after 57s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 12s
Validate Config / Playbook Schema Validation (pull_request) Successful in 27s
Validate Training Data / validate (pull_request) Successful in 26s
Architecture Lint / Lint Repository (pull_request) Failing after 12s
PR Checklist / pr-checklist (pull_request) Successful in 3m45s
Architecture Lint / Linter Tests (pull_request) Successful in 23s
Smoke Test / smoke (pull_request) Failing after 20s
feat(training): add Crisis Response dataset generator (#574)
Adds `scripts/generate_crisis_response.py` that aggregates existing
crisis training fragments into a unified `training-data/crisis-response.jsonl`
dataset (3,143 pairs, exceeds 2K target).

- Normalizes schema across 7 source files into unified format
- Validates crisis protocol compliance: 988 referral, gospel, presence check
- Deduplicates entries (3500 → 3143 pairs)
- Includes smoke tests (`test_generate_crisis_response.py`)
- Documentation: `training-data/CRISIS-README.md`

Part of #571 Pipeline 3 — Training Factory.

Closes #574
2026-04-29 03:25:33 -04:00

31 lines
1.3 KiB
Python

#!/usr/bin/env python3
import json, os
from pathlib import Path
def smoke():
out = Path("training-data/crisis-response.jsonl")
assert out.exists(), "output missing"
lines = [l for l in open(out) if l.strip()]
assert len(lines) >= 2000, f"pairs={len(lines)}"
req = {"id","family","category","scenario","response","risk_level","988_included","gospel_included","safety_check_included"}
for ln in lines[:100]:
e = json.loads(ln)
miss = req - set(e.keys())
assert not miss, f"missing: {miss}"
assert isinstance(e["988_included"], bool) and e["988_included"]
assert isinstance(e["safety_check_included"], bool) and e["safety_check_included"]
print(f"{len(lines)} pairs, schema valid, 988+safety present in sample")
fams = {json.loads(l)["family"] for l in open(out)}
exp = {"indirect_signals","manipulation_edge_cases","post_crisis_recovery"}
assert exp.issubset(fams), f"Missing families: {exp-fams}"
print(f"✅ Families: {sorted(fams)}")
ents = [json.loads(l) for l in open(out)]
print(f"✅ Gospel present in {sum(e['gospel_included'] for e in ents)}/{len(ents)} entries")
if __name__ == "__main__":
os.chdir(Path(__file__).parent.parent)
smoke()
print("\n✅ All smoke tests passed")