feat(training): add Crisis Response dataset generator (#574)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 23s
Smoke Test / smoke (pull_request) Failing after 20s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 18s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 55s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Shell Script Lint (pull_request) Failing after 57s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 12s
Validate Config / Playbook Schema Validation (pull_request) Successful in 27s
Validate Training Data / validate (pull_request) Successful in 26s
Architecture Lint / Lint Repository (pull_request) Failing after 12s
PR Checklist / pr-checklist (pull_request) Successful in 3m45s
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 23s
Smoke Test / smoke (pull_request) Failing after 20s
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 18s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 55s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Shell Script Lint (pull_request) Failing after 57s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 12s
Validate Config / Playbook Schema Validation (pull_request) Successful in 27s
Validate Training Data / validate (pull_request) Successful in 26s
Architecture Lint / Lint Repository (pull_request) Failing after 12s
PR Checklist / pr-checklist (pull_request) Successful in 3m45s
Adds `scripts/generate_crisis_response.py` that aggregates existing crisis training fragments into a unified `training-data/crisis-response.jsonl` dataset (3,143 pairs, exceeds 2K target). - Normalizes schema across 7 source files into unified format - Validates crisis protocol compliance: 988 referral, gospel, presence check - Deduplicates entries (3500 → 3143 pairs) - Includes smoke tests (`test_generate_crisis_response.py`) - Documentation: `training-data/CRISIS-README.md` Part of #571 Pipeline 3 — Training Factory. Closes #574
This commit is contained in:
30
scripts/test_generate_crisis_response.py
Normal file
30
scripts/test_generate_crisis_response.py
Normal file
@@ -0,0 +1,30 @@
|
||||
#!/usr/bin/env python3
|
||||
import json, os
|
||||
from pathlib import Path
|
||||
|
||||
def smoke():
|
||||
out = Path("training-data/crisis-response.jsonl")
|
||||
assert out.exists(), "output missing"
|
||||
lines = [l for l in open(out) if l.strip()]
|
||||
assert len(lines) >= 2000, f"pairs={len(lines)}"
|
||||
req = {"id","family","category","scenario","response","risk_level","988_included","gospel_included","safety_check_included"}
|
||||
for ln in lines[:100]:
|
||||
e = json.loads(ln)
|
||||
miss = req - set(e.keys())
|
||||
assert not miss, f"missing: {miss}"
|
||||
assert isinstance(e["988_included"], bool) and e["988_included"]
|
||||
assert isinstance(e["safety_check_included"], bool) and e["safety_check_included"]
|
||||
print(f"✅ {len(lines)} pairs, schema valid, 988+safety present in sample")
|
||||
|
||||
fams = {json.loads(l)["family"] for l in open(out)}
|
||||
exp = {"indirect_signals","manipulation_edge_cases","post_crisis_recovery"}
|
||||
assert exp.issubset(fams), f"Missing families: {exp-fams}"
|
||||
print(f"✅ Families: {sorted(fams)}")
|
||||
|
||||
ents = [json.loads(l) for l in open(out)]
|
||||
print(f"✅ Gospel present in {sum(e['gospel_included'] for e in ents)}/{len(ents)} entries")
|
||||
|
||||
if __name__ == "__main__":
|
||||
os.chdir(Path(__file__).parent.parent)
|
||||
smoke()
|
||||
print("\n✅ All smoke tests passed")
|
||||
Reference in New Issue
Block a user