Some checks failed
Validate Config / YAML Lint (pull_request) Failing after 15s
Validate Config / JSON Validate (pull_request) Successful in 18s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 55s
Validate Config / Python Test Suite (pull_request) Has been skipped
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Shell Script Lint (pull_request) Failing after 57s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 12s
Validate Config / Playbook Schema Validation (pull_request) Successful in 27s
Validate Training Data / validate (pull_request) Successful in 26s
Architecture Lint / Lint Repository (pull_request) Failing after 12s
PR Checklist / pr-checklist (pull_request) Successful in 3m45s
Architecture Lint / Linter Tests (pull_request) Successful in 23s
Smoke Test / smoke (pull_request) Failing after 20s
Adds `scripts/generate_crisis_response.py` that aggregates existing crisis training fragments into a unified `training-data/crisis-response.jsonl` dataset (3,143 pairs, exceeds 2K target). - Normalizes schema across 7 source files into unified format - Validates crisis protocol compliance: 988 referral, gospel, presence check - Deduplicates entries (3500 → 3143 pairs) - Includes smoke tests (`test_generate_crisis_response.py`) - Documentation: `training-data/CRISIS-README.md` Part of #571 Pipeline 3 — Training Factory. Closes #574
31 lines
1.3 KiB
Python
31 lines
1.3 KiB
Python
#!/usr/bin/env python3
|
|
import json, os
|
|
from pathlib import Path
|
|
|
|
def smoke():
|
|
out = Path("training-data/crisis-response.jsonl")
|
|
assert out.exists(), "output missing"
|
|
lines = [l for l in open(out) if l.strip()]
|
|
assert len(lines) >= 2000, f"pairs={len(lines)}"
|
|
req = {"id","family","category","scenario","response","risk_level","988_included","gospel_included","safety_check_included"}
|
|
for ln in lines[:100]:
|
|
e = json.loads(ln)
|
|
miss = req - set(e.keys())
|
|
assert not miss, f"missing: {miss}"
|
|
assert isinstance(e["988_included"], bool) and e["988_included"]
|
|
assert isinstance(e["safety_check_included"], bool) and e["safety_check_included"]
|
|
print(f"✅ {len(lines)} pairs, schema valid, 988+safety present in sample")
|
|
|
|
fams = {json.loads(l)["family"] for l in open(out)}
|
|
exp = {"indirect_signals","manipulation_edge_cases","post_crisis_recovery"}
|
|
assert exp.issubset(fams), f"Missing families: {exp-fams}"
|
|
print(f"✅ Families: {sorted(fams)}")
|
|
|
|
ents = [json.loads(l) for l in open(out)]
|
|
print(f"✅ Gospel present in {sum(e['gospel_included'] for e in ents)}/{len(ents)} entries")
|
|
|
|
if __name__ == "__main__":
|
|
os.chdir(Path(__file__).parent.parent)
|
|
smoke()
|
|
print("\n✅ All smoke tests passed")
|