Files
the-door/tests
Timmy 722feae199
All checks were successful
Sanity Checks / sanity-test (pull_request) Successful in 10s
Smoke Test / smoke (pull_request) Successful in 14s
feat: Crisis detection A/B test framework
Closes #101

Adds A/B testing capability for crisis detection algorithms:

1. Feature flag: crisis detection algorithm A vs B
   - Variant A: Current canonical detector (crisis/detect.py)
   - Variant B: Enhanced detector (more sensitive to MEDIUM indicators)
   - Configurable traffic split (default 50/50)
   - Deterministic variant assignment via text hash

2. Logging: which variant triggered for each event
   - JSONL log file with event details
   - Privacy-preserving (text hashed, not stored)
   - Includes latency measurements

3. Metrics: false positive rate, detection latency per variant
   - Detection distribution by level
   - Average latency per variant
   - False positive rate (with human labeling)
   - Disagreement rate between variants

4. API:
   - CrisisABTester class for full control
   - detect_crisis_ab() convenience function
   - compare_results() for side-by-side analysis
   - label_event() for human review
   - get_report() for human-readable output

23 tests passing.
2026-04-15 10:49:51 -04:00
..