Closes#101
Adds A/B testing capability for crisis detection algorithms:
1. Feature flag: crisis detection algorithm A vs B
- Variant A: Current canonical detector (crisis/detect.py)
- Variant B: Enhanced detector (more sensitive to MEDIUM indicators)
- Configurable traffic split (default 50/50)
- Deterministic variant assignment via text hash
2. Logging: which variant triggered for each event
- JSONL log file with event details
- Privacy-preserving (text hashed, not stored)
- Includes latency measurements
3. Metrics: false positive rate, detection latency per variant
- Detection distribution by level
- Average latency per variant
- False positive rate (with human labeling)
- Disagreement rate between variants
4. API:
- CrisisABTester class for full control
- detect_crisis_ab() convenience function
- compare_results() for side-by-side analysis
- label_event() for human review
- get_report() for human-readable output
23 tests passing.