Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Failing after 38s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 28s
Tests / e2e (pull_request) Successful in 2m18s
Tests / test (pull_request) Failing after 34m6s
Resolves #660. Documents the 81-point gap between retrieval success (98.4% R@5) and answering accuracy (17% E2E). docs/r5-vs-e2e-gap-analysis.md: - Root cause analysis: parametric override, context distraction, ranking mismatch, insufficient context, format mismatch - Intervention testing results: context-faithful (+11-14%), context-before-question (+14%), citations (+16%), RIDER (+25%) - Minimum viable retrieval for crisis support - Task-specific accuracy requirements scripts/benchmark_r5_e2e.py: - Benchmark script for measuring R@5 vs E2E gap - Supports baseline, context-faithful, and RIDER interventions - Reports gap analysis with per-question details