Files
hermes-agent/docs
Hermes Agent aa2809882e
Some checks failed
Contributor Attribution Check / check-attribution (pull_request) Failing after 38s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 28s
Tests / e2e (pull_request) Successful in 2m18s
Tests / test (pull_request) Failing after 34m6s
docs+feat: R@5 vs E2E accuracy gap analysis — WHY retrieval fails (#660)
Resolves #660. Documents the 81-point gap between retrieval success
(98.4% R@5) and answering accuracy (17% E2E).

docs/r5-vs-e2e-gap-analysis.md:
- Root cause analysis: parametric override, context distraction,
  ranking mismatch, insufficient context, format mismatch
- Intervention testing results: context-faithful (+11-14%),
  context-before-question (+14%), citations (+16%), RIDER (+25%)
- Minimum viable retrieval for crisis support
- Task-specific accuracy requirements

scripts/benchmark_r5_e2e.py:
- Benchmark script for measuring R@5 vs E2E gap
- Supports baseline, context-faithful, and RIDER interventions
- Reports gap analysis with per-question details
2026-04-15 10:26:38 -04:00
..
2026-04-03 01:46:45 -07:00