Research: R@5 vs End-to-End Accuracy Gap — WHY Does Retrieval Succeed but Answering Fail? #660
Open
opened 2026-04-14 19:07:33 +00:00 by Rockachopa
·
1 comment
No Branch/Tag Specified
main
feat/802-a2a-agent-card
feat/crisis-protocol-1776270957872
fix/809
burn/800-gemma4-multimodal
feat/806-a2a-mtls
feat/robust-tool-orchestration-1776268138150
fix/658
fix/798
burn/781-1776263880
fix/779
fix/781-json-repair
fix/659
fix/660
fix/662
fix/664
fix/665
feat/43-context-rag-decision-framework
fix/667
fix/672
fix/742
claude/issue-695
fix/666
fix/744-gateway-cron-notification-drop
fix/663
fix/670
fix/740
fix/741
fix/752
fix/753
fix/755
fix/754
fix/222
fix/746
fix/cron-delivery-retry-744
fix/743
fix/756
fix/747
fix/744
fix/705
fix/745
fix/668-api
fix/748
fix/749
fix/695
fix/725
fix/712
fix/669
fix/issue-734
fix/734
burn/714-1776218235
burn/713-1776218256
burn/714-1776218190
fix/693-crisis-notification-integration
fix/issue-642-1
fix/issue-702-7
fix/issue-701-8
fix/issue-692-3
fix/issue-706-6
fix/issue-694-1
fix/issue-645-6
fix/issue-707-5
fix/issue-708-4
fix/issue-643-8
fix/issue-711-3
fix/issue-714-1
fix/issue-713-2
fix/issue-644-7
fix/cron-schedule-parse-error
burn-681-1776207280
burn-677-1776207278
burn-679-1776207283
feat/670-approval-tiers
feat/679-crisis-wiring
feat/674-atlas-inference-engine
burn-677-1776207287
feat/atlas-provider
feat/671-hybrid-search
fix/677-crisis-hook
feat/673-988-crisis-escalation
feat/672-soul-crisis-protocol
burn-679-1776207276
fix/673-crisis-hook-integration
burn-681-1776207273
fix/677-crisis-hook-integration
fix/672-crisis-protocol
feat/681-path-aware-risk
feat/671-hybrid-search-router
feat/667-context-faithful-prompting
fix/670-approval-tiers
feat/672-crisis-protocol
feat/673-988-lifeline
burn/validate-action-pokayoke
fix/693-test-branch
perf/lazy-session-creation
fix/624-error-context
fix/626-validate-feedback
fix/614-multilingual-shield
claude/issue-628
claude/issue-613
dispatch/350-1776180746
dispatch/295-1776180746
dispatch/329-1776180746
dispatch/372-1776180746
dispatch/327-1776180746
dispatch/326-1776180746
dispatch/296-1776180746
dispatch/375-1776180746
dispatch/321-1776180746
dispatch/324-1776180746
claude/issue-592
fix/582-shield-tool-args
claude/issue-579
am/350-1776166469
am/326-1776166469
am/329-1776166469
am/295-1776166469
am/322-1776166469
am/327-1776166469
am/378-1776166469
am/372-1776166469
am/288-1776166469
am/296-1776166469
am/321-1776166469
am/324-1776166469
am/375-1776166469
claude/issue-565
dawn/295-1776130053
triage/295-1776129677
claude/issue-556
q/295-1776129480
dawn/326-1776130053
fix/538-context-pressure-threshold
fix/561-ssh-dispatch
dawn/322-1776130053
dawn/372-1776130053
triage/326-1776129677
dawn/350-1776130053
dawn/378-1776130053
dawn/329-1776130053
triage/322-1776129677
triage/372-1776129677
dawn/327-1776130053
feat/505-session-templates
triage/378-1776129677
triage/350-1776129677
q/372-1776129480
triage/329-1776129677
triage/327-1776129677
q/322-1776129480
q/378-1776129480
dawn/296-1776130053
fix/500-cloud-context-warning
queue/372-1776129201
dawn/324-1776130053
q/327-1776129480
q/329-1776129480
queue/378-1776129201
dawn/288-1776130053
q/350-1776129480
triage/296-1776129677
triage/324-1776129677
fix/499-hardcoded-paths
q/324-1776129480
q/296-1776129480
queue/322-1776129201
queue/327-1776129201
queue/324-1776129201
q/326-1776129480
queue/296-1776129201
queue/350-1776129201
fix/478-tilde-expand
fix/478-tilde-expansion
queue/329-1776129201
fix/478-hermes-home-tilde-expand
fix/468-cron-cloud-context
fix/479-optional-skills-hardcoded-paths
fix/479-hardcoded-paths
q/316-1776129677
q/288-1776129480
feat/334-profile-scoped-cron
whip/326-1776128804
dawn/375-1776130053
fix/375-deploy-crons-model-provider-comparison
whip/324-1776128804
burn/456-1776129600
triage/375-1776129677
fix/468-1776128804
q/375-1776129480
whip/372-1776128804
queue/375-1776129201
queue/288-1776129201
fix/457-ssh-dispatch-validation
whip/350-1776128804
whip/251-1776128804
fix/456-cloud-context-warning
queue/321-1776129201
whip/378-1776128804
whip/322-1776128804
whip/327-1776128804
whip/329-1776128804
whip/288-1776128804
whip/327-1776127281
whip/375-1776128804
whip/321-1776128804
dispatch/288-1776180746
triage/288-1776129677
whip/296-1776128804
whip/293-1776127532
whip/350-1776127532
whip/372-1776127532
whip/378-1776127532
whip/329-1776127532
whip/322-1776127532
whip/316-1776127532
whip/321-1776127532
whip/326-1776127532
whip/327-1776127532
whip/314-1776127532
whip/288-1776127532
whip/375-1776127532
burn/274-1776126523
burn/275-1776126523
burn/293-1776126523
burn/273-1776126523
burn/299-1776126523
burn/272-1776126523
burn/350-1776125702
burn/372-1776125702
burn/329-1776125702
burn/328-1776125702
burn/379-1776125702
burn/327-1776125702
burn/317-1776125702
burn/373-1776125702
burn/349-1776125702
burn/326-1776125702
burn/322-1776125702
queue/326-1776129201
burn/350-1776120221
burn/372-1776120221
burn/324-1776120221
burn/328-1776120221
burn/373-1776120221
burn/329-1776120221
burn/349-1776120221
burn/323-1776120221
burn/322-1776120221
burn/326-1776120221
burn/327-1776120221
burn/317-1776120221
burn/320-1776120221
burn/251-1776117799
burn/378-1776117791
burn/375-1776117778
burn/379-1776117790
burn/376-1776117777
burn/349-1776117786
burn/254-1776117794
burn/350-1776117787
burn/252-1776117800
burn/262-1776117798
burn/372-1776117789
burn/282-1776117784
burn/284-1776117781
burn/373-1776117779
burn/285-1776117782
burn/377-1776117775
burn/280-1776117796
burn/acp-272-1776117838
burn/255-1776117795
burn/286-1776117783
burn/web-console-325
burn/253-1776117793
burn/321-1776120221
burn/321-1776125702
burn/profile-cron-334
burn/prompt-injection-324
burn/skill-revert-295
burn/context-overflow-296
burn/honcho-eval-322
burn/privacy-filter-283
burn/model-benchmark-287
burn/20260413-1705-fix-token-tracking
feat/marathon-session-limits-326
fix/poka-yoke-hardcoded-paths
feat/315-session-gc
feature/time-aware-model-routing-317
fix/gateway-config-debt-328
feat/lazy-session-creation
burn/378-1776120221
fix/weak-credential-guard
fix/tool-return-type-validation
fix/memory-no-match-not-error
feat/temporal-decay-holographic-memory
fix/syntax-preflight-execute-code
fix/cron-script-failure-detection
fix/empty-model-preflight
fix/cron-sync-guard-v2
fix/cron-interpreter-shutdown-352
feat/error-circuit-breaker
fix/circuit-breaker-error-cascade
feat/cron-run-now
ci/fix-mempalace-syntax
claude/issue-351
fix/cron-tick-backlog
feat/deploy-sync-guard
feat/20260413-cron-agent-kwargs
feat/profile-scoped-cron
fix/cron-ticker-startup
fix/empirical-audit-hardening
feat/skills-index-workflow
fix/credential-guard
feat/research-paper-scaffolder
feat/cron-tool-choice-propagation
perplexity/provider-allowlist
fix/json-repair-for-tool-calls
feat/context-rag-decision-framework
census/feature-inventory
fix/ci-stability
burn/20260410-1649-277-memory-remove-bridge
keymaxx/mimoomni/243
burn/20260410-0744-matrix-wire
burn/20260410-0707-browser-integration
feature/improve-sovereignty-justification
burn/20260409-2111-memory-budget
burn/20260409-2105-memory-sovereignty
burn/20260409-2051-263-memory-architecture-guide
burn/20260409-1242-memory-docs
claude/issue-1135
feat/mempalace-portal-1775695506634
feat/ci-no-duplicate-models
feat/mempalace-tool-1775642243437
bezalel/ci-provider-duplicate-check
bezalel/self-awareness-epic-203
fix/kimi-fallback-model
bezalel/pr-215-rescue
perplexity/mempalace-tests
upstream-sync
bezalel/fix-gitea-ci-runner-host-mode
claude/issue-192
claude/issue-190
bezalel/fix-indentation-error
bezalel/gitea-workflow-skill
rescue/ollama-provider
rescue/v011-obfuscation-fix
claw-code/issue-151
claw-code/issue-126
groq/issue-168
timmy/issue-169-ollama-provider
gemini/issue-24
bezalel/syntax-guard-ci
claude/issue-128
claude/issue-142
claude/issue-133
claude/issue-143
claude/issue-146
claude/issue-155
claude/issue-147
claude/issue-148
bezalel/notebook-workflow-demo
claude/issue-149
bezalel/forge-health-check
epic-999-phase-ii-forge
allegro/m1-stop-protocol
timmy/issue-123-process-resilience
timmy/issue-116-config-validation
epic-999-phase-i
security/v-011-skills-guard-bypass
gemini/security-hardening
gemini/sovereign-gitea-client
timmy-custom
security/fix-oauth-session-fixation
security/fix-skills-path-traversal
security/fix-file-toctou
security/fix-error-disclosure
security/add-rate-limiting
security/fix-browser-cdp
security/fix-docker-privilege
security/fix-auth-bypass
fix/sqlite-contention
tests/security-coverage
security/fix-race-condition
security/fix-ssrf
security/fix-secret-leakage
feat/gen-ai-evolution-phases-19-21
feat/gen-ai-evolution-phases-16-18
feat/gen-ai-evolution-phases-13-15
security/fix-path-traversal
security/fix-command-injection
feat/gen-ai-evolution-phases-10-12
feat/gen-ai-evolution-phases-7-9
feat/gen-ai-evolution-phases-4-6
feat/gen-ai-evolution-phases-1-3
feat/sovereign-evolution-redistribution
feat/apparatus-verification
feat/sovereign-intersymbolic-ai
feat/sovereign-learning-system
feat/sovereign-reasoning-engine
v7.0.0
BezalelGolden-20260408-195253
GoldenRockachopa
Labels
Clear labels
CI
QA
assigned-claw-code
assigned-kimi
audit-finding
batch-pipeline
blocked
bug
claw-code-done
claw-code-in-progress
critical
duplicate
epic
epic:memory
gaming
infra
kimi-done
kimi-in-progress
mcp
morrowind
needs-review
ops
p0-critical
p1-important
phase:1-activate
phase:2-operational
phase:3-episodic
phase:4-hygiene
phase:5-integration
poka-yoke
research
security
shared-infra
stale
throughput-10x
token-masterplan
tool-evaluation
velocity-engine
wont-fix
Continuous integration, runners, workflow issues
Quality assurance, testing, production audit
Queued for Code Claw (qwen/openrouter)
Task assigned to KimiClaw for processing
batch-pipeline label
Blocked by external dependency or merge conflict
Code Claw completed this task
Code Claw is actively working
Duplicate of another issue
Epic - large feature with multiple sub-tasks
Unified Memory Architecture epic
Gaming agent capabilities
Infrastructure, VPS, DNS, deployment
KimiClaw has completed this task
KimiClaw is actively working on this
MCP (Model Context Protocol) tools & servers
Morrowind Agent gameplay & MCP integration
PR or issue requires reviewer sign-off before merge/close
Phase 1: Activate existing systems
Phase 2: Operational memory bridge
Phase 3: Proactive episodic recall
Phase 4: Memory hygiene and quality
Phase 5: Integration and testing
Security hardening, vulnerability fixes
Shared infrastructure across projects (Playground + LPM)
No activity, pending triage or closure
throughput-10x label
token-masterplan label
Auto-generated by velocity engine
Closed as intentionally not fixing — explicit descope
No Label
p0-critical
Milestone
No items
No Milestone
Projects
Clear projects
No project
Assignees
Rockachopa
Timmy
allegro
antigravity
bezalel
claude
claw-code
codex-agent
ezra
gemini
google
grok
hermes
kimi
manus
perplexity
sonnet
Clear assignees
No Assignees
Timmy
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: Timmy_Foundation/hermes-agent#660
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Critical Finding from SOTA
MemPalace achieved 98.4% R@5 (retrieved correct documents) but only 17% correct answers (LLM got it right). This 81-point gap is the most important discovery from our research.
Research Questions
WHY does retrieval succeed but answering fail?
What patterns bridge the gap?
What's the minimum viable retrieval for crisis support?
Methodology
Acceptance Criteria
Source
Research Complete: R@5 vs End-to-End Accuracy Gap
Report: research_r5_vs_e2e_gap.md (301 lines)
Status: COMPLETE
Key Findings
The gap is a FUNDAMENTAL UTILIZATION BOTTLENECK, not a retrieval problem.
Even with oracle retrieval (guaranteed correct documents), models below 7B fail to extract answers 85-100% of the time.
Three Failure Modes
Eight Patterns That Bridge the Gap
Realistic Target
Combined interventions can reduce the gap from 81 points to ~25-45 points (E2E accuracy 50-70%).
Recommendation for Hermes
Critical Insight
The crisis support domain involves complex queries (levels 3-4: interpretable/hidden rationale) which have the LOWEST utilization rates. This explains the extreme 81-point gap.