Implement Reader-Guided Reranking — Bridge R@5 vs E2E Gap (+10-20 Accuracy) #666

Closed
opened 2026-04-14 19:34:56 +00:00 by Rockachopa · 0 comments
Owner

Research Source

Issue #660: R@5 vs End-to-End Accuracy Gap
Finding: Reader-guided reranking achieves +10-20 top-1 accuracy gains

Problem

Retrieval succeeds (98.4% R@5) but answering fails (17% E2E). The LLM can't use what it retrieves. Reader-guided reranking fixes this by using the LLM's own predictions to rerank passages.

Solution

Implement RIDER (Reader-Guided Passage Reranking):

  1. Retrieve top-K passages (existing vector/FTS5 search)
  2. Have LLM predict answer from each passage individually
  3. Rerank passages by LLM confidence in its predictions
  4. Return top-N reranked passages for final answer generation

Why It Works

  • Aligns retrieval with what the LLM can actually use
  • No training required — uses reader's predictions as signal
  • Achieves 48.3 EM on Natural Questions with only 1,024 tokens

Acceptance Criteria

  • RIDER class with rerank(passages, query) method
  • LLM prediction from each passage (batch inference)
  • Confidence-based reranking
  • Integration with session_search
  • Benchmark: measure E2E accuracy before/after
  • Target: +10-20 top-1 accuracy improvement
  • Tests: test_reader_guided_reranking.py
  • Documentation: docs/reader-guided-reranking.md

Implementation Steps

  1. Create agent/rider.py with RIDER class
  2. Implement batch LLM prediction from passages
  3. Implement confidence scoring and reranking
  4. Wire into session_search pipeline
  5. Benchmark on LongMemEval subset
  6. Measure E2E accuracy improvement
  7. Write tests
  8. Document

Success Metric

+10-20 top-1 accuracy improvement. E2E accuracy increases measurably.

Effort: 3 days

## Research Source Issue #660: R@5 vs End-to-End Accuracy Gap Finding: Reader-guided reranking achieves +10-20 top-1 accuracy gains ## Problem Retrieval succeeds (98.4% R@5) but answering fails (17% E2E). The LLM can't use what it retrieves. Reader-guided reranking fixes this by using the LLM's own predictions to rerank passages. ## Solution Implement RIDER (Reader-Guided Passage Reranking): 1. Retrieve top-K passages (existing vector/FTS5 search) 2. Have LLM predict answer from each passage individually 3. Rerank passages by LLM confidence in its predictions 4. Return top-N reranked passages for final answer generation ## Why It Works - Aligns retrieval with what the LLM can actually use - No training required — uses reader's predictions as signal - Achieves 48.3 EM on Natural Questions with only 1,024 tokens ## Acceptance Criteria - [ ] RIDER class with rerank(passages, query) method - [ ] LLM prediction from each passage (batch inference) - [ ] Confidence-based reranking - [ ] Integration with session_search - [ ] Benchmark: measure E2E accuracy before/after - [ ] Target: +10-20 top-1 accuracy improvement - [ ] Tests: test_reader_guided_reranking.py - [ ] Documentation: docs/reader-guided-reranking.md ## Implementation Steps 1. Create agent/rider.py with RIDER class 2. Implement batch LLM prediction from passages 3. Implement confidence scoring and reranking 4. Wire into session_search pipeline 5. Benchmark on LongMemEval subset 6. Measure E2E accuracy improvement 7. Write tests 8. Document ## Success Metric +10-20 top-1 accuracy improvement. E2E accuracy increases measurably. ## Effort: 3 days
Rockachopa added the p0-critical label 2026-04-14 19:34:56 +00:00
Timmy was assigned by Rockachopa 2026-04-14 19:34:56 +00:00
Timmy closed this issue 2026-04-15 11:58:04 +00:00
Sign in to join this conversation.
No Label p0-critical
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/hermes-agent#666