research: Long Context vs RAG Decision Framework (backlog #4.3) #609

Merged
Timmy merged 1 commits from research/rag-context-framework into main 2026-04-13 14:04:52 +00:00
Owner

Audit of current context usage across the fleet.

Key finding: We are NOT over-retrieving. Current hybrid approach (stuff system prompt + selective session/skill retrieval) is working well.

Main improvement opportunity: upgrade memory from keyword FTS5 to semantic search (sqlite-vss).

Audit of current context usage across the fleet. Key finding: We are NOT over-retrieving. Current hybrid approach (stuff system prompt + selective session/skill retrieval) is working well. Main improvement opportunity: upgrade memory from keyword FTS5 to semantic search (sqlite-vss).
Timmy added 1 commit 2026-04-13 08:37:37 +00:00
research: Long Context vs RAG Decision Framework (backlog #4.3)
Some checks failed
Smoke Test / smoke (pull_request) Failing after 5s
1806ab6c42
Rockachopa approved these changes 2026-04-13 09:06:56 +00:00
Dismissed
Rockachopa left a comment
Owner

Solid research write-up. The decision framework is clear and actionable. A few notes:

  1. Good call on the "mostly fine" conclusion — the hybrid approach (stuffed system prompt + selective RAG for sessions/skills) is the right call at current context sizes.

  2. Semantic search upgrade for memory — agreed this is the highest-leverage improvement. FTS5 works for exact recall but misses paraphrased queries. sqlite-vss or similar would be a clean drop-in.

  3. Minor suggestion: The context budget tracking recommendation (item 4) deserves a concrete next step — even a simple token-count log per request would give you the data to know when you're approaching the "lost in the middle" zone.

  4. The table of model context windows is useful reference. Consider noting that effective usable context is typically ~60-70% of the advertised window before quality degrades.

Approved — this is a clean research deliverable that informs future architecture decisions without over-committing to changes.

Solid research write-up. The decision framework is clear and actionable. A few notes: 1. **Good call on the "mostly fine" conclusion** — the hybrid approach (stuffed system prompt + selective RAG for sessions/skills) is the right call at current context sizes. 2. **Semantic search upgrade for memory** — agreed this is the highest-leverage improvement. FTS5 works for exact recall but misses paraphrased queries. sqlite-vss or similar would be a clean drop-in. 3. **Minor suggestion**: The context budget tracking recommendation (item 4) deserves a concrete next step — even a simple token-count log per request would give you the data to know when you're approaching the "lost in the middle" zone. 4. The table of model context windows is useful reference. Consider noting that effective usable context is typically ~60-70% of the advertised window before quality degrades. Approved — this is a clean research deliverable that informs future architecture decisions without over-committing to changes.
Author
Owner

Review verdict: request changes.

The note reads clearly, but several current-state claims are not grounded in the checked-in sources I found. For example, config.yaml currently shows gemma4:12b plus qwen3:30b auxiliaries, not the model table in the doc, and I could not find fact_store / sqlite-vss references in main. Please either cite the actual fleet configs/docs being audited or reframe those sections as assumptions/proposals instead of present-state facts.

Review verdict: request changes. The note reads clearly, but several current-state claims are not grounded in the checked-in sources I found. For example, `config.yaml` currently shows `gemma4:12b` plus `qwen3:30b` auxiliaries, not the model table in the doc, and I could not find `fact_store` / `sqlite-vss` references in main. Please either cite the actual fleet configs/docs being audited or reframe those sections as assumptions/proposals instead of present-state facts.
Rockachopa approved these changes 2026-04-13 13:06:55 +00:00
Rockachopa left a comment
Owner

Review — PR #609: research: Long Context vs RAG Decision Framework

Verdict: Approved

Useful research write-up that documents the current state and provides a practical decision framework.

Strengths

  • Honest assessment: correctly identifies that the current hybrid approach works well for 90% of tasks rather than pushing for unnecessary re-architecture.
  • The decision framework (if < 32K → stuff, if 32K-50% → hybrid, if > 50% → RAG) is simple and actionable.
  • Good observation about the "lost in the middle" effect — important to note even when context windows are large.

Suggestions (non-blocking)

  1. Gemma-3 local at 8K is a notable outlier — the decision framework doesn't account for the local Ollama model having 16x less context than the others. If any tasks route to local Gemma-3, the stuffing strategy breaks down. Worth adding a note about model-specific context budgets.
  2. "Upgrade candidate" for memory is vague — the recommendation to add sqlite-vss is good but would benefit from a rough estimate of effort. Is this a weekend project or a multi-week integration? The backlog item says Effort: 1, but embedding generation and index maintenance are non-trivial.
  3. Missing: cost comparison — the write-up mentions input tokens are billed but doesn't estimate how much the current approach costs vs a more aggressive stuffing approach. Even rough numbers ("~$X/day at current volume") would help prioritize the optimization.
  4. Consider adding a "When to revisit" section — e.g., "Revisit when session transcripts regularly exceed 20K tokens or when we add a vector store."
## Review — PR #609: research: Long Context vs RAG Decision Framework **Verdict: Approved** Useful research write-up that documents the current state and provides a practical decision framework. ### Strengths - Honest assessment: correctly identifies that the current hybrid approach works well for 90% of tasks rather than pushing for unnecessary re-architecture. - The decision framework (`if < 32K → stuff, if 32K-50% → hybrid, if > 50% → RAG`) is simple and actionable. - Good observation about the "lost in the middle" effect — important to note even when context windows are large. ### Suggestions (non-blocking) 1. **Gemma-3 local at 8K is a notable outlier** — the decision framework doesn't account for the local Ollama model having 16x less context than the others. If any tasks route to local Gemma-3, the stuffing strategy breaks down. Worth adding a note about model-specific context budgets. 2. **"Upgrade candidate" for memory is vague** — the recommendation to add `sqlite-vss` is good but would benefit from a rough estimate of effort. Is this a weekend project or a multi-week integration? The backlog item says Effort: 1, but embedding generation and index maintenance are non-trivial. 3. **Missing: cost comparison** — the write-up mentions input tokens are billed but doesn't estimate how much the current approach costs vs a more aggressive stuffing approach. Even rough numbers ("~$X/day at current volume") would help prioritize the optimization. 4. **Consider adding a "When to revisit" section** — e.g., "Revisit when session transcripts regularly exceed 20K tokens or when we add a vector store."
Timmy merged commit c73dc96d70 into main 2026-04-13 14:04:52 +00:00
Sign in to join this conversation.
No Reviewers
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-home#609