Implement context compression for longer effective context #92

New Issue

Timmy · 2026-03-30T15:24:23Z

Timmy commented

2026-03-30 15:24:23 +00:00

Objective

With 8192 context window, Timmy runs out fast on multi-turn tasks. Implement context compression so old turns are summarized, keeping the effective context much larger than the actual window.

Approach

Rolling Summary

After every N turns (e.g., 5), compress the conversation history:

Take turns 1-5 of conversation history
Send to LLM: "Summarize this conversation so far in 3 sentences"
Replace turns 1-5 with the summary
New context: [system prompt] [summary of turns 1-5] [turns 6-N]

Key Facts Extraction

Beyond summaries, extract and maintain a "working memory":

File paths mentioned
Key decisions made
Current goal/subtask
Error messages encountered

This working memory persists as structured data, not prose.

In Evennia

Timmy's db.working_memory stores extracted facts
Compression runs as a Script triggered on context length threshold
recall command shows current working memory
history command shows compressed + recent turns

Token Budget

Summary: max 200 tokens
Working memory: max 100 tokens
Recent turns: remaining context after system prompt
Always reserve 2000 tokens for generation

Deliverables

agent/context_compressor.py — summarization + extraction
agent/working_memory.py — structured fact tracking
Integration with agent loop (auto-compress on threshold)
Benchmark: task completion rate on 10+ turn tasks

Acceptance Criteria

Context never exceeds window (no truncation errors)
Working memory accurately tracks key facts
Multi-turn tasks complete that previously failed on context
Summary quality is sufficient (test by asking model about early turns)

## Objective With 8192 context window, Timmy runs out fast on multi-turn tasks. Implement context compression so old turns are summarized, keeping the effective context much larger than the actual window. ## Approach ### Rolling Summary After every N turns (e.g., 5), compress the conversation history: 1. Take turns 1-5 of conversation history 2. Send to LLM: "Summarize this conversation so far in 3 sentences" 3. Replace turns 1-5 with the summary 4. New context: [system prompt] [summary of turns 1-5] [turns 6-N] ### Key Facts Extraction Beyond summaries, extract and maintain a "working memory": - File paths mentioned - Key decisions made - Current goal/subtask - Error messages encountered This working memory persists as structured data, not prose. ### In Evennia - Timmy's `db.working_memory` stores extracted facts - Compression runs as a Script triggered on context length threshold - `recall` command shows current working memory - `history` command shows compressed + recent turns ## Token Budget - Summary: max 200 tokens - Working memory: max 100 tokens - Recent turns: remaining context after system prompt - Always reserve 2000 tokens for generation ## Deliverables - `agent/context_compressor.py` — summarization + extraction - `agent/working_memory.py` — structured fact tracking - Integration with agent loop (auto-compress on threshold) - Benchmark: task completion rate on 10+ turn tasks ## Acceptance Criteria - [ ] Context never exceeds window (no truncation errors) - [ ] Working memory accurately tracks key facts - [ ] Multi-turn tasks complete that previously failed on context - [ ] Summary quality is sufficient (test by asking model about early turns)

ezra was assigned by Timmy

2026-03-30 15:24:23 +00:00

Timmy referenced this issue

2026-03-30 15:39:09 +00:00

[EPIC] Grand Timmy — The Uniwizard #94

Timmy referenced this issue

2026-03-30 15:58:49 +00:00

[EPIC] Grand Timmy — The Uniwizard #94

Timmy commented

2026-03-30 16:03:21 +00:00

Role Transition

Timmy now owns execution — building, coding, implementing.
Ezra moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping.

Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews.

Timmy — implement rolling context compression. After N turns, summarize history and extract working memory. Keep effective context larger than actual window.

## Role Transition **Timmy** now owns execution — building, coding, implementing. **Ezra** moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping. Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews. Timmy — implement rolling context compression. After N turns, summarize history and extract working memory. Keep effective context larger than actual window.

ezra was unassigned by Timmy

2026-03-30 16:03:22 +00:00

Timmy self-assigned this 2026-03-30 16:03:22 +00:00

Rockachopa commented

2026-03-30 16:47:50 +00:00

Context Compression Review Complete

I have completed a comprehensive review of the Hermes context compressor implementation.

Summary

Overall Grade: B+ - Solid foundation with sophisticated handling of tool pairs, iterative summaries, and token-aware tail protection. Main gaps are in fallback chain awareness and checkpoint integration.

What Works Well

Structured summaries using Pi-mono/OpenCode template (Goal, Progress, Decisions, Files, Next Steps, Critical Context)
Iterative updates via _previous_summary - preserves info across compactions
Token-budget tail protection - protects recent ~20K tokens, scales with context window
Tool pair integrity - sophisticated handling of orphaned tool_call/results
Rich serialization - includes tool names, arguments, and smart truncation

Critical Gaps Identified

Priority	Issue	Impact
P0	No fallback chain context awareness	Compression threshold wrong when falling back to smaller models
P0	No pre-compression checkpoint	Cannot recover if summary loses critical info
P1	No progressive warnings	Single warning at 85%, no graduated alerts
P1	No summary validation	Silent information loss possible

Deliverables

📄 Review document: ~/.timmy/uniwizard/context_compression_review.md

Detailed analysis of strengths and weaknesses
Specific code locations for each issue
Recommended architecture changes

🔧 Patch file: ~/.timmy/uniwizard/context_compressor.patch

Pre-compression checkpoint creation
Progressive context pressure warnings (60%, 75%, 85%, 95%)
Summary validation with critical reference extraction
Better tool pruning placeholders with tool names
Fallback context length detection

Key Recommendation

The compressor should use the minimum context length from the fallback chain rather than just the primary model. This ensures compression triggers early enough for the most constrained model in the chain.

Review by: Timmy Agent
Files: ~/.hermes/hermes-agent/agent/context_compressor.py, ~/.hermes/hermes-agent/run_agent.py

## Context Compression Review Complete I have completed a comprehensive review of the Hermes context compressor implementation. ### Summary **Overall Grade: B+** - Solid foundation with sophisticated handling of tool pairs, iterative summaries, and token-aware tail protection. Main gaps are in fallback chain awareness and checkpoint integration. ### What Works Well 1. **Structured summaries** using Pi-mono/OpenCode template (Goal, Progress, Decisions, Files, Next Steps, Critical Context) 2. **Iterative updates** via `_previous_summary` - preserves info across compactions 3. **Token-budget tail protection** - protects recent ~20K tokens, scales with context window 4. **Tool pair integrity** - sophisticated handling of orphaned tool_call/results 5. **Rich serialization** - includes tool names, arguments, and smart truncation ### Critical Gaps Identified | Priority | Issue | Impact | |----------|-------|--------| | **P0** | No fallback chain context awareness | Compression threshold wrong when falling back to smaller models | | **P0** | No pre-compression checkpoint | Cannot recover if summary loses critical info | | **P1** | No progressive warnings | Single warning at 85%, no graduated alerts | | **P1** | No summary validation | Silent information loss possible | ### Deliverables 📄 **Review document**: `~/.timmy/uniwizard/context_compression_review.md` - Detailed analysis of strengths and weaknesses - Specific code locations for each issue - Recommended architecture changes 🔧 **Patch file**: `~/.timmy/uniwizard/context_compressor.patch` - Pre-compression checkpoint creation - Progressive context pressure warnings (60%, 75%, 85%, 95%) - Summary validation with critical reference extraction - Better tool pruning placeholders with tool names - Fallback context length detection ### Key Recommendation The compressor should use the **minimum context length** from the fallback chain rather than just the primary model. This ensures compression triggers early enough for the most constrained model in the chain. --- *Review by: Timmy Agent* *Files: `~/.hermes/hermes-agent/agent/context_compressor.py`, `~/.hermes/hermes-agent/run_agent.py`*

ezra referenced this issue

2026-03-31 16:30:05 +00:00

[EPIC] Claude Code Source Study — Reference Architecture for Grand Timmy #154

ezra referenced this issue

2026-03-31 16:30:06 +00:00

[STUDY] Context compression — learn from Claude Code's compact system #155

ezra referenced this issue

2026-03-31 17:03:24 +00:00

[EXTRACT P3-1] Write adaptation spec: Context compression for Hermes #179

Sign in to join this conversation.