# Context Compression Review ## Gitea Issue: timmy-home #92 **Date:** 2026-03-30 **Reviewer:** Timmy (Agent) **Scope:** `~/.hermes/hermes-agent/agent/context_compressor.py` --- ## Executive Summary The Hermes context compressor is a **mature, well-architected implementation** with sophisticated handling of tool call pairs, iterative summary updates, and token-aware tail protection. However, there are several **high-impact gaps** related to fallback chain awareness, early warning systems, and checkpoint integration that should be addressed for production reliability. **Overall Grade:** B+ (Solid foundation, needs edge-case hardening) --- ## What the Current Implementation Does Well ### 1. Structured Summary Template (Lines 276-303) The compressor uses a Pi-mono/OpenCode-inspired structured format: - **Goal**: What the user is trying to accomplish - **Constraints & Preferences**: User preferences, coding style - **Progress**: Done / In Progress / Blocked sections - **Key Decisions**: Important technical decisions with rationale - **Relevant Files**: Files read/modified/created with notes - **Next Steps**: What needs to happen next - **Critical Context**: Values, error messages, config details that would be lost This is **best-in-class** compared to most context compression implementations. ### 2. Iterative Summary Updates (Lines 264-304) The `_previous_summary` mechanism preserves information across multiple compactions: - On first compaction: Summarizes from scratch - On subsequent compactions: Updates previous summary with new progress - Moves items from "In Progress" to "Done" when completed - Accumulates constraints and file references across compactions ### 3. Token-Budget Tail Protection (Lines 490-539) Instead of fixed message counts, protects the most recent N tokens: ```python tail_token_budget = threshold_tokens * summary_target_ratio # Default: 50% of 128K context = 64K threshold → ~13K token tail ``` This scales automatically with model context window. ### 4. Tool Call/Result Pair Integrity (Lines 392-450) Sophisticated handling of orphaned tool pairs: - `_sanitize_tool_pairs()`: Removes orphaned results, adds stubs for missing results - `_align_boundary_forward/backward()`: Prevents splitting tool groups - Protects the integrity of the message sequence for API compliance ### 5. Tool Output Pruning Pre-Pass (Lines 152-182) Cheap first pass that replaces old tool results with placeholders: ```python _PRUNED_TOOL_PLACEHOLDER = "[Old tool output cleared to save context space]" ``` Only prunes content >200 chars, preserving smaller results. ### 6. Rich Serialization for Summary Input (Lines 199-248) Includes tool call arguments and truncates intelligently: - Tool results: Up to 3000 chars (with smart truncation keeping head/tail) - Tool calls: Function name AND arguments (truncated to 400 chars if needed) - All roles: 3000 char limit with ellipses ### 7. Proper Integration with Agent Loop - Initialized in `AIAgent.__init__()` (lines 1191-1203) - Triggered in `_compress_context()` (line 5259) - Resets state in `reset_session_state()` (lines 1263-1271) - Updates token counts via `update_from_response()` (lines 122-126) --- ## What's Missing or Broken ### 🔴 CRITICAL: No Fallback Chain Context Window Awareness **Issue:** When the agent falls back to a model with a smaller context window (e.g., primary Claude 1M tokens → fallback GPT-4 128K tokens), the compressor's threshold is based on the **original model**, not the fallback model. **Location:** `run_agent.py` compression initialization (lines 1191-1203) **Impact:** - Fallback model may hit context limits before compression triggers - Or compression may trigger too aggressively for smaller models **Evidence:** ```python # In AIAgent.__init__(): self.context_compressor = ContextCompressor( model=self.model, # Original model only # ... no fallback context lengths passed ) ``` **Fix Needed:** Pass fallback chain context lengths and use minimum: ```python # Suggested approach: context_lengths = [get_model_context_length(m) for m in [primary] + fallbacks] effective_context = min(context_lengths) # Conservative ``` --- ### 🔴 HIGH: No Pre-Compression Checkpoint **Issue:** When compression occurs, the pre-compression state is lost. Users cannot "rewind" to before compression if the summary loses critical information. **Location:** `run_agent.py` `_compress_context()` (line 5259) **Impact:** - Information loss is irreversible - If summary misses critical context, conversation is corrupted - No audit trail of what was removed **Fix Needed:** Create checkpoint before compression: ```python def _compress_context(self, messages, system_message, ...): # Create checkpoint BEFORE compression if self._checkpoint_mgr: self._checkpoint_mgr.create_checkpoint( name=f"pre-compression-{self.context_compressor.compression_count}", messages=messages, # Full pre-compression state ) compressed = self.context_compressor.compress(messages, ...) ``` --- ### 🟡 MEDIUM: No Progressive Context Pressure Warnings **Issue:** Only one warning at 85% (line 7871), then sudden compression at 50-100% threshold. No graduated alert system. **Location:** `run_agent.py` context pressure check (lines 7865-7872) **Current:** ```python if _compaction_progress >= 0.85 and not self._context_pressure_warned: self._context_pressure_warned = True ``` **Better:** ```python # Progressive warnings at 60%, 75%, 85%, 95% warning_levels = [(0.60, "info"), (0.75, "notice"), (0.85, "warning"), (0.95, "critical")] ``` --- ### 🟡 MEDIUM: Summary Validation Missing **Issue:** No verification that the generated summary actually contains the critical information from the compressed turns. **Location:** `context_compressor.py` `_generate_summary()` (lines 250-369) **Risk:** If the summarization model fails or produces low-quality output, critical information is silently lost. **Fix Needed:** Add summary quality checks: ```python def _validate_summary(self, summary: str, turns: list) -> bool: """Verify summary captures critical information.""" # Check for key file paths mentioned in turns # Check for error messages that were present # Check for specific values/IDs # Return False if validation fails, trigger fallback ``` --- ### 🟡 MEDIUM: No Semantic Deduplication **Issue:** Same information may be repeated across the original turns and the previous summary, leading to bloated input to the summarizer. **Location:** `_generate_summary()` iterative update path (lines 264-304) **Example:** If the previous summary already mentions "file X was modified", and new turns also mention it, the information appears twice in the summarizer input. --- ### đŸŸĸ LOW: Tool Result Placeholder Not Actionable **Issue:** The placeholder `[Old tool output cleared to save context space]` tells the user nothing about what was lost. **Location:** Line 45 **Better:** ```python # Include tool name and truncated preview _PRUNED_TOOL_PLACEHOLDER_TEMPLATE = ( "[Tool output for {tool_name} cleared. " "Preview: {preview}... ({original_chars} chars removed)]" ) ``` --- ### đŸŸĸ LOW: Compression Metrics Not Tracked **Issue:** No tracking of compression ratio, frequency, or information density over time. **Useful metrics to track:** - Tokens saved per compression - Compression ratio (input tokens / output tokens) - Frequency of compression (compressions per 100 turns) - Average summary length --- ## Specific Code Improvements ### 1. Add Fallback Context Length Detection **File:** `run_agent.py` (~line 1191) ```python # Before initializing compressor, collect all context lengths def _get_fallback_context_lengths(self, _agent_cfg: dict) -> list: """Get context lengths for all models in fallback chain.""" lengths = [] # Primary model lengths.append(get_model_context_length( self.model, base_url=self.base_url, api_key=self.api_key, provider=self.provider )) # Fallback models from config fallback_providers = _agent_cfg.get("fallback_providers", []) for fb in fallback_providers: if isinstance(fb, dict): fb_model = fb.get("model", "") fb_base = fb.get("base_url", "") fb_provider = fb.get("provider", "") fb_key_env = fb.get("api_key_env", "") fb_key = os.getenv(fb_key_env, "") if fb_model: lengths.append(get_model_context_length( fb_model, base_url=fb_base, api_key=fb_key, provider=fb_provider )) return [l for l in lengths if l and l > 0] # Use minimum context length for conservative compression _fallback_contexts = self._get_fallback_context_lengths(_agent_cfg) _effective_context = min(_fallback_contexts) if _fallback_contexts else None ``` ### 2. Add Pre-Compression Checkpoint **File:** `run_agent.py` `_compress_context()` method See patch file for implementation. ### 3. Add Summary Validation **File:** `context_compressor.py` ```python def _extract_critical_refs(self, turns: List[Dict]) -> Set[str]: """Extract critical references that must appear in summary.""" critical = set() for msg in turns: content = msg.get("content", "") or "" # File paths for match in re.finditer(r'[\w\-./]+\.(py|js|ts|json|yaml|md)\b', content): critical.add(match.group(0)) # Error messages if "error" in content.lower() or "exception" in content.lower(): lines = content.split('\n') for line in lines: if any(k in line.lower() for k in ["error", "exception", "traceback"]): critical.add(line[:100]) # First 100 chars of error line return critical def _validate_summary(self, summary: str, turns: List[Dict]) -> Tuple[bool, List[str]]: """Validate that summary captures critical information. Returns (is_valid, missing_items). """ if not summary or len(summary) < 100: return False, ["summary too short"] critical = self._extract_critical_refs(turns) missing = [ref for ref in critical if ref not in summary] # Allow some loss but not too much if len(missing) > len(critical) * 0.5: return False, missing[:5] # Return first 5 missing return True, [] ``` ### 4. Progressive Context Pressure Warnings **File:** `run_agent.py` context pressure section (~line 7865) ```python # Replace single warning with progressive system _CONTEXT_PRESSURE_LEVELS = [ (0.60, "â„šī¸ Context usage at 60% — monitoring"), (0.75, "📊 Context usage at 75% — consider wrapping up soon"), (0.85, "âš ī¸ Context usage at 85% — compression imminent"), (0.95, "🔴 Context usage at 95% — compression will trigger soon"), ] # Track which levels have been reported if not hasattr(self, '_context_pressure_reported'): self._context_pressure_reported = set() for threshold, message in _CONTEXT_PRESSURE_LEVELS: if _compaction_progress >= threshold and threshold not in self._context_pressure_reported: self._context_pressure_reported.add(threshold) if self.status_callback: self.status_callback("warning", message) if not self.quiet_mode: print(f"\n{message}\n") ``` --- ## Interaction with Fallback Chain ### Current Behavior The compressor is initialized once at agent startup with the primary model's context length: ```python self.context_compressor = ContextCompressor( model=self.model, # Primary model only threshold_percent=compression_threshold, # Default 50% # ... ) ``` ### Problems 1. **No dynamic adjustment:** If fallback occurs to a smaller model, compression threshold is wrong 2. **No re-initialization on model switch:** `/model` command doesn't update compressor 3. **Context probe affects wrong model:** If primary probe fails, fallback models may have already been used ### Recommended Architecture ```python class AIAgent: def _update_compressor_for_model(self, model: str, base_url: str, provider: str): """Reconfigure compressor when model changes (fallback or /model command).""" new_context = get_model_context_length(model, base_url=base_url, provider=provider) if new_context != self.context_compressor.context_length: self.context_compressor.context_length = new_context self.context_compressor.threshold_tokens = int( new_context * self.context_compressor.threshold_percent ) logger.info(f"Compressor adjusted for {model}: {new_context:,} tokens") def _handle_fallback(self, fallback_model: str, ...): """Update compressor when falling back to different model.""" self._update_compressor_for_model(fallback_model, ...) ``` --- ## Testing Gaps 1. **No fallback chain test:** Tests don't verify behavior when context limits differ 2. **No checkpoint integration test:** Pre-compression checkpoint not tested 3. **No summary validation test:** No test for detecting poor-quality summaries 4. **No progressive warning test:** Only tests the 85% threshold 5. **No tool result deduplication test:** Tests verify pairs are preserved but not deduplicated --- ## Recommendations Priority | Priority | Item | Effort | Impact | |----------|------|--------|--------| | P0 | Pre-compression checkpoint | Medium | Critical | | P0 | Fallback context awareness | Medium | High | | P1 | Progressive warnings | Low | Medium | | P1 | Summary validation | Medium | High | | P2 | Semantic deduplication | High | Medium | | P2 | Better pruning placeholders | Low | Low | | P3 | Compression metrics | Low | Low | --- ## Conclusion The context compressor is a **solid, production-ready implementation** with sophisticated handling of the core compression problem. The structured summary format and iterative update mechanism are particularly well-designed. The main gaps are in **edge-case hardening**: 1. Fallback chain awareness needs to be addressed for multi-model reliability 2. Pre-compression checkpoint is essential for information recovery 3. Summary validation would prevent silent information loss These are incremental improvements to an already strong foundation. --- *Review conducted by Timmy Agent* *For Gitea issue timmy-home #92*