Files
timmy-home/uniwizard/context_compression_review.md

402 lines
14 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Context Compression Review
## Gitea Issue: timmy-home #92
**Date:** 2026-03-30
**Reviewer:** Timmy (Agent)
**Scope:** `~/.hermes/hermes-agent/agent/context_compressor.py`
---
## Executive Summary
The Hermes context compressor is a **mature, well-architected implementation** with sophisticated handling of tool call pairs, iterative summary updates, and token-aware tail protection. However, there are several **high-impact gaps** related to fallback chain awareness, early warning systems, and checkpoint integration that should be addressed for production reliability.
**Overall Grade:** B+ (Solid foundation, needs edge-case hardening)
---
## What the Current Implementation Does Well
### 1. Structured Summary Template (Lines 276-303)
The compressor uses a Pi-mono/OpenCode-inspired structured format:
- **Goal**: What the user is trying to accomplish
- **Constraints & Preferences**: User preferences, coding style
- **Progress**: Done / In Progress / Blocked sections
- **Key Decisions**: Important technical decisions with rationale
- **Relevant Files**: Files read/modified/created with notes
- **Next Steps**: What needs to happen next
- **Critical Context**: Values, error messages, config details that would be lost
This is **best-in-class** compared to most context compression implementations.
### 2. Iterative Summary Updates (Lines 264-304)
The `_previous_summary` mechanism preserves information across multiple compactions:
- On first compaction: Summarizes from scratch
- On subsequent compactions: Updates previous summary with new progress
- Moves items from "In Progress" to "Done" when completed
- Accumulates constraints and file references across compactions
### 3. Token-Budget Tail Protection (Lines 490-539)
Instead of fixed message counts, protects the most recent N tokens:
```python
tail_token_budget = threshold_tokens * summary_target_ratio
# Default: 50% of 128K context = 64K threshold → ~13K token tail
```
This scales automatically with model context window.
### 4. Tool Call/Result Pair Integrity (Lines 392-450)
Sophisticated handling of orphaned tool pairs:
- `_sanitize_tool_pairs()`: Removes orphaned results, adds stubs for missing results
- `_align_boundary_forward/backward()`: Prevents splitting tool groups
- Protects the integrity of the message sequence for API compliance
### 5. Tool Output Pruning Pre-Pass (Lines 152-182)
Cheap first pass that replaces old tool results with placeholders:
```python
_PRUNED_TOOL_PLACEHOLDER = "[Old tool output cleared to save context space]"
```
Only prunes content >200 chars, preserving smaller results.
### 6. Rich Serialization for Summary Input (Lines 199-248)
Includes tool call arguments and truncates intelligently:
- Tool results: Up to 3000 chars (with smart truncation keeping head/tail)
- Tool calls: Function name AND arguments (truncated to 400 chars if needed)
- All roles: 3000 char limit with ellipses
### 7. Proper Integration with Agent Loop
- Initialized in `AIAgent.__init__()` (lines 1191-1203)
- Triggered in `_compress_context()` (line 5259)
- Resets state in `reset_session_state()` (lines 1263-1271)
- Updates token counts via `update_from_response()` (lines 122-126)
---
## What's Missing or Broken
### 🔴 CRITICAL: No Fallback Chain Context Window Awareness
**Issue:** When the agent falls back to a model with a smaller context window (e.g., primary Claude 1M tokens → fallback GPT-4 128K tokens), the compressor's threshold is based on the **original model**, not the fallback model.
**Location:** `run_agent.py` compression initialization (lines 1191-1203)
**Impact:**
- Fallback model may hit context limits before compression triggers
- Or compression may trigger too aggressively for smaller models
**Evidence:**
```python
# In AIAgent.__init__():
self.context_compressor = ContextCompressor(
model=self.model, # Original model only
# ... no fallback context lengths passed
)
```
**Fix Needed:** Pass fallback chain context lengths and use minimum:
```python
# Suggested approach:
context_lengths = [get_model_context_length(m) for m in [primary] + fallbacks]
effective_context = min(context_lengths) # Conservative
```
---
### 🔴 HIGH: No Pre-Compression Checkpoint
**Issue:** When compression occurs, the pre-compression state is lost. Users cannot "rewind" to before compression if the summary loses critical information.
**Location:** `run_agent.py` `_compress_context()` (line 5259)
**Impact:**
- Information loss is irreversible
- If summary misses critical context, conversation is corrupted
- No audit trail of what was removed
**Fix Needed:** Create checkpoint before compression:
```python
def _compress_context(self, messages, system_message, ...):
# Create checkpoint BEFORE compression
if self._checkpoint_mgr:
self._checkpoint_mgr.create_checkpoint(
name=f"pre-compression-{self.context_compressor.compression_count}",
messages=messages, # Full pre-compression state
)
compressed = self.context_compressor.compress(messages, ...)
```
---
### 🟡 MEDIUM: No Progressive Context Pressure Warnings
**Issue:** Only one warning at 85% (line 7871), then sudden compression at 50-100% threshold. No graduated alert system.
**Location:** `run_agent.py` context pressure check (lines 7865-7872)
**Current:**
```python
if _compaction_progress >= 0.85 and not self._context_pressure_warned:
self._context_pressure_warned = True
```
**Better:**
```python
# Progressive warnings at 60%, 75%, 85%, 95%
warning_levels = [(0.60, "info"), (0.75, "notice"),
(0.85, "warning"), (0.95, "critical")]
```
---
### 🟡 MEDIUM: Summary Validation Missing
**Issue:** No verification that the generated summary actually contains the critical information from the compressed turns.
**Location:** `context_compressor.py` `_generate_summary()` (lines 250-369)
**Risk:** If the summarization model fails or produces low-quality output, critical information is silently lost.
**Fix Needed:** Add summary quality checks:
```python
def _validate_summary(self, summary: str, turns: list) -> bool:
"""Verify summary captures critical information."""
# Check for key file paths mentioned in turns
# Check for error messages that were present
# Check for specific values/IDs
# Return False if validation fails, trigger fallback
```
---
### 🟡 MEDIUM: No Semantic Deduplication
**Issue:** Same information may be repeated across the original turns and the previous summary, leading to bloated input to the summarizer.
**Location:** `_generate_summary()` iterative update path (lines 264-304)
**Example:** If the previous summary already mentions "file X was modified", and new turns also mention it, the information appears twice in the summarizer input.
---
### 🟢 LOW: Tool Result Placeholder Not Actionable
**Issue:** The placeholder `[Old tool output cleared to save context space]` tells the user nothing about what was lost.
**Location:** Line 45
**Better:**
```python
# Include tool name and truncated preview
_PRUNED_TOOL_PLACEHOLDER_TEMPLATE = (
"[Tool output for {tool_name} cleared. "
"Preview: {preview}... ({original_chars} chars removed)]"
)
```
---
### 🟢 LOW: Compression Metrics Not Tracked
**Issue:** No tracking of compression ratio, frequency, or information density over time.
**Useful metrics to track:**
- Tokens saved per compression
- Compression ratio (input tokens / output tokens)
- Frequency of compression (compressions per 100 turns)
- Average summary length
---
## Specific Code Improvements
### 1. Add Fallback Context Length Detection
**File:** `run_agent.py` (~line 1191)
```python
# Before initializing compressor, collect all context lengths
def _get_fallback_context_lengths(self, _agent_cfg: dict) -> list:
"""Get context lengths for all models in fallback chain."""
lengths = []
# Primary model
lengths.append(get_model_context_length(
self.model, base_url=self.base_url,
api_key=self.api_key, provider=self.provider
))
# Fallback models from config
fallback_providers = _agent_cfg.get("fallback_providers", [])
for fb in fallback_providers:
if isinstance(fb, dict):
fb_model = fb.get("model", "")
fb_base = fb.get("base_url", "")
fb_provider = fb.get("provider", "")
fb_key_env = fb.get("api_key_env", "")
fb_key = os.getenv(fb_key_env, "")
if fb_model:
lengths.append(get_model_context_length(
fb_model, base_url=fb_base,
api_key=fb_key, provider=fb_provider
))
return [l for l in lengths if l and l > 0]
# Use minimum context length for conservative compression
_fallback_contexts = self._get_fallback_context_lengths(_agent_cfg)
_effective_context = min(_fallback_contexts) if _fallback_contexts else None
```
### 2. Add Pre-Compression Checkpoint
**File:** `run_agent.py` `_compress_context()` method
See patch file for implementation.
### 3. Add Summary Validation
**File:** `context_compressor.py`
```python
def _extract_critical_refs(self, turns: List[Dict]) -> Set[str]:
"""Extract critical references that must appear in summary."""
critical = set()
for msg in turns:
content = msg.get("content", "") or ""
# File paths
for match in re.finditer(r'[\w\-./]+\.(py|js|ts|json|yaml|md)\b', content):
critical.add(match.group(0))
# Error messages
if "error" in content.lower() or "exception" in content.lower():
lines = content.split('\n')
for line in lines:
if any(k in line.lower() for k in ["error", "exception", "traceback"]):
critical.add(line[:100]) # First 100 chars of error line
return critical
def _validate_summary(self, summary: str, turns: List[Dict]) -> Tuple[bool, List[str]]:
"""Validate that summary captures critical information.
Returns (is_valid, missing_items).
"""
if not summary or len(summary) < 100:
return False, ["summary too short"]
critical = self._extract_critical_refs(turns)
missing = [ref for ref in critical if ref not in summary]
# Allow some loss but not too much
if len(missing) > len(critical) * 0.5:
return False, missing[:5] # Return first 5 missing
return True, []
```
### 4. Progressive Context Pressure Warnings
**File:** `run_agent.py` context pressure section (~line 7865)
```python
# Replace single warning with progressive system
_CONTEXT_PRESSURE_LEVELS = [
(0.60, " Context usage at 60% — monitoring"),
(0.75, "📊 Context usage at 75% — consider wrapping up soon"),
(0.85, "⚠️ Context usage at 85% — compression imminent"),
(0.95, "🔴 Context usage at 95% — compression will trigger soon"),
]
# Track which levels have been reported
if not hasattr(self, '_context_pressure_reported'):
self._context_pressure_reported = set()
for threshold, message in _CONTEXT_PRESSURE_LEVELS:
if _compaction_progress >= threshold and threshold not in self._context_pressure_reported:
self._context_pressure_reported.add(threshold)
if self.status_callback:
self.status_callback("warning", message)
if not self.quiet_mode:
print(f"\n{message}\n")
```
---
## Interaction with Fallback Chain
### Current Behavior
The compressor is initialized once at agent startup with the primary model's context length:
```python
self.context_compressor = ContextCompressor(
model=self.model, # Primary model only
threshold_percent=compression_threshold, # Default 50%
# ...
)
```
### Problems
1. **No dynamic adjustment:** If fallback occurs to a smaller model, compression threshold is wrong
2. **No re-initialization on model switch:** `/model` command doesn't update compressor
3. **Context probe affects wrong model:** If primary probe fails, fallback models may have already been used
### Recommended Architecture
```python
class AIAgent:
def _update_compressor_for_model(self, model: str, base_url: str, provider: str):
"""Reconfigure compressor when model changes (fallback or /model command)."""
new_context = get_model_context_length(model, base_url=base_url, provider=provider)
if new_context != self.context_compressor.context_length:
self.context_compressor.context_length = new_context
self.context_compressor.threshold_tokens = int(
new_context * self.context_compressor.threshold_percent
)
logger.info(f"Compressor adjusted for {model}: {new_context:,} tokens")
def _handle_fallback(self, fallback_model: str, ...):
"""Update compressor when falling back to different model."""
self._update_compressor_for_model(fallback_model, ...)
```
---
## Testing Gaps
1. **No fallback chain test:** Tests don't verify behavior when context limits differ
2. **No checkpoint integration test:** Pre-compression checkpoint not tested
3. **No summary validation test:** No test for detecting poor-quality summaries
4. **No progressive warning test:** Only tests the 85% threshold
5. **No tool result deduplication test:** Tests verify pairs are preserved but not deduplicated
---
## Recommendations Priority
| Priority | Item | Effort | Impact |
|----------|------|--------|--------|
| P0 | Pre-compression checkpoint | Medium | Critical |
| P0 | Fallback context awareness | Medium | High |
| P1 | Progressive warnings | Low | Medium |
| P1 | Summary validation | Medium | High |
| P2 | Semantic deduplication | High | Medium |
| P2 | Better pruning placeholders | Low | Low |
| P3 | Compression metrics | Low | Low |
---
## Conclusion
The context compressor is a **solid, production-ready implementation** with sophisticated handling of the core compression problem. The structured summary format and iterative update mechanism are particularly well-designed.
The main gaps are in **edge-case hardening**:
1. Fallback chain awareness needs to be addressed for multi-model reliability
2. Pre-compression checkpoint is essential for information recovery
3. Summary validation would prevent silent information loss
These are incremental improvements to an already strong foundation.
---
*Review conducted by Timmy Agent*
*For Gitea issue timmy-home #92*