402 lines
14 KiB
Markdown
402 lines
14 KiB
Markdown
|
|
# Context Compression Review
|
|||
|
|
## Gitea Issue: timmy-home #92
|
|||
|
|
|
|||
|
|
**Date:** 2026-03-30
|
|||
|
|
**Reviewer:** Timmy (Agent)
|
|||
|
|
**Scope:** `~/.hermes/hermes-agent/agent/context_compressor.py`
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Executive Summary
|
|||
|
|
|
|||
|
|
The Hermes context compressor is a **mature, well-architected implementation** with sophisticated handling of tool call pairs, iterative summary updates, and token-aware tail protection. However, there are several **high-impact gaps** related to fallback chain awareness, early warning systems, and checkpoint integration that should be addressed for production reliability.
|
|||
|
|
|
|||
|
|
**Overall Grade:** B+ (Solid foundation, needs edge-case hardening)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## What the Current Implementation Does Well
|
|||
|
|
|
|||
|
|
### 1. Structured Summary Template (Lines 276-303)
|
|||
|
|
The compressor uses a Pi-mono/OpenCode-inspired structured format:
|
|||
|
|
- **Goal**: What the user is trying to accomplish
|
|||
|
|
- **Constraints & Preferences**: User preferences, coding style
|
|||
|
|
- **Progress**: Done / In Progress / Blocked sections
|
|||
|
|
- **Key Decisions**: Important technical decisions with rationale
|
|||
|
|
- **Relevant Files**: Files read/modified/created with notes
|
|||
|
|
- **Next Steps**: What needs to happen next
|
|||
|
|
- **Critical Context**: Values, error messages, config details that would be lost
|
|||
|
|
|
|||
|
|
This is **best-in-class** compared to most context compression implementations.
|
|||
|
|
|
|||
|
|
### 2. Iterative Summary Updates (Lines 264-304)
|
|||
|
|
The `_previous_summary` mechanism preserves information across multiple compactions:
|
|||
|
|
- On first compaction: Summarizes from scratch
|
|||
|
|
- On subsequent compactions: Updates previous summary with new progress
|
|||
|
|
- Moves items from "In Progress" to "Done" when completed
|
|||
|
|
- Accumulates constraints and file references across compactions
|
|||
|
|
|
|||
|
|
### 3. Token-Budget Tail Protection (Lines 490-539)
|
|||
|
|
Instead of fixed message counts, protects the most recent N tokens:
|
|||
|
|
```python
|
|||
|
|
tail_token_budget = threshold_tokens * summary_target_ratio
|
|||
|
|
# Default: 50% of 128K context = 64K threshold → ~13K token tail
|
|||
|
|
```
|
|||
|
|
This scales automatically with model context window.
|
|||
|
|
|
|||
|
|
### 4. Tool Call/Result Pair Integrity (Lines 392-450)
|
|||
|
|
Sophisticated handling of orphaned tool pairs:
|
|||
|
|
- `_sanitize_tool_pairs()`: Removes orphaned results, adds stubs for missing results
|
|||
|
|
- `_align_boundary_forward/backward()`: Prevents splitting tool groups
|
|||
|
|
- Protects the integrity of the message sequence for API compliance
|
|||
|
|
|
|||
|
|
### 5. Tool Output Pruning Pre-Pass (Lines 152-182)
|
|||
|
|
Cheap first pass that replaces old tool results with placeholders:
|
|||
|
|
```python
|
|||
|
|
_PRUNED_TOOL_PLACEHOLDER = "[Old tool output cleared to save context space]"
|
|||
|
|
```
|
|||
|
|
Only prunes content >200 chars, preserving smaller results.
|
|||
|
|
|
|||
|
|
### 6. Rich Serialization for Summary Input (Lines 199-248)
|
|||
|
|
Includes tool call arguments and truncates intelligently:
|
|||
|
|
- Tool results: Up to 3000 chars (with smart truncation keeping head/tail)
|
|||
|
|
- Tool calls: Function name AND arguments (truncated to 400 chars if needed)
|
|||
|
|
- All roles: 3000 char limit with ellipses
|
|||
|
|
|
|||
|
|
### 7. Proper Integration with Agent Loop
|
|||
|
|
- Initialized in `AIAgent.__init__()` (lines 1191-1203)
|
|||
|
|
- Triggered in `_compress_context()` (line 5259)
|
|||
|
|
- Resets state in `reset_session_state()` (lines 1263-1271)
|
|||
|
|
- Updates token counts via `update_from_response()` (lines 122-126)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## What's Missing or Broken
|
|||
|
|
|
|||
|
|
### 🔴 CRITICAL: No Fallback Chain Context Window Awareness
|
|||
|
|
|
|||
|
|
**Issue:** When the agent falls back to a model with a smaller context window (e.g., primary Claude 1M tokens → fallback GPT-4 128K tokens), the compressor's threshold is based on the **original model**, not the fallback model.
|
|||
|
|
|
|||
|
|
**Location:** `run_agent.py` compression initialization (lines 1191-1203)
|
|||
|
|
|
|||
|
|
**Impact:**
|
|||
|
|
- Fallback model may hit context limits before compression triggers
|
|||
|
|
- Or compression may trigger too aggressively for smaller models
|
|||
|
|
|
|||
|
|
**Evidence:**
|
|||
|
|
```python
|
|||
|
|
# In AIAgent.__init__():
|
|||
|
|
self.context_compressor = ContextCompressor(
|
|||
|
|
model=self.model, # Original model only
|
|||
|
|
# ... no fallback context lengths passed
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Fix Needed:** Pass fallback chain context lengths and use minimum:
|
|||
|
|
```python
|
|||
|
|
# Suggested approach:
|
|||
|
|
context_lengths = [get_model_context_length(m) for m in [primary] + fallbacks]
|
|||
|
|
effective_context = min(context_lengths) # Conservative
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 🔴 HIGH: No Pre-Compression Checkpoint
|
|||
|
|
|
|||
|
|
**Issue:** When compression occurs, the pre-compression state is lost. Users cannot "rewind" to before compression if the summary loses critical information.
|
|||
|
|
|
|||
|
|
**Location:** `run_agent.py` `_compress_context()` (line 5259)
|
|||
|
|
|
|||
|
|
**Impact:**
|
|||
|
|
- Information loss is irreversible
|
|||
|
|
- If summary misses critical context, conversation is corrupted
|
|||
|
|
- No audit trail of what was removed
|
|||
|
|
|
|||
|
|
**Fix Needed:** Create checkpoint before compression:
|
|||
|
|
```python
|
|||
|
|
def _compress_context(self, messages, system_message, ...):
|
|||
|
|
# Create checkpoint BEFORE compression
|
|||
|
|
if self._checkpoint_mgr:
|
|||
|
|
self._checkpoint_mgr.create_checkpoint(
|
|||
|
|
name=f"pre-compression-{self.context_compressor.compression_count}",
|
|||
|
|
messages=messages, # Full pre-compression state
|
|||
|
|
)
|
|||
|
|
compressed = self.context_compressor.compress(messages, ...)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 🟡 MEDIUM: No Progressive Context Pressure Warnings
|
|||
|
|
|
|||
|
|
**Issue:** Only one warning at 85% (line 7871), then sudden compression at 50-100% threshold. No graduated alert system.
|
|||
|
|
|
|||
|
|
**Location:** `run_agent.py` context pressure check (lines 7865-7872)
|
|||
|
|
|
|||
|
|
**Current:**
|
|||
|
|
```python
|
|||
|
|
if _compaction_progress >= 0.85 and not self._context_pressure_warned:
|
|||
|
|
self._context_pressure_warned = True
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Better:**
|
|||
|
|
```python
|
|||
|
|
# Progressive warnings at 60%, 75%, 85%, 95%
|
|||
|
|
warning_levels = [(0.60, "info"), (0.75, "notice"),
|
|||
|
|
(0.85, "warning"), (0.95, "critical")]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 🟡 MEDIUM: Summary Validation Missing
|
|||
|
|
|
|||
|
|
**Issue:** No verification that the generated summary actually contains the critical information from the compressed turns.
|
|||
|
|
|
|||
|
|
**Location:** `context_compressor.py` `_generate_summary()` (lines 250-369)
|
|||
|
|
|
|||
|
|
**Risk:** If the summarization model fails or produces low-quality output, critical information is silently lost.
|
|||
|
|
|
|||
|
|
**Fix Needed:** Add summary quality checks:
|
|||
|
|
```python
|
|||
|
|
def _validate_summary(self, summary: str, turns: list) -> bool:
|
|||
|
|
"""Verify summary captures critical information."""
|
|||
|
|
# Check for key file paths mentioned in turns
|
|||
|
|
# Check for error messages that were present
|
|||
|
|
# Check for specific values/IDs
|
|||
|
|
# Return False if validation fails, trigger fallback
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 🟡 MEDIUM: No Semantic Deduplication
|
|||
|
|
|
|||
|
|
**Issue:** Same information may be repeated across the original turns and the previous summary, leading to bloated input to the summarizer.
|
|||
|
|
|
|||
|
|
**Location:** `_generate_summary()` iterative update path (lines 264-304)
|
|||
|
|
|
|||
|
|
**Example:** If the previous summary already mentions "file X was modified", and new turns also mention it, the information appears twice in the summarizer input.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 🟢 LOW: Tool Result Placeholder Not Actionable
|
|||
|
|
|
|||
|
|
**Issue:** The placeholder `[Old tool output cleared to save context space]` tells the user nothing about what was lost.
|
|||
|
|
|
|||
|
|
**Location:** Line 45
|
|||
|
|
|
|||
|
|
**Better:**
|
|||
|
|
```python
|
|||
|
|
# Include tool name and truncated preview
|
|||
|
|
_PRUNED_TOOL_PLACEHOLDER_TEMPLATE = (
|
|||
|
|
"[Tool output for {tool_name} cleared. "
|
|||
|
|
"Preview: {preview}... ({original_chars} chars removed)]"
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### 🟢 LOW: Compression Metrics Not Tracked
|
|||
|
|
|
|||
|
|
**Issue:** No tracking of compression ratio, frequency, or information density over time.
|
|||
|
|
|
|||
|
|
**Useful metrics to track:**
|
|||
|
|
- Tokens saved per compression
|
|||
|
|
- Compression ratio (input tokens / output tokens)
|
|||
|
|
- Frequency of compression (compressions per 100 turns)
|
|||
|
|
- Average summary length
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Specific Code Improvements
|
|||
|
|
|
|||
|
|
### 1. Add Fallback Context Length Detection
|
|||
|
|
|
|||
|
|
**File:** `run_agent.py` (~line 1191)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# Before initializing compressor, collect all context lengths
|
|||
|
|
def _get_fallback_context_lengths(self, _agent_cfg: dict) -> list:
|
|||
|
|
"""Get context lengths for all models in fallback chain."""
|
|||
|
|
lengths = []
|
|||
|
|
|
|||
|
|
# Primary model
|
|||
|
|
lengths.append(get_model_context_length(
|
|||
|
|
self.model, base_url=self.base_url,
|
|||
|
|
api_key=self.api_key, provider=self.provider
|
|||
|
|
))
|
|||
|
|
|
|||
|
|
# Fallback models from config
|
|||
|
|
fallback_providers = _agent_cfg.get("fallback_providers", [])
|
|||
|
|
for fb in fallback_providers:
|
|||
|
|
if isinstance(fb, dict):
|
|||
|
|
fb_model = fb.get("model", "")
|
|||
|
|
fb_base = fb.get("base_url", "")
|
|||
|
|
fb_provider = fb.get("provider", "")
|
|||
|
|
fb_key_env = fb.get("api_key_env", "")
|
|||
|
|
fb_key = os.getenv(fb_key_env, "")
|
|||
|
|
if fb_model:
|
|||
|
|
lengths.append(get_model_context_length(
|
|||
|
|
fb_model, base_url=fb_base,
|
|||
|
|
api_key=fb_key, provider=fb_provider
|
|||
|
|
))
|
|||
|
|
|
|||
|
|
return [l for l in lengths if l and l > 0]
|
|||
|
|
|
|||
|
|
# Use minimum context length for conservative compression
|
|||
|
|
_fallback_contexts = self._get_fallback_context_lengths(_agent_cfg)
|
|||
|
|
_effective_context = min(_fallback_contexts) if _fallback_contexts else None
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. Add Pre-Compression Checkpoint
|
|||
|
|
|
|||
|
|
**File:** `run_agent.py` `_compress_context()` method
|
|||
|
|
|
|||
|
|
See patch file for implementation.
|
|||
|
|
|
|||
|
|
### 3. Add Summary Validation
|
|||
|
|
|
|||
|
|
**File:** `context_compressor.py`
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
def _extract_critical_refs(self, turns: List[Dict]) -> Set[str]:
|
|||
|
|
"""Extract critical references that must appear in summary."""
|
|||
|
|
critical = set()
|
|||
|
|
for msg in turns:
|
|||
|
|
content = msg.get("content", "") or ""
|
|||
|
|
# File paths
|
|||
|
|
for match in re.finditer(r'[\w\-./]+\.(py|js|ts|json|yaml|md)\b', content):
|
|||
|
|
critical.add(match.group(0))
|
|||
|
|
# Error messages
|
|||
|
|
if "error" in content.lower() or "exception" in content.lower():
|
|||
|
|
lines = content.split('\n')
|
|||
|
|
for line in lines:
|
|||
|
|
if any(k in line.lower() for k in ["error", "exception", "traceback"]):
|
|||
|
|
critical.add(line[:100]) # First 100 chars of error line
|
|||
|
|
return critical
|
|||
|
|
|
|||
|
|
def _validate_summary(self, summary: str, turns: List[Dict]) -> Tuple[bool, List[str]]:
|
|||
|
|
"""Validate that summary captures critical information.
|
|||
|
|
|
|||
|
|
Returns (is_valid, missing_items).
|
|||
|
|
"""
|
|||
|
|
if not summary or len(summary) < 100:
|
|||
|
|
return False, ["summary too short"]
|
|||
|
|
|
|||
|
|
critical = self._extract_critical_refs(turns)
|
|||
|
|
missing = [ref for ref in critical if ref not in summary]
|
|||
|
|
|
|||
|
|
# Allow some loss but not too much
|
|||
|
|
if len(missing) > len(critical) * 0.5:
|
|||
|
|
return False, missing[:5] # Return first 5 missing
|
|||
|
|
|
|||
|
|
return True, []
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. Progressive Context Pressure Warnings
|
|||
|
|
|
|||
|
|
**File:** `run_agent.py` context pressure section (~line 7865)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# Replace single warning with progressive system
|
|||
|
|
_CONTEXT_PRESSURE_LEVELS = [
|
|||
|
|
(0.60, "ℹ️ Context usage at 60% — monitoring"),
|
|||
|
|
(0.75, "📊 Context usage at 75% — consider wrapping up soon"),
|
|||
|
|
(0.85, "⚠️ Context usage at 85% — compression imminent"),
|
|||
|
|
(0.95, "🔴 Context usage at 95% — compression will trigger soon"),
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
# Track which levels have been reported
|
|||
|
|
if not hasattr(self, '_context_pressure_reported'):
|
|||
|
|
self._context_pressure_reported = set()
|
|||
|
|
|
|||
|
|
for threshold, message in _CONTEXT_PRESSURE_LEVELS:
|
|||
|
|
if _compaction_progress >= threshold and threshold not in self._context_pressure_reported:
|
|||
|
|
self._context_pressure_reported.add(threshold)
|
|||
|
|
if self.status_callback:
|
|||
|
|
self.status_callback("warning", message)
|
|||
|
|
if not self.quiet_mode:
|
|||
|
|
print(f"\n{message}\n")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Interaction with Fallback Chain
|
|||
|
|
|
|||
|
|
### Current Behavior
|
|||
|
|
|
|||
|
|
The compressor is initialized once at agent startup with the primary model's context length:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
self.context_compressor = ContextCompressor(
|
|||
|
|
model=self.model, # Primary model only
|
|||
|
|
threshold_percent=compression_threshold, # Default 50%
|
|||
|
|
# ...
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Problems
|
|||
|
|
|
|||
|
|
1. **No dynamic adjustment:** If fallback occurs to a smaller model, compression threshold is wrong
|
|||
|
|
2. **No re-initialization on model switch:** `/model` command doesn't update compressor
|
|||
|
|
3. **Context probe affects wrong model:** If primary probe fails, fallback models may have already been used
|
|||
|
|
|
|||
|
|
### Recommended Architecture
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
class AIAgent:
|
|||
|
|
def _update_compressor_for_model(self, model: str, base_url: str, provider: str):
|
|||
|
|
"""Reconfigure compressor when model changes (fallback or /model command)."""
|
|||
|
|
new_context = get_model_context_length(model, base_url=base_url, provider=provider)
|
|||
|
|
if new_context != self.context_compressor.context_length:
|
|||
|
|
self.context_compressor.context_length = new_context
|
|||
|
|
self.context_compressor.threshold_tokens = int(
|
|||
|
|
new_context * self.context_compressor.threshold_percent
|
|||
|
|
)
|
|||
|
|
logger.info(f"Compressor adjusted for {model}: {new_context:,} tokens")
|
|||
|
|
|
|||
|
|
def _handle_fallback(self, fallback_model: str, ...):
|
|||
|
|
"""Update compressor when falling back to different model."""
|
|||
|
|
self._update_compressor_for_model(fallback_model, ...)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Testing Gaps
|
|||
|
|
|
|||
|
|
1. **No fallback chain test:** Tests don't verify behavior when context limits differ
|
|||
|
|
2. **No checkpoint integration test:** Pre-compression checkpoint not tested
|
|||
|
|
3. **No summary validation test:** No test for detecting poor-quality summaries
|
|||
|
|
4. **No progressive warning test:** Only tests the 85% threshold
|
|||
|
|
5. **No tool result deduplication test:** Tests verify pairs are preserved but not deduplicated
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Recommendations Priority
|
|||
|
|
|
|||
|
|
| Priority | Item | Effort | Impact |
|
|||
|
|
|----------|------|--------|--------|
|
|||
|
|
| P0 | Pre-compression checkpoint | Medium | Critical |
|
|||
|
|
| P0 | Fallback context awareness | Medium | High |
|
|||
|
|
| P1 | Progressive warnings | Low | Medium |
|
|||
|
|
| P1 | Summary validation | Medium | High |
|
|||
|
|
| P2 | Semantic deduplication | High | Medium |
|
|||
|
|
| P2 | Better pruning placeholders | Low | Low |
|
|||
|
|
| P3 | Compression metrics | Low | Low |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Conclusion
|
|||
|
|
|
|||
|
|
The context compressor is a **solid, production-ready implementation** with sophisticated handling of the core compression problem. The structured summary format and iterative update mechanism are particularly well-designed.
|
|||
|
|
|
|||
|
|
The main gaps are in **edge-case hardening**:
|
|||
|
|
1. Fallback chain awareness needs to be addressed for multi-model reliability
|
|||
|
|
2. Pre-compression checkpoint is essential for information recovery
|
|||
|
|
3. Summary validation would prevent silent information loss
|
|||
|
|
|
|||
|
|
These are incremental improvements to an already strong foundation.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
*Review conducted by Timmy Agent*
|
|||
|
|
*For Gitea issue timmy-home #92*
|