Files
timmy-home/uniwizard/context_compression_review.md

14 KiB
Raw Blame History

Context Compression Review

Gitea Issue: timmy-home #92

Date: 2026-03-30
Reviewer: Timmy (Agent)
Scope: ~/.hermes/hermes-agent/agent/context_compressor.py


Executive Summary

The Hermes context compressor is a mature, well-architected implementation with sophisticated handling of tool call pairs, iterative summary updates, and token-aware tail protection. However, there are several high-impact gaps related to fallback chain awareness, early warning systems, and checkpoint integration that should be addressed for production reliability.

Overall Grade: B+ (Solid foundation, needs edge-case hardening)


What the Current Implementation Does Well

1. Structured Summary Template (Lines 276-303)

The compressor uses a Pi-mono/OpenCode-inspired structured format:

  • Goal: What the user is trying to accomplish
  • Constraints & Preferences: User preferences, coding style
  • Progress: Done / In Progress / Blocked sections
  • Key Decisions: Important technical decisions with rationale
  • Relevant Files: Files read/modified/created with notes
  • Next Steps: What needs to happen next
  • Critical Context: Values, error messages, config details that would be lost

This is best-in-class compared to most context compression implementations.

2. Iterative Summary Updates (Lines 264-304)

The _previous_summary mechanism preserves information across multiple compactions:

  • On first compaction: Summarizes from scratch
  • On subsequent compactions: Updates previous summary with new progress
  • Moves items from "In Progress" to "Done" when completed
  • Accumulates constraints and file references across compactions

3. Token-Budget Tail Protection (Lines 490-539)

Instead of fixed message counts, protects the most recent N tokens:

tail_token_budget = threshold_tokens * summary_target_ratio
# Default: 50% of 128K context = 64K threshold → ~13K token tail

This scales automatically with model context window.

4. Tool Call/Result Pair Integrity (Lines 392-450)

Sophisticated handling of orphaned tool pairs:

  • _sanitize_tool_pairs(): Removes orphaned results, adds stubs for missing results
  • _align_boundary_forward/backward(): Prevents splitting tool groups
  • Protects the integrity of the message sequence for API compliance

5. Tool Output Pruning Pre-Pass (Lines 152-182)

Cheap first pass that replaces old tool results with placeholders:

_PRUNED_TOOL_PLACEHOLDER = "[Old tool output cleared to save context space]"

Only prunes content >200 chars, preserving smaller results.

6. Rich Serialization for Summary Input (Lines 199-248)

Includes tool call arguments and truncates intelligently:

  • Tool results: Up to 3000 chars (with smart truncation keeping head/tail)
  • Tool calls: Function name AND arguments (truncated to 400 chars if needed)
  • All roles: 3000 char limit with ellipses

7. Proper Integration with Agent Loop

  • Initialized in AIAgent.__init__() (lines 1191-1203)
  • Triggered in _compress_context() (line 5259)
  • Resets state in reset_session_state() (lines 1263-1271)
  • Updates token counts via update_from_response() (lines 122-126)

What's Missing or Broken

🔴 CRITICAL: No Fallback Chain Context Window Awareness

Issue: When the agent falls back to a model with a smaller context window (e.g., primary Claude 1M tokens → fallback GPT-4 128K tokens), the compressor's threshold is based on the original model, not the fallback model.

Location: run_agent.py compression initialization (lines 1191-1203)

Impact:

  • Fallback model may hit context limits before compression triggers
  • Or compression may trigger too aggressively for smaller models

Evidence:

# In AIAgent.__init__():
self.context_compressor = ContextCompressor(
    model=self.model,  # Original model only
    # ... no fallback context lengths passed
)

Fix Needed: Pass fallback chain context lengths and use minimum:

# Suggested approach:
context_lengths = [get_model_context_length(m) for m in [primary] + fallbacks]
effective_context = min(context_lengths)  # Conservative

🔴 HIGH: No Pre-Compression Checkpoint

Issue: When compression occurs, the pre-compression state is lost. Users cannot "rewind" to before compression if the summary loses critical information.

Location: run_agent.py _compress_context() (line 5259)

Impact:

  • Information loss is irreversible
  • If summary misses critical context, conversation is corrupted
  • No audit trail of what was removed

Fix Needed: Create checkpoint before compression:

def _compress_context(self, messages, system_message, ...):
    # Create checkpoint BEFORE compression
    if self._checkpoint_mgr:
        self._checkpoint_mgr.create_checkpoint(
            name=f"pre-compression-{self.context_compressor.compression_count}",
            messages=messages,  # Full pre-compression state
        )
    compressed = self.context_compressor.compress(messages, ...)

🟡 MEDIUM: No Progressive Context Pressure Warnings

Issue: Only one warning at 85% (line 7871), then sudden compression at 50-100% threshold. No graduated alert system.

Location: run_agent.py context pressure check (lines 7865-7872)

Current:

if _compaction_progress >= 0.85 and not self._context_pressure_warned:
    self._context_pressure_warned = True

Better:

# Progressive warnings at 60%, 75%, 85%, 95%
warning_levels = [(0.60, "info"), (0.75, "notice"), 
                  (0.85, "warning"), (0.95, "critical")]

🟡 MEDIUM: Summary Validation Missing

Issue: No verification that the generated summary actually contains the critical information from the compressed turns.

Location: context_compressor.py _generate_summary() (lines 250-369)

Risk: If the summarization model fails or produces low-quality output, critical information is silently lost.

Fix Needed: Add summary quality checks:

def _validate_summary(self, summary: str, turns: list) -> bool:
    """Verify summary captures critical information."""
    # Check for key file paths mentioned in turns
    # Check for error messages that were present
    # Check for specific values/IDs
    # Return False if validation fails, trigger fallback

🟡 MEDIUM: No Semantic Deduplication

Issue: Same information may be repeated across the original turns and the previous summary, leading to bloated input to the summarizer.

Location: _generate_summary() iterative update path (lines 264-304)

Example: If the previous summary already mentions "file X was modified", and new turns also mention it, the information appears twice in the summarizer input.


🟢 LOW: Tool Result Placeholder Not Actionable

Issue: The placeholder [Old tool output cleared to save context space] tells the user nothing about what was lost.

Location: Line 45

Better:

# Include tool name and truncated preview
_PRUNED_TOOL_PLACEHOLDER_TEMPLATE = (
    "[Tool output for {tool_name} cleared. "
    "Preview: {preview}... ({original_chars} chars removed)]"
)

🟢 LOW: Compression Metrics Not Tracked

Issue: No tracking of compression ratio, frequency, or information density over time.

Useful metrics to track:

  • Tokens saved per compression
  • Compression ratio (input tokens / output tokens)
  • Frequency of compression (compressions per 100 turns)
  • Average summary length

Specific Code Improvements

1. Add Fallback Context Length Detection

File: run_agent.py (~line 1191)

# Before initializing compressor, collect all context lengths
def _get_fallback_context_lengths(self, _agent_cfg: dict) -> list:
    """Get context lengths for all models in fallback chain."""
    lengths = []
    
    # Primary model
    lengths.append(get_model_context_length(
        self.model, base_url=self.base_url, 
        api_key=self.api_key, provider=self.provider
    ))
    
    # Fallback models from config
    fallback_providers = _agent_cfg.get("fallback_providers", [])
    for fb in fallback_providers:
        if isinstance(fb, dict):
            fb_model = fb.get("model", "")
            fb_base = fb.get("base_url", "")
            fb_provider = fb.get("provider", "")
            fb_key_env = fb.get("api_key_env", "")
            fb_key = os.getenv(fb_key_env, "")
            if fb_model:
                lengths.append(get_model_context_length(
                    fb_model, base_url=fb_base,
                    api_key=fb_key, provider=fb_provider
                ))
    
    return [l for l in lengths if l and l > 0]

# Use minimum context length for conservative compression
_fallback_contexts = self._get_fallback_context_lengths(_agent_cfg)
_effective_context = min(_fallback_contexts) if _fallback_contexts else None

2. Add Pre-Compression Checkpoint

File: run_agent.py _compress_context() method

See patch file for implementation.

3. Add Summary Validation

File: context_compressor.py

def _extract_critical_refs(self, turns: List[Dict]) -> Set[str]:
    """Extract critical references that must appear in summary."""
    critical = set()
    for msg in turns:
        content = msg.get("content", "") or ""
        # File paths
        for match in re.finditer(r'[\w\-./]+\.(py|js|ts|json|yaml|md)\b', content):
            critical.add(match.group(0))
        # Error messages
        if "error" in content.lower() or "exception" in content.lower():
            lines = content.split('\n')
            for line in lines:
                if any(k in line.lower() for k in ["error", "exception", "traceback"]):
                    critical.add(line[:100])  # First 100 chars of error line
    return critical

def _validate_summary(self, summary: str, turns: List[Dict]) -> Tuple[bool, List[str]]:
    """Validate that summary captures critical information.
    
    Returns (is_valid, missing_items).
    """
    if not summary or len(summary) < 100:
        return False, ["summary too short"]
    
    critical = self._extract_critical_refs(turns)
    missing = [ref for ref in critical if ref not in summary]
    
    # Allow some loss but not too much
    if len(missing) > len(critical) * 0.5:
        return False, missing[:5]  # Return first 5 missing
    
    return True, []

4. Progressive Context Pressure Warnings

File: run_agent.py context pressure section (~line 7865)

# Replace single warning with progressive system
_CONTEXT_PRESSURE_LEVELS = [
    (0.60, " Context usage at 60% — monitoring"),
    (0.75, "📊 Context usage at 75% — consider wrapping up soon"),
    (0.85, "⚠️ Context usage at 85% — compression imminent"),
    (0.95, "🔴 Context usage at 95% — compression will trigger soon"),
]

# Track which levels have been reported
if not hasattr(self, '_context_pressure_reported'):
    self._context_pressure_reported = set()

for threshold, message in _CONTEXT_PRESSURE_LEVELS:
    if _compaction_progress >= threshold and threshold not in self._context_pressure_reported:
        self._context_pressure_reported.add(threshold)
        if self.status_callback:
            self.status_callback("warning", message)
        if not self.quiet_mode:
            print(f"\n{message}\n")

Interaction with Fallback Chain

Current Behavior

The compressor is initialized once at agent startup with the primary model's context length:

self.context_compressor = ContextCompressor(
    model=self.model,  # Primary model only
    threshold_percent=compression_threshold,  # Default 50%
    # ...
)

Problems

  1. No dynamic adjustment: If fallback occurs to a smaller model, compression threshold is wrong
  2. No re-initialization on model switch: /model command doesn't update compressor
  3. Context probe affects wrong model: If primary probe fails, fallback models may have already been used
class AIAgent:
    def _update_compressor_for_model(self, model: str, base_url: str, provider: str):
        """Reconfigure compressor when model changes (fallback or /model command)."""
        new_context = get_model_context_length(model, base_url=base_url, provider=provider)
        if new_context != self.context_compressor.context_length:
            self.context_compressor.context_length = new_context
            self.context_compressor.threshold_tokens = int(
                new_context * self.context_compressor.threshold_percent
            )
            logger.info(f"Compressor adjusted for {model}: {new_context:,} tokens")
    
    def _handle_fallback(self, fallback_model: str, ...):
        """Update compressor when falling back to different model."""
        self._update_compressor_for_model(fallback_model, ...)

Testing Gaps

  1. No fallback chain test: Tests don't verify behavior when context limits differ
  2. No checkpoint integration test: Pre-compression checkpoint not tested
  3. No summary validation test: No test for detecting poor-quality summaries
  4. No progressive warning test: Only tests the 85% threshold
  5. No tool result deduplication test: Tests verify pairs are preserved but not deduplicated

Recommendations Priority

Priority Item Effort Impact
P0 Pre-compression checkpoint Medium Critical
P0 Fallback context awareness Medium High
P1 Progressive warnings Low Medium
P1 Summary validation Medium High
P2 Semantic deduplication High Medium
P2 Better pruning placeholders Low Low
P3 Compression metrics Low Low

Conclusion

The context compressor is a solid, production-ready implementation with sophisticated handling of the core compression problem. The structured summary format and iterative update mechanism are particularly well-designed.

The main gaps are in edge-case hardening:

  1. Fallback chain awareness needs to be addressed for multi-model reliability
  2. Pre-compression checkpoint is essential for information recovery
  3. Summary validation would prevent silent information loss

These are incremental improvements to an already strong foundation.


Review conducted by Timmy Agent
For Gitea issue timmy-home #92