Files

Kimi Claw 83e400d4aa [KimiClaw] Uniwizard routing modules — quality scorer, task classifier, self-grader (#107 )

Co-authored-by: Kimi Claw <kimi@timmytime.ai>
Co-committed-by: Kimi Claw <kimi@timmytime.ai>

2026-03-30 20:15:36 +00:00

14 KiB

Raw Blame History

Context Compression Review

Gitea Issue: timmy-home #92

Date: 2026-03-30
Reviewer: Timmy (Agent)
Scope: ~/.hermes/hermes-agent/agent/context_compressor.py

Executive Summary

The Hermes context compressor is a mature, well-architected implementation with sophisticated handling of tool call pairs, iterative summary updates, and token-aware tail protection. However, there are several high-impact gaps related to fallback chain awareness, early warning systems, and checkpoint integration that should be addressed for production reliability.

Overall Grade: B+ (Solid foundation, needs edge-case hardening)

What the Current Implementation Does Well

1. Structured Summary Template (Lines 276-303)

The compressor uses a Pi-mono/OpenCode-inspired structured format:

Goal: What the user is trying to accomplish
Constraints & Preferences: User preferences, coding style
Progress: Done / In Progress / Blocked sections
Key Decisions: Important technical decisions with rationale
Relevant Files: Files read/modified/created with notes
Next Steps: What needs to happen next
Critical Context: Values, error messages, config details that would be lost

This is best-in-class compared to most context compression implementations.

2. Iterative Summary Updates (Lines 264-304)

The _previous_summary mechanism preserves information across multiple compactions:

On first compaction: Summarizes from scratch
On subsequent compactions: Updates previous summary with new progress
Moves items from "In Progress" to "Done" when completed
Accumulates constraints and file references across compactions

3. Token-Budget Tail Protection (Lines 490-539)

Instead of fixed message counts, protects the most recent N tokens:

tail_token_budget = threshold_tokens * summary_target_ratio
# Default: 50% of 128K context = 64K threshold → ~13K token tail

This scales automatically with model context window.

4. Tool Call/Result Pair Integrity (Lines 392-450)

Sophisticated handling of orphaned tool pairs:

_sanitize_tool_pairs(): Removes orphaned results, adds stubs for missing results
_align_boundary_forward/backward(): Prevents splitting tool groups
Protects the integrity of the message sequence for API compliance

5. Tool Output Pruning Pre-Pass (Lines 152-182)

Cheap first pass that replaces old tool results with placeholders:

_PRUNED_TOOL_PLACEHOLDER = "[Old tool output cleared to save context space]"

Only prunes content >200 chars, preserving smaller results.

6. Rich Serialization for Summary Input (Lines 199-248)

Includes tool call arguments and truncates intelligently:

Tool results: Up to 3000 chars (with smart truncation keeping head/tail)
Tool calls: Function name AND arguments (truncated to 400 chars if needed)
All roles: 3000 char limit with ellipses

7. Proper Integration with Agent Loop

Initialized in AIAgent.__init__() (lines 1191-1203)
Triggered in _compress_context() (line 5259)
Resets state in reset_session_state() (lines 1263-1271)
Updates token counts via update_from_response() (lines 122-126)

What's Missing or Broken

🔴 CRITICAL: No Fallback Chain Context Window Awareness

Issue: When the agent falls back to a model with a smaller context window (e.g., primary Claude 1M tokens → fallback GPT-4 128K tokens), the compressor's threshold is based on the original model, not the fallback model.

Location: run_agent.py compression initialization (lines 1191-1203)

Impact:

Fallback model may hit context limits before compression triggers
Or compression may trigger too aggressively for smaller models

Evidence:

# In AIAgent.__init__():
self.context_compressor = ContextCompressor(
    model=self.model,  # Original model only
    # ... no fallback context lengths passed
)

Fix Needed: Pass fallback chain context lengths and use minimum:

# Suggested approach:
context_lengths = [get_model_context_length(m) for m in [primary] + fallbacks]
effective_context = min(context_lengths)  # Conservative

🔴 HIGH: No Pre-Compression Checkpoint

Issue: When compression occurs, the pre-compression state is lost. Users cannot "rewind" to before compression if the summary loses critical information.

Location: run_agent.py _compress_context() (line 5259)

Impact:

Information loss is irreversible
If summary misses critical context, conversation is corrupted
No audit trail of what was removed

Fix Needed: Create checkpoint before compression:

def _compress_context(self, messages, system_message, ...):
    # Create checkpoint BEFORE compression
    if self._checkpoint_mgr:
        self._checkpoint_mgr.create_checkpoint(
            name=f"pre-compression-{self.context_compressor.compression_count}",
            messages=messages,  # Full pre-compression state
        )
    compressed = self.context_compressor.compress(messages, ...)

🟡 MEDIUM: No Progressive Context Pressure Warnings

Issue: Only one warning at 85% (line 7871), then sudden compression at 50-100% threshold. No graduated alert system.

Location: run_agent.py context pressure check (lines 7865-7872)

Current:

if _compaction_progress >= 0.85 and not self._context_pressure_warned:
    self._context_pressure_warned = True

Better:

# Progressive warnings at 60%, 75%, 85%, 95%
warning_levels = [(0.60, "info"), (0.75, "notice"), 
                  (0.85, "warning"), (0.95, "critical")]

🟡 MEDIUM: Summary Validation Missing

Issue: No verification that the generated summary actually contains the critical information from the compressed turns.

Location: context_compressor.py _generate_summary() (lines 250-369)

Risk: If the summarization model fails or produces low-quality output, critical information is silently lost.

Fix Needed: Add summary quality checks:

def _validate_summary(self, summary: str, turns: list) -> bool:
    """Verify summary captures critical information."""
    # Check for key file paths mentioned in turns
    # Check for error messages that were present
    # Check for specific values/IDs
    # Return False if validation fails, trigger fallback

🟡 MEDIUM: No Semantic Deduplication

Issue: Same information may be repeated across the original turns and the previous summary, leading to bloated input to the summarizer.

Location: _generate_summary() iterative update path (lines 264-304)

Example: If the previous summary already mentions "file X was modified", and new turns also mention it, the information appears twice in the summarizer input.

🟢 LOW: Tool Result Placeholder Not Actionable

Issue: The placeholder [Old tool output cleared to save context space] tells the user nothing about what was lost.

Location: Line 45

Better:

# Include tool name and truncated preview
_PRUNED_TOOL_PLACEHOLDER_TEMPLATE = (
    "[Tool output for {tool_name} cleared. "
    "Preview: {preview}... ({original_chars} chars removed)]"
)

🟢 LOW: Compression Metrics Not Tracked

Issue: No tracking of compression ratio, frequency, or information density over time.

Useful metrics to track:

Tokens saved per compression
Compression ratio (input tokens / output tokens)
Frequency of compression (compressions per 100 turns)
Average summary length

Specific Code Improvements

1. Add Fallback Context Length Detection

File: run_agent.py (~line 1191)

# Before initializing compressor, collect all context lengths
def _get_fallback_context_lengths(self, _agent_cfg: dict) -> list:
    """Get context lengths for all models in fallback chain."""
    lengths = []
    
    # Primary model
    lengths.append(get_model_context_length(
        self.model, base_url=self.base_url, 
        api_key=self.api_key, provider=self.provider
    ))
    
    # Fallback models from config
    fallback_providers = _agent_cfg.get("fallback_providers", [])
    for fb in fallback_providers:
        if isinstance(fb, dict):
            fb_model = fb.get("model", "")
            fb_base = fb.get("base_url", "")
            fb_provider = fb.get("provider", "")
            fb_key_env = fb.get("api_key_env", "")
            fb_key = os.getenv(fb_key_env, "")
            if fb_model:
                lengths.append(get_model_context_length(
                    fb_model, base_url=fb_base,
                    api_key=fb_key, provider=fb_provider
                ))
    
    return [l for l in lengths if l and l > 0]

# Use minimum context length for conservative compression
_fallback_contexts = self._get_fallback_context_lengths(_agent_cfg)
_effective_context = min(_fallback_contexts) if _fallback_contexts else None

2. Add Pre-Compression Checkpoint

File: run_agent.py _compress_context() method

See patch file for implementation.

3. Add Summary Validation

File: context_compressor.py

def _extract_critical_refs(self, turns: List[Dict]) -> Set[str]:
    """Extract critical references that must appear in summary."""
    critical = set()
    for msg in turns:
        content = msg.get("content", "") or ""
        # File paths
        for match in re.finditer(r'[\w\-./]+\.(py|js|ts|json|yaml|md)\b', content):
            critical.add(match.group(0))
        # Error messages
        if "error" in content.lower() or "exception" in content.lower():
            lines = content.split('\n')
            for line in lines:
                if any(k in line.lower() for k in ["error", "exception", "traceback"]):
                    critical.add(line[:100])  # First 100 chars of error line
    return critical

def _validate_summary(self, summary: str, turns: List[Dict]) -> Tuple[bool, List[str]]:
    """Validate that summary captures critical information.
    
    Returns (is_valid, missing_items).
    """
    if not summary or len(summary) < 100:
        return False, ["summary too short"]
    
    critical = self._extract_critical_refs(turns)
    missing = [ref for ref in critical if ref not in summary]
    
    # Allow some loss but not too much
    if len(missing) > len(critical) * 0.5:
        return False, missing[:5]  # Return first 5 missing
    
    return True, []

4. Progressive Context Pressure Warnings

File: run_agent.py context pressure section (~line 7865)

# Replace single warning with progressive system
_CONTEXT_PRESSURE_LEVELS = [
    (0.60, "ℹ️ Context usage at 60% — monitoring"),
    (0.75, "📊 Context usage at 75% — consider wrapping up soon"),
    (0.85, "⚠️ Context usage at 85% — compression imminent"),
    (0.95, "🔴 Context usage at 95% — compression will trigger soon"),
]

# Track which levels have been reported
if not hasattr(self, '_context_pressure_reported'):
    self._context_pressure_reported = set()

for threshold, message in _CONTEXT_PRESSURE_LEVELS:
    if _compaction_progress >= threshold and threshold not in self._context_pressure_reported:
        self._context_pressure_reported.add(threshold)
        if self.status_callback:
            self.status_callback("warning", message)
        if not self.quiet_mode:
            print(f"\n{message}\n")

Interaction with Fallback Chain

Current Behavior

The compressor is initialized once at agent startup with the primary model's context length:

self.context_compressor = ContextCompressor(
    model=self.model,  # Primary model only
    threshold_percent=compression_threshold,  # Default 50%
    # ...
)

Problems

No dynamic adjustment: If fallback occurs to a smaller model, compression threshold is wrong
No re-initialization on model switch: /model command doesn't update compressor
Context probe affects wrong model: If primary probe fails, fallback models may have already been used

Recommended Architecture

class AIAgent:
    def _update_compressor_for_model(self, model: str, base_url: str, provider: str):
        """Reconfigure compressor when model changes (fallback or /model command)."""
        new_context = get_model_context_length(model, base_url=base_url, provider=provider)
        if new_context != self.context_compressor.context_length:
            self.context_compressor.context_length = new_context
            self.context_compressor.threshold_tokens = int(
                new_context * self.context_compressor.threshold_percent
            )
            logger.info(f"Compressor adjusted for {model}: {new_context:,} tokens")
    
    def _handle_fallback(self, fallback_model: str, ...):
        """Update compressor when falling back to different model."""
        self._update_compressor_for_model(fallback_model, ...)

Testing Gaps

No fallback chain test: Tests don't verify behavior when context limits differ
No checkpoint integration test: Pre-compression checkpoint not tested
No summary validation test: No test for detecting poor-quality summaries
No progressive warning test: Only tests the 85% threshold
No tool result deduplication test: Tests verify pairs are preserved but not deduplicated

Recommendations Priority

Priority	Item	Effort	Impact
P0	Pre-compression checkpoint	Medium	Critical
P0	Fallback context awareness	Medium	High
P1	Progressive warnings	Low	Medium
P1	Summary validation	Medium	High
P2	Semantic deduplication	High	Medium
P2	Better pruning placeholders	Low	Low
P3	Compression metrics	Low	Low

Conclusion

The context compressor is a solid, production-ready implementation with sophisticated handling of the core compression problem. The structured summary format and iterative update mechanism are particularly well-designed.

The main gaps are in edge-case hardening:

Fallback chain awareness needs to be addressed for multi-model reliability
Pre-compression checkpoint is essential for information recovery
Summary validation would prevent silent information loss

These are incremental improvements to an already strong foundation.

Review conducted by Timmy Agent
For Gitea issue timmy-home #92

14 KiB Raw Blame History Unescape Escape