2026-02-21 22:31:43 -08:00
""" Automatic context window compression for long conversations.
Self - contained class with its own OpenAI client for summarization .
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
Uses auxiliary model ( cheap / fast ) to summarize middle turns while
2026-02-21 22:31:43 -08:00
protecting head and tail context .
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
Improvements over v1 :
- Structured summary template ( Goal , Progress , Decisions , Files , Next Steps )
- Iterative summary updates ( preserves info across multiple compactions )
- Token - budget tail protection instead of fixed message count
- Tool output pruning before LLM summarization ( cheap pre - pass )
- Scaled summary budget ( proportional to compressed content )
- Richer tool call / result detail in summarizer input
2026-02-21 22:31:43 -08:00
"""
import logging
2026-03-07 11:54:51 -08:00
from typing import Any , Dict , List , Optional
2026-02-21 22:31:43 -08:00
2026-03-11 20:52:19 -07:00
from agent . auxiliary_client import call_llm
2026-02-21 22:31:43 -08:00
from agent . model_metadata import (
get_model_context_length ,
estimate_messages_tokens_rough ,
)
logger = logging . getLogger ( __name__ )
2026-03-14 02:33:31 -07:00
SUMMARY_PREFIX = (
" [CONTEXT COMPACTION] Earlier turns in this conversation were compacted "
" to save context space. The summary below describes work that was "
" already completed, and the current session state may still reflect "
" that work (for example, files may already be changed). Use the summary "
" and the current state to continue from where things left off, and "
" avoid repeating work: "
)
LEGACY_SUMMARY_PREFIX = " [CONTEXT SUMMARY]: "
2026-03-24 17:45:49 -07:00
# Minimum tokens for the summary output
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
_MIN_SUMMARY_TOKENS = 2000
# Proportion of compressed content to allocate for summary
_SUMMARY_RATIO = 0.20
2026-03-24 17:45:49 -07:00
# Absolute ceiling for summary tokens (even on very large context windows)
2026-03-24 18:48:04 -07:00
_SUMMARY_TOKENS_CEILING = 12_000
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Placeholder used when pruning old tool results
_PRUNED_TOOL_PLACEHOLDER = " [Old tool output cleared to save context space] "
# Chars per token rough estimate
_CHARS_PER_TOKEN = 4
2026-02-21 22:31:43 -08:00
class ContextCompressor :
""" Compresses conversation context when approaching the model ' s context limit.
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
Algorithm :
1. Prune old tool results ( cheap , no LLM call )
2. Protect head messages ( system prompt + first exchange )
3. Protect tail messages by token budget ( most recent ~ 20 K tokens )
4. Summarize middle turns with structured LLM prompt
5. On subsequent compactions , iteratively update the previous summary
2026-02-21 22:31:43 -08:00
"""
def __init__ (
self ,
model : str ,
2026-03-24 18:48:04 -07:00
threshold_percent : float = 0.50 ,
2026-02-21 22:31:43 -08:00
protect_first_n : int = 3 ,
2026-03-24 17:45:49 -07:00
protect_last_n : int = 20 ,
2026-03-24 18:48:04 -07:00
summary_target_ratio : float = 0.20 ,
2026-02-21 22:31:43 -08:00
quiet_mode : bool = False ,
2026-02-28 04:46:35 -08:00
summary_model_override : str = None ,
2026-03-05 16:09:57 -08:00
base_url : str = " " ,
2026-03-18 03:04:07 -07:00
api_key : str = " " ,
2026-03-19 06:01:16 -07:00
config_context_length : int | None = None ,
feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.
Key changes:
- New agent/models_dev.py: Fetches and caches the models.dev registry
(3800+ models across 100+ providers with per-provider context windows).
In-memory cache (1hr TTL) + disk cache for cold starts.
- Rewritten get_model_context_length() resolution chain:
0. Config override (model.context_length)
1. Custom providers per-model context_length
2. Persistent disk cache
3. Endpoint /models (local servers)
4. Anthropic /v1/models API (max_input_tokens, API-key only)
5. OpenRouter live API (existing, unchanged)
6. Nous suffix-match via OpenRouter (dot/dash normalization)
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. 128K fallback (was 2M)
- Provider-aware context: same model now correctly resolves to different
context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
128K on GitHub Copilot). Provider name flows through ContextCompressor.
- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
models.dev replaces the per-model hardcoding.
- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.
- hermes model: prompts for context_length when configuring custom
endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
per-model config.
- custom_providers schema extended with optional models dict for
per-model context_length (backward compatible).
- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
normalization. Handles all 15 current Nous models.
- Anthropic direct: queries /v1/models for max_input_tokens. Only works
with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
to models.dev for OAuth users.
Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md
Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
provider : str = " " ,
2026-02-21 22:31:43 -08:00
) :
self . model = model
2026-03-05 16:09:57 -08:00
self . base_url = base_url
2026-03-18 03:04:07 -07:00
self . api_key = api_key
feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.
Key changes:
- New agent/models_dev.py: Fetches and caches the models.dev registry
(3800+ models across 100+ providers with per-provider context windows).
In-memory cache (1hr TTL) + disk cache for cold starts.
- Rewritten get_model_context_length() resolution chain:
0. Config override (model.context_length)
1. Custom providers per-model context_length
2. Persistent disk cache
3. Endpoint /models (local servers)
4. Anthropic /v1/models API (max_input_tokens, API-key only)
5. OpenRouter live API (existing, unchanged)
6. Nous suffix-match via OpenRouter (dot/dash normalization)
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. 128K fallback (was 2M)
- Provider-aware context: same model now correctly resolves to different
context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
128K on GitHub Copilot). Provider name flows through ContextCompressor.
- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
models.dev replaces the per-model hardcoding.
- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.
- hermes model: prompts for context_length when configuring custom
endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
per-model config.
- custom_providers schema extended with optional models dict for
per-model context_length (backward compatible).
- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
normalization. Handles all 15 current Nous models.
- Anthropic direct: queries /v1/models for max_input_tokens. Only works
with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
to models.dev for OAuth users.
Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md
Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
self . provider = provider
2026-02-21 22:31:43 -08:00
self . threshold_percent = threshold_percent
self . protect_first_n = protect_first_n
self . protect_last_n = protect_last_n
2026-03-24 17:45:49 -07:00
self . summary_target_ratio = max ( 0.10 , min ( summary_target_ratio , 0.80 ) )
2026-02-21 22:31:43 -08:00
self . quiet_mode = quiet_mode
2026-03-19 06:01:16 -07:00
self . context_length = get_model_context_length (
model , base_url = base_url , api_key = api_key ,
config_context_length = config_context_length ,
feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.
Key changes:
- New agent/models_dev.py: Fetches and caches the models.dev registry
(3800+ models across 100+ providers with per-provider context windows).
In-memory cache (1hr TTL) + disk cache for cold starts.
- Rewritten get_model_context_length() resolution chain:
0. Config override (model.context_length)
1. Custom providers per-model context_length
2. Persistent disk cache
3. Endpoint /models (local servers)
4. Anthropic /v1/models API (max_input_tokens, API-key only)
5. OpenRouter live API (existing, unchanged)
6. Nous suffix-match via OpenRouter (dot/dash normalization)
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. 128K fallback (was 2M)
- Provider-aware context: same model now correctly resolves to different
context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
128K on GitHub Copilot). Provider name flows through ContextCompressor.
- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
models.dev replaces the per-model hardcoding.
- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.
- hermes model: prompts for context_length when configuring custom
endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
per-model config.
- custom_providers schema extended with optional models dict for
per-model context_length (backward compatible).
- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
normalization. Handles all 15 current Nous models.
- Anthropic direct: queries /v1/models for max_input_tokens. Only works
with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
to models.dev for OAuth users.
Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md
Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
provider = provider ,
2026-03-19 06:01:16 -07:00
)
2026-02-21 22:31:43 -08:00
self . threshold_tokens = int ( self . context_length * threshold_percent )
self . compression_count = 0
2026-03-21 10:47:44 -07:00
2026-03-24 18:48:04 -07:00
# Derive token budgets: ratio is relative to the threshold, not total context
target_tokens = int ( self . threshold_tokens * self . summary_target_ratio )
2026-03-24 17:45:49 -07:00
self . tail_token_budget = target_tokens
self . max_summary_tokens = min (
int ( self . context_length * 0.05 ) , _SUMMARY_TOKENS_CEILING ,
)
2026-03-21 10:47:44 -07:00
if not quiet_mode :
logger . info (
" Context compressor initialized: model= %s context_length= %d "
2026-03-24 17:45:49 -07:00
" threshold= %d ( %.0f %% ) target_ratio= %.0f %% tail_budget= %d "
" provider= %s base_url= %s " ,
2026-03-21 10:47:44 -07:00
model , self . context_length , self . threshold_tokens ,
2026-03-24 17:45:49 -07:00
threshold_percent * 100 , self . summary_target_ratio * 100 ,
self . tail_token_budget ,
provider or " none " , base_url or " none " ,
2026-03-21 10:47:44 -07:00
)
2026-03-05 16:09:57 -08:00
self . _context_probed = False # True after a step-down from context error
2026-02-21 22:31:43 -08:00
self . last_prompt_tokens = 0
self . last_completion_tokens = 0
self . last_total_tokens = 0
2026-03-11 20:52:19 -07:00
self . summary_model = summary_model_override or " "
2026-02-21 22:31:43 -08:00
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Stores the previous compaction summary for iterative updates
self . _previous_summary : Optional [ str ] = None
2026-02-21 22:31:43 -08:00
def update_from_response ( self , usage : Dict [ str , Any ] ) :
""" Update tracked token usage from API response. """
self . last_prompt_tokens = usage . get ( " prompt_tokens " , 0 )
self . last_completion_tokens = usage . get ( " completion_tokens " , 0 )
self . last_total_tokens = usage . get ( " total_tokens " , 0 )
def should_compress ( self , prompt_tokens : int = None ) - > bool :
""" Check if context exceeds the compression threshold. """
tokens = prompt_tokens if prompt_tokens is not None else self . last_prompt_tokens
return tokens > = self . threshold_tokens
def should_compress_preflight ( self , messages : List [ Dict [ str , Any ] ] ) - > bool :
""" Quick pre-flight check using rough estimate (before API call). """
rough_estimate = estimate_messages_tokens_rough ( messages )
return rough_estimate > = self . threshold_tokens
def get_status ( self ) - > Dict [ str , Any ] :
""" Get current compression status for display/logging. """
return {
" last_prompt_tokens " : self . last_prompt_tokens ,
" threshold_tokens " : self . threshold_tokens ,
" context_length " : self . context_length ,
2026-03-28 14:55:18 -07:00
" usage_percent " : min ( 100 , ( self . last_prompt_tokens / self . context_length * 100 ) ) if self . context_length else 0 ,
2026-02-21 22:31:43 -08:00
" compression_count " : self . compression_count ,
}
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# ------------------------------------------------------------------
# Tool output pruning (cheap pre-pass, no LLM call)
# ------------------------------------------------------------------
2026-02-21 22:31:43 -08:00
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
def _prune_old_tool_results (
self , messages : List [ Dict [ str , Any ] ] , protect_tail_count : int ,
) - > tuple [ List [ Dict [ str , Any ] ] , int ] :
""" Replace old tool result contents with a short placeholder.
Walks backward from the end , protecting the most recent
` ` protect_tail_count ` ` messages . Older tool results get their
content replaced with a placeholder string .
Returns ( pruned_messages , pruned_count ) .
"""
if not messages :
return messages , 0
result = [ m . copy ( ) for m in messages ]
pruned = 0
prune_boundary = len ( result ) - protect_tail_count
for i in range ( prune_boundary ) :
msg = result [ i ]
if msg . get ( " role " ) != " tool " :
continue
content = msg . get ( " content " , " " )
if not content or content == _PRUNED_TOOL_PLACEHOLDER :
continue
# Only prune if the content is substantial (>200 chars)
if len ( content ) > 200 :
result [ i ] = { * * msg , " content " : _PRUNED_TOOL_PLACEHOLDER }
pruned + = 1
return result , pruned
# ------------------------------------------------------------------
# Summarization
# ------------------------------------------------------------------
def _compute_summary_budget ( self , turns_to_summarize : List [ Dict [ str , Any ] ] ) - > int :
2026-03-24 17:45:49 -07:00
""" Scale summary token budget with the amount of content being compressed.
The maximum scales with the model ' s context window (5 % o f context,
capped at ` ` _SUMMARY_TOKENS_CEILING ` ` ) so large - context models get
richer summaries instead of being hard - capped at 8 K tokens .
"""
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
content_tokens = estimate_messages_tokens_rough ( turns_to_summarize )
budget = int ( content_tokens * _SUMMARY_RATIO )
2026-03-24 17:45:49 -07:00
return max ( _MIN_SUMMARY_TOKENS , min ( budget , self . max_summary_tokens ) )
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
def _serialize_for_summary ( self , turns : List [ Dict [ str , Any ] ] ) - > str :
""" Serialize conversation turns into labeled text for the summarizer.
Includes tool call arguments and result content ( up to 3000 chars
per message ) so the summarizer can preserve specific details like
file paths , commands , and outputs .
2026-03-07 11:54:51 -08:00
"""
2026-02-21 22:31:43 -08:00
parts = [ ]
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
for msg in turns :
2026-02-21 22:31:43 -08:00
role = msg . get ( " role " , " unknown " )
2026-03-02 01:35:52 -08:00
content = msg . get ( " content " ) or " "
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Tool results: keep more content than before (3000 chars)
if role == " tool " :
tool_id = msg . get ( " tool_call_id " , " " )
if len ( content ) > 3000 :
content = content [ : 2000 ] + " \n ...[truncated]... \n " + content [ - 800 : ]
parts . append ( f " [TOOL RESULT { tool_id } ]: { content } " )
continue
# Assistant messages: include tool call names AND arguments
if role == " assistant " :
if len ( content ) > 3000 :
content = content [ : 2000 ] + " \n ...[truncated]... \n " + content [ - 800 : ]
tool_calls = msg . get ( " tool_calls " , [ ] )
if tool_calls :
tc_parts = [ ]
for tc in tool_calls :
if isinstance ( tc , dict ) :
fn = tc . get ( " function " , { } )
name = fn . get ( " name " , " ? " )
args = fn . get ( " arguments " , " " )
# Truncate long arguments but keep enough for context
if len ( args ) > 500 :
args = args [ : 400 ] + " ... "
tc_parts . append ( f " { name } ( { args } ) " )
else :
fn = getattr ( tc , " function " , None )
name = getattr ( fn , " name " , " ? " ) if fn else " ? "
tc_parts . append ( f " { name } (...) " )
content + = " \n [Tool calls: \n " + " \n " . join ( tc_parts ) + " \n ] "
parts . append ( f " [ASSISTANT]: { content } " )
continue
# User and other roles
if len ( content ) > 3000 :
content = content [ : 2000 ] + " \n ...[truncated]... \n " + content [ - 800 : ]
2026-02-21 22:31:43 -08:00
parts . append ( f " [ { role . upper ( ) } ]: { content } " )
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
return " \n \n " . join ( parts )
def _generate_summary ( self , turns_to_summarize : List [ Dict [ str , Any ] ] ) - > Optional [ str ] :
""" Generate a structured summary of conversation turns.
Uses a structured template ( Goal , Progress , Decisions , Files , Next Steps )
inspired by Pi - mono and OpenCode . When a previous summary exists ,
generates an iterative update instead of summarizing from scratch .
Returns None if all attempts fail — the caller should drop
the middle turns without a summary rather than inject a useless
placeholder .
"""
summary_budget = self . _compute_summary_budget ( turns_to_summarize )
content_to_summarize = self . _serialize_for_summary ( turns_to_summarize )
if self . _previous_summary :
# Iterative update: preserve existing info, add new progress
prompt = f """ You are updating a context compaction summary. A previous compaction produced the summary below. New conversation turns have occurred since then and need to be incorporated.
PREVIOUS SUMMARY :
{ self . _previous_summary }
NEW TURNS TO INCORPORATE :
{ content_to_summarize }
Update the summary using this exact structure . PRESERVE all existing information that is still relevant . ADD new progress . Move items from " In Progress " to " Done " when completed . Remove information only if it is clearly obsolete .
## Goal
[ What the user is trying to accomplish — preserve from previous summary , update if goal evolved ]
## Constraints & Preferences
[ User preferences , coding style , constraints , important decisions — accumulate across compactions ]
## Progress
### Done
[ Completed work — include specific file paths , commands run , results obtained ]
### In Progress
[ Work currently underway ]
### Blocked
[ Any blockers or issues encountered ]
## Key Decisions
[ Important technical decisions and why they were made ]
## Relevant Files
[ Files read , modified , or created — with brief note on each . Accumulate across compactions . ]
2026-02-21 22:31:43 -08:00
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
## Next Steps
[ What needs to happen next to continue the work ]
2026-02-21 22:31:43 -08:00
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
## Critical Context
[ Any specific values , error messages , configuration details , or data that would be lost without explicit preservation ]
Target ~ { summary_budget } tokens . Be specific — include file paths , command outputs , error messages , and concrete values rather than vague descriptions .
Write only the summary body . Do not include any preamble or prefix . """
else :
# First compaction: summarize from scratch
prompt = f """ Create a structured handoff summary for a later assistant that will continue this conversation after earlier turns are compacted.
2026-02-21 22:31:43 -08:00
TURNS TO SUMMARIZE :
{ content_to_summarize }
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
Use this exact structure :
## Goal
[ What the user is trying to accomplish ]
## Constraints & Preferences
[ User preferences , coding style , constraints , important decisions ]
## Progress
### Done
[ Completed work — include specific file paths , commands run , results obtained ]
### In Progress
[ Work currently underway ]
### Blocked
[ Any blockers or issues encountered ]
## Key Decisions
[ Important technical decisions and why they were made ]
## Relevant Files
[ Files read , modified , or created — with brief note on each ]
## Next Steps
[ What needs to happen next to continue the work ]
## Critical Context
[ Any specific values , error messages , configuration details , or data that would be lost without explicit preservation ]
Target ~ { summary_budget } tokens . Be specific — include file paths , command outputs , error messages , and concrete values rather than vague descriptions . The goal is to prevent the next assistant from repeating work or losing important details .
Write only the summary body . Do not include any preamble or prefix . """
2026-02-21 22:31:43 -08:00
2026-03-04 17:58:09 -08:00
try :
2026-03-11 20:52:19 -07:00
call_kwargs = {
" task " : " compression " ,
" messages " : [ { " role " : " user " , " content " : prompt } ] ,
" temperature " : 0.3 ,
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
" max_tokens " : summary_budget * 2 ,
2026-03-28 14:35:28 -07:00
# timeout resolved from auxiliary.compression.timeout config by call_llm
2026-03-11 20:52:19 -07:00
}
if self . summary_model :
call_kwargs [ " model " ] = self . summary_model
response = call_llm ( * * call_kwargs )
feat(stt): add free local whisper transcription via faster-whisper (#1185)
* fix: Home Assistant event filtering now closed by default
Previously, when no watch_domains or watch_entities were configured,
ALL state_changed events passed through to the agent, causing users
to be flooded with notifications for every HA entity change.
Now events are dropped by default unless the user explicitly configures:
- watch_domains: list of domains to monitor (e.g. climate, light)
- watch_entities: list of specific entity IDs to monitor
- watch_all: true (new option — opt-in to receive all events)
A warning is logged at connect time if no filters are configured,
guiding users to set up their HA platform config.
All 49 gateway HA tests + 52 HA tool tests pass.
* docs: update Home Assistant integration documentation
- homeassistant.md: Fix event filtering docs to reflect closed-by-default
behavior. Add watch_all option. Replace Python dict config example with
YAML. Fix defaults table (was incorrectly showing 'all'). Add required
configuration warning admonition.
- environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section.
- messaging/index.md: Add Home Assistant to description, architecture
diagram, platform toolsets table, and Next Steps links.
* fix(terminal): strip provider env vars from background and PTY subprocesses
Extends the env var blocklist from #1157 to also cover the two remaining
leaky paths in process_registry.py:
- spawn_local() PTY path (line 156)
- spawn_local() background Popen path (line 197)
Both were still using raw os.environ, leaking provider vars to background
processes and interactive PTY sessions. Now uses the same dynamic
_HERMES_PROVIDER_ENV_BLOCKLIST from local.py.
Explicit env_vars passed to spawn_local() still override the blocklist,
matching the existing behavior for callers that intentionally need these.
Gap identified by PR #1004 (@PeterFile).
* feat(delegate): add observability metadata to subagent results
Enrich delegate_task results with metadata from the child AIAgent:
- model: which model the child used
- exit_reason: completed | interrupted | max_iterations
- tokens.input / tokens.output: token counts
- tool_trace: per-tool-call trace with byte sizes and ok/error status
Tool trace uses tool_call_id matching to correctly pair parallel tool
calls with their results, with a fallback for messages without IDs.
Cherry-picked from PR #872 by @omerkaz, with fixes:
- Fixed parallel tool call trace pairing (was always updating last entry)
- Removed redundant 'iterations' field (identical to existing 'api_calls')
- Added test for parallel tool call trace correctness
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
* feat(stt): add free local whisper transcription via faster-whisper
Replace OpenAI-only STT with a dual-provider system mirroring the TTS
architecture (Edge TTS free / ElevenLabs paid):
STT: faster-whisper local (free, default) / OpenAI Whisper API (paid)
Changes:
- tools/transcription_tools.py: Full rewrite with provider dispatch,
config loading, local faster-whisper backend, and OpenAI API backend.
Auto-downloads model (~150MB for 'base') on first voice message.
Singleton model instance reused across calls.
- pyproject.toml: Add faster-whisper>=1.0.0 as core dependency
- hermes_cli/config.py: Expand stt config to match TTS pattern with
provider selection and per-provider model settings
- agent/context_compressor.py: Fix .strip() crash when LLM returns
non-string content (dict from llama.cpp, None). Fixes #1100 partially.
- tests/: 23 new tests for STT providers + 2 for compressor fix
- docs/: Updated Voice & TTS page with STT provider table, model sizes,
config examples, and fallback behavior
Fallback behavior:
- Local not installed → OpenAI API (if key set)
- OpenAI key not set → local whisper (if installed)
- Neither → graceful error message to user
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
---------
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
content = response . choices [ 0 ] . message . content
# Handle cases where content is not a string (e.g., dict from llama.cpp)
if not isinstance ( content , str ) :
content = str ( content ) if content else " "
summary = content . strip ( )
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Store for iterative updates on next compaction
self . _previous_summary = summary
2026-03-14 02:33:31 -07:00
return self . _with_summary_prefix ( summary )
2026-03-11 20:52:19 -07:00
except RuntimeError :
logging . warning ( " Context compression: no provider available for "
" summary. Middle turns will be dropped without summary. " )
return None
except Exception as e :
logging . warning ( " Failed to generate context summary: %s " , e )
return None
2026-03-04 17:58:09 -08:00
2026-03-14 02:33:31 -07:00
@staticmethod
def _with_summary_prefix ( summary : str ) - > str :
""" Normalize summary text to the current compaction handoff format. """
text = ( summary or " " ) . strip ( )
for prefix in ( LEGACY_SUMMARY_PREFIX , SUMMARY_PREFIX ) :
if text . startswith ( prefix ) :
text = text [ len ( prefix ) : ] . lstrip ( )
break
return f " { SUMMARY_PREFIX } \n { text } " if text else SUMMARY_PREFIX
2026-03-07 08:08:00 -08:00
# ------------------------------------------------------------------
# Tool-call / tool-result pair integrity helpers
# ------------------------------------------------------------------
@staticmethod
def _get_tool_call_id ( tc ) - > str :
""" Extract the call ID from a tool_call entry (dict or SimpleNamespace). """
if isinstance ( tc , dict ) :
return tc . get ( " id " , " " )
return getattr ( tc , " id " , " " ) or " "
def _sanitize_tool_pairs ( self , messages : List [ Dict [ str , Any ] ] ) - > List [ Dict [ str , Any ] ] :
""" Fix orphaned tool_call / tool_result pairs after compression.
Two failure modes :
1. A tool * result * references a call_id whose assistant tool_call was
removed ( summarized / truncated ) . The API rejects this with
" No tool call found for function call output with call_id ... " .
2. An assistant message has tool_calls whose results were dropped .
The API rejects this because every tool_call must be followed by
a tool result with the matching call_id .
This method removes orphaned results and inserts stub results for
orphaned calls so the message list is always well - formed .
"""
surviving_call_ids : set = set ( )
for msg in messages :
if msg . get ( " role " ) == " assistant " :
for tc in msg . get ( " tool_calls " ) or [ ] :
cid = self . _get_tool_call_id ( tc )
if cid :
surviving_call_ids . add ( cid )
result_call_ids : set = set ( )
for msg in messages :
if msg . get ( " role " ) == " tool " :
cid = msg . get ( " tool_call_id " )
if cid :
result_call_ids . add ( cid )
# 1. Remove tool results whose call_id has no matching assistant tool_call
orphaned_results = result_call_ids - surviving_call_ids
if orphaned_results :
messages = [
m for m in messages
if not ( m . get ( " role " ) == " tool " and m . get ( " tool_call_id " ) in orphaned_results )
]
if not self . quiet_mode :
logger . info ( " Compression sanitizer: removed %d orphaned tool result(s) " , len ( orphaned_results ) )
# 2. Add stub results for assistant tool_calls whose results were dropped
missing_results = surviving_call_ids - result_call_ids
if missing_results :
patched : List [ Dict [ str , Any ] ] = [ ]
for msg in messages :
patched . append ( msg )
if msg . get ( " role " ) == " assistant " :
for tc in msg . get ( " tool_calls " ) or [ ] :
cid = self . _get_tool_call_id ( tc )
if cid in missing_results :
patched . append ( {
" role " : " tool " ,
" content " : " [Result from earlier conversation — see context summary above] " ,
" tool_call_id " : cid ,
} )
messages = patched
if not self . quiet_mode :
logger . info ( " Compression sanitizer: added %d stub tool result(s) " , len ( missing_results ) )
return messages
def _align_boundary_forward ( self , messages : List [ Dict [ str , Any ] ] , idx : int ) - > int :
""" Push a compress-start boundary forward past any orphan tool results.
If ` ` messages [ idx ] ` ` is a tool result , slide forward until we hit a
non - tool message so we don ' t start the summarised region mid-group.
"""
while idx < len ( messages ) and messages [ idx ] . get ( " role " ) == " tool " :
idx + = 1
return idx
def _align_boundary_backward ( self , messages : List [ Dict [ str , Any ] ] , idx : int ) - > int :
""" Pull a compress-end boundary backward to avoid splitting a
tool_call / result group .
2026-03-18 15:22:51 -07:00
If the boundary falls in the middle of a tool - result group ( i . e .
there are consecutive tool messages before ` ` idx ` ` ) , walk backward
past all of them to find the parent assistant message . If found ,
move the boundary before the assistant so the entire
assistant + tool_results group is included in the summarised region
rather than being split ( which causes silent data loss when
` ` _sanitize_tool_pairs ` ` removes the orphaned tail results ) .
2026-03-07 08:08:00 -08:00
"""
if idx < = 0 or idx > = len ( messages ) :
return idx
2026-03-18 15:22:51 -07:00
# Walk backward past consecutive tool results
check = idx - 1
while check > = 0 and messages [ check ] . get ( " role " ) == " tool " :
check - = 1
# If we landed on the parent assistant with tool_calls, pull the
# boundary before it so the whole group gets summarised together.
if check > = 0 and messages [ check ] . get ( " role " ) == " assistant " and messages [ check ] . get ( " tool_calls " ) :
idx = check
2026-03-07 08:08:00 -08:00
return idx
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# ------------------------------------------------------------------
# Tail protection by token budget
# ------------------------------------------------------------------
def _find_tail_cut_by_tokens (
self , messages : List [ Dict [ str , Any ] ] , head_end : int ,
2026-03-24 17:45:49 -07:00
token_budget : int | None = None ,
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
) - > int :
""" Walk backward from the end of messages, accumulating tokens until
the budget is reached . Returns the index where the tail starts .
2026-03-24 17:45:49 -07:00
` ` token_budget ` ` defaults to ` ` self . tail_token_budget ` ` which is
derived from ` ` summary_target_ratio * context_length ` ` , so it
scales automatically with the model ' s context window.
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
Never cuts inside a tool_call / result group . Falls back to the old
` ` protect_last_n ` ` if the budget would protect fewer messages .
"""
2026-03-24 17:45:49 -07:00
if token_budget is None :
token_budget = self . tail_token_budget
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
n = len ( messages )
min_tail = self . protect_last_n
accumulated = 0
cut_idx = n # start from beyond the end
for i in range ( n - 1 , head_end - 1 , - 1 ) :
msg = messages [ i ]
content = msg . get ( " content " ) or " "
msg_tokens = len ( content ) / / _CHARS_PER_TOKEN + 10 # +10 for role/metadata
# Include tool call arguments in estimate
for tc in msg . get ( " tool_calls " ) or [ ] :
if isinstance ( tc , dict ) :
args = tc . get ( " function " , { } ) . get ( " arguments " , " " )
msg_tokens + = len ( args ) / / _CHARS_PER_TOKEN
if accumulated + msg_tokens > token_budget and ( n - i ) > = min_tail :
break
accumulated + = msg_tokens
cut_idx = i
# Ensure we protect at least protect_last_n messages
fallback_cut = n - min_tail
if cut_idx > fallback_cut :
cut_idx = fallback_cut
# If the token budget would protect everything (small conversations),
# fall back to the fixed protect_last_n approach so compression can
# still remove middle turns.
if cut_idx < = head_end :
cut_idx = fallback_cut
# Align to avoid splitting tool groups
cut_idx = self . _align_boundary_backward ( messages , cut_idx )
return max ( cut_idx , head_end + 1 )
# ------------------------------------------------------------------
# Main compression entry point
# ------------------------------------------------------------------
2026-02-21 22:31:43 -08:00
def compress ( self , messages : List [ Dict [ str , Any ] ] , current_tokens : int = None ) - > List [ Dict [ str , Any ] ] :
""" Compress conversation messages by summarizing middle turns.
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
Algorithm :
1. Prune old tool results ( cheap pre - pass , no LLM call )
2. Protect head messages ( system prompt + first exchange )
3. Find tail boundary by token budget ( ~ 20 K tokens of recent context )
4. Summarize middle turns with structured LLM prompt
5. On re - compression , iteratively update the previous summary
2026-03-07 08:08:00 -08:00
After compression , orphaned tool_call / tool_result pairs are cleaned
up so the API never receives mismatched IDs .
2026-02-21 22:31:43 -08:00
"""
n_messages = len ( messages )
if n_messages < = self . protect_first_n + self . protect_last_n + 1 :
if not self . quiet_mode :
2026-03-17 16:31:01 -07:00
logger . warning (
" Cannot compress: only %d messages (need > %d ) " ,
n_messages ,
self . protect_first_n + self . protect_last_n + 1 ,
)
2026-02-21 22:31:43 -08:00
return messages
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
display_tokens = current_tokens if current_tokens else self . last_prompt_tokens or estimate_messages_tokens_rough ( messages )
# Phase 1: Prune old tool results (cheap, no LLM call)
messages , pruned_count = self . _prune_old_tool_results (
messages , protect_tail_count = self . protect_last_n * 3 ,
)
if pruned_count and not self . quiet_mode :
logger . info ( " Pre-compression: pruned %d old tool result(s) " , pruned_count )
2026-02-21 22:31:43 -08:00
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Phase 2: Determine boundaries
compress_start = self . protect_first_n
2026-03-07 08:08:00 -08:00
compress_start = self . _align_boundary_forward ( messages , compress_start )
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Use token-budget tail protection instead of fixed message count
compress_end = self . _find_tail_cut_by_tokens ( messages , compress_start )
2026-03-07 08:08:00 -08:00
if compress_start > = compress_end :
return messages
2026-02-21 22:31:43 -08:00
turns_to_summarize = messages [ compress_start : compress_end ]
if not self . quiet_mode :
2026-03-17 16:31:01 -07:00
logger . info (
" Context compression triggered ( %d tokens >= %d threshold) " ,
display_tokens ,
self . threshold_tokens ,
)
logger . info (
" Model context limit: %d tokens ( %.0f %% = %d ) " ,
self . context_length ,
self . threshold_percent * 100 ,
self . threshold_tokens ,
)
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
tail_msgs = n_messages - compress_end
2026-03-17 16:31:01 -07:00
logger . info (
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
" Summarizing turns %d - %d ( %d turns), protecting %d head + %d tail messages " ,
2026-03-17 16:31:01 -07:00
compress_start + 1 ,
compress_end ,
len ( turns_to_summarize ) ,
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
compress_start ,
tail_msgs ,
2026-03-17 16:31:01 -07:00
)
2026-02-21 22:31:43 -08:00
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Phase 3: Generate structured summary
2026-02-21 22:31:43 -08:00
summary = self . _generate_summary ( turns_to_summarize )
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Phase 4: Assemble compressed message list
2026-02-21 22:31:43 -08:00
compressed = [ ]
for i in range ( compress_start ) :
msg = messages [ i ] . copy ( )
if i == 0 and msg . get ( " role " ) == " system " and self . compression_count == 0 :
2026-03-14 02:33:31 -07:00
msg [ " content " ] = (
( msg . get ( " content " ) or " " )
+ " \n \n [Note: Some earlier conversation turns have been compacted into a handoff summary to preserve context space. The current session state may still reflect earlier work, so build on that summary and state rather than re-doing work.] "
)
2026-02-21 22:31:43 -08:00
compressed . append ( msg )
2026-03-17 05:18:52 -07:00
_merge_summary_into_tail = False
2026-03-07 11:54:51 -08:00
if summary :
2026-03-08 23:09:04 -07:00
last_head_role = messages [ compress_start - 1 ] . get ( " role " , " user " ) if compress_start > 0 else " user "
2026-03-17 04:08:37 -07:00
first_tail_role = messages [ compress_end ] . get ( " role " , " user " ) if compress_end < n_messages else " user "
# Pick a role that avoids consecutive same-role with both neighbors.
# Priority: avoid colliding with head (already committed), then tail.
if last_head_role in ( " assistant " , " tool " ) :
summary_role = " user "
else :
summary_role = " assistant "
# If the chosen role collides with the tail AND flipping wouldn't
# collide with the head, flip it.
if summary_role == first_tail_role :
flipped = " assistant " if summary_role == " user " else " user "
if flipped != last_head_role :
summary_role = flipped
2026-03-17 05:18:52 -07:00
else :
# Both roles would create consecutive same-role messages
# (e.g. head=assistant, tail=user — neither role works).
# Merge the summary into the first tail message instead
# of inserting a standalone message that breaks alternation.
_merge_summary_into_tail = True
if not _merge_summary_into_tail :
compressed . append ( { " role " : summary_role , " content " : summary } )
2026-03-07 11:54:51 -08:00
else :
if not self . quiet_mode :
2026-03-17 16:31:01 -07:00
logger . warning ( " No summary model available — middle turns dropped without summary " )
2026-02-21 22:31:43 -08:00
for i in range ( compress_end , n_messages ) :
2026-03-17 05:18:52 -07:00
msg = messages [ i ] . copy ( )
if _merge_summary_into_tail and i == compress_end :
original = msg . get ( " content " ) or " "
msg [ " content " ] = summary + " \n \n " + original
_merge_summary_into_tail = False
compressed . append ( msg )
2026-02-21 22:31:43 -08:00
self . compression_count + = 1
2026-03-07 08:08:00 -08:00
compressed = self . _sanitize_tool_pairs ( compressed )
2026-02-21 22:31:43 -08:00
if not self . quiet_mode :
new_estimate = estimate_messages_tokens_rough ( compressed )
saved_estimate = display_tokens - new_estimate
2026-03-17 16:31:01 -07:00
logger . info (
" Compressed: %d -> %d messages (~ %d tokens saved) " ,
n_messages ,
len ( compressed ) ,
saved_estimate ,
)
logger . info ( " Compression # %d complete " , self . compression_count )
2026-02-21 22:31:43 -08:00
return compressed