Two fixes for context overflow handling: 1. Proactive compression after tool execution: The compression check now estimates the next prompt size using real token counts from the last API response (prompt_tokens + completion_tokens) plus a conservative estimate of newly appended tool results (chars // 3 for JSON-heavy content). Previously, should_compress() only checked last_prompt_tokens which didn't account for tool results — so a 130k prompt + 100k chars of tool output would pass the 140k threshold check but fail the 200k API limit. 2. Safety net: Added 'prompt is too long' to context-length error detection phrases. Anthropic returns 'prompt is too long: N tokens > M maximum' on HTTP 400, which wasn't matched by existing phrases. This ensures compression fires even if the proactive check underestimates. Fixes #813
19 KiB
19 KiB