Two fixes for the case where a user switches to a model with a smaller
context window while having a large existing session:
1. Preflight compression in run_conversation(): Before the main loop,
estimate tokens of loaded history + system prompt. If it exceeds the
model's compression threshold (85% of context), compress proactively
with up to 3 passes. This naturally handles model switches because
the gateway creates a fresh AIAgent per message with the current
model's context length.
2. Error handler reordering: Context-length errors (400 with 'maximum
context length' etc.) are now checked BEFORE the generic 4xx handler.
Previously, OpenRouter's 400-status context-length errors were caught
as non-retryable client errors and aborted immediately, never reaching
the compression+retry logic.
Reported by Sonicrida on Discord: 840-message session (2MB+) crashed
after switching from a large-context model to minimax via OpenRouter.
The 413 "Request Entity Too Large" error from the LLM API was caught by the
generic 4xx handler which aborts immediately. This is wrong for 413 — it's a
payload-size issue that can be resolved by compressing conversation history.
- Intercept 413 before the generic 4xx block and route to _compress_context
- Exclude 413 from generic is_client_error detection
- Add 'request entity too large' to context-length phrases as safety net
- Add tests for 413 compression behavior
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>