PR #3566 intentionally routes suppressed content to stream_delta_callback
when tool calls are present, so reasoning tag extraction can fire during
streaming. The test was still asserting the old behavior where content
after tool calls was fully suppressed from the callback.
Updated the assertion to match: content IS delivered to the callback
(for tag extraction), with display-level suppression handled by the
CLI's _stream_delta.
Root cause: Anthropic buffers entire tool call arguments and goes silent
for minutes while thinking (verified: 167s gap with zero SSE events on
direct API). OpenRouter's upstream proxy times out after ~125s of
inactivity and drops the connection with 'Network connection lost'.
Fix: Send the x-anthropic-beta: fine-grained-tool-streaming-2025-05-14
header for Claude models on OpenRouter. This makes Anthropic stream
tool call arguments token-by-token instead of buffering them, keeping
the connection alive through OpenRouter's proxy.
Live-tested: the exact prompt that consistently failed at ~128s now
completes successfully — 2,972 lines written, 49K tokens, 8 minutes.
Additional improvements:
1. Send explicit max_tokens for Claude through OpenRouter. Without it,
OpenRouter defaults to 65,536 (confirmed via echo_upstream_body) —
only half of Opus 4.6's 128K limit.
2. Classify SSE 'Network connection lost' as retryable in the streaming
inner retry loop. The OpenAI SDK raises APIError from SSE error
events, which was bypassing our transient error retry logic.
3. Actionable diagnostic guidance when stream-drop retries exhaust.
After streaming retries are exhausted on transient errors, fall back to
non-streaming instead of propagating the error. Also fall back for any
other pre-delivery stream error (not just 'streaming not supported').
Added user-facing message when streaming is not supported by a model/
provider, directing users to set display.streaming: false in config.yaml
to avoid the fallback delay.
Cherry-picked from PR #3008 by kshitijk4poor. Added UX message for
streaming-not-supported detection.
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Previously the fallback only triggered on specific error keywords like
'streaming is not supported'. Many third-party providers have partial
or broken streaming — rejecting stream=True, crashing on stream_options,
dropping connections mid-stream, returning malformed chunks, etc.
Now: any exception during the streaming API call triggers an automatic
fallback to the standard non-streaming request path. The error is logged
at INFO level for diagnostics but never surfaces to the user. If the
fallback also fails, THAT error propagates normally.
This ensures streaming is additive — it improves UX when it works but
never breaks providers that don't support it.
Tests: 2 new (any-error fallback, double-failure propagation), 15 total.