Files
hermes-agent/tests/run_agent/test_anthropic_truncation_continuation.py

115 lines
4.6 KiB
Python
Raw Normal View History

fix(anthropic): complete third-party Anthropic-compatible provider support (#12846) Third-party gateways that speak the native Anthropic protocol (MiniMax, Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end with the same feature set as direct api.anthropic.com callers. Synthesizes eight stale community PRs into one consolidated change. Five fixes: - URL detection: consolidate three inline `endswith("/anthropic")` checks in runtime_provider.py into the shared _detect_api_mode_for_url helper. Third-party /anthropic endpoints now auto-resolve to api_mode=anthropic_messages via one code path instead of three. - OAuth leak-guard: all five sites that assign `_is_anthropic_oauth` (__init__, switch_model, _try_refresh_anthropic_client_credentials, _swap_credential, _try_activate_fallback) now gate on `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips Claude-Code identity injection on third-party endpoints. Previously only 2 of 5 sites were guarded. - Prompt caching: new method `_anthropic_prompt_cache_policy()` returns `(should_cache, use_native_layout)` per endpoint. Replaces three inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')` call-site flag. Native Anthropic and third-party Anthropic gateways both get the native cache_control layout; OpenRouter gets envelope layout. Layout is persisted in `_primary_runtime` so fallback restoration preserves the per-endpoint choice. - Auxiliary client: `_try_custom_endpoint` honors `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient` instead of silently downgrading to an OpenAI-wire client. Degrades gracefully to OpenAI-wire when the anthropic SDK isn't installed. - Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py) clears stale `api_key`/`api_mode` when switching to a built-in provider, so a previous MiniMax custom endpoint's credentials can't leak into a later OpenRouter session. - Truncation continuation: length-continuation and tool-call-truncation retry now cover `anthropic_messages` in addition to `chat_completions` and `bedrock_converse`. Reuses the existing `_build_assistant_message` path via `normalize_anthropic_response()` so the interim message shape is byte-identical to the non-truncated path. Tests: 6 new files, 42 test cases. Targeted run + tests/run_agent, tests/agent, tests/hermes_cli all pass (4554 passed). Synthesized from (credits preserved via Co-authored-by trailers): #7410 @nocoo — URL detection helper #7393 @keyuyuan — OAuth 5-site guard #7367 @n-WN — OAuth guard (narrower cousin, kept comment) #8636 @sgaofen — caching helper + native-vs-proxy layout split #10954 @Only-Code-A — caching on anthropic_messages+Claude #7648 @zhongyueming1121 — aux client anthropic_messages branch #6096 @hansnow — /model switch clears stale api_mode #9691 @TroyMitchell911 — anthropic_messages truncation continuation Closes: #7366, #8294 (third-party Anthropic identity + caching). Supersedes: #7410, #7367, #7393, #8636, #10954, #7648, #6096, #9691. Rejects: #9621 (OpenAI-wire caching with incomplete blocklist — risky), #7242 (superseded by #9691, stale branch), #8321 (targets smart_model_routing which was removed in #12732). Co-authored-by: nocoo <nocoo@users.noreply.github.com> Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com> Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com> Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com> Co-authored-by: Only-Code-A <bxzt2006@163.com> Co-authored-by: zhongyueming <mygamez@163.com> Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com> Co-authored-by: Troy Mitchell <i@troy-y.org>
2026-04-19 22:43:09 -07:00
"""Regression test for anthropic_messages truncation continuation.
When an Anthropic response hits ``stop_reason: max_tokens`` (mapped to
``finish_reason == 'length'`` in run_agent), the agent must retry with
a continuation prompt the same behavior it has always had for
chat_completions and bedrock_converse. Before this PR, the
``if self.api_mode in ('chat_completions', 'bedrock_converse'):`` guard
silently dropped Anthropic-wire truncations on the floor, returning a
half-finished response with no retry.
We don't exercise the full agent loop here (it's 3000 lines of inference,
streaming, plugin hooks, etc.) instead we verify the normalization
adapter produces exactly the shape the continuation block now consumes.
"""
from __future__ import annotations
from types import SimpleNamespace
import pytest
def _make_anthropic_text_block(text: str) -> SimpleNamespace:
return SimpleNamespace(type="text", text=text)
def _make_anthropic_tool_use_block(name: str = "my_tool") -> SimpleNamespace:
return SimpleNamespace(
type="tool_use",
id="toolu_01",
name=name,
input={"foo": "bar"},
)
def _make_anthropic_response(blocks, stop_reason: str = "max_tokens"):
return SimpleNamespace(
id="msg_01",
type="message",
role="assistant",
model="claude-sonnet-4-6",
content=blocks,
stop_reason=stop_reason,
stop_sequence=None,
usage=SimpleNamespace(input_tokens=100, output_tokens=200),
)
class TestTruncatedAnthropicResponseNormalization:
"""normalize_anthropic_response() gives us the shape _build_assistant_message expects."""
def test_text_only_truncation_produces_text_content_no_tool_calls(self):
"""Pure-text Anthropic truncation → continuation path should fire."""
from agent.anthropic_adapter import normalize_anthropic_response
response = _make_anthropic_response(
[_make_anthropic_text_block("partial response that was cut off")]
)
msg, finish = normalize_anthropic_response(response)
# The continuation block checks these two attributes:
# assistant_message.content → appended to truncated_response_prefix
# assistant_message.tool_calls → guards the text-retry branch
assert msg.content is not None
assert "partial response" in msg.content
assert not msg.tool_calls, (
"Pure-text truncation must have no tool_calls so the text-continuation "
"branch (not the tool-retry branch) fires"
)
assert finish == "length", "max_tokens stop_reason must map to OpenAI-style 'length'"
def test_truncated_tool_call_produces_tool_calls(self):
"""Tool-use truncation → tool-call retry path should fire."""
from agent.anthropic_adapter import normalize_anthropic_response
response = _make_anthropic_response(
[
_make_anthropic_text_block("thinking..."),
_make_anthropic_tool_use_block(),
]
)
msg, finish = normalize_anthropic_response(response)
assert bool(msg.tool_calls), (
"Truncation mid-tool_use must expose tool_calls so the "
"tool-call retry branch fires instead of text continuation"
)
assert finish == "length"
def test_empty_content_does_not_crash(self):
"""Empty response.content — defensive: treat as a truncation with no text."""
from agent.anthropic_adapter import normalize_anthropic_response
response = _make_anthropic_response([])
msg, finish = normalize_anthropic_response(response)
# Depending on the adapter, content may be "" or None — both are
# acceptable; what matters is no exception.
assert msg is not None
assert not msg.tool_calls
class TestContinuationLogicBranching:
"""Symbolic check that the api_mode gate now includes anthropic_messages."""
@pytest.mark.parametrize("api_mode", ["chat_completions", "bedrock_converse", "anthropic_messages"])
def test_all_three_api_modes_hit_continuation_branch(self, api_mode):
# The guard in run_agent.py is:
# if self.api_mode in ("chat_completions", "bedrock_converse", "anthropic_messages"):
assert api_mode in ("chat_completions", "bedrock_converse", "anthropic_messages")
def test_codex_responses_still_excluded(self):
# codex_responses has its own truncation path (not continuation-based)
# and should NOT be routed through the shared block.
assert "codex_responses" not in ("chat_completions", "bedrock_converse", "anthropic_messages")