Fixes from comprehensive code review and cross-referencing with
clawdbot/OpenCode implementations:
CRITICAL:
- Add one-shot guard (anthropic_auth_retry_attempted) to prevent
infinite 401 retry loops when credentials keep changing
- Fix _is_oauth_token(): managed keys from ~/.claude.json are NOT
regular API keys (don't start with sk-ant-api). Inverted the logic:
only sk-ant-api* is treated as API key auth, everything else uses
Bearer auth + oauth beta headers
HIGH:
- Wrap json.loads(args) in try/except in message conversion — malformed
tool_call arguments no longer crash the entire conversation
- Raise AuthError in runtime_provider when no Anthropic token found
(was silently passing empty string, causing confusing API errors)
- Remove broken _try_anthropic() from auxiliary vision chain — the
centralized router creates an OpenAI client for api_key providers
which doesn't work with Anthropic's Messages API
MEDIUM:
- Handle empty assistant message content — Anthropic rejects empty
content blocks, now inserts '(empty)' placeholder
- Fix setup.py existing_key logic — set to 'KEEP' sentinel instead
of None to prevent falling through to the auth choice prompt
- Add debug logging to _fetch_anthropic_models on failure
Tests: 43 adapter tests (2 new for token detection), 3197 total passed
Feedback fixes:
1. Revert _convert_vision_content — vision is handled by the vision_analyze
tool, not by converting image blocks inline in conversation messages.
Removed the function and its tests.
2. Add Anthropic to 'hermes model' (cmd_model in main.py):
- Added to provider_labels dict
- Added to providers selection list
- Added _model_flow_anthropic() with Claude Code credential auto-detection,
API key prompting, and model selection from catalog.
3. Wire up Anthropic as a vision-capable auxiliary provider:
- Added _try_anthropic() to auxiliary_client.py using claude-sonnet-4
as the vision model (Claude natively supports multimodal)
- Added to the get_vision_auxiliary_client() auto-detection chain
(after OpenRouter/Nous, before Codex/custom)
Cache tracking note: the Anthropic cache metrics branch in run_agent.py
(cache_read_input_tokens / cache_creation_input_tokens) is in the correct
place — it's response-level parsing, same location as the existing
OpenRouter cache tracking. auxiliary_client.py has no cache tracking.
* fix: use session_key instead of chat_id for adapter interrupt lookups
monitor_for_interrupt() in _run_agent was using source.chat_id to query
the adapter's has_pending_interrupt() and get_pending_message() methods.
But the adapter stores interrupt events under build_session_key(source),
which produces a different string (e.g. 'agent:main:telegram:dm' vs '123456').
This key mismatch meant the interrupt was never detected through the
adapter path, which is the only active interrupt path for all adapter-based
platforms (Telegram, Discord, Slack, etc.). The gateway-level interrupt
path (in dispatch_message) is unreachable because the adapter intercepts
the 2nd message in handle_message() before it reaches dispatch_message().
Result: sending a new message while subagents were running had no effect —
the interrupt was silently lost.
Fix: replace all source.chat_id references in the interrupt-related code
within _run_agent() with the session_key parameter, which matches the
adapter's storage keys.
Also adds regression tests verifying session_key vs chat_id consistency.
* debug: add file-based logging to CLI interrupt path
Temporary instrumentation to diagnose why message-based interrupts
don't seem to work during subagent execution. Logs to
~/.hermes/interrupt_debug.log (immune to redirect_stdout).
Two log points:
1. When Enter handler puts message into _interrupt_queue
2. When chat() reads it and calls agent.interrupt()
This will reveal whether the message reaches the queue and
whether the interrupt is actually fired.
* fix: accept unlisted models with warning instead of rejecting
validate_requested_model() previously hard-rejected any model not found
in the provider's API listing. This was too aggressive — users on higher
plan tiers (e.g. Z.AI Pro/Max) may have access to models not shown in
the public listing (like glm-5 on coding endpoints).
Changes:
- validate_requested_model: accept unlisted models with a warning note
instead of blocking. The model is saved to config and used immediately.
- Z.AI setup: always offer glm-5 in the model list regardless of whether
a coding endpoint was detected. Pro/Max plans support it.
- Z.AI setup detection message: softened from 'GLM-5 is not available'
to 'GLM-5 may still be available depending on your plan tier'
After studying clawdbot (OpenClaw) and OpenCode implementations:
## Beta headers
- Add interleaved-thinking-2025-05-14 and fine-grained-tool-streaming-2025-05-14
as common betas (sent with ALL auth types, not just OAuth)
- OAuth tokens additionally get oauth-2025-04-20
- API keys now also get the common betas (previously got none)
## Vision/image support
- Add _convert_vision_content() to convert OpenAI multimodal format
(image_url blocks) to Anthropic format (image blocks with base64/url source)
- Handles both data: URIs (base64) and regular URLs
## Role alternation enforcement
- Anthropic strictly rejects consecutive same-role messages (400 error)
- Add post-processing step that merges consecutive user/assistant messages
- Handles string, list, and mixed content types during merge
## Tool choice support
- Add tool_choice parameter to build_anthropic_kwargs()
- Maps OpenAI values: auto→auto, required→any, none→omit, name→tool
## Cache metrics tracking
- Anthropic uses cache_read_input_tokens / cache_creation_input_tokens
(different from OpenRouter's prompt_tokens_details.cached_tokens)
- Add api_mode-aware branch in run_agent.py cache stats logging
## Credential refresh on 401
- On 401 error during anthropic_messages mode, re-read credentials
via resolve_anthropic_token() (picks up refreshed Claude Code tokens)
- Rebuild client if new token differs from current one
- Follows same pattern as Codex/Nous 401 refresh handlers
## Tests
- 44 adapter tests (8 new: vision conversion, role alternation, tool choice)
- Updated beta header tests to verify new structure
- Full suite: 3198 passed, 0 regressions
* fix: use session_key instead of chat_id for adapter interrupt lookups
monitor_for_interrupt() in _run_agent was using source.chat_id to query
the adapter's has_pending_interrupt() and get_pending_message() methods.
But the adapter stores interrupt events under build_session_key(source),
which produces a different string (e.g. 'agent:main:telegram:dm' vs '123456').
This key mismatch meant the interrupt was never detected through the
adapter path, which is the only active interrupt path for all adapter-based
platforms (Telegram, Discord, Slack, etc.). The gateway-level interrupt
path (in dispatch_message) is unreachable because the adapter intercepts
the 2nd message in handle_message() before it reaches dispatch_message().
Result: sending a new message while subagents were running had no effect —
the interrupt was silently lost.
Fix: replace all source.chat_id references in the interrupt-related code
within _run_agent() with the session_key parameter, which matches the
adapter's storage keys.
Also adds regression tests verifying session_key vs chat_id consistency.
* debug: add file-based logging to CLI interrupt path
Temporary instrumentation to diagnose why message-based interrupts
don't seem to work during subagent execution. Logs to
~/.hermes/interrupt_debug.log (immune to redirect_stdout).
Two log points:
1. When Enter handler puts message into _interrupt_queue
2. When chat() reads it and calls agent.interrupt()
This will reveal whether the message reaches the queue and
whether the interrupt is actually fired.
Follow-up to PR #862 (local skills classification by arceus77-7):
- Remove unnecessary isinstance guard on _read_manifest() return value —
it always returns Dict[str, str], so set() on it suffices.
- Extract repeated hub-dir monkeypatching into a shared pytest fixture (hub_env).
- Add three_source_env fixture for source-classification tests.
- Add _read_manifest monkeypatch to test_do_list_initializes_hub_dir
(was fragile — relied on empty skills list masking the real manifest).
- Add test coverage for --source hub and --source builtin filters.
- Extract _capture() helper to reduce console/StringIO boilerplate.
5 tests, all green.
When a dangerous command is detected and the user is prompted for
approval, long commands are truncated (80 chars in fallback, 70 chars
in the TUI). Users had no way to see the full command before deciding.
This adds a 'View full command' option across all approval interfaces:
- CLI fallback (tools/approval.py): [v]iew option in the prompt menu.
Shows the full command and re-prompts for approval decision.
- CLI TUI (cli.py): 'Show full command' choice in the arrow-key
selection panel. Expands the command display in-place and removes
the view option after use.
- CLI callbacks (callbacks.py): 'view' choice added to the list when
the command exceeds 70 characters.
- Gateway (gateway/run.py): 'full', 'show', 'view' responses reveal
the complete command while keeping the approval pending.
Includes 7 new tests covering view-then-approve, view-then-deny,
short command fallthrough, and double-view behavior.
Closes community feedback about the 80-char cap on dangerous commands.
* fix: /reasoning command output ordering, display, and inline think extraction
Three issues with the /reasoning command:
1. Output interleaving: The command echo used print() while feedback
used _cprint(), causing them to render out-of-order under
prompt_toolkit's patch_stdout. Changed echo to use _cprint() so
all output renders through the same path in correct order.
2. Reasoning display not working: /reasoning show toggled a flag
but reasoning never appeared for models that embed thinking in
inline <think> blocks rather than structured API fields. Added
fallback extraction in _build_assistant_message to capture
<think> block content as reasoning when no structured reasoning
fields (reasoning, reasoning_content, reasoning_details) are
present. This feeds into both the reasoning callback (during
tool loops) and the post-response reasoning box display.
3. Feedback clarity: Added checkmarks to confirm actions, persisted
show/hide to config (was session-only before), and aligned the
status display for readability.
Tests: 7 new tests for inline think block extraction (41 total).
* feat: add /reasoning command to gateway (Telegram/Discord/etc)
The /reasoning command only existed in the CLI — messaging platforms
had no way to view or change reasoning settings. This adds:
1. /reasoning command handler in the gateway:
- No args: shows current effort level and display state
- /reasoning <level>: sets reasoning effort (none/low/medium/high/xhigh)
- /reasoning show|hide: toggles reasoning display in responses
- All changes saved to config.yaml immediately
2. Reasoning display in gateway responses:
- When show_reasoning is enabled, prepends a 'Reasoning' block
with the model's last_reasoning content before the response
- Collapses long reasoning (>15 lines) to keep messages readable
- Uses last_reasoning from run_conversation result dict
3. Plumbing:
- Added _show_reasoning attribute loaded from config at startup
- Propagated last_reasoning through _run_agent return dict
- Added /reasoning to help text and known_commands set
- Uses getattr for _show_reasoning to handle test stubs
4 test files spawn real processes or make live API calls that hang
indefinitely in batch/CI runs. Skip them with pytestmark:
- tests/tools/test_code_execution.py (subprocess spawns)
- tests/tools/test_file_tools_live.py (live LocalEnvironment)
- tests/test_413_compression.py (blocks on process)
- tests/test_agent_loop_tool_calling.py (live OpenRouter API calls)
Also added global 30s signal.alarm timeout in conftest.py as a safety
net, and removed stale nous-api test that hung on OAuth browser login.
Suite now runs in ~55s with no hangs.
- Remove test_nous_api_setup_preserves_model_provider_metadata (nous-api
provider no longer exists, test selected Nous OAuth which hangs waiting
for browser login)
- Fix test_nous_oauth_setup prompt_choice index: 1→0 (Nous Portal is
now first option after nous-api removal)
- gateway/run.py: Take main's _resolve_gateway_model() helper
- hermes_cli/setup.py: Re-apply nous-api removal after merge brought
it back. Fix provider_idx offset (Custom is now index 3, not 4).
- tests/hermes_cli/test_setup.py: Fix custom setup test index (3→4)
Both /model and /provider now show the same unified display:
Current: anthropic/claude-opus-4.6 via OpenRouter
Authenticated providers & models:
[openrouter] ← active
anthropic/claude-opus-4.6 ← current
anthropic/claude-sonnet-4.5
...
[nous]
claude-opus-4-6
gemini-3-flash
...
[openai-codex]
gpt-5.2-codex
gpt-5.1-codex-mini
...
Not configured: Z.AI / GLM, Kimi / Moonshot, ...
Switch model: /model <model-name>
Switch provider: /model <provider>:<model-name>
Example: /model nous:claude-opus-4-6
Users can see all authenticated providers and their models at a glance,
making it easy to switch mid-conversation.
Also added curated model lists for Nous Portal and OpenAI Codex to
hermes_cli/models.py.
Model selection now comes exclusively from config.yaml (set via
'hermes model' or 'hermes setup'). The LLM_MODEL env var is no longer
read or written anywhere in production code.
Why: env vars are per-process/per-user and would conflict in
multi-agent or multi-tenant setups. Config.yaml is file-based and
can be scoped per-user or eventually per-session.
Changes:
- cli.py: Read model from CLI_CONFIG only, not LLM_MODEL/OPENAI_MODEL
- hermes_cli/auth.py: _save_model_choice() no longer writes LLM_MODEL
to .env
- hermes_cli/setup.py: Remove 12 save_env_value('LLM_MODEL', ...)
calls from all provider setup flows
- gateway/run.py: Remove LLM_MODEL fallback (HERMES_MODEL still works
for gateway process runtime)
- cron/scheduler.py: Same
- agent/auxiliary_client.py: Remove LLM_MODEL from custom endpoint
model detection
Phase 2 of the provider router migration — route the main agent's
client construction and fallback activation through
resolve_provider_client() instead of duplicated ad-hoc logic.
run_agent.py:
- __init__: When no explicit api_key/base_url, use
resolve_provider_client(provider, raw_codex=True) for client
construction. Explicit creds (from CLI/gateway runtime provider)
still construct directly.
- _try_activate_fallback: Replace _resolve_fallback_credentials and
its duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS
dicts with a single resolve_provider_client() call. The router
handles all provider types (API-key, OAuth, Codex) centrally.
- Remove _resolve_fallback_credentials method and both fallback dicts.
agent/auxiliary_client.py:
- Add raw_codex parameter to resolve_provider_client(). When True,
returns the raw OpenAI client for Codex providers instead of wrapping
in CodexAuxiliaryClient. The main agent needs this for direct
responses.stream() access.
3251 passed, 2 pre-existing unrelated failures.
Update 14 test files to use the new call_llm/async_call_llm mock
patterns instead of the old get_text_auxiliary_client/
get_vision_auxiliary_client tuple returns.
- vision_tools tests: mock async_call_llm instead of _aux_async_client
- browser tests: mock call_llm instead of _aux_vision_client
- flush_memories tests: mock call_llm instead of get_text_auxiliary_client
- session_search tests: mock async_call_llm with RuntimeError
- mcp_tool tests: fix whitelist model config, use side_effect for
multi-response tests
- auxiliary_config_bridge: update for model=None (resolved in router)
3251 passed, 2 pre-existing unrelated failures.
Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
1. Resolve provider + model from task config or explicit args
2. Get or create a cached client for that provider
3. Format request args (max_tokens handling, provider extra_body)
4. Make the API call with max_tokens/max_completion_tokens retry
5. Return the response
Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.
Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')
Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).
When a new user runs 'hermes setup' for the first time and ~/.openclaw/
exists, the wizard now asks if they want to import their OpenClaw data
before API/tool configuration begins.
If accepted, the existing migration script from optional-skills/ is
loaded dynamically and run with the 'full' preset — importing settings,
memories, skills, API keys, and platform configs. Config is reloaded
afterward so imported values (like API keys) are available for the
remaining setup steps.
The migration is only offered on first-time setup (not returning users)
and handles errors gracefully without blocking setup completion.
Closes#829
When hermes-agent runs as a systemd service, Docker container, or
headless daemon, the stdout pipe can become unavailable (idle timeout,
buffer exhaustion, socket reset). Any print() call then raises
OSError: [Errno 5] Input/output error, crashing run_conversation()
and causing cron jobs to fail.
Rather than wrapping individual print() calls (68 in run_conversation
alone), this adds a transparent _SafeWriter wrapper installed once at
the start of run_conversation(). It delegates all writes to the real
stdout and silently catches OSError. Zero overhead on the happy path,
comprehensive coverage of all print calls including future ones.
Fixes#845
Co-authored-by: J0hnLawMississippi <J0hnLawMississippi@users.noreply.github.com>
- Forum parent channel IDs now match free-response list (add a forum
channel ID and all its threads respond without mention)
- Better thread chat names: 'Guild / forum / thread' for forum threads
- Add discord.require_mention and discord.free_response_channels to
config.yaml (bridged to env vars, env vars still override)
- Keep require_mention defaulting to true (safe for shared servers)
Cherry-picked from PR #867 by insecurejezza with default fix and
config.yaml integration.
Co-authored-by: insecurejezza <insecurejezza@users.noreply.github.com>
Three new tests for the naive timestamp fix (PR #807):
- test_ensure_aware_naive_preserves_absolute_time: verifies UTC equivalent
is preserved when interpreting naive datetimes as system-local time
- test_ensure_aware_normalizes_aware_to_hermes_tz: verifies already-aware
datetimes are normalized to Hermes tz without shifting the instant
- test_ensure_aware_due_job_not_skipped_when_system_ahead: end-to-end
regression test for the original bug scenario
_discover_one() caught all exceptions and returned [], making
asyncio.gather(return_exceptions=True) redundant. The
isinstance(result, Exception) branch in _discover_all() was dead
code, so failed_count was always 0. This caused:
- No summary printed when all servers fail (silent failure)
- ok_servers always equaling total_servers (misleading count)
- Unused variables transport_desc and transport_type
Fix: let exceptions propagate to gather() so failed_count increments
correctly. Move per-server failure logging to _discover_all(). Remove
dead variables.
Two fixes for context overflow handling:
1. Proactive compression after tool execution: The compression check now
estimates the next prompt size using real token counts from the last API
response (prompt_tokens + completion_tokens) plus a conservative estimate
of newly appended tool results (chars // 3 for JSON-heavy content).
Previously, should_compress() only checked last_prompt_tokens which
didn't account for tool results — so a 130k prompt + 100k chars of tool
output would pass the 140k threshold check but fail the 200k API limit.
2. Safety net: Added 'prompt is too long' to context-length error detection
phrases. Anthropic returns 'prompt is too long: N tokens > M maximum'
on HTTP 400, which wasn't matched by existing phrases. This ensures
compression fires even if the proactive check underestimates.
Fixes#813
- test_agent_loop_tool_calling.py: import atroposlib at module level
to trigger skip (environments.agent_loop is now importable without
atroposlib due to __init__.py graceful fallback)
- test_modal_sandbox_fixes.py: skip TestToolResolution tests when
minisweagent not installed
- environments/__init__.py: try/except on atroposlib imports so
submodules like tool_call_parsers remain importable standalone
- test_agent_loop.py, test_tool_call_parsers.py,
test_managed_server_tool_support.py: skip at module level when
atroposlib is missing
Guard all test files that import from environments/ or atroposlib
with try/except + pytest.skip(allow_module_level=True) so they
gracefully skip instead of crashing when deps aren't available.
Tests hit a real vLLM server (Qwen/Qwen3-4B-Thinking-2507) via
ManagedServer Phase 2. Auto-skip if server isn't running.
Tests verify:
- Single tool call through full agent loop
- Multi-tool calls across turns
- ManagedServer produces SequenceNodes with tokens/logprobs
- Direct response without tools
- Thinking model produces <think> blocks
Also adds fallback parser in agent_loop.py: when ManagedServer's
ToolCallTranslator can't parse (vLLM not installed), hermes-agent's
standalone parsers extract <tool_call> tags from raw content.
Real LLM calls via OpenRouter using stepfun/step-3.5-flash:free (zero cost).
Falls back to paid models if free model is unavailable.
Tests: single tool call, multi-tool single turn, multi-turn chains,
unknown tool rejection, max_turns limit, direct response (no tools),
tool error handling, AgentResult structure, conversation history.
The weaker assertion (r.exc_info is not None) passes even when
exc_info is (None, None, None). Check r.exc_info[0] is not None
to verify actual exception info is present.
The _aux_async_client mock was already applied on main.
Co-authored-by: OutThisLife <nickolasgustafsson@gmail.com>
Allow users to interact with Hermes by sending and receiving emails.
Uses IMAP polling for incoming messages and SMTP for replies with
proper threading (In-Reply-To, References headers).
Integrates with all 14 gateway extension points: config, adapter
factory, authorization, send_message tool, cron delivery, toolsets,
prompt hints, channel directory, setup wizard, status display, and
env example.
65 tests covering config, parsing, dispatch, threading, IMAP fetch,
SMTP send, attachments, and all integration points.
Adds delegation.model and delegation.provider config fields so subagents
can run on a completely different provider:model pair than the parent agent.
When delegation.provider is set, the system resolves the full credential
bundle (base_url, api_key, api_mode) via resolve_runtime_provider() —
the same path used by CLI/gateway startup. This means all configured
providers work out of the box: openrouter, nous, zai, kimi-coding,
minimax, minimax-cn.
Key design decisions:
- Provider resolution uses hermes_cli.runtime_provider (single source of
truth for credential resolution across CLI, gateway, cron, and now
delegation)
- When only delegation.model is set (no provider), the model name changes
but parent credentials are inherited (for switching models within the
same provider like OpenRouter)
- When delegation.provider is set, full credentials are resolved
independently — enabling cross-provider delegation (e.g. parent on
Nous Portal, subagents on OpenRouter)
- Clear error messages if provider resolution fails (missing API key,
unknown provider name)
- _load_config() now falls back to hermes_cli.config.load_config() for
gateway/cron contexts where CLI_CONFIG is unavailable
Based on PR #791 by 0xbyt4 (closes#609), reworked to use proper
provider credential resolution instead of passing provider as metadata.
Co-authored-by: 0xbyt4 <0xbyt4@users.noreply.github.com>
Combined implementation of reasoning management:
- /reasoning Show current effort level and display state
- /reasoning <level> Set reasoning effort (none, low, medium, high, xhigh)
- /reasoning show|on Show model thinking/reasoning in output
- /reasoning hide|off Hide model thinking/reasoning from output
Effort level changes persist to config and force agent re-init.
Display toggle updates the agent callback dynamically without re-init.
When display is enabled:
- Intermediate reasoning shown as dim [thinking] lines during tool loops
- Final reasoning shown in a bordered box above the response
- Long reasoning collapsed (5 lines intermediate, 10 lines final)
Also adds:
- reasoning_callback parameter to AIAgent
- last_reasoning in run_conversation result dict
- show_reasoning config option (display section, default: false)
- Display section in /config output
- 34 tests covering both features
Combines functionality from PR #789 and PR #790.
Co-authored-by: Aum Desai <Aum08Desai@users.noreply.github.com>
Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>