Route all remaining ad-hoc auxiliary LLM call sites through
resolve_provider_client() so auth, headers, and API format (Chat
Completions vs Responses API) are handled consistently in one place.
Files changed:
- tools/openrouter_client.py: Replace manual AsyncOpenAI construction
with resolve_provider_client('openrouter', async_mode=True). The
shared client module now delegates entirely to the router.
- tools/skills_guard.py: Replace inline OpenAI client construction
(hardcoded OpenRouter base_url, manual api_key lookup, manual
headers) with resolve_provider_client('openrouter'). Remove unused
OPENROUTER_BASE_URL import.
- trajectory_compressor.py: Add _detect_provider() to map config
base_url to a provider name, then route through
resolve_provider_client. Falls back to raw construction for
unrecognized custom endpoints.
- mini_swe_runner.py: Route default case (no explicit api_key/base_url)
through resolve_provider_client('openrouter') with auto-detection
fallback. Preserves direct construction when explicit creds are
passed via CLI args.
- agent/auxiliary_client.py: Fix stale module docstring — vision auto
mode now correctly documents that Codex and custom endpoints are
tried (not skipped).
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
When hermes-agent runs as a systemd service, Docker container, or
headless daemon, the stdout pipe can become unavailable (idle timeout,
buffer exhaustion, socket reset). Any print() call then raises
OSError: [Errno 5] Input/output error, crashing run_conversation()
and causing cron jobs to fail.
Rather than wrapping individual print() calls (68 in run_conversation
alone), this adds a transparent _SafeWriter wrapper installed once at
the start of run_conversation(). It delegates all writes to the real
stdout and silently catches OSError. Zero overhead on the happy path,
comprehensive coverage of all print calls including future ones.
Fixes#845
Co-authored-by: J0hnLawMississippi <J0hnLawMississippi@users.noreply.github.com>
Previously the early return for unconfigured vision model was silent.
Now logs an error so the failure is visible in logs for debugging.
Inspired by PR #839 by aydnOktay.
Co-authored-by: aydnOktay <aydnOktay@users.noreply.github.com>
MiniMax APIs (global and China) don't support /v1/models, causing
hermes doctor to always show HTTP 404 even with valid API keys.
Skip the HTTP check for these providers and show '(key configured)'
when the API key is present.
Cherry-picked from PR #822 by Bartok9, rebased onto current main.
Fixes#811
Co-authored-by: Bartok9 <259807879+Bartok9@users.noreply.github.com>
Two fixes for context overflow handling:
1. Proactive compression after tool execution: The compression check now
estimates the next prompt size using real token counts from the last API
response (prompt_tokens + completion_tokens) plus a conservative estimate
of newly appended tool results (chars // 3 for JSON-heavy content).
Previously, should_compress() only checked last_prompt_tokens which
didn't account for tool results — so a 130k prompt + 100k chars of tool
output would pass the 140k threshold check but fail the 200k API limit.
2. Safety net: Added 'prompt is too long' to context-length error detection
phrases. Anthropic returns 'prompt is too long: N tokens > M maximum'
on HTTP 400, which wasn't matched by existing phrases. This ensures
compression fires even if the proactive check underestimates.
Fixes#813
The old flow blindly asked for an OpenRouter API key after ANY non-OR
provider selection, even for Nous Portal and Codex which already
support vision natively. This was confusing and annoying.
New behavior:
- OpenRouter: skip — vision uses Gemini via their OR key
- Nous Portal OAuth: skip — vision uses Gemini via Nous
- OpenAI Codex: skip — gpt-5.3-codex supports vision
- Custom endpoint (api.openai.com): show OpenAI vision model picker
(gpt-4o, gpt-4o-mini, gpt-4.1, etc.), saves AUXILIARY_VISION_MODEL
- Custom (other) / z.ai / kimi / minimax / nous-api:
- First checks if existing OR/Nous creds already cover vision
- If not, offers friendly choice: OpenRouter / OpenAI / Skip
- No more 'enter OpenRouter key' thrown in your face
Also fixes the setup summary to check actual vision availability
across all providers instead of hardcoding 'requires OPENROUTER_API_KEY'.
MoA still correctly requires OpenRouter (calls multiple frontier models).
Show users the specific commands for each config area (hermes model,
hermes tools, hermes config set, hermes gateway setup) and then
present 'hermes setup' as the option to configure everything at once.
The installer already handles full setup (provider config, etc.), so
telling users to run 'hermes setup' post-install is redundant and
confusing. Updated all docs to reflect the correct flow:
1. Run the installer (handles everything including provider setup)
2. Use 'hermes model', 'hermes tools', 'hermes gateway setup' to
reconfigure individual settings later
Files updated:
- README.md: removed setup from quick install & getting started
- installation.md: updated post-install, manual step 9, troubleshooting
- quickstart.md: updated provider section & quick reference table
- cli-commands.md: updated hermes setup description
- faq.md: replaced hermes setup references with specific commands
- max_retries reduced from 6 to 3 — 6 retries with exponential backoff
could stall for ~275s total on persistent errors
- ValueError and TypeError now detected as non-retryable client errors
and abort immediately instead of being retried with backoff (these are
local validation/programming errors that will never succeed on retry)
_preflight_codex_api_kwargs rejected these three fields as unsupported,
but _build_api_kwargs adds them to every codex request. This caused a
ValueError before _interruptible_api_call was reached, which was caught
by the retry loop and retried with exponential backoff — appearing as
an infinite hang in tests (275s total backoff across 6 retries).
The fix adds these keys to allowed_keys and passes them through to the
normalized request dict.
This fixes the hanging test_cron_run_job_codex_path_handles_internal_401_refresh
test (now passes in 2.6s instead of timing out).
- test_agent_loop_tool_calling.py: import atroposlib at module level
to trigger skip (environments.agent_loop is now importable without
atroposlib due to __init__.py graceful fallback)
- test_modal_sandbox_fixes.py: skip TestToolResolution tests when
minisweagent not installed
- environments/__init__.py: try/except on atroposlib imports so
submodules like tool_call_parsers remain importable standalone
- test_agent_loop.py, test_tool_call_parsers.py,
test_managed_server_tool_support.py: skip at module level when
atroposlib is missing
Guard all test files that import from environments/ or atroposlib
with try/except + pytest.skip(allow_module_level=True) so they
gracefully skip instead of crashing when deps aren't available.
TBLite eval was bypassing ManagedServer and calling ServerManager
directly, which uses /v1/chat/completions — not available on the
atropos vllm_api_server (/generate only).
Now uses _use_managed_server() to detect vLLM/SGLang backends and
route through ManagedServer (Phase 2) with proper tool_parser and
/generate endpoint. Falls back to Phase 1 for OpenAI endpoints.
Also adds local_vllm.yaml config for running against a local vLLM
server with Docker sandboxes.
vLLM's ToolCallTranslator returns tool_calls as dicts, while
OpenAI API returns them as objects with .id, .function.name etc.
Normalize both formats in the agent loop.
Tests hit a real vLLM server (Qwen/Qwen3-4B-Thinking-2507) via
ManagedServer Phase 2. Auto-skip if server isn't running.
Tests verify:
- Single tool call through full agent loop
- Multi-tool calls across turns
- ManagedServer produces SequenceNodes with tokens/logprobs
- Direct response without tools
- Thinking model produces <think> blocks
Also adds fallback parser in agent_loop.py: when ManagedServer's
ToolCallTranslator can't parse (vLLM not installed), hermes-agent's
standalone parsers extract <tool_call> tags from raw content.
Migrate from old tool_call_parser (instance) to new ToolCallTranslator
pattern from atropos add-openai-endpoint-for-managed-server branch:
- Set tool_parser on ServerManager (string name, e.g. 'hermes')
- Use managed_server(tokenizer=..., preserve_think_blocks=...)
instead of managed_server(tokenizer=..., tool_call_parser=instance)
- ManagedServer now handles tool call translation internally via
ToolCallTranslator (bidirectional raw text <-> OpenAI tool_calls)
- Remove old parser loading code (get_parser/KeyError fallback)
The hermes-agent tool_call_parsers/ directory is preserved as a
standalone fallback for environments that don't use vLLM's parsers.
Real LLM calls via OpenRouter using stepfun/step-3.5-flash:free (zero cost).
Falls back to paid models if free model is unavailable.
Tests: single tool call, multi-tool single turn, multi-turn chains,
unknown tool rejection, max_turns limit, direct response (no tools),
tool error handling, AgentResult structure, conversation history.
- Add eval_concurrency config field with asyncio.Semaphore
- Add local.yaml config using Docker backend (sandboxed, no cloud costs)
- Register docker_image alongside modal_image for backend flexibility
- Default: 8 parallel tasks for local runs
Fixes discovered while running TBLite baseline evaluation:
1. ephemeral_disk param not supported in modal 1.3.5 - check before passing
2. Modal legacy image builder requires working pip - add ensurepip fix via
setup_dockerfile_commands to handle task images with broken pip
3. Host cwd leaked into Modal sandbox - add /home/ to host prefix check
4. Tilde ~ not expanded by subprocess.run(cwd=) in sandboxes - use /root
5. install_pipx must stay True for swerex-remote to be available
Dependencies also needed (not in this commit):
- git submodule update --init mini-swe-agent
- uv pip install swe-rex boto3
Adds pytest-xdist to dev dependencies and -n auto to default pytest addopts
for parallel test execution across CPU cores.
Authored by OutThisLife.
Co-authored-by: OutThisLife <OutThisLife@users.noreply.github.com>
Post-merge fixes for the email gateway (PR #797):
1. Add Platform.EMAIL to all 4 platform-to-toolset/config mapping
dicts in gateway/run.py. Without this, email sessions silently
fell back to the Telegram toolset because these dicts were added
after the PR branched off main.
2. Add email (and signal) to hermes_cli/tools_config.py and
hermes_cli/skills_config.py PLATFORMS dicts so they appear in
'hermes tools' and 'hermes skills' CLI commands.
3. Add full email setup documentation:
- website/docs/user-guide/messaging/email.md — setup guide with
Gmail/Outlook instructions, configuration, troubleshooting,
security advice, and env var reference
- Update messaging/index.md — add email to architecture diagram,
platform toolset table, security examples, and next steps
The weaker assertion (r.exc_info is not None) passes even when
exc_info is (None, None, None). Check r.exc_info[0] is not None
to verify actual exception info is present.
The _aux_async_client mock was already applied on main.
Co-authored-by: OutThisLife <nickolasgustafsson@gmail.com>
Allow users to interact with Hermes by sending and receiving emails.
Uses IMAP polling for incoming messages and SMTP for replies with
proper threading (In-Reply-To, References headers).
Integrates with all 14 gateway extension points: config, adapter
factory, authorization, send_message tool, cron delivery, toolsets,
prompt hints, channel directory, setup wizard, status display, and
env example.
65 tests covering config, parsing, dispatch, threading, IMAP fetch,
SMTP send, attachments, and all integration points.
Adds delegation.model and delegation.provider config fields so subagents
can run on a completely different provider:model pair than the parent agent.
When delegation.provider is set, the system resolves the full credential
bundle (base_url, api_key, api_mode) via resolve_runtime_provider() —
the same path used by CLI/gateway startup. This means all configured
providers work out of the box: openrouter, nous, zai, kimi-coding,
minimax, minimax-cn.
Key design decisions:
- Provider resolution uses hermes_cli.runtime_provider (single source of
truth for credential resolution across CLI, gateway, cron, and now
delegation)
- When only delegation.model is set (no provider), the model name changes
but parent credentials are inherited (for switching models within the
same provider like OpenRouter)
- When delegation.provider is set, full credentials are resolved
independently — enabling cross-provider delegation (e.g. parent on
Nous Portal, subagents on OpenRouter)
- Clear error messages if provider resolution fails (missing API key,
unknown provider name)
- _load_config() now falls back to hermes_cli.config.load_config() for
gateway/cron contexts where CLI_CONFIG is unavailable
Based on PR #791 by 0xbyt4 (closes#609), reworked to use proper
provider credential resolution instead of passing provider as metadata.
Co-authored-by: 0xbyt4 <0xbyt4@users.noreply.github.com>
Combined implementation of reasoning management:
- /reasoning Show current effort level and display state
- /reasoning <level> Set reasoning effort (none, low, medium, high, xhigh)
- /reasoning show|on Show model thinking/reasoning in output
- /reasoning hide|off Hide model thinking/reasoning from output
Effort level changes persist to config and force agent re-init.
Display toggle updates the agent callback dynamically without re-init.
When display is enabled:
- Intermediate reasoning shown as dim [thinking] lines during tool loops
- Final reasoning shown in a bordered box above the response
- Long reasoning collapsed (5 lines intermediate, 10 lines final)
Also adds:
- reasoning_callback parameter to AIAgent
- last_reasoning in run_conversation result dict
- show_reasoning config option (display section, default: false)
- Display section in /config output
- 34 tests covering both features
Combines functionality from PR #789 and PR #790.
Co-authored-by: Aum Desai <Aum08Desai@users.noreply.github.com>
Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
Adds defensive guard against empty/None/missing choices in SamplingHandler.__call__
before accessing response.choices[0]. Returns proper ErrorData instead of crashing
with IndexError/TypeError on content filtering, provider errors, or rate limits.
Authored by 0xbyt4.
Co-authored-by: 0xbyt4 <0xbyt4@users.noreply.github.com>
Replaces all ad-hoc print() calls in the Slack gateway adapter with
proper logging.getLogger(__name__) calls, matching the pattern already
used by every other platform adapter (telegram, discord, whatsapp,
signal, homeassistant).
Changes:
- Add import logging + module-level logger
- Use logger.error for failures, logger.warning for non-critical
fallbacks, logger.info for status, logger.debug for routine ops
- Add exc_info=True for full stack traces on all error/warning paths
- Use %s format strings (lazy evaluation) instead of f-strings
- Wrap disconnect() in try/except for safety
- Add structured context (file paths, channel IDs, URLs) to log messages
- Convert document handling prints added after the original PR
Cherry-picked from PR #778 by aydnOktay, rebased onto current main
with conflict resolution and extended to cover document/video methods
added since the PR was created.
Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
Stray print() in write_file_tool exception handler leaked debug output
to stdout. Replaced with logger.error() which is already set up in
the file.
Authored by memosr.
Co-authored-by: memosr <memosr@users.noreply.github.com>
Adds tool_choice=auto, parallel_tool_calls=true, and prompt_cache_key
to Codex Responses API requests, matching the official Codex CLI.
Root cause fix for #747 (agent claiming no shell access).
Adds tool_choice, parallel_tool_calls, and prompt_cache_key to the
Codex Responses API request kwargs — matching what the official Codex
CLI sends.
- tool_choice: 'auto' — enables the model to proactively call tools.
Without this, the model may default to not using tools, which explains
reports of the agent claiming it lacks shell access (#747).
- parallel_tool_calls: True — allows the model to issue multiple tool
calls in a single turn for efficiency.
- prompt_cache_key: session_id — enables server-side prompt caching
across turns in the same session, reducing latency and cost.
Refs #747
The MEDIA routing in _process_message_background passes
metadata=_thread_metadata to send_video, send_document, and
send_image_file — but none accepted it, causing TypeError silently
caught by the except handler. Files just failed to send.
Fix: add **kwargs to all four base class media methods and their
Telegram overrides.
Isolate Telegram forum topic sessions — each topic gets its own independent session key, history, and interrupt tracking. Progress, hygiene, and cron messages all route to the correct topic.
Four cleanups to code merged today:
1. New hermes_cli/curses_ui.py — shared curses_checklist() used by both
hermes tools and hermes skills. Eliminates ~140 lines of near-identical
curses code (scrolling, key handling, color setup, numbered fallback).
2. Fix _find_all_skills() perf — was calling load_config() per skill
(~100+ YAML parses). Now loads disabled set once via
_get_disabled_skill_names() and does a set lookup.
3. Eliminate _list_all_skills_unfiltered() duplication — _find_all_skills()
now accepts skip_disabled=True for the config UI, removing 30 lines
of copy-pasted discovery logic from skills_config.py.
4. Fix fragile label round-trip in skills_command — was building label
strings, passing to checklist, then mapping labels back to skill names
(collision-prone). Now works with indices directly, like tools_config.
When wl-paste produces empty output, the destination file was left as
a 0-byte orphan. Added dest.unlink() before returning False, matching
the existing cleanup pattern in the exception handler.
Authored by 0xbyt4.
Co-authored-by: 0xbyt4 <0xbyt4@users.noreply.github.com>
Authored by teyrebaz33. Closes#643.
- /personality none/default/neutral clears system prompt overlay
- Dict format personalities with description, tone, style fields
- Works in both CLI and gateway
- 18 tests