Both /model and /provider now show the same unified display:
Current: anthropic/claude-opus-4.6 via OpenRouter
Authenticated providers & models:
[openrouter] ← active
anthropic/claude-opus-4.6 ← current
anthropic/claude-sonnet-4.5
...
[nous]
claude-opus-4-6
gemini-3-flash
...
[openai-codex]
gpt-5.2-codex
gpt-5.1-codex-mini
...
Not configured: Z.AI / GLM, Kimi / Moonshot, ...
Switch model: /model <model-name>
Switch provider: /model <provider>:<model-name>
Example: /model nous:claude-opus-4-6
Users can see all authenticated providers and their models at a glance,
making it easy to switch mid-conversation.
Also added curated model lists for Nous Portal and OpenAI Codex to
hermes_cli/models.py.
Model selection now comes exclusively from config.yaml (set via
'hermes model' or 'hermes setup'). The LLM_MODEL env var is no longer
read or written anywhere in production code.
Why: env vars are per-process/per-user and would conflict in
multi-agent or multi-tenant setups. Config.yaml is file-based and
can be scoped per-user or eventually per-session.
Changes:
- cli.py: Read model from CLI_CONFIG only, not LLM_MODEL/OPENAI_MODEL
- hermes_cli/auth.py: _save_model_choice() no longer writes LLM_MODEL
to .env
- hermes_cli/setup.py: Remove 12 save_env_value('LLM_MODEL', ...)
calls from all provider setup flows
- gateway/run.py: Remove LLM_MODEL fallback (HERMES_MODEL still works
for gateway process runtime)
- cron/scheduler.py: Same
- agent/auxiliary_client.py: Remove LLM_MODEL from custom endpoint
model detection
Phase 2 of the provider router migration — route the main agent's
client construction and fallback activation through
resolve_provider_client() instead of duplicated ad-hoc logic.
run_agent.py:
- __init__: When no explicit api_key/base_url, use
resolve_provider_client(provider, raw_codex=True) for client
construction. Explicit creds (from CLI/gateway runtime provider)
still construct directly.
- _try_activate_fallback: Replace _resolve_fallback_credentials and
its duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS
dicts with a single resolve_provider_client() call. The router
handles all provider types (API-key, OAuth, Codex) centrally.
- Remove _resolve_fallback_credentials method and both fallback dicts.
agent/auxiliary_client.py:
- Add raw_codex parameter to resolve_provider_client(). When True,
returns the raw OpenAI client for Codex providers instead of wrapping
in CodexAuxiliaryClient. The main agent needs this for direct
responses.stream() access.
3251 passed, 2 pre-existing unrelated failures.
Update 14 test files to use the new call_llm/async_call_llm mock
patterns instead of the old get_text_auxiliary_client/
get_vision_auxiliary_client tuple returns.
- vision_tools tests: mock async_call_llm instead of _aux_async_client
- browser tests: mock call_llm instead of _aux_vision_client
- flush_memories tests: mock call_llm instead of get_text_auxiliary_client
- session_search tests: mock async_call_llm with RuntimeError
- mcp_tool tests: fix whitelist model config, use side_effect for
multi-response tests
- auxiliary_config_bridge: update for model=None (resolved in router)
3251 passed, 2 pre-existing unrelated failures.
Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
1. Resolve provider + model from task config or explicit args
2. Get or create a cached client for that provider
3. Format request args (max_tokens handling, provider extra_body)
4. Make the API call with max_tokens/max_completion_tokens retry
5. Return the response
Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.
Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')
Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).
When hermes-agent runs as a systemd service, Docker container, or
headless daemon, the stdout pipe can become unavailable (idle timeout,
buffer exhaustion, socket reset). Any print() call then raises
OSError: [Errno 5] Input/output error, crashing run_conversation()
and causing cron jobs to fail.
Rather than wrapping individual print() calls (68 in run_conversation
alone), this adds a transparent _SafeWriter wrapper installed once at
the start of run_conversation(). It delegates all writes to the real
stdout and silently catches OSError. Zero overhead on the happy path,
comprehensive coverage of all print calls including future ones.
Fixes#845
Co-authored-by: J0hnLawMississippi <J0hnLawMississippi@users.noreply.github.com>
Two fixes for context overflow handling:
1. Proactive compression after tool execution: The compression check now
estimates the next prompt size using real token counts from the last API
response (prompt_tokens + completion_tokens) plus a conservative estimate
of newly appended tool results (chars // 3 for JSON-heavy content).
Previously, should_compress() only checked last_prompt_tokens which
didn't account for tool results — so a 130k prompt + 100k chars of tool
output would pass the 140k threshold check but fail the 200k API limit.
2. Safety net: Added 'prompt is too long' to context-length error detection
phrases. Anthropic returns 'prompt is too long: N tokens > M maximum'
on HTTP 400, which wasn't matched by existing phrases. This ensures
compression fires even if the proactive check underestimates.
Fixes#813
- test_agent_loop_tool_calling.py: import atroposlib at module level
to trigger skip (environments.agent_loop is now importable without
atroposlib due to __init__.py graceful fallback)
- test_modal_sandbox_fixes.py: skip TestToolResolution tests when
minisweagent not installed
- environments/__init__.py: try/except on atroposlib imports so
submodules like tool_call_parsers remain importable standalone
- test_agent_loop.py, test_tool_call_parsers.py,
test_managed_server_tool_support.py: skip at module level when
atroposlib is missing
Guard all test files that import from environments/ or atroposlib
with try/except + pytest.skip(allow_module_level=True) so they
gracefully skip instead of crashing when deps aren't available.
Tests hit a real vLLM server (Qwen/Qwen3-4B-Thinking-2507) via
ManagedServer Phase 2. Auto-skip if server isn't running.
Tests verify:
- Single tool call through full agent loop
- Multi-tool calls across turns
- ManagedServer produces SequenceNodes with tokens/logprobs
- Direct response without tools
- Thinking model produces <think> blocks
Also adds fallback parser in agent_loop.py: when ManagedServer's
ToolCallTranslator can't parse (vLLM not installed), hermes-agent's
standalone parsers extract <tool_call> tags from raw content.
Real LLM calls via OpenRouter using stepfun/step-3.5-flash:free (zero cost).
Falls back to paid models if free model is unavailable.
Tests: single tool call, multi-tool single turn, multi-turn chains,
unknown tool rejection, max_turns limit, direct response (no tools),
tool error handling, AgentResult structure, conversation history.
The weaker assertion (r.exc_info is not None) passes even when
exc_info is (None, None, None). Check r.exc_info[0] is not None
to verify actual exception info is present.
The _aux_async_client mock was already applied on main.
Co-authored-by: OutThisLife <nickolasgustafsson@gmail.com>
Allow users to interact with Hermes by sending and receiving emails.
Uses IMAP polling for incoming messages and SMTP for replies with
proper threading (In-Reply-To, References headers).
Integrates with all 14 gateway extension points: config, adapter
factory, authorization, send_message tool, cron delivery, toolsets,
prompt hints, channel directory, setup wizard, status display, and
env example.
65 tests covering config, parsing, dispatch, threading, IMAP fetch,
SMTP send, attachments, and all integration points.
Adds delegation.model and delegation.provider config fields so subagents
can run on a completely different provider:model pair than the parent agent.
When delegation.provider is set, the system resolves the full credential
bundle (base_url, api_key, api_mode) via resolve_runtime_provider() —
the same path used by CLI/gateway startup. This means all configured
providers work out of the box: openrouter, nous, zai, kimi-coding,
minimax, minimax-cn.
Key design decisions:
- Provider resolution uses hermes_cli.runtime_provider (single source of
truth for credential resolution across CLI, gateway, cron, and now
delegation)
- When only delegation.model is set (no provider), the model name changes
but parent credentials are inherited (for switching models within the
same provider like OpenRouter)
- When delegation.provider is set, full credentials are resolved
independently — enabling cross-provider delegation (e.g. parent on
Nous Portal, subagents on OpenRouter)
- Clear error messages if provider resolution fails (missing API key,
unknown provider name)
- _load_config() now falls back to hermes_cli.config.load_config() for
gateway/cron contexts where CLI_CONFIG is unavailable
Based on PR #791 by 0xbyt4 (closes#609), reworked to use proper
provider credential resolution instead of passing provider as metadata.
Co-authored-by: 0xbyt4 <0xbyt4@users.noreply.github.com>
Combined implementation of reasoning management:
- /reasoning Show current effort level and display state
- /reasoning <level> Set reasoning effort (none, low, medium, high, xhigh)
- /reasoning show|on Show model thinking/reasoning in output
- /reasoning hide|off Hide model thinking/reasoning from output
Effort level changes persist to config and force agent re-init.
Display toggle updates the agent callback dynamically without re-init.
When display is enabled:
- Intermediate reasoning shown as dim [thinking] lines during tool loops
- Final reasoning shown in a bordered box above the response
- Long reasoning collapsed (5 lines intermediate, 10 lines final)
Also adds:
- reasoning_callback parameter to AIAgent
- last_reasoning in run_conversation result dict
- show_reasoning config option (display section, default: false)
- Display section in /config output
- 34 tests covering both features
Combines functionality from PR #789 and PR #790.
Co-authored-by: Aum Desai <Aum08Desai@users.noreply.github.com>
Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
Adds defensive guard against empty/None/missing choices in SamplingHandler.__call__
before accessing response.choices[0]. Returns proper ErrorData instead of crashing
with IndexError/TypeError on content filtering, provider errors, or rate limits.
Authored by 0xbyt4.
Co-authored-by: 0xbyt4 <0xbyt4@users.noreply.github.com>
Adds tool_choice, parallel_tool_calls, and prompt_cache_key to the
Codex Responses API request kwargs — matching what the official Codex
CLI sends.
- tool_choice: 'auto' — enables the model to proactively call tools.
Without this, the model may default to not using tools, which explains
reports of the agent claiming it lacks shell access (#747).
- parallel_tool_calls: True — allows the model to issue multiple tool
calls in a single turn for efficiency.
- prompt_cache_key: session_id — enables server-side prompt caching
across turns in the same session, reducing latency and cost.
Refs #747
Isolate Telegram forum topic sessions — each topic gets its own independent session key, history, and interrupt tracking. Progress, hygiene, and cron messages all route to the correct topic.
Four cleanups to code merged today:
1. New hermes_cli/curses_ui.py — shared curses_checklist() used by both
hermes tools and hermes skills. Eliminates ~140 lines of near-identical
curses code (scrolling, key handling, color setup, numbered fallback).
2. Fix _find_all_skills() perf — was calling load_config() per skill
(~100+ YAML parses). Now loads disabled set once via
_get_disabled_skill_names() and does a set lookup.
3. Eliminate _list_all_skills_unfiltered() duplication — _find_all_skills()
now accepts skip_disabled=True for the config UI, removing 30 lines
of copy-pasted discovery logic from skills_config.py.
4. Fix fragile label round-trip in skills_command — was building label
strings, passing to checklist, then mapping labels back to skill names
(collision-prone). Now works with indices directly, like tools_config.
Authored by teyrebaz33. Closes#643.
- /personality none/default/neutral clears system prompt overlay
- Dict format personalities with description, tone, style fields
- Works in both CLI and gateway
- 18 tests
Add /background <prompt> to the gateway, allowing users on Telegram,
Discord, Slack, etc. to fire off a prompt in a separate agent session.
The result is delivered back to the same chat when done, without
modifying the active conversation history.
Implementation:
- _handle_background_command: validates input, spawns asyncio task
- _run_background_task: creates AIAgent in executor thread, delivers
result (text, images, media files) back via the platform adapter
- Inherits model, toolsets, provider routing from gateway config
- Error handling with user-visible failure messages
Also adds /background to hermes_cli/commands.py registry so it
appears in /help and autocomplete.
Tests: 15 new tests covering usage, task creation, uniqueness,
multi-platform, error paths, and help/autocomplete integration.
Fixes#898 — Python 3.11 changed argparse to raise an exception on
duplicate subparser names (CPython #94331). The 'skills' name was
registered twice: once for Skills Hub and once for skills config.
Changes:
- Remove duplicate 'skills' subparser registration
- Add 'config' as a sub-action under the existing 'hermes skills' command
- Route 'hermes skills config' to skills_config module
- Add regression test to catch future duplicates
Migration: 'hermes skills' (config) is now 'hermes skills config'
Two-tier warning system that nudges the LLM as it approaches
max_iterations, injected into the last tool result JSON rather
than as a separate system message:
- Caution (70%): {"_budget_warning": "[BUDGET: 42/60...]"}
- Warning (90%): {"_budget_warning": "[BUDGET WARNING: 54/60...]"}
For JSON tool results, adds a _budget_warning field to the existing
dict. For plain text results, appends the warning as text.
Key properties:
- No system messages injected mid-conversation
- No changes to message structure
- Prompt cache stays valid
- Configurable thresholds (0.7 / 0.9)
- Can be disabled: _budget_pressure_enabled = False
Inspired by PR #421 (@Bartok9) and issue #414.
8 tests covering thresholds, edge cases, JSON and text injection.
Replaces head-only stdout capture with a two-buffer approach (40% head,
60% tail rolling window) so scripts that print() their final results
at the end never lose them. Adds truncation notice between sections.
Cherry-picked from PR #755, conflict resolved (test file additions).
3 new tests for short output, head+tail preservation, and notice format.
Adds configurable bot message filtering via DISCORD_ALLOW_BOTS env var:
- 'none' (default): ignore all bot messages
- 'mentions': accept bots only when they @mention us
- 'all': accept all bot messages
Includes 8 tests.
Authored by teyrebaz33. Adds config-driven quick commands that execute
shell commands without invoking the LLM — zero token usage, works from
Telegram/Discord/Slack/etc. Closes#744.
Follow-up to 58dbd81 — ensures smooth transition for existing users:
- Backward compat: old session files without last_prompt_tokens
default to 0 via data.get('last_prompt_tokens', 0)
- /compress, /undo, /retry: reset last_prompt_tokens to 0 after
rewriting transcripts (stale token counts would under-report)
- Auto-compression hygiene: reset last_prompt_tokens after rewriting
- update_session: use None sentinel (not 0) as default so callers
can explicitly reset to 0 while normal calls don't clobber
- 6 new tests covering: default value, serialization roundtrip,
old-format migration, set/reset/no-change semantics
- /reset: new SessionEntry naturally gets last_prompt_tokens=0
2942 tests pass.
Adds two tests for _handle_retry_command: verifies /retry returns the
agent response (not None), and verifies graceful handling when no
previous message exists.
Cherry-picked from PR #731 by teyrebaz33. Regression coverage for
the fix merged in PR #441.
Co-authored-by: teyrebaz33 <teyrebaz33@users.noreply.github.com>
Authored by dmahan93. Adds HERMES_YOLO_MODE env var and --yolo CLI flag
to auto-approve all dangerous command prompts.
Post-merge: renamed --fuck-it-ship-it to --yolo for brevity,
resolved conflict with --checkpoints flag.
On macOS, Docker Desktop installs the CLI to /usr/local/bin/docker, but
when Hermes runs as a gateway service (launchd) or in other non-login
contexts, /usr/local/bin is often not in PATH. This causes the Docker
requirements check to fail with 'No such file or directory: docker' even
though docker works fine from the user's terminal.
Add find_docker() helper that uses shutil.which() first, then probes
common Docker Desktop install paths on macOS (/usr/local/bin,
/opt/homebrew/bin, Docker.app bundle). The resolved path is cached and
passed to mini-swe-agent via its 'executable' parameter.
- tools/environments/docker.py: add find_docker(), use it in
_storage_opt_supported() and pass to _Docker(executable=...)
- tools/terminal_tool.py: use find_docker() in requirements check
- tests/tools/test_docker_find.py: 4 tests (PATH, fallback, not found, cache)
2877 tests pass.
Shows an immediate status message and braille spinner for slow slash
commands (/skills search|browse|inspect|install, /reload-mcp). Makes
input read-only while the command runs so the CLI doesn't appear frozen.
Cherry-picked from PR #714 by vilkasdev, rebased onto current main
with conflict resolution and bug fix (get_hint_text duplicate return).
Fixes#636
Co-authored-by: vilkasdev <vilkasdev@users.noreply.github.com>
Two related bugs prevented users from reliably switching providers:
1. OPENAI_BASE_URL poisoning OpenRouter resolution: When a user with a
custom endpoint ran /model openrouter:model, _resolve_openrouter_runtime
picked up OPENAI_BASE_URL instead of the OpenRouter URL, causing model
validation to probe the wrong API and reject valid models.
Fix: skip OPENAI_BASE_URL when requested_provider is explicitly
'openrouter'.
2. Provider never saved to config: _save_model_choice() could save
config.model as a plain string. All five _model_flow_* functions then
checked isinstance(model, dict) before writing the provider — which
silently failed on strings. With no provider in config, auto-detection
would pick up stale credentials (e.g. Codex desktop app) instead of
the user's explicit choice.
Fix: _save_model_choice() now always saves as dict format. All flow
functions also normalize string->dict as a safety net before writing
provider.
Adds 4 regression tests. 2873 tests pass.
Follow-up to PR #716 (0xbyt4):
- Log the third remaining silent except-pass in scheduler (prefill
messages JSON parse failure)
- Fix test mock: run → run_conversation (matches actual agent API)
- Remove unused imports (asyncio, AsyncMock)
- Add test for prefill_messages parse failure logging
The 4 early-return paths in _spawn_training_run (API exit, trainer
exit, env not found, env exit) were doing manual process.terminate()
or returning without cleanup, leaking open log file handles. Now all
paths call _stop_training_run() which handles both process termination
and file handle closure.
Also adds 12 tests for _stop_training_run covering file handle
cleanup, process termination, status transitions, and edge cases.
Inspired by PR #715 (0xbyt4) which identified the early-return issue.
Core file handle fix was already on main via e28dc13 (memosr.eth).