Images pasted in the CLI were embedded as raw base64 image_url content
parts in the conversation history, which only works with vision-capable
models. If the main model (e.g. Nous API) doesn't support vision, this
breaks the request and poisons all subsequent messages.
Now the CLI uses the same approach as the messaging gateway: images are
pre-processed through the auxiliary vision model (Gemini Flash via
OpenRouter or Nous Portal) and converted to text descriptions. The
local file path is included so the agent can re-examine via
vision_analyze if needed. Works with any model.
Fixes#638.
User messaging improvements:
- Rejection: '(>_<) Error: not a valid model' instead of '(^_^) Warning: Error:'
- Rejection: shows 'Model unchanged' + tip about /model and /provider
- Session-only: explains 'this session only' with reason and 'will revert on restart'
- Saved: clear '(saved to config)' confirmation
Docs updated:
- cli-commands.md, cli.md, messaging/index.md: /model now shows
provider:model syntax, /provider command added to tables
Test fixes: deduplicated test names, assertions match new messages.
/provider command (CLI + gateway):
Shows all providers with auth status (✓/✗), aliases, and active marker.
Users can now discover what provider names work with provider:model syntax.
Gateway bugs fixed:
- Config was saved even when validation.persist=False (told user 'session
only' but actually persisted the unvalidated model)
- HERMES_INFERENCE_PROVIDER env var not set on provider switch, causing
the switch to be silently overridden if that env var was already set
parse_model_input hardened:
- Colon only treated as provider delimiter if left side is a recognized
provider name or alias. 'anthropic/claude-3.5-sonnet:beta' now passes
through as a model name instead of trying provider='anthropic/claude-3.5-sonnet'.
- HTTP URLs, random colons no longer misinterpreted.
56 tests passing across model validation, CLI commands, and integration.
'auto' doesn't always mean openrouter — it could be nous, zai,
kimi-coding, etc. depending on configured credentials. Reverted the
hardcoded mapping and now both CLI and gateway call
resolve_provider() to detect the actual active provider when 'auto'
is set. Falls back to openrouter only if resolution fails.
- normalize_provider('auto') now returns 'openrouter' (the default)
so /model shows the curated model list instead of nothing
- CLI /model display uses normalize_provider before looking up labels
- Gateway /model handler now uses the same validation logic as CLI:
live API probe, provider:model syntax, curated model list display
Add provider:model syntax to /model command for runtime provider switching:
/model zai:glm-5 → switch to Z.AI provider with glm-5
/model nous:hermes-3 → switch to Nous Portal with hermes-3
/model openrouter:anthropic/claude-sonnet-4.5 → explicit OpenRouter
When switching providers, credentials are resolved via resolve_runtime_provider
and validated before committing. Both model and provider are saved to config.
Provider aliases work (glm: → zai, kimi: → kimi-coding, etc.).
Enhanced /model (no args) display now shows:
- Current model and provider
- Curated model list for the current provider with ← marker
- Usage examples including provider:model syntax
39 tests covering parse_model_input, curated_models_for_provider,
provider switching (success + credential failure), and display output.
Replace the static catalog-based model validation with a live API probe.
The /model command now hits the provider's /models endpoint to check if
the requested model actually exists:
- Model found in API → accepted + saved to config
- Model NOT found in API → rejected with 'Error: not a valid model'
and fuzzy-match suggestions from the live model list
- API unreachable → graceful fallback to hardcoded catalog (session-only
for unrecognized models)
- Format errors (empty, spaces, missing '/') still caught instantly
without a network call
The API probe takes ~0.2s for OpenRouter (346 models) and works with any
OpenAI-compatible endpoint (Ollama, vLLM, custom, etc.).
32 tests covering all paths: format checks, API found, API not found,
API unreachable fallback, CLI integration.
- Wrap validate_requested_model in try/except so /model doesn't crash
if validation itself fails (falls back to old accept+save behavior)
- Remove unnecessary sys.path.insert from both test files
- Expand test_model_validation.py: 4 → 23 tests covering normalize_provider,
provider_model_ids, empty/whitespace/spaces rejection, OpenRouter format
validation, custom endpoints, nous provider, provider aliases, unknown
providers, fuzzy suggestions
- Expand test_cli_model_command.py: 2 → 5 tests adding known-model save,
validation crash fallback, and /model with no argument
Two fixes:
1. Gateway CWD override: TERMINAL_CWD from config.yaml was being
unconditionally overwritten by the messaging_cwd fallback (line 114).
Now explicit paths in config.yaml are respected — only '.' / 'auto' /
'cwd' (or unset) fall back to MESSAGING_CWD or home directory.
2. sandbox_dir config: Added terminal.sandbox_dir to config.yaml bridge
in gateway/run.py, cli.py, and hermes_cli/config.py. Maps to
TERMINAL_SANDBOX_DIR env var, which get_sandbox_dir() reads to
determine where Docker/Singularity sandbox data is stored (default:
~/.hermes/sandboxes/). Users can now set:
hermes config set terminal.sandbox_dir /data/hermes-sandboxes
Critical fixes:
- Add --worktree/-w to hermes_cli/main.py argparse (both chat
subcommand and top-level parser) so 'hermes -w' works via the
actual CLI entry point, not just 'python cli.py -w'
- Pass worktree flag through cmd_chat() kwargs to cli_main()
- Handle worktree attr in bare 'hermes' and --resume/--continue paths
Bug fixes in cli.py:
- Skip worktree creation for --list-tools/--list-toolsets (wasteful)
- Wrap git worktree subprocess.run in try/except (crash on timeout)
- Add stale worktree pruning on startup (_prune_stale_worktrees):
removes clean worktrees older than 24h left by crashed/killed sessions
Documentation updates:
- AGENTS.md: add --worktree to CLI commands table
- cli-config.yaml.example: add worktree config section
- website/docs/reference/cli-commands.md: add to core commands
- website/docs/user-guide/cli.md: add usage examples
- website/docs/user-guide/configuration.md: add config docs
Test improvements (17 → 31 tests):
- Stale worktree pruning (prune old clean, keep recent, keep dirty)
- Directory symlink via .worktreeinclude
- Edge cases (no commits, not a repo, pre-existing .worktrees/)
- CLI flag/config OR logic
- TERMINAL_CWD integration
- System prompt injection format
Add a --worktree (-w) flag to the hermes CLI that creates an isolated
git worktree for the session. This allows running multiple hermes-agent
instances concurrently on the same repo without file collisions.
How it works:
- On startup with -w: detects git repo, creates .worktrees/<session>/
with its own branch (hermes/<session-id>), sets TERMINAL_CWD to it
- Each agent works in complete isolation — independent HEAD, index,
and working tree, shared git object store
- On exit: auto-removes worktree and branch if clean, warns and
keeps if there are uncommitted changes
- .worktreeinclude file support: list gitignored files (.env, .venv/)
to auto-copy/symlink into new worktrees
- .worktrees/ is auto-added to .gitignore
- Agent gets a system prompt note about the worktree context
- Config support: set worktree: true in config.yaml to always enable
Usage:
hermes -w # Interactive mode in worktree
hermes -w -q "Fix issue #123" # Single query in worktree
# Or in config.yaml:
worktree: true
Includes 17 tests covering: repo detection, worktree creation,
independence verification, cleanup (clean/dirty), .worktreeinclude,
.gitignore management, and 10 concurrent worktrees.
Closes#652
The /clear command was using Rich's console.clear() and console.print()
which write directly to stdout. Inside the TUI, prompt_toolkit's
patch_stdout intercepts stdout via StdoutProxy, which doesn't interpret
screen-clearing escape sequences and mangles Rich's ANSI output,
resulting in raw escape codes dumped to the terminal.
Fix:
- Use prompt_toolkit's output.erase_screen() + cursor_goto() to clear
the terminal directly (bypasses patch_stdout's StdoutProxy)
- Render the banner through ChatConsole (which routes Rich output
through prompt_toolkit's native print_formatted_text/ANSI renderer)
- Use _cprint for the status message (prompt_toolkit-compatible)
- Fall back to the old behavior when not inside the TUI (e.g. startup)
Reduces token usage and latency for most tasks by defaulting to
medium reasoning effort instead of xhigh. Users can still override
via config or CLI flag. Updates code, tests, example config, and docs.
Subagent tool calls now count toward the same session-wide iteration
limit as the parent agent. Previously, each subagent had its own
independent counter, so a parent with max_iterations=60 could spawn
3 subagents each doing 50 calls = 150 total tool calls unmetered.
Changes:
- IterationBudget: thread-safe shared counter (run_agent.py)
- consume(): try to use one iteration, returns False if exhausted
- refund(): give back one iteration (for execute_code turns)
- Thread-safe via Lock (subagents run in ThreadPoolExecutor)
- Parent creates the budget, children inherit it via delegate_tool.py
- execute_code turns are refunded (don't count against budget)
- Default raised from 60 → 90 to account for shared consumption
- Per-child cap (50) still applies as a safety valve
The per-child max_iterations (default 50) remains as a per-child
ceiling, but the shared budget is the hard session-wide limit.
A child stops at whichever comes first.
Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn)
to the inference provider system. All use standard OpenAI-compatible
chat/completions endpoints with Bearer token auth.
Core changes:
- auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var
fields. Added providers to PROVIDER_REGISTRY. Added provider aliases
(glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key
providers in resolve_provider(). Added resolve_api_key_provider_credentials()
and get_api_key_provider_status() helpers.
- runtime_provider.py: Added generic API-key provider branch in
resolve_runtime_provider() — any provider with auth_type='api_key'
is automatically handled.
- main.py: Added providers to hermes model menu with generic
_model_flow_api_key_provider() flow. Updated _has_any_provider_configured()
to check all provider env vars. Updated argparse --provider choices.
- setup.py: Added providers to setup wizard with API key prompts and
curated model lists.
- config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY,
etc.) to OPTIONAL_ENV_VARS.
- status.py: Added API key display and provider status section.
- doctor.py: Added connectivity checks for each provider endpoint.
- cli.py: Updated provider docstrings.
Docs: Updated README.md, .env.example, cli-config.yaml.example,
cli-commands.md, environment-variables.md, configuration.md.
Tests: 50 new tests covering registry, aliases, resolution, auto-detection,
credential resolution, and runtime provider dispatch.
Inspired by PR #33 (numman-ali) which proposed a provider registry approach.
Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related
provider improvements merged earlier in this changeset.
API key selection is now base_url-aware: when the resolved base_url
targets OpenRouter, OPENROUTER_API_KEY takes priority (preserving the
#289 fix). When hitting any other endpoint (Z.ai, vLLM, custom, etc.),
OPENAI_API_KEY takes priority so the OpenRouter key doesn't leak.
Applied in both the runtime provider resolver (the real code path) and
the CLI initial default (for consistency).
Fixes#560.
Inspired by Claude Code's /insights, adapted for Hermes Agent's multi-platform
architecture. Analyzes session history from state.db to produce comprehensive
usage insights.
Features:
- Overview stats: sessions, messages, tokens, estimated cost, active time
- Model breakdown: per-model sessions, tokens, and cost estimation
- Platform breakdown: CLI vs Telegram vs Discord etc. (unique to Hermes)
- Tool usage ranking: most-used tools with percentages
- Activity patterns: day-of-week chart, peak hours, streaks
- Notable sessions: longest, most messages, most tokens, most tool calls
- Cost estimation: real pricing data for 25+ models (OpenAI, Anthropic,
DeepSeek, Google, Meta) with fuzzy model name matching
- Configurable time window: --days flag (default 30)
- Source filtering: --source flag to filter by platform
Three entry points:
- /insights slash command in CLI (supports --days and --source flags)
- /insights slash command in gateway (compact markdown format)
- hermes insights CLI subcommand (standalone)
Includes 56 tests covering pricing helpers, format helpers, empty DB,
populated DB with multi-platform data, filtering, formatting, and edge cases.
Authored by rovle. Adds Daytona as the sixth terminal execution backend
with cloud sandboxes, persistent workspaces, and full CLI/gateway integration.
Includes 24 unit tests and 8 integration tests.
Alt key combos pass through all terminal emulators (sent as ESC + key),
unlike Ctrl+V which terminals intercept for text paste. This is the
reliable way to attach clipboard images on WSL2, Windows Terminal,
VSCode, and SSH sessions where Ctrl+V never reaches the application
for image-only clipboard content.
Also adds 'Paste image: Alt+V (or /paste)' hint to /help output.
The original implementation only supported xclip (X11), which silently
fails on WSL2 (can't access Windows clipboard for images), Wayland
desktops (xclip is X11-only), and VSCode terminal on WSL2.
Clipboard backend changes (hermes_cli/clipboard.py):
- WSL2: detect via /proc/version, use powershell.exe with .NET
System.Windows.Forms.Clipboard to extract images as base64 PNG
- Wayland: use wl-paste with MIME type detection, auto-convert BMP
to PNG for WSLg environments (via Pillow or ImageMagick)
- Dispatch order: WSL → Wayland → X11 (xclip), with fallthrough
- New has_clipboard_image() for lightweight clipboard checks
- Cache WSL detection result per-process
CLI changes (cli.py):
- /paste command: explicit clipboard image check for terminals where
BracketedPaste doesn't fire (image-only clipboard in VSCode/WinTerm)
- Ctrl+V keybinding: fallback for Linux terminals where Ctrl+V sends
raw byte instead of triggering bracketed paste
Tests: 80 tests (up from 37) covering WSL, Wayland, X11 dispatch,
BMP conversion, has_clipboard_image, and /paste command.
Copy an image to clipboard (screenshot, browser, etc.) and paste into
the Hermes CLI. The image is saved to ~/.hermes/images/, shown as a
badge above the input ([📎 Image #1]), and sent to the model as a
base64-encoded OpenAI vision multimodal content block.
Implementation:
- hermes_cli/clipboard.py: clean module with platform-specific extraction
- macOS: pngpaste (if installed) → osascript fallback (always available)
- Linux: xclip (apt install xclip)
- cli.py: BracketedPaste key handler checks clipboard on every paste,
image bar widget shows attached images, chat() converts to multimodal
content format, Ctrl+C clears attachments
Inspired by @m0at's fork (https://github.com/m0at/hermes-agent) which
implemented image paste support for local vision models. Reimplemented
cleanly as a separate module with tests.
Replaces the unsafe 128K fallback for unknown models with a descending
probe strategy (2M → 1M → 512K → 200K → 128K → 64K → 32K). When a
context-length error occurs, the agent steps down tiers and retries.
The discovered limit is cached per model+provider combo in
~/.hermes/context_length_cache.yaml so subsequent sessions skip probing.
Also parses API error messages to extract the actual context limit
(e.g. 'maximum context length is 32768 tokens') for instant resolution.
The CLI banner now displays the context window size next to the model
name (e.g. 'claude-opus-4 · 200K context · Nous Research').
Changes:
- agent/model_metadata.py: CONTEXT_PROBE_TIERS, persistent cache
(save/load/get), parse_context_limit_from_error(), get_next_probe_tier()
- agent/context_compressor.py: accepts base_url, passes to metadata
- run_agent.py: step-down logic in context error handler, caches on success
- cli.py + hermes_cli/banner.py: context length in welcome banner
- tests: 22 new tests for probing, parsing, and caching
Addresses #132. PR #319's approach (8K default) rejected — too conservative.
Previously, pressing Ctrl+C while text was typed in the input prompt
would immediately exit Hermes. Now follows standard shell behavior:
- Text in buffer → Ctrl+C clears the line (like bash)
- Empty buffer → Ctrl+C exits
This means accidentally hitting Ctrl+C while composing a message just
clears the input instead of killing the session. A second Ctrl+C on
the empty prompt still exits as expected.
The TextArea uses multiline=True, so up/down arrows only moved the
cursor within text — history browsing via FileHistory was attached
but inaccessible.
Two fixes:
1. Add up/down key bindings in normal input mode that call
Buffer.auto_up()/auto_down(). These intelligently handle both:
cursor movement when editing multi-line text, and history
browsing when on the first/last line.
2. Pass append_to_history=True to buffer.reset() in the Enter
handler so messages actually get saved to ~/.hermes_history.
History persists across sessions via FileHistory. The bindings are
filtered out during clarify, approval, and sudo prompts (which
have their own up/down handlers).
After /reload-mcp updates self.agent.tools, immediately call
_persist_session() so the session JSON file at ~/.hermes/sessions/
reflects the new tools list. Without this, the tools field in the
session log would only update on the next conversation turn — if
the user quit after reloading, the log would have stale tools.
- CLI: After reload, refreshes self.agent.tools and valid_tool_names
so the model sees updated tools on its next API call
- Both CLI and Gateway: Appends a [SYSTEM: ...] message at the END
of conversation history explaining what changed (added/removed/
reconnected servers, tool count). This preserves prompt-cache for
the system prompt and earlier messages — only the tail changes.
- Gateway already creates a new AIAgent per message so tools refresh
naturally; the injected message provides context for the model
Banner integration:
- MCP Servers section in CLI startup banner between Tools and Skills
- Shows each server with transport type, tool count, connection status
- Failed servers shown in red; section hidden when no MCP configured
- Summary line includes MCP server count
- Removed raw print() calls from discovery (banner handles display)
/reload-mcp command:
- New slash command in both CLI and gateway
- Disconnects all MCP servers, re-reads config.yaml, reconnects
- Reports what changed (added/removed/reconnected servers)
- Allows adding/removing MCP servers without restarting
Resources & Prompts support:
- 4 utility tools registered per server: list_resources, read_resource,
list_prompts, get_prompt
- Exposes MCP Resources (data sources) and Prompts (templates) as tools
- Proper parameter schemas (uri for read_resource, name for get_prompt)
- Handles text and binary resource content
- 23 new tests covering schemas, handlers, and registration
Test coverage: 74 MCP tests total, 1186 tests pass overall.
When both OPENROUTER_API_KEY and OPENAI_API_KEY are set (e.g. OPENAI_API_KEY
in .bashrc), the wrong key was sent to OpenRouter causing auth failures.
Fixed key resolution order in cli.py and runtime_provider.py.
Fixes#289
Connect to external MCP servers via stdio transport, discover their tools
at startup, and register them into the hermes-agent tool registry.
- New tools/mcp_tool.py: config loading, server connection via background
event loop, tool handler factories, discovery, and graceful shutdown
- model_tools.py: trigger MCP discovery after built-in tool imports
- cli.py: call shutdown_mcp_servers in _run_cleanup
- pyproject.toml: add mcp>=1.2.0 as optional dependency
- 27 unit tests covering config, schema conversion, handlers, registration,
SDK interaction, toolset injection, graceful fallback, and shutdown
Config format (in ~/.hermes/config.yaml):
mcp_servers:
filesystem:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
The OpenAI API returns content: null on assistant messages with tool
calls. msg.get('content', '') returns None when the key exists with
value None, causing TypeError on len(), string concatenation, and
.strip() in downstream code paths.
Fixed 4 locations that process conversation messages:
- agent/auxiliary_client.py:84 — None passed to API calls
- cli.py:1288 — crash on content[:200] and len(content)
- run_agent.py:3444 — crash on None.strip()
- honcho_integration/session.py:445 — 'None' rendered in transcript
13 other instances were verified safe (already protected, only process
user/tool messages, or use the safe pattern).
Pattern: msg.get('content', '') → msg.get('content') or ''
Fixes#276
Priority is: CLI arg > config file > env var > default
(not env var > config file as the old comment stated)
The test failed because config.yaml had max_turns at both root level
and inside agent section. The test cleared agent.max_turns but the
root-level value still took precedence over the env var. Fixed the
test to clear both, and corrected the comment to match the intended
priority order.
load_cli_config() only merged keys present in its hardcoded defaults
dict, silently dropping user-added keys like platform_toolsets (saved
by 'hermes tools'), provider_routing, memory, honcho, etc.
Added a second pass to carry over all file_config keys that aren't in
defaults, so 'hermes tools' changes actually take effect in CLI mode.
The gateway was unaffected (reads YAML directly via yaml.safe_load).
Introduced a new `provider_routing` section in the CLI configuration to control how requests are routed across providers when using OpenRouter. This includes options for sorting providers by throughput, latency, or price, as well as allowing or ignoring specific providers, setting the order of provider attempts, and managing data collection policies. Updated relevant classes and documentation to support these features, enhancing flexibility in provider selection.
Updated the CLI header formatting for tool and configuration displays to center titles within their respective widths. Enhanced the display of command descriptions to include an ellipsis for longer texts, ensuring better readability. This refactor improves the overall user interface of the CLI.
Introduced a new command "/usage" in the CLI to show cumulative token usage for the current session. This includes details on prompt tokens, completion tokens, total tokens, API calls, and context state. Updated command documentation to reflect this addition. Enhanced the AIAgent class to track token usage throughout the session.
Introduced a new command "/compress" to the CLI, allowing users to manually trigger context compression on the current conversation. The method checks for sufficient conversation history and active agent status before performing compression, providing feedback on the number of messages and tokens before and after the operation. Updated command documentation accordingly.
- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.