- Updated `_on_tool_gen_start` method in `HermesCLI` to close open streaming boxes exactly once, preventing potential multiple closures.
- Added a check for `_stream_box_opened` to manage the state of the streaming box more effectively, enhancing user experience during large payload streaming.
- Introduced `_on_tool_gen_start` in `HermesCLI` to indicate when tool-call arguments are being generated, enhancing user feedback during streaming.
- Updated `AIAgent` to support a new `tool_gen_callback`, notifying the display layer when tool generation starts, allowing for better user experience during large payloads.
- Ensured that the callback is triggered appropriately during streaming events to prevent user interface freezing.
On macOS with Homebrew (Apple Silicon), Node.js and agent-browser
binaries live under /opt/homebrew/bin/ which is not included in the
_SANE_PATH fallback used by browser_tool.py and environments/local.py.
When Hermes runs with a filtered PATH (e.g. as a systemd service),
these binaries are invisible, causing 'env: node: No such file or
directory' errors when using browser tools.
Changes:
- Add /opt/homebrew/bin and /opt/homebrew/sbin to _SANE_PATH in both
browser_tool.py and environments/local.py
- Add _discover_homebrew_node_dirs() to find versioned Node installs
(e.g. brew install node@24) that aren't linked into /opt/homebrew/bin
- Extend _find_agent_browser() to search Homebrew and Hermes-managed
dirs when agent-browser isn't on the current PATH
- Include discovered Homebrew node dirs in subprocess PATH when
launching agent-browser
- Add 11 new tests covering all Homebrew path discovery logic
The gateway memory flush agent reviews old conversation history on session
reset/expiry and writes to memory. It had no awareness of memory changes
made after that conversation ended (by the live agent, cron jobs, or other
sessions), causing silent overwrites of newer entries.
Two fixes:
1. Skip memory flush entirely for cron sessions (session IDs starting with
'cron_'). Cron sessions are headless with no meaningful user conversation
to extract memories from.
2. Inject the current live memory state (MEMORY.md + USER.md) directly into
the flush prompt. The flush agent can now see what's already saved and
make informed decisions — only adding genuinely new information rather
than blindly overwriting entries that may have been updated since the
conversation ended.
Addresses the root cause identified in #2670: the flush agent was making
memory decisions blind to the current state of memory, causing stale
context to overwrite newer entries on gateway restarts and session resets.
Co-authored-by: devorun <devorun@users.noreply.github.com>
Co-authored-by: dlkakbs <dlkakbs@users.noreply.github.com>
* feat(config): support ${ENV_VAR} substitution in config.yaml
* fix: extend env var expansion to CLI and gateway config loaders
The original PR (#2680) only wired _expand_env_vars into load_config(),
which is used by 'hermes tools' and 'hermes setup'. The two primary
config paths were missed:
- load_cli_config() in cli.py (interactive CLI)
- Module-level _cfg in gateway/run.py (gateway — bridges api_keys to env vars)
Also:
- Remove redundant 'import re' (already imported at module level)
- Add missing blank lines between top-level functions (PEP 8)
- Add tests for load_cli_config() expansion
---------
Co-authored-by: teyrebaz33 <hakanerten02@hotmail.com>
echo was called with the full unquoted path (~username/suffix), allowing
command substitution in the suffix (e.g. ~user/$(malicious)) to execute
arbitrary shell commands. The fix expands only the validated ~username
portion via the shell and concatenates the suffix as a plain string.
Co-authored-by: Gutslabs <gutslabsxyz@gmail.com>
A bare `except Exception: pass` meant any YAML syntax error, bad value,
or unexpected structure in config.yaml was silently ignored and the
gateway fell back to .env / gateway.json without any indication.
Users had no way to know why their config changes had no effect.
Co-authored-by: sprmn24 <oncuevtv@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
httpx.AsyncClient awaits event hooks. The sync _ssrf_redirect_guard
returned None, causing 'object NoneType can't be used in await
expression' on any vision_analyze call that followed redirects.
Caught during live PTY testing of the merged SSRF protection.
* fix(security): add SSRF protection to vision_tools and web_tools
Both vision_analyze and web_extract/web_crawl accept arbitrary URLs
without checking if they target private/internal network addresses.
A prompt-injected or malicious skill could use this to access cloud
metadata endpoints (169.254.169.254), localhost services, or private
network hosts.
Adds a shared url_safety.is_safe_url() that resolves hostnames and
blocks private, loopback, link-local, and reserved IP ranges. Also
blocks known internal hostnames (metadata.google.internal).
Integrated at the URL validation layer in vision_tools and before
each website_policy check in web_tools (extract, crawl).
* test(vision): update localhost test to reflect SSRF protection
The existing test_valid_url_with_port asserted localhost URLs pass
validation. With SSRF protection, localhost is now correctly blocked.
Update the test to verify the block, and add a separate test for
valid URLs with ports using a public hostname.
* fix(security): harden SSRF protection — fail-closed, CGNAT, multicast, redirect guard
Follow-up hardening on top of dieutx's SSRF protection (PR #2630):
- Change fail-open to fail-closed: DNS errors and unexpected exceptions
now block the request instead of allowing it (OWASP best practice)
- Block CGNAT range (100.64.0.0/10): Python's ipaddress.is_private
does NOT cover this range (returns False for both is_private and
is_global). Used by Tailscale/WireGuard and carrier infrastructure.
- Add is_multicast and is_unspecified checks: multicast (224.0.0.0/4)
and unspecified (0.0.0.0) addresses were not caught by the original
four-check chain
- Add redirect guard for vision_tools: httpx event hook re-validates
each redirect target against SSRF checks, preventing the classic
redirect-based SSRF bypass (302 to internal IP)
- Move SSRF filtering before backend dispatch in web_extract: now
covers Parallel and Tavily backends, not just Firecrawl
- Extract _is_blocked_ip() helper for cleaner IP range checking
- Add 24 new tests (CGNAT, multicast, IPv4-mapped IPv6, fail-closed
behavior, parametrized blocked/allowed IP lists)
- Fix existing tests to mock DNS resolution for test hostnames
---------
Co-authored-by: dieutx <dangtc94@gmail.com>
Root cause: terminal_tool, execute_code, and process_registry returned raw
subprocess output with ANSI escape sequences intact. The model saw these
in tool results and copied them into file writes.
Previous fix (PR #2532) stripped ANSI at the write point in file_tools.py,
but this was a band-aid — regex on file content risks corrupting legitimate
content, and doesn't prevent ANSI from wasting tokens in the model context.
Source-level fix:
- New tools/ansi_strip.py with comprehensive ECMA-48 regex covering CSI
(incl. private-mode, colon-separated, intermediate bytes), OSC (both
terminators), DCS/SOS/PM/APC strings, Fp/Fe/Fs/nF escapes, 8-bit C1
- terminal_tool.py: strip output before returning to model
- code_execution_tool.py: strip stdout/stderr before returning
- process_registry.py: strip output in poll/read_log/wait
- file_tools.py: remove _strip_ansi band-aid (no longer needed)
Verified: `ls --color=always` output returned as clean text to model,
file written from that output contains zero ESC bytes.
* fix(gateway): added MATRIX_ALLOWED_USERS to list of env vars checked by gateway
* fix(gateway): add all missing platform allowlist env vars to startup check
The startup warning for 'No user allowlists configured' was only checking
TELEGRAM, DISCORD, WHATSAPP, SLACK, and SMS — missing SIGNAL, EMAIL,
MATTERMOST, and DINGTALK. Users of those platforms would see a spurious
warning even with their platform-specific allowlist configured.
Now matches the canonical platform_env_map in _is_user_authorized().
---------
Co-authored-by: SteelPh0enix <wojciech_olech@hotmail.com>
Cherry-picked from PR #2576 by ereid7, plus read-side fix from 173a5c62.
Both fixes were originally landed in 173a5c62 but were inadvertently
reverted by commit 34be3f8b (a squash-merge that bundled unrelated
tools_config.py changes).
Save side (_save_platform_tools): exclude platform default toolset
names (hermes-cli, hermes-telegram) from preserved entries so they
don't silently re-enable everything.
Read side (_get_platform_tools): when the saved list contains explicit
configurable keys, use direct membership instead of subset inference.
The subset approach is broken when composite toolsets like hermes-cli
resolve to ALL tools.
Cherry-picked from PR #2583 by Glucksberg.
The MEDIA: regex used \S+ which truncated paths at the first space.
Added a space-aware alternative anchored to known media extensions.
Also updated extract_local_files to allow spaces in path segments.
Follow-up fix: changed \s to [^\S\n] in the space-matching group
so the regex doesn't greedily match across newlines (broke multi-line
MEDIA: tags).
Cherry-picked from PR #2575 by ticketclosed-wontfix.
Filters out Discord system messages (thread renames, pins, member joins,
boosts) that were being treated as regular user messages.
Follow-up fix: also allow MessageType.reply (value 19) — the original
filter only allowed MessageType.default, which would silently drop all
reply-based interactions.
Added pytest.importorskip for discord dependency in tests.
An agent session killed the systemd-managed gateway (PID 1605) and restarted
it with '&disown', taking it outside systemd's Restart= management. When the
orphaned process later received SIGTERM, nothing restarted it.
Add dangerous command patterns to detect:
- 'gateway run' with & (background), disown, nohup, or setsid
- These should use 'systemctl --user restart hermes-gateway' instead
Also applied directly to main repo and fixed the systemd service:
- Changed Restart=on-failure to Restart=always (clean SIGTERM = exit 0 = not
a 'failure', so on-failure never triggered)
- RestartSec=10 for reasonable restart delay
Previously 'Activated skills: xxx' was printed above the banner in
show_banner(). Now it prints directly after the 'Welcome to Hermes
Agent!' line in run(), which is a more natural placement.
Path('~/.hermes/image.png').is_file() returns False because Path
doesn't expand tilde. This caused the tool to fall through to URL
validation, which also failed, producing a confusing error:
'Invalid image source. Provide an HTTP/HTTPS URL or a valid local
file path.'
Fix: use os.path.expanduser() before constructing the Path object.
Added two tests for tilde expansion (success and nonexistent file).
When a messaging platform fails to connect at startup (e.g. transient DNS
failure) or disconnects at runtime with a retryable error, the gateway now
queues it for background reconnection instead of giving up permanently.
- New _platform_reconnect_watcher background task runs alongside the
existing session expiry watcher
- Exponential backoff: 30s, 60s, 120s, 240s, 300s cap
- Max 20 retry attempts before giving up on a platform
- Non-retryable errors (bad auth token, etc.) are not retried
- Runtime disconnections via _handle_adapter_fatal_error now queue
retryable failures instead of triggering gateway shutdown
- On successful reconnect, adapter is wired up and channel directory
is rebuilt automatically
Fixes the case where a DNS blip during gateway startup caused Telegram
and Discord to be permanently unavailable until manual restart.
Local LLM servers (llama.cpp, ollama, vLLM, etc.) typically don't
require authentication. When a custom base_url is configured but no
API key is found, use a placeholder instead of failing with
'Provider resolver returned an empty API key.'
The OpenAI SDK accepts any string as api_key, and local servers
simply ignore the Authorization header.
Fixes issue reported by @ThatWolfieGuy — llama.cpp stopped working
after updating because the new runtime provider resolver enforces
non-empty API keys even for keyless local endpoints.
Added explicit closing of the OpenAI/httpx client in the background review process to prevent "Event loop is closed" errors. This change ensures that the client is properly cleaned up when the review agent is no longer needed, enhancing stability and resource management.
When AsyncOpenAI clients are garbage-collected after the event loop
closes, their AsyncHttpxClientWrapper.__del__ tries to schedule
aclose() on the dead loop, causing RuntimeError: Event loop is closed.
prompt_toolkit catches this as an unhandled exception and shows
'Press ENTER to continue...' which blocks CLI exit.
Fix: Add shutdown_cached_clients() to auxiliary_client.py that marks
all cached async clients' underlying httpx transport as CLOSED before
GC runs. This prevents __del__ from attempting the aclose() call.
- _force_close_async_httpx(): sets httpx AsyncClient._state to CLOSED
- shutdown_cached_clients(): iterates _client_cache, closes sync clients
normally and marks async clients as closed
- Also fix stale client eviction in _get_cached_client to mark evicted
async clients as closed (was just del-ing them, triggering __del__)
- Call shutdown_cached_clients() from _run_cleanup() in cli.py
The previous commit capped the 1.4x at 95% of context, but the multiplier
itself is unnecessary and confusing:
85% threshold × 1.4 = 119% of context → never fires
95% warn × 1.4 = 133% of context → never warns
The 85% hygiene threshold already provides ample headroom over the agent's
own 50% compressor. Even if rough estimates overestimate by 50%, hygiene
would fire at ~57% actual usage — safe and harmless.
Remove the multiplier entirely. Both actual and estimated token paths
now use the same 85% / 95% thresholds. Update tests and comments.
Three bugs in gateway session hygiene pre-compression caused 'Session too
large' errors for ~200K context models like GLM-5-turbo on z.ai:
1. Gateway hygiene called get_model_context_length(model) without passing
config_context_length, provider, or base_url — so user overrides like
model.context_length: 180000 were ignored, and provider-aware detection
(models.dev, z.ai endpoint) couldn't fire. The agent's own compressor
correctly passed all three (run_agent.py line 1038).
2. The 1.4x safety factor on rough token estimates pushed the compression
threshold above the model's actual context limit:
200K * 0.85 * 1.4 = 238K > 200K (model limit)
So hygiene never compressed, sessions grew past the limit, and the API
rejected the request.
3. Same issue for the warn threshold: 200K * 0.95 * 1.4 = 266K.
Fix:
- Read model.context_length, provider, and base_url from config.yaml
(same as run_agent.py does) and pass them to get_model_context_length()
- Resolve provider/base_url from runtime when not in config
- Cap the 1.4x-adjusted compress threshold at 95% of context_length
- Cap the 1.4x-adjusted warn threshold at context_length
Affects: z.ai GLM-5/GLM-5-turbo, any ~200K or smaller context model
where the 1.4x factor would push 85% above 100%.
Ref: Discord report from Ddox — glm-5-turbo on z.ai coding plan
* fix(mcp-oauth): port mismatch, path traversal, and shared state in OAuth flow
Three bugs in the new MCP OAuth 2.1 PKCE implementation:
1. CRITICAL: OAuth redirect port mismatch — build_oauth_auth() calls
_find_free_port() to register the redirect_uri, but _wait_for_callback()
calls _find_free_port() again getting a DIFFERENT port. Browser redirects
to port A, server listens on port B — callback never arrives, 120s timeout.
Fix: share the port via module-level _oauth_port variable.
2. MEDIUM: Path traversal via unsanitized server_name — HermesTokenStorage
uses server_name directly in filenames. A name like "../../.ssh/config"
writes token files outside ~/.hermes/mcp-tokens/.
Fix: sanitize server_name with the same regex pattern used elsewhere.
3. MEDIUM: Class-level auth_code/state on _CallbackHandler causes data
races if concurrent OAuth flows run. Second callback overwrites first.
Fix: factory function _make_callback_handler() returns a handler class
with a closure-scoped result dict, isolating each flow.
* test: add tests for MCP OAuth path traversal, handler isolation, and port sharing
7 new tests covering:
- Path traversal blocked (../../.ssh/config stays in mcp-tokens/)
- Dots/slashes sanitized and resolved within base dir
- Normal server names preserved
- Special characters sanitized (@, :, /)
- Concurrent handler result dicts are independent
- Handler writes to its own result dict, not class-level
- build_oauth_auth stores port in module-level _oauth_port
---------
Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
Models occasionally copy ANSI escape sequences from terminal output
or display formatting into file content, breaking shebangs and
injecting binary characters into scripts.
Strip ANSI codes (CSI, OSC, simple escapes) from:
- write_file content
- patch old_string, new_string, and V4A patch content
The check is fast (skips entirely if no ESC byte present).
Reported by Andi Jaeger.
When a session expires (daily schedule or idle timeout) and is
automatically reset, send a notification to the user explaining
what happened:
◐ Session automatically reset (inactive for 24h).
Conversation history cleared.
Use /resume to browse and restore a previous session.
Adjust reset timing in config.yaml under session_reset.
Notifications are suppressed when:
- The expired session had no activity (no tokens used)
- The platform is excluded (api_server, webhook by default)
- notify: false in config
Changes:
- session.py: _should_reset() returns reason string ('idle'/'daily')
instead of bool; SessionEntry gains auto_reset_reason and
reset_had_activity fields; old entry's total_tokens checked
- config.py: SessionResetPolicy gains notify (bool, default: true)
and notify_exclude_platforms (default: api_server, webhook)
- run.py: sends notification via adapter.send() before processing
the user's message, with activity + platform checks
- 13 new tests
Config (config.yaml):
session_reset:
notify: true
notify_exclude_platforms: [api_server, webhook]
Three fixes for the Matrix adapter:
1. Remove RoomMessageMedia callback registration — RoomMessageImage
inherits from it, causing images to be processed twice.
2. Add event ID deduplication to both text and media handlers.
nio can fire the same event more than once; bounded deque+set
tracks the last 1000 events.
3. Cache images locally via Matrix client download. MXC URLs require
authentication, so the vision pipeline couldn't access them.
Images are now downloaded via the authenticated client and saved
to the local cache (same pattern as Telegram/Discord).
Cherry-picked from PR #2353 by williamtwomey.
Co-authored-by: williamtwomey <williamtwomey@users.noreply.github.com>
Closes#2476
The polling error callback previously only handled Conflict errors
(409 from multiple getUpdates callers). All other errors, including
NetworkError and TimedOut that python-telegram-bot raises when the
host loses connectivity (Mac sleep, WiFi switch, VPN reconnect),
were logged and silently discarded. The bot would stop responding
until manually restarted.
Fix:
- Add _looks_like_network_error() to classify transient connectivity
errors (NetworkError, TimedOut, OSError, ConnectionError).
- Add _handle_polling_network_error() with exponential back-off
reconnect: retries up to 10 times with delays 5s, 10s, 20s, 40s,
60s (capped). On exhaustion, marks the adapter retryable-fatal so
launchd/systemd can restart the gateway process.
- Refactor _polling_error_callback() to route network errors to the
new handler before falling through to a generic error log.
- Track _polling_network_error_count (reset on successful reconnect)
independently from _polling_conflict_count.
The context length resolver was querying the /models endpoint for known
providers like GitHub Copilot, which returns a provider-imposed limit
(128k) instead of the model's actual context window (400k for gpt-5.4).
Since this check happened before the models.dev lookup, the wrong value
won every time.
Fix:
- Add api.githubcopilot.com and models.github.ai to _URL_TO_PROVIDER
- Skip the endpoint metadata probe for known providers — their /models
data is unreliable for context length. models.dev has the correct
per-provider values.
Reported by danny [DUMB] — gpt-5.4 via Copilot was resolving to 128k
instead of the correct 400k from models.dev.
- Download and cache .pdf, .docx, .xlsx, .pptx attachments locally
instead of passing expiring CDN URLs to the agent
- Inject .txt and .md content (≤100 KB) into event.text so the agent
sees file content without needing to fetch the URL
- Add 20 MB size guard and SUPPORTED_DOCUMENT_TYPES allowlist
- Fix: unsupported types (.zip etc.) no longer get MessageType.DOCUMENT
- Add 9 unit tests in test_discord_document_handling.py
Mirrors the Slack implementation from PR #784. Discord CDN URLs are
publicly accessible so no auth header is needed (unlike Slack).
Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
- test_plugins.py: remove tests for unimplemented plugin command API
(get_plugin_command_handler, register_command never existed)
- test_redact.py: add autouse fixture to clear HERMES_REDACT_SECRETS
env var leaked by cli.py import in other tests
- test_signal.py: same HERMES_REDACT_SECRETS fix for phone redaction
- test_mattermost.py: add @bot_user_id to test messages after the
mention-only filter was added in #2443
- test_context_token_tracking.py: mock resolve_provider_client for
openai-codex provider that requires real OAuth credentials
Full suite: 5893 passed, 0 failed.
Based on PR #2454 by @kshitijk4poor (reimplemented lean — 127 lines
vs original 715).
Type @ in the CLI input to get autocomplete suggestions for context
references:
- Static: @diff, @staged, @file:, @folder:, @git:, @url:
- @file:path and @folder:path browse the filesystem
- Bare @ or @partial shows matching files/folders from cwd
Dropped from original: .hermesignore walking, custom shell tokenizer,
PathToken dataclass, fuzzy matching, token estimates. Kept: all
user-facing functionality.
Reads auxiliary.vision.timeout from config.yaml (default: 30s) and
passes it to async_call_llm. Useful for slow local vision models
that need more than 30 seconds.
Setting is in config.yaml (not .env) since it's not a secret:
auxiliary:
vision:
timeout: 120
Based on PR #2306.
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
* fix: respect DashScope v1 runtime mode for alibaba
Remove the hardcoded Alibaba branch from resolve_runtime_provider()
that forced api_mode='anthropic_messages' regardless of the base URL.
Alibaba now goes through the generic API-key provider path, which
auto-detects the protocol from the URL:
- /apps/anthropic → anthropic_messages (via endswith check)
- /v1 → chat_completions (default)
This fixes Alibaba setup with OpenAI-compatible DashScope endpoints
(e.g. coding-intl.dashscope.aliyuncs.com/v1) that were broken because
runtime always forced Anthropic mode even when setup saved a /v1 URL.
Based on PR #2024 by @kshitijk4poor.
* docs(skill): add split, merge, search examples to ocr-and-documents skill
Adds pymupdf examples for PDF splitting, merging, and text search
to the existing ocr-and-documents skill. No new dependencies — pymupdf
already covers all three operations natively.
* fix: replace all production print() calls with logger in rl_training_tool
Replace all bare print() calls in production code paths with proper logger calls.
- Add `import logging` and module-level `logger = logging.getLogger(__name__)`
- Replace print() in _start_training_run() with logger.info()
- Replace print() in _stop_training_run() with logger.info()
- Replace print(Warning/Note) calls with logger.warning() and logger.info()
Using the logging framework allows log level filtering, proper formatting,
and log routing instead of always printing to stdout.
* fix(gateway): process /queue'd messages after agent completion
/queue stored messages in adapter._pending_messages but never consumed
them after normal (non-interrupted) completion. The consumption path
at line 5219 only checked pending messages when result.get('interrupted')
was True — since /queue deliberately doesn't interrupt, queued messages
were silently dropped.
Now checks adapter._pending_messages after both interrupted AND normal
completion. For queued messages (non-interrupt), the first response is
delivered before recursing to process the queued follow-up. Skips the
direct send when streaming already delivered the response.
Reported by GhostMode on Discord.
* chore: add minimax/minimax-m2.7 to OpenRouter and MiniMax model catalogs
---------
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
Follow-up to 669c60a6 (cherry-pick of PR #2187, fixes#2177).
The original fix emits a "\n\n" delta immediately after every
_execute_tool_calls() invocation. When the model runs multiple
consecutive tool iterations before producing text (common with
search → read → analyze flows), each iteration appends its own
paragraph break, resulting in 4-6+ blank lines before the actual
response.
Replace the immediate delta with a deferred flag
(_stream_needs_break). _fire_stream_delta() checks the flag and
prepends a single "\n\n" only when the first real text delta
arrives, so multiple back-to-back tool iterations still produce
exactly one paragraph break.