The '📨 Queued:' label was misleading — it looked like the message was
silently deferred when it was actually being sent immediately after the
interrupt. Changed to '⚡ Sending after interrupt:' with multi-message
count when the user typed several messages during agent execution.
Added comment documenting that this code path only applies when
busy_input_mode == 'interrupt' (the default).
Based on PR #4821 by iRonin.
Co-authored-by: iRonin <iRonin@users.noreply.github.com>
The original PR missed the third CLI restore path — the /session switch
command that loads history via get_messages_as_conversation() without
stripping session_meta entries.
Add a provider-agnostic role allowlist guard to _sanitize_api_messages()
that drops messages with roles not accepted by the chat-completions API
(e.g. session_meta). This prevents CLI resume/session restore from
leaking transcript-only metadata into the outgoing messages payload.
Two layers of defense:
1. API-boundary guard: _sanitize_api_messages() now filters messages by
role allowlist (system/user/assistant/tool/function/developer) before
the existing orphaned tool-call repair logic. This protects all
current and future call paths.
2. CLI restore defense-in-depth: Both session restore paths in cli.py
now strip session_meta entries before loading history into
conversation_history, matching the existing gateway behavior.
Closes#4715
gateway stop and restart previously called kill_gateway_processes() which
scans ps aux and kills ALL gateway processes across all profiles. Starting
a profile gateway would nuke the main one (and vice versa).
Now:
- hermes gateway stop → only kills the current profile's gateway (PID file)
- hermes -p work gateway stop → only kills the 'work' profile's gateway
- hermes gateway stop --all → kills every gateway process (old behavior)
- hermes gateway restart → profile-scoped for manual fallback path
- hermes update → discovers and restarts ALL profile gateways (systemctl
list-units hermes-gateway*) since the code update is shared
Added stop_profile_gateway() which uses the HERMES_HOME-scoped PID file
instead of global process scanning.
Windows (CRLF) and old Mac (CR) line endings are normalised to LF
before the 5-line collapse threshold is checked in handle_paste.
Without this, markdown copied from Windows sources contains \r\n but
the line counter (pasted_text.count('\n')) still works — however
buf.insert_text() leaves bare \r characters in the buffer which some
terminals render by moving the cursor to the start of the line,
making multi-line pastes appear as a single overwritten line.
- Add .zip to SUPPORTED_DOCUMENT_TYPES so gateway platforms (Telegram,
Slack, Discord) cache uploaded zip files instead of rejecting them.
- Add get_cache_directory_mounts() and iter_cache_files() to
credential_files.py for host-side cache directory passthrough
(documents, images, audio, screenshots).
- Docker: bind-mount cache dirs read-only alongside credentials/skills.
Changes are live (bind mount semantics).
- Modal: mount cache files at sandbox creation + resync before each
command via _sync_files() with mtime+size change detection.
- Handles backward-compat with legacy dir names (document_cache,
image_cache, audio_cache, browser_screenshots) via get_hermes_dir().
- Container paths always use the new cache/<subdir> layout regardless
of host layout.
This replaces the need for a dedicated extract_archive tool (PR #4819)
— the agent can now use standard terminal commands (unzip, tar) on
uploaded files inside remote containers.
Closes: related to PR #4819 by kshitijk4poor
Three fixes for memory+profile isolation bugs:
1. memory_tool.py: Replace module-level MEMORY_DIR constant with
get_memory_dir() function that calls get_hermes_home() dynamically.
The old constant was cached at import time and could go stale if
HERMES_HOME changed after import. Internal MemoryStore methods now
call get_memory_dir() directly. MEMORY_DIR kept as backward-compat
alias.
2. profiles.py: profile create --clone now copies MEMORY.md and USER.md
from the source profile. These curated memory files are part of the
agent's identity (same as SOUL.md) and should carry over on clone.
3. holographic plugin: initialize() now expands $HERMES_HOME and
${HERMES_HOME} in the db_path config value, so users can write
'db_path: $HERMES_HOME/memory_store.db' and it resolves to the
active profile directory, not the default home.
Tests updated to mock get_memory_dir() alongside the legacy MEMORY_DIR.
uv.lock was stale at v0.5.0 and missing exa-py (core dep), causing
ModuleNotFoundError for Nix flake builds. Also syncs faster-whisper
placement (core → voice extra), adds feishu/debugpy/lark-oapi extras.
Fixes#4648
Credit to @lvnilesh for identifying the issue in PR #4649.
The pipeline prefetch design was firing \`brv query\` in a background
thread *after* each response, meaning the context injected at turn N
was from turn N-1's message — and the first turn got no BRV context
at all. Replace the async prefetch pipeline with a synchronous query
in \`prefetch()\` so recall runs before the first API call on every
turn. Make \`queue_prefetch()\` a no-op and remove the now-unused
pipeline state.
Memory provider tools (hindsight_retain, honcho_search, etc.) were
advertised to the model via tool schemas but failed with 'Unknown tool'
at execution time. The concurrent path (_invoke_tool) correctly checks
self._memory_manager.has_tool() before falling through to the registry,
but the sequential path (_execute_tool_calls_sequential) was never
updated with this check. Since sequential is the default for single
tool calls, memory provider tools always hit the registry dispatcher
which returns 'Unknown tool' because they're not registered there.
Add the memory_manager dispatch check between the delegate_task handler
and the quiet_mode fallthrough in the sequential path, with proper
spinner/display handling to match the existing pattern.
Reported by KiBenderOP — all memory providers affected (Honcho,
Hindsight, Holographic, etc.).
The API server adapter created AIAgent instances without passing
session_db, so conversations via Open WebUI and other OpenAI-compatible
frontends were never persisted to state.db. This meant 'hermes sessions
list' showed no API server sessions — they were effectively stateless.
Changes:
- Add _ensure_session_db() helper for lazy SessionDB initialization
- Pass session_db=self._ensure_session_db() in _create_agent()
- Refactor existing X-Hermes-Session-Id handler to use the shared helper
Sessions now persist with source='api_server' and are visible alongside
CLI and gateway sessions in hermes sessions list/search.
Two fixes for Discord exec approval:
1. Register /approve and /deny as native Discord slash commands so they
appear in Discord's command picker (autocomplete). Previously they
were only handled as text commands, so users saw 'no commands found'
when typing /approve.
2. Wire up the existing ExecApprovalView button UI (was dead code):
- ExecApprovalView now calls resolve_gateway_approval() to actually
unblock the waiting agent thread when a button is clicked
- Gateway's _approval_notify_sync() detects adapters with
send_exec_approval() and routes through the button UI
- Added 'Allow Session' button for parity with /approve session
- send_exec_approval() now accepts session_key and metadata for
thread support
- Graceful fallback to text-based /approve prompt if button send fails
Also updates test mocks to include grey/secondary ButtonStyle and
purple Color (used by new button styles).
Three interconnected bugs caused `hermes skills config` per-platform
settings to be silently ignored:
1. telegram_menu_commands() never filtered disabled skills — all skills
consumed menu slots regardless of platform config, hitting Telegram's
100 command cap. Now loads disabled skills for 'telegram' and excludes
them from the menu.
2. Gateway skill dispatch executed disabled skills because
get_skill_commands() (process-global cache) only filters by the global
disabled list at scan time. Added per-platform check before execution,
returning an actionable 'skill is disabled' message.
3. get_disabled_skill_names() only checked HERMES_PLATFORM env var, but
the gateway sets HERMES_SESSION_PLATFORM instead. Added
HERMES_SESSION_PLATFORM as fallback, plus an explicit platform=
parameter for callers that know their platform (menu builder, gateway
dispatch). Also added platform to prompt_builder's skills cache key
so multi-platform gateways get correct per-platform skill prompts.
Reported by SteveSkedasticity (CLAW community).
When the agent is blocked on a dangerous command approval (threading.Event
wait inside tools/approval.py), incoming /approve and /deny commands were
falling through to the generic interrupt path instead of being dispatched
to their command handlers. The interrupt sets _interrupt_requested on the
agent, but the agent thread is blocked on event.wait() — not checking the
flag. Result: approval times out after 300s (5 minutes) before executing.
Fix: intercept /approve and /deny in the running-agent early-intercept
block (alongside /stop, /new, /queue) and route directly to
_handle_approve_command / _handle_deny_command.
Addresses common questions from the Nous Research community Discord:
- Multi-model workflows via delegation config
- WhatsApp per-chat binding limitations and workarounds
- Controlling tool progress display on Telegram
- Per-platform skills config and Telegram 100-command limit
- Shared thread sessions across multiple users
- Exporting/migrating Hermes to a new machine
- Permission denied on shell reload after install
- HTTP 400 on first agent run
When config.yaml has 'mcp_servers:' with no value, YAML parses it as
None. dict.get('mcp_servers', {}) only returns the default when the key
is absent, not when it's explicitly None. Use 'or {}' pattern to handle
both cases, matching the other two assignment sites in the same file.
Four fixes for MCP server stability issues reported by community member
(terminal lockup, zombie processes, escape sequence pollution, startup hang):
1. MCP reload timeout guard (cli.py): _check_config_mcp_changes now runs
_reload_mcp in a separate daemon thread with a 30s hard timeout. Previously,
a hung MCP server could block the process_loop thread indefinitely, freezing
the entire TUI (user can type but nothing happens, only Ctrl+D/Ctrl+\ work).
2. MCP stdio subprocess PID tracking (mcp_tool.py): Tracks child PIDs spawned
by stdio_client via before/after snapshots of /proc children. On shutdown,
_stop_mcp_loop force-kills any tracked PIDs that survived the SDK's graceful
SIGTERM→SIGKILL cleanup. Prevents zombie MCP server processes from
accumulating across sessions.
3. MCP event loop exception handler (mcp_tool.py): Installs
_mcp_loop_exception_handler on the MCP background event loop — same pattern
as the existing _suppress_closed_loop_errors on prompt_toolkit's loop.
Suppresses benign 'Event loop is closed' RuntimeError from httpx transport
__del__ during MCP shutdown. Salvaged from PR #2538 (acsezen).
4. MCP OAuth non-blocking (mcp_oauth.py): Replaces blocking input() call in
_wait_for_callback with OAuthNonInteractiveError raise. Adds _is_interactive()
TTY detection. In non-interactive environments, build_oauth_auth() still
returns a provider (cached tokens + refresh work), but the callback handler
raises immediately instead of blocking the MCP event loop for 120s. Re-raises
OAuth setup failures in _run_http so failed servers are reported cleanly
without blocking others. Salvaged from PRs #4521 (voidborne-d) and #4465
(heathley).
Closes#2537, closes#4462
Related: #4128, #3436
Three fixes for long-running gateway sessions that enter a death spiral
when API disconnects prevent token data collection, which prevents
compression, which causes more disconnects:
Layer 1 — Stale token counter fallback (run_agent.py in-loop):
When last_prompt_tokens is 0 (stale after API disconnect or provider
returned no usage data), fall back to estimate_messages_tokens_rough()
instead of passing 0 to should_compress(), which would never fire.
Layer 2 — Server disconnect heuristic (run_agent.py error handler):
When ReadError/RemoteProtocolError hits a large session (>60% context
or >200 messages), treat it as a context-length error and trigger
compression rather than burning through retries that all fail the
same way.
Layer 3 — Hard message count limit (gateway/run.py hygiene):
Force compression when a session exceeds 400 messages, regardless of
token estimates. This catches runaway growth even when all token-based
checks fail due to missing API data.
Based on the analysis from PR #2157 by ygd58 — the gateway threshold
direction fix (1.4x multiplier) was already resolved on main.
Anthropic returns HTTP 429 'Extra usage is required for long context
requests' when a Claude Max subscription doesn't include the 1M context
tier. This is NOT a transient rate limit — retrying won't help.
Only applies to Sonnet models (Opus 1M is general access). Detects
this specific error before the generic rate-limit handler and:
1. Reduces context_length from 1M to 200k (the standard tier)
2. Triggers context compression to fit
3. Retries with the reduced context
The reduction is session-scoped (not persisted) so it auto-recovers
if the user later enables extra usage on their subscription.
Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage
Anthropic returns HTTP 429 'Extra usage is required for long context
requests' when a Claude Max subscription doesn't include the 1M context
tier. This is NOT a transient rate limit — retrying won't help.
Detect this specific error before the generic rate-limit handler and:
1. Reduce context_length from 1M to 200k (the standard tier)
2. Trigger context compression to fit
3. Retry with the reduced context
The reduction is session-scoped (not persisted) so it auto-recovers
if the user later enables extra usage on their subscription.
Fixes: Sonnet 4.6 instant rate limits on Claude Max without extra usage
Follow-up for salvaged PR #2352:
- Replace hardcoded Path(os.getenv('HERMES_HOME', ...)) with
get_hermes_home() from hermes_constants (2 places)
- Consolidate redundant git_cmd_base into the existing git_cmd
variable, constructed once before fork detection
- Update autostash tests for the unmerged index check added
in the previous commit
- Detect if origin points to a fork (not NousResearch/hermes-agent)
- Show warning when updating from a fork: origin URL
- After pulling from origin/main on a fork:
- Prompt to add upstream remote if not present
- Respect ~/.hermes/.skip_upstream_prompt to avoid repeated prompts
- Compare origin/main with upstream/main
- If origin has commits not on upstream, skip (don't trample user's work)
- If upstream is ahead, pull from upstream and try to sync fork
- Use --force-with-lease for safe fork syncing
Non-main branches are unaffected - they just pull from origin/{branch}.
Co-authored-by: Avery <avery@hermes-agent.ai>
* fix(gateway): race condition, photo media loss, and flood control in Telegram
Three bugs causing intermittent silent drops, partial responses, and
flood control delays on the Telegram platform:
1. Race condition in handle_message() — _active_sessions was set inside
the background task, not before create_task(). Two rapid messages
could both pass the guard and spawn duplicate processing tasks.
Fix: set _active_sessions synchronously before spawning the task
(grammY sequentialize / aiogram EventIsolation pattern).
2. Photo media loss on dequeue — when a photo (no caption) was queued
during active processing and later dequeued, only .text was
extracted. Empty text → message silently dropped.
Fix: _build_media_placeholder() creates text context for media-only
events so they survive the dequeue path.
3. Progress message edits triggered Telegram flood control — rapid tool
calls edited the progress message every 0.3s, hitting Telegram's
rate limit (23s+ waits). This blocked progress updates and could
cause stream consumer timeouts.
Fix: throttle edits to 1.5s minimum interval, detect flood control
errors and gracefully degrade to new messages. edit_message() now
returns failure for flood waits >5s instead of blocking.
* fix(gateway): downgrade empty/None response log from WARNING to DEBUG
This warning fires on every successful streamed response (streaming
delivers the text, handler returns None via already_sent=True) and
on every queued message during active processing. Both are expected
behavior, not error conditions. Downgrade to DEBUG to reduce log noise.
* fix(gateway): prevent stuck sessions with agent timeout and staleness eviction
Three changes to prevent sessions from getting permanently locked:
1. Agent execution timeout (HERMES_AGENT_TIMEOUT, default 10min):
Wraps run_in_executor with asyncio.wait_for so a hung API call or
runaway tool can't lock a session indefinitely. On timeout, the
agent is interrupted and the user gets an actionable error message.
2. Staleness eviction for _running_agents:
Tracks start timestamps for each session entry. When a new message
arrives and the entry is older than timeout + 1min grace, it's
evicted as a leaked lock. Safety net for any cleanup path that
fails to remove the entry.
3. Cron job timeout (HERMES_CRON_TIMEOUT, default 10min):
Wraps run_conversation in a ThreadPoolExecutor with timeout so a
hung cron job doesn't block the ticker thread (and all subsequent
cron jobs) indefinitely.
Follows grammY runner's per-update timeout pattern and aiogram's
asyncio.wait_for approach for handler deadlines.
* fix(gateway): STT config resolution, stream consumer flood control fallback
Three targeted fixes from user-reported issues:
1. STT config resolution (transcription_tools.py):
_has_openai_audio_backend() and _resolve_openai_audio_client_config()
now check stt.openai.api_key/base_url in config.yaml FIRST, before
falling back to env vars. Fixes voice transcription breaking when
using a custom OpenAI-compatible endpoint via config.yaml.
2. Stream consumer flood control fallback (stream_consumer.py):
When an edit fails mid-stream (e.g., Telegram flood control returns
failure for waits >5s), reset _already_sent to False so the normal
final send path delivers the complete response. Previously, a
truncated partial was left as the final message.
3. Telegram edit_message comment alignment (telegram.py):
Clarify that long flood waits return failure so streaming can fall
back to a normal final send.
* refactor: simplify and harden PR fixes after review
- Fix cron ThreadPoolExecutor blocking on timeout: use shutdown(wait=False,
cancel_futures=True) instead of context manager that waits indefinitely
- Extract _dequeue_pending_text() to deduplicate media-placeholder logic
in interrupt and normal-completion dequeue paths
- Remove hasattr guards for _running_agents_ts: add class-level default
so partial test construction works without scattered defensive checks
- Move `import concurrent.futures` to top of cron/scheduler.py
- Progress throttle: sleep remaining interval instead of busy-looping
0.1s (~15 wakeups per 1.5s window → 1 wakeup)
- Deduplicate _load_stt_config() in transcription_tools.py:
_has_openai_audio_backend() now delegates to _resolve_openai_audio_client_config()
* fix: move class-level attribute after docstring, clarify throttle comment
Follow-up nits for salvaged PR #4577:
- Move _running_agents_ts class attribute below the docstring so
GatewayRunner.__doc__ is preserved.
- Add clarifying comment explaining the throttle continue behavior
(batches queued messages during the throttle interval).
* fix(update): handle conflicted git index during hermes update
When the git index has unmerged entries (e.g. from an interrupted
merge or rebase), git stash fails with 'needs merge / could not
write index'. Detect this with git ls-files --unmerged and clear
the conflict state with git reset before attempting the stash.
Working-tree changes are preserved.
Reported by @LLMJunky — package-lock.json conflict from a prior
merge left the index dirty, blocking hermes update entirely.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
launchd_restart() already prints stop/start confirmation via its
internal helpers — the extra 'Gateway restarted via launchd' line
was redundant. Update test assertion to match.
Previously, `hermes gateway install --system` hard-refused to create a
service running as root, even when explicitly requested via
`--run-as-user root`. This forced LXC/container users (where root is
the only user) to either create throwaway users or comment out the check
in source.
Changes:
- Auto-detected root (no explicit --run-as-user) still raises, but with
a message explaining how to override
- Explicit `--run-as-user root` now allowed with a warning about
security implications
- Interactive setup wizard prompt accepts 'root' as a valid username
(warning comes from _system_service_identity downstream)
- Added tests for all three paths: auto-detected root rejection,
explicit root allowance, and normal non-root passthrough
When /history is used in an empty chat or /resume with no argument,
show an inline table of recent resumable sessions with title, preview,
relative timestamp, and session ID instead of a dead-end message.
Table formatting matches the existing hermes sessions list style
(column headers + thin separators, no box drawing).
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Follow-up nits for salvaged PR #4577:
- Move _running_agents_ts class attribute below the docstring so
GatewayRunner.__doc__ is preserved.
- Add clarifying comment explaining the throttle continue behavior
(batches queued messages during the throttle interval).
- Fix cron ThreadPoolExecutor blocking on timeout: use shutdown(wait=False,
cancel_futures=True) instead of context manager that waits indefinitely
- Extract _dequeue_pending_text() to deduplicate media-placeholder logic
in interrupt and normal-completion dequeue paths
- Remove hasattr guards for _running_agents_ts: add class-level default
so partial test construction works without scattered defensive checks
- Move `import concurrent.futures` to top of cron/scheduler.py
- Progress throttle: sleep remaining interval instead of busy-looping
0.1s (~15 wakeups per 1.5s window → 1 wakeup)
- Deduplicate _load_stt_config() in transcription_tools.py:
_has_openai_audio_backend() now delegates to _resolve_openai_audio_client_config()
Three targeted fixes from user-reported issues:
1. STT config resolution (transcription_tools.py):
_has_openai_audio_backend() and _resolve_openai_audio_client_config()
now check stt.openai.api_key/base_url in config.yaml FIRST, before
falling back to env vars. Fixes voice transcription breaking when
using a custom OpenAI-compatible endpoint via config.yaml.
2. Stream consumer flood control fallback (stream_consumer.py):
When an edit fails mid-stream (e.g., Telegram flood control returns
failure for waits >5s), reset _already_sent to False so the normal
final send path delivers the complete response. Previously, a
truncated partial was left as the final message.
3. Telegram edit_message comment alignment (telegram.py):
Clarify that long flood waits return failure so streaming can fall
back to a normal final send.
Three changes to prevent sessions from getting permanently locked:
1. Agent execution timeout (HERMES_AGENT_TIMEOUT, default 10min):
Wraps run_in_executor with asyncio.wait_for so a hung API call or
runaway tool can't lock a session indefinitely. On timeout, the
agent is interrupted and the user gets an actionable error message.
2. Staleness eviction for _running_agents:
Tracks start timestamps for each session entry. When a new message
arrives and the entry is older than timeout + 1min grace, it's
evicted as a leaked lock. Safety net for any cleanup path that
fails to remove the entry.
3. Cron job timeout (HERMES_CRON_TIMEOUT, default 10min):
Wraps run_conversation in a ThreadPoolExecutor with timeout so a
hung cron job doesn't block the ticker thread (and all subsequent
cron jobs) indefinitely.
Follows grammY runner's per-update timeout pattern and aiogram's
asyncio.wait_for approach for handler deadlines.
This warning fires on every successful streamed response (streaming
delivers the text, handler returns None via already_sent=True) and
on every queued message during active processing. Both are expected
behavior, not error conditions. Downgrade to DEBUG to reduce log noise.
Three bugs causing intermittent silent drops, partial responses, and
flood control delays on the Telegram platform:
1. Race condition in handle_message() — _active_sessions was set inside
the background task, not before create_task(). Two rapid messages
could both pass the guard and spawn duplicate processing tasks.
Fix: set _active_sessions synchronously before spawning the task
(grammY sequentialize / aiogram EventIsolation pattern).
2. Photo media loss on dequeue — when a photo (no caption) was queued
during active processing and later dequeued, only .text was
extracted. Empty text → message silently dropped.
Fix: _build_media_placeholder() creates text context for media-only
events so they survive the dequeue path.
3. Progress message edits triggered Telegram flood control — rapid tool
calls edited the progress message every 0.3s, hitting Telegram's
rate limit (23s+ waits). This blocked progress updates and could
cause stream consumer timeouts.
Fix: throttle edits to 1.5s minimum interval, detect flood control
errors and gracefully degrade to new messages. edit_message() now
returns failure for flood waits >5s instead of blocking.
The PR changed prev_tools from list[str] to list[dict] with name/result
keys. The gateway's _step_callback_sync passed this directly to hooks
as 'tool_names', breaking user-authored hooks that call
', '.join(tool_names).
Now:
- 'tool_names' always contains strings (backward-compatible)
- 'tools' carries the enriched dicts for hooks that want results
Also adds summary logging to register_mcp_servers() and comprehensive
tests for all three PR changes:
- sanitize_mcp_name_component edge cases
- register_mcp_servers public API
- _register_session_mcp_servers ACP integration
- step_callback result forwarding
- gateway normalization backward compat
The step_callback previously only forwarded tool names as strings,
so build_tool_complete received result=None and ACP tool_call_update
events had empty content/rawOutput. Now prev_tools carries dicts with
both name and result by pairing each tool_call with its matching
tool-role message via tool_call_id.