The gateway created a fresh AIAgent per message, rebuilding the system
prompt (including memory, skills, context files) every turn. This broke
prompt prefix caching — providers like Anthropic charge ~10x more for
uncached prefixes.
Now caches AIAgent instances per session_key with a config signature.
The cached agent is reused across messages in the same session,
preserving the frozen system prompt and tool schemas. Cache is
invalidated when:
- Config changes (model, provider, toolsets, reasoning, ephemeral
prompt) — detected via signature mismatch
- /new, /reset, /clear — explicit session reset
- /model — global model change clears all cached agents
- /reasoning — global reasoning change clears all cached agents
Per-message state (callbacks, stream consumers, progress queues) is
set on the agent instance before each run_conversation() call.
This matches CLI behavior where a single AIAgent lives across all turns
in a session, with _cached_system_prompt built once and reused.
The /title command would fail with 'Session not found in database.' when
used as the first command in a new session. This happened because:
1. Gateway creates session in session_store (in-memory)
2. But SQLite _session_db only gets sessions when agent flushes messages
3. set_session_title() does UPDATE which fails if row doesn't exist
Now we check if session exists in SQLite and create it if needed before
attempting to set the title.
Fixes: Session not found in database. error on /title in new chats
When streaming is enabled, text chunks are sent to the user in
real-time including raw MEDIA: tags. The normal post-processing in
_process_message_background is skipped when already_sent=True, so
MEDIA: files were never extracted or delivered — the user just saw
the raw MEDIA:/path/to/file text.
Fix: after streaming completes, extract MEDIA: tags and local file
paths from the response and deliver them via the platform adapter.
The text is already sent (with the raw tag visible in the stream),
but the actual files now get delivered as attachments.
Add @file:path, @folder:dir, @diff, @staged, @git:N, and @url:
references that expand inline before the message reaches the LLM.
Supports line ranges (@file:main.py:10-50), token budget enforcement
(soft warn at 25%, hard block at 50%), and path sandboxing for gateway.
Core module from PR #2090 by @kshitijk4poor. CLI and gateway wiring
rewritten against current main. Fixed asyncio.run() crash when called
from inside a running event loop (gateway).
Closes#682.
When streaming is enabled, the base adapter receives None from
_handle_message (already_sent=True) and cannot run auto-TTS for
voice input. The runner was unconditionally skipping voice input
TTS assuming the base adapter would handle it.
Now the runner takes over TTS responsibility when streaming has
already delivered the text response, so voice channel playback
works with both streaming on and off.
Streaming off behavior is unchanged (default already_sent=False
preserves the original code path exactly).
Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
Mirrors the CLI fix for the gateway /model handler. When the user is on
a custom provider (provider=custom, localhost, or 127.0.0.1 endpoint),
/model <name> no longer tries to auto-detect a provider switch.
Previously, typing /model openrouter/nvidia/nemotron:free on Telegram
while on a localhost endpoint would silently accept the model name on
the local server — auto-detection failed to match the free model, so
the provider stayed as custom with the localhost base_url. The user saw
'Model changed' but requests still went to localhost, which doesn't
serve that model.
Now shows the endpoint URL and provider:model syntax tip, matching
the CLI behavior.
Adds /queue <prompt> (alias /q) that queues a message for the next
turn while the agent is busy, without interrupting the current run.
- CLI: /queue <prompt> puts it in _pending_input for the next turn
- Gateway: /queue <prompt> creates a pending MessageEvent on the
adapter, picked up after the current agent run finishes
- Enter still interrupts as usual (no behavior change)
- /queue with no prompt shows usage
- /queue when agent is idle tells user to just type normally
Co-authored-by: Test <test@test.com>
Two fixes for Telegram/gateway-specific bugs:
1. Anthropic adapter: strip orphaned tool_result blocks (mirror of
existing tool_use stripping). Context compression or session
truncation can remove an assistant message containing a tool_use
while leaving the subsequent tool_result intact. Anthropic rejects
these with a 400: 'unexpected tool_use_id found in tool_result
blocks'. The adapter now collects all tool_use IDs and filters out
any tool_result blocks referencing IDs not in that set.
2. Gateway: /reset and /new now bypass the running-agent guard (like
/status already does). Previously, sending /reset while an agent
was running caused the raw text to be queued and later fed back as
a user message with the same broken history — replaying the
corrupted session instead of resetting it. Now the running agent is
interrupted, pending messages are cleared, and the reset command
dispatches immediately.
Tests updated: existing tests now include proper tool_use→tool_result
pairs; two new tests cover orphaned tool_result stripping.
Co-authored-by: Test <test@test.com>
* feat: context pressure warnings for CLI and gateway
User-facing notifications as context approaches the compaction threshold.
Warnings fire at 60% and 85% of the way to compaction — relative to
the configured compression threshold, not the raw context window.
CLI: Formatted line with a progress bar showing distance to compaction.
Cyan at 60% (approaching), bold yellow at 85% (imminent).
◐ context ▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱▱▱▱▱▱ 60% to compaction 100k threshold (50%) · approaching compaction
⚠ context ▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱ 85% to compaction 100k threshold (50%) · compaction imminent
Gateway: Plain-text notification sent to the user's chat via the new
status_callback mechanism (asyncio.run_coroutine_threadsafe bridge,
same pattern as step_callback).
Does NOT inject into the message stream. The LLM never sees these
warnings. Flags reset after each compaction cycle.
Files changed:
- agent/display.py — format_context_pressure(), format_context_pressure_gateway()
- run_agent.py — status_callback param, _context_50/70_warned flags,
_emit_context_pressure(), flag reset in _compress_context()
- gateway/run.py — _status_callback_sync bridge, wired to AIAgent
- tests/test_context_pressure.py — 23 tests
* Merge remote-tracking branch 'origin/main' into hermes/hermes-7ea545bf
---------
Co-authored-by: Test <test@test.com>
Place a sentinel in _running_agents immediately after the "already
running" guard check passes — before any await. Without this, the
numerous await points between the guard (line 1324) and agent
registration (track_agent at line 4790) create a window where a
second message for the same session can bypass the guard and start
a duplicate agent, corrupting the transcript.
The await gap includes: hook emissions, vision enrichment (external
API call), audio transcription (external API call), session hygiene
compression, and the run_in_executor call itself. For messages with
media attachments the window can be several seconds wide.
The sentinel is wrapped in try/finally so it is always cleaned up —
even if the handler raises or takes an early-return path. When the
real AIAgent is created, track_agent() overwrites the sentinel with
the actual instance (preserving interrupt support).
Also handles the edge case where a message arrives while the sentinel
is set but no real agent exists yet: the message is queued via the
adapter's pending-message mechanism instead of attempting to call
interrupt() on the sentinel object.
Two issues with /model preventing proper provider switching:
1. Bare provider names not detected: typing '/model nous' treated 'nous'
as a model name instead of triggering a provider switch. Fixed by adding
step 0 in detect_provider_for_model() that checks if the input matches
a known provider name/alias (excluding 'custom'/'openrouter' which need
explicit model names) and returns that provider's default model.
2. Custom endpoint details hidden: /model (no args) showed '[custom]' with
just a usage hint but no endpoint URL or model name. Now displays the
configured base_url for custom providers in both CLI and gateway.
Note: config base_url and OPENAI_BASE_URL are intentionally NOT cleared on
provider switch — dedicated provider paths (nous, anthropic, codex) have
their own credential resolution that ignores these, and clearing them would
destroy the user's custom endpoint config, preventing switching back.
Co-authored-by: Test <test@test.com>
The gateway approval system previously intercepted bare 'yes'/'no' text
from the user's next message to approve/deny dangerous commands. This was
fragile and dangerous — if the agent asked a clarify question and the user
said 'yes' to answer it, the gateway would execute the pending dangerous
command instead. (Fixes#1888)
Changes:
- Remove bare text matching ('yes', 'y', 'approve', 'ok', etc.) from
_handle_message approval check
- Add /approve and /deny as gateway-only slash commands in the command
registry
- /approve supports scoping: /approve (one-time), /approve session,
/approve always (permanent)
- Add 5-minute timeout for stale approvals
- Gateway appends structured instructions to the agent response when a
dangerous command is pending, telling the user exactly how to respond
- 9 tests covering approve, deny, timeout, scoping, and verification
that bare 'yes' no longer triggers execution
Credit to @solo386 and @FlyByNight69420 for identifying and reporting
this security issue in PR #1971 and issue #1888.
Co-authored-by: Test <test@test.com>
Add unauthorized_dm_behavior config (pair|ignore) with global default
and per-platform override. WhatsApp can silently drop unknown DMs
instead of sending pairing codes.
Adapted config bridging to work with gw_data dict (pre-construction)
rather than config object. Dropped implementation plan document.
Co-authored-by: Frederico Ribeiro <fr@tecompanytea.com>
Only mention the hermes-agent-setup skill in STT failure notes (both
the direct user message and the agent context note) when the skill is
actually installed. Uses _find_skill() from skill_manager_tool.
Also confirmed: STT is the only user-facing failure case where the
setup skill hint helps. Vision failures are transient API issues,
runtime transcription errors indicate a configured-but-broken provider,
and platform startup warnings are server logs.
Skill now covers full CLI usage (hermes setup, hermes skills, hermes
tools, hermes config, session management, etc.), config file reference,
and expanded gateway commands.
Agent context notes for STT failure now mention the hermes-agent-setup
skill is available to help users configure Hermes features.
When a user sends a voice message and STT isn't configured, the gateway
now sends a clear message directly to the user explaining how to set up
voice transcription, rather than relying on the agent to relay an
injected context note (which often gets misinterpreted).
Also adds a hermes-agent-setup bundled skill covering STT/TTS setup,
tool configuration, dependency installation, and troubleshooting.
Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.
This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Distinguish plan usage limits (429 with usage_limit_reached) from transient rate limits
- Show approximate reset time in hours for plan limits
- Treat HTTP 500 with large sessions as context overflow (same as 400)
- Move history length check earlier for reuse across status codes
* feat: OpenAI-compatible API server platform adapter
Salvaged from PR #956, updated for current main.
Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.
Endpoints:
- POST /v1/chat/completions — stateless Chat Completions API
- POST /v1/responses — stateful Responses API with chaining
- GET /v1/responses/{id} — retrieve stored response
- DELETE /v1/responses/{id} — delete stored response
- GET /v1/models — list hermes-agent as available model
- GET /health — health check
Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses
Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()
Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)
Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference
* feat(whatsapp): make reply prefix configurable via config.yaml
Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.
The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:
whatsapp:
reply_prefix: '' # disable header
reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix
How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix
Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.
Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.
---------
Co-authored-by: Test <test@test.com>
- Add summary_base_url config option to compression block for custom
OpenAI-compatible endpoints (e.g. zai, DeepSeek, Ollama)
- Remove compression env var bridges from cli.py and gateway/run.py
(CONTEXT_COMPRESSION_* env vars no longer set from config)
- Switch run_agent.py to read compression config directly from
config.yaml instead of env vars
- Fix backwards-compat block in _resolve_task_provider_model to also
fire when auxiliary.compression.provider is 'auto' (DEFAULT_CONFIG
sets this, which was silently preventing the compression section's
summary_* keys from being read)
- Add test for summary_base_url config-to-client flow
- Update docs to show compression as config.yaml-only
Closes#1591
Based on PR #1702 by @uzaylisak
Based on PR #1432 by @bayrakdarerdem. session:start was already on main; this adds the session:end event.
Co-authored-by: bayrakdarerdem <bayrakdarerdem@users.noreply.github.com>
After the first user→assistant exchange, Hermes now generates a short
descriptive session title via the auxiliary LLM (compression task config).
Title generation runs in a background thread so it never delays the
user-facing response.
Key behaviors:
- Fires only on the first 1-2 exchanges (checks user message count)
- Skips if a title already exists (user-set titles are never overwritten)
- Uses call_llm with compression task config (cheapest/fastest model)
- Truncates long messages to keep the title generation request small
- Cleans up LLM output: strips quotes, 'Title:' prefixes, enforces 80 char max
- Works in both CLI and gateway (Telegram/Discord/etc.)
Also updates /title (no args) to show the session ID alongside the title
in both CLI and gateway.
Implements #1426
Salvaged from PR #1573 by @eren-karakus0. Cherry-picked with authorship preserved.
Fixes#1143 — background process notifications resume after gateway restart.
Co-authored-by: Muhammet Eren Karakuş <erenkar950@gmail.com>
The privacy.redact_pii config reader on line 1546 used bare 'yaml'
which is not in scope — yaml is imported as '_yaml' at module level
(line 93) and as '_y' in other methods. The NameError was silently
caught by the try/except, so PII redaction never activated even when
configured.
Add a local 'import yaml as _pii_yaml' consistent with the pattern
used elsewhere in the file.
Line 1482 referenced 'session_key' which is not defined until line 1519,
causing a NameError on every skill slash command invocation in the gateway
(e.g. /deploy, /plan-with-skill). The try/except silently swallowed the
error, making all user-defined skill slash commands silently fail.
The correct variable is '_quick_key', defined at line 1292 (same variable
used by the /plan handler on line 1379).
Add support for Mattermost (self-hosted Slack alternative) and Matrix
(federated messaging protocol) as messaging platforms.
Mattermost adapter:
- REST API v4 client for posts, files, channels, typing indicators
- WebSocket listener for real-time 'posted' events with reconnect backoff
- Thread support via root_id
- File upload/download with auth-aware caching
- Dedup cache (5min TTL, 2000 entries)
- Full self-hosted instance support
Matrix adapter:
- matrix-nio AsyncClient with sync loop
- Dual auth: access token or user_id + password
- Optional E2EE via matrix-nio[e2e] (libolm)
- Thread support via m.thread (MSC3440)
- Reply support via m.in_reply_to with fallback stripping
- Media upload/download via mxc:// URLs (authenticated v1.11+ endpoint)
- Auto-join on room invite
- DM detection via m.direct account data with sync fallback
- Markdown to HTML conversion
Fixes applied over original PR #1225 by @cyb0rgk1tty:
- Mattermost: add timeout to file downloads, wrap API helpers in
try/except for network errors, download incoming files immediately
with auth headers instead of passing auth-required URLs
- Matrix: use authenticated media endpoint (/_matrix/client/v1/media/),
robust m.direct cache with sync fallback, prefer aiohttp over httpx
Install Matrix support: pip install 'hermes-agent[matrix]'
Mattermost needs no extra deps (uses aiohttp).
Salvaged from PR #1225 by @cyb0rgk1tty with fixes.
Add DingTalk as a messaging platform using the dingtalk-stream SDK
for real-time message reception via Stream Mode (no webhook needed).
Replies are sent via session webhook using markdown format.
Features:
- Stream Mode connection (long-lived WebSocket, no public URL needed)
- Text and rich text message support
- DM and group chat support
- Message deduplication with 5-minute window
- Auto-reconnection with exponential backoff
- Session webhook caching for reply routing
Configuration:
export DINGTALK_CLIENT_ID=your-app-key
export DINGTALK_CLIENT_SECRET=your-app-secret
# or in config.yaml:
platforms:
dingtalk:
enabled: true
extra:
client_id: your-app-key
client_secret: your-app-secret
Files:
- gateway/platforms/dingtalk.py (340 lines) — adapter implementation
- gateway/config.py — add DINGTALK to Platform enum
- gateway/run.py — add DingTalk to _create_adapter
- hermes_cli/config.py — add env vars to _EXTRA_ENV_KEYS
- hermes_cli/tools_config.py — add dingtalk to PLATFORMS
- tests/gateway/test_dingtalk.py — 21 tests
* fix: thread safety for concurrent subagent delegation
Four thread-safety fixes that prevent crashes and data races when
running multiple subagents concurrently via delegate_task:
1. Remove redirect_stdout/stderr from delegate_tool — mutating global
sys.stdout races with the spinner thread when multiple children start
concurrently, causing segfaults. Children already run with
quiet_mode=True so the redirect was redundant.
2. Split _run_single_child into _build_child_agent (main thread) +
_run_single_child (worker thread). AIAgent construction creates
httpx/SSL clients which are not thread-safe to initialize
concurrently.
3. Add threading.Lock to SessionDB — subagents share the parent's
SessionDB and call create_session/append_message from worker threads
with no synchronization.
4. Add _active_children_lock to AIAgent — interrupt() iterates
_active_children while worker threads append/remove children.
5. Add _client_cache_lock to auxiliary_client — multiple subagent
threads may resolve clients concurrently via call_llm().
Based on PR #1471 by peteromallet.
* feat: Honcho base_url override via config.yaml + quick command alias type
Two features salvaged from PR #1576:
1. Honcho base_url override: allows pointing Hermes at a remote
self-hosted Honcho deployment via config.yaml:
honcho:
base_url: "http://192.168.x.x:8000"
When set, this overrides the Honcho SDK's environment mapping
(production/local), enabling LAN/VPN Honcho deployments without
requiring the server to live on localhost. Uses config.yaml instead
of env var (HONCHO_URL) per project convention.
2. Quick command alias type: adds a new 'alias' quick command type
that rewrites to another slash command before normal dispatch:
quick_commands:
sc:
type: alias
target: /context
Supports both CLI and gateway. Arguments are forwarded to the
target command.
Based on PR #1576 by redhelix.
---------
Co-authored-by: peteromallet <peteromallet@users.noreply.github.com>
Co-authored-by: redhelix <redhelix@users.noreply.github.com>
Docker terminal sessions are secret-dark by default. This adds
terminal.docker_forward_env as an explicit allowlist for env vars
that may be forwarded into Docker containers.
Values resolve from the current shell first, then fall back to
~/.hermes/.env. Only variables the user explicitly lists are
forwarded — nothing is auto-exposed.
Cherry-picked from PR #1449 by @teknium1, conflict-resolved onto
current main.
Fixes#1436
Supersedes #1439
* fix: prevent infinite 400 failure loop on context overflow (#1630)
When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message. This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error. Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.
Three-layer fix:
1. run_agent.py — Fallback heuristic: when a 400 error has a very short
generic message AND the session is large (>40% of context or >80
messages), treat it as a probable context overflow and trigger
compression instead of aborting.
2. run_agent.py + gateway/run.py — Don't persist failed messages:
when the agent returns failed=True before generating any response,
skip writing the user's message to the transcript/DB. This prevents
the session from growing on each failure.
3. gateway/run.py — Smarter error messages: detect context-overflow
failures and suggest /compact or /reset specifically, instead of a
generic 'try again' that will fail identically.
* fix(skills): detect prompt injection patterns and block cache file reads
Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):
1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
(index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
was the original injection vector — untrusted skill descriptions
in the catalog contained adversarial text that the model executed.
2. skill_view: warns when skills are loaded from outside the trusted
~/.hermes/skills/ directory, and detects common injection patterns
in skill content ("ignore previous instructions", "<system>", etc.).
Cherry-picked from PR #1562 by ygd58.
* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)
Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.
- Apply truncate_message() chunking in _send_to_platform() before
dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement
Cherry-picked from PR #1557 by llbn.
* fix(approval): show full command in dangerous command approval (#1553)
Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:
- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests
Cherry-picked from PR #1566 by crazywriter1.
* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)
The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.
Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.
* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)
When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.
Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.
Cherry-picked from PR #1593 by ygd58.
* fix(security): block sandbox backend creds from subprocess env (#1264)
Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.
Cherry-picked from PR #1571 by ygd58.
* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)
When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.
Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.
The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).
* fix(gateway): /model shows active fallback model instead of config default (#1615)
When the agent falls back to a different model (e.g. due to rate
limiting), /model still showed the config default. Now tracks the
effective model/provider after each agent run and displays it.
Cleared when the primary model succeeds again or the user explicitly
switches via /model.
Cherry-picked from PR #1616 by MaxKerkula. Added hasattr guard for
test compatibility.
* feat(gateway): inject reply-to message context for out-of-session replies (#1594)
When a user replies to a Telegram message, check if the quoted text
exists in the current session transcript. If missing (from cron jobs,
background tasks, or old sessions), prepend [Replying to: "..."] to
the message so the agent has context about what's being referenced.
- Add reply_to_text field to MessageEvent (base.py)
- Populate from Telegram's reply_to_message (text or caption)
- Inject context in _handle_message when not found in history
Based on PR #1596 by anpicasso (cherry-picked reply-to feature only,
excluded unrelated /server command and background delegation changes).
---------
Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
Co-authored-by: Max K <MaxKerkula@users.noreply.github.com>
Co-authored-by: Angello Picasso <angello.picasso@devsu.com>
* fix: prevent infinite 400 failure loop on context overflow (#1630)
When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message. This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error. Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.
Three-layer fix:
1. run_agent.py — Fallback heuristic: when a 400 error has a very short
generic message AND the session is large (>40% of context or >80
messages), treat it as a probable context overflow and trigger
compression instead of aborting.
2. run_agent.py + gateway/run.py — Don't persist failed messages:
when the agent returns failed=True before generating any response,
skip writing the user's message to the transcript/DB. This prevents
the session from growing on each failure.
3. gateway/run.py — Smarter error messages: detect context-overflow
failures and suggest /compact or /reset specifically, instead of a
generic 'try again' that will fail identically.
* fix(skills): detect prompt injection patterns and block cache file reads
Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):
1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
(index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
was the original injection vector — untrusted skill descriptions
in the catalog contained adversarial text that the model executed.
2. skill_view: warns when skills are loaded from outside the trusted
~/.hermes/skills/ directory, and detects common injection patterns
in skill content ("ignore previous instructions", "<system>", etc.).
Cherry-picked from PR #1562 by ygd58.
* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)
Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.
- Apply truncate_message() chunking in _send_to_platform() before
dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement
Cherry-picked from PR #1557 by llbn.
* fix(approval): show full command in dangerous command approval (#1553)
Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:
- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests
Cherry-picked from PR #1566 by crazywriter1.
* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)
The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.
Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.
* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)
When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.
Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.
Cherry-picked from PR #1593 by ygd58.
* fix(security): block sandbox backend creds from subprocess env (#1264)
Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.
Cherry-picked from PR #1571 by ygd58.
* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)
When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.
Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.
The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).
* fix(gateway): /model shows active fallback model instead of config default (#1615)
When the agent falls back to a different model (e.g. due to rate
limiting), /model still showed the config default. Now tracks the
effective model/provider after each agent run and displays it.
Cleared when the primary model succeeds again or the user explicitly
switches via /model.
Cherry-picked from PR #1616 by MaxKerkula. Added hasattr guard for
test compatibility.
---------
Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
Co-authored-by: Max K <MaxKerkula@users.noreply.github.com>
* fix: prevent infinite 400 failure loop on context overflow (#1630)
When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message. This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error. Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.
Three-layer fix:
1. run_agent.py — Fallback heuristic: when a 400 error has a very short
generic message AND the session is large (>40% of context or >80
messages), treat it as a probable context overflow and trigger
compression instead of aborting.
2. run_agent.py + gateway/run.py — Don't persist failed messages:
when the agent returns failed=True before generating any response,
skip writing the user's message to the transcript/DB. This prevents
the session from growing on each failure.
3. gateway/run.py — Smarter error messages: detect context-overflow
failures and suggest /compact or /reset specifically, instead of a
generic 'try again' that will fail identically.
* fix(skills): detect prompt injection patterns and block cache file reads
Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):
1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
(index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
was the original injection vector — untrusted skill descriptions
in the catalog contained adversarial text that the model executed.
2. skill_view: warns when skills are loaded from outside the trusted
~/.hermes/skills/ directory, and detects common injection patterns
in skill content ("ignore previous instructions", "<system>", etc.).
Cherry-picked from PR #1562 by ygd58.
* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)
Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.
- Apply truncate_message() chunking in _send_to_platform() before
dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement
Cherry-picked from PR #1557 by llbn.
* fix(approval): show full command in dangerous command approval (#1553)
Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:
- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests
Cherry-picked from PR #1566 by crazywriter1.
* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)
The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.
Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.
* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)
When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.
Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.
Cherry-picked from PR #1593 by ygd58.
* fix(security): block sandbox backend creds from subprocess env (#1264)
Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.
Cherry-picked from PR #1571 by ygd58.
* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)
When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.
Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.
The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).
---------
Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
* fix: prevent infinite 400 failure loop on context overflow (#1630)
When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message. This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error. Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.
Three-layer fix:
1. run_agent.py — Fallback heuristic: when a 400 error has a very short
generic message AND the session is large (>40% of context or >80
messages), treat it as a probable context overflow and trigger
compression instead of aborting.
2. run_agent.py + gateway/run.py — Don't persist failed messages:
when the agent returns failed=True before generating any response,
skip writing the user's message to the transcript/DB. This prevents
the session from growing on each failure.
3. gateway/run.py — Smarter error messages: detect context-overflow
failures and suggest /compact or /reset specifically, instead of a
generic 'try again' that will fail identically.
* fix(skills): detect prompt injection patterns and block cache file reads
Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):
1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
(index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
was the original injection vector — untrusted skill descriptions
in the catalog contained adversarial text that the model executed.
2. skill_view: warns when skills are loaded from outside the trusted
~/.hermes/skills/ directory, and detects common injection patterns
in skill content ("ignore previous instructions", "<system>", etc.).
Cherry-picked from PR #1562 by ygd58.
---------
Co-authored-by: buray <ygd58@users.noreply.github.com>
* refactor: centralize slash command registry
Replace 7+ scattered command definition sites with a single
CommandDef registry in hermes_cli/commands.py. All downstream
consumers now derive from this registry:
- CLI process_command() resolves aliases via resolve_command()
- Gateway _known_commands uses GATEWAY_KNOWN_COMMANDS frozenset
- Gateway help text generated by gateway_help_lines()
- Telegram BotCommands generated by telegram_bot_commands()
- Slack subcommand map generated by slack_subcommand_map()
Adding a command or alias is now a one-line change to
COMMAND_REGISTRY instead of touching 6+ files.
Bugfixes included:
- Telegram now registers /rollback, /background (were missing)
- Slack now has /voice, /update, /reload-mcp (were missing)
- Gateway duplicate 'reasoning' dispatch (dead code) removed
- Gateway help text can no longer drift from CLI help
Backwards-compatible: COMMANDS and COMMANDS_BY_CATEGORY dicts are
rebuilt from the registry, so existing imports work unchanged.
* docs: update developer docs for centralized command registry
Update AGENTS.md with full 'Slash Command Registry' and 'Adding a
Slash Command' sections covering CommandDef fields, registry helpers,
and the one-line alias workflow.
Also update:
- CONTRIBUTING.md: commands.py description
- website/docs/reference/slash-commands.md: reference central registry
- docs/plans/centralize-command-registry.md: mark COMPLETED
- plans/checkpoint-rollback.md: reference new pattern
- hermes-agent-dev skill: architecture table
* chore: remove stale plan docs