Grok models (x-ai/grok-4.20-beta, grok-code-fast-1) now receive tool-use
enforcement guidance, steering them to actually call tools instead of
describing intended actions. Matches both OpenRouter (x-ai/grok-*) and
direct xAI API usage.
After restarting a service-managed gateway (systemd/launchd), the
stale-process sweep calls find_gateway_pids() which returns ALL gateway
PIDs via ps aux — including the one just spawned by the service manager.
The sweep kills it, leaving the user with a stopped gateway and a
confusing 'Restart manually' message.
Fix: add _get_service_pids() to query systemd MainPID and launchd PID
for active gateway services, then exclude those PIDs from the sweep.
Also add exclude_pids parameter to find_gateway_pids() and
kill_gateway_processes() so callers can skip known service-managed PIDs.
Adds 9 targeted tests covering:
- _get_service_pids() for systemd, launchd, empty, and zero-PID cases
- find_gateway_pids() exclude_pids filtering
- cmd_update integration: service PID not killed after restart
- cmd_update integration: manual PID killed while service PID preserved
Cron agents were burning iterations trying to use send_message (which is
disabled via messaging toolset) because their prompts said things like
'send the report to Telegram'. The scheduler handles delivery
automatically via the deliver setting, but nothing told the agent that.
Add a delivery guidance hint to _build_job_prompt alongside the existing
[SILENT] hint: tells agents their final response is auto-delivered and
they should NOT use send_message.
Before: only [SILENT] suppression hint
After: delivery guidance ('do NOT use send_message') + [SILENT] hint
Port the gateway's inactivity-based timeout pattern (PR #5389) to the
cron scheduler. The agent can now run for hours if it's actively calling
tools or receiving stream tokens — only genuine inactivity (no activity
for HERMES_CRON_TIMEOUT seconds, default 600s) triggers a timeout.
This fixes the Sunday PR scouts (openclaw, nanoclaw, ironclaw) which
all hit the hard 600s wall-clock limit while actively working.
Changes:
- Replace flat future.result(timeout=N) with a polling loop that checks
agent.get_activity_summary() every 5s (same pattern as gateway)
- Timeout error now includes diagnostic info: last activity description,
idle duration, current tool, iteration count
- HERMES_CRON_TIMEOUT=0 means unlimited (no timeout)
- Move sys.path.insert before repo-level imports to fix
ModuleNotFoundError for hermes_time on stale gateway processes
- Add time import needed by the polling loop
- Add 9 tests covering active/idle/unlimited/env-var/diagnostic scenarios
- Rename per-LLM-call hooks from pre_llm_request/post_llm_request for clarity vs pre_llm_call
- Emit summary kwargs only (counts, usage dict from normalize_usage); keep env_var_enabled for HERMES_DUMP_REQUESTS
- Add is_truthy_value/env_var_enabled to utils; wire hermes_cli.plugins._env_enabled through it
- Update Langfuse local setup doc; add scripts/langfuse_smoketest.py and optional ~/.hermes plugin tests
Made-with: Cursor
Add validate_config_structure() that catches common config.yaml mistakes:
- custom_providers as dict instead of list (missing '-' in YAML)
- fallback_model accidentally nested inside another section
- custom_providers entries missing required fields (name, base_url)
- Missing model section when custom_providers is configured
- Root-level keys that look like misplaced custom_providers fields
Surface these diagnostics at three levels:
1. Startup: print_config_warnings() runs at CLI and gateway module load,
so users see issues before hitting cryptic errors
2. Error time: 'Unknown provider' errors in auth.py and model_switch.py
now include config diagnostics with fix suggestions
3. Doctor: 'hermes doctor' shows a Config Structure section with all
issues and fix hints
Also adds a warning log in runtime_provider.py when custom_providers
is a dict (previously returned None silently).
Motivated by a Discord user who had malformed custom_providers YAML
and got only 'Unknown Provider' with no guidance on what was wrong.
17 new tests covering all validation paths.
Consolidated salvage from PRs #5301 (qaqcvc), #5339 (lance0),
#5058 and #5098 (maymuneth).
Mem0 API v2 compatibility (#5301):
- All reads use filters={user_id: ...} instead of bare user_id= kwarg
- All writes use filters with user_id + agent_id for attribution
- Response unwrapping for v2 dict format {results: [...]}
- Split _read_filters() vs _write_filters() — reads are user-scoped
only for cross-session recall, writes include agent_id
- Preserved 'hermes-user' default (no breaking change for existing users)
- Omitted run_id scoping from #5301 — cross-session memory is Mem0's
core value, session-scoping reads would defeat that purpose
Memory prefetch context fencing (#5339):
- Wraps prefetched memory in <memory-context> fenced blocks with system
note marking content as recalled context, NOT user input
- Sanitizes provider output to strip fence-escape sequences, preventing
injection where memory content breaks out of the fence
- API-call-time only — never persisted to session history
Secret redaction (#5058, #5098):
- Added prefix patterns for Groq (gsk_), Matrix (syt_), RetainDB
(retaindb_), Hindsight (hsk-), Mem0 (mem0_), ByteRover (brv_)
All 35 subprocess.run() calls in hermes_cli/gateway.py lacked timeout
parameters. If systemctl, launchctl, loginctl, wmic, or ps blocks,
hermes gateway start/stop/restart/status/install/uninstall hangs
indefinitely with no feedback.
Timeouts tiered by operation type:
- 10s: instant queries (is-active, status, list, ps, tail, journalctl)
- 30s: fast lifecycle (daemon-reload, enable, start, bootstrap, kickstart)
- 90s: graceful shutdown (stop, restart, bootout, kickstart -k) — exceeds
our TimeoutStopSec=60 to avoid premature timeout during shutdown
Special handling: _is_service_running() and launchd_status() catch
TimeoutExpired and treat it as not-running/not-loaded, consistent with
how non-zero return codes are already handled.
Inspired by PR #3732 (dlkakbs) and issue #4057 (SHL0MS).
Reimplemented on current main which has significantly changed launchctl
handling (bootout/bootstrap/kickstart vs legacy load/unload/start/stop).
Display live per-million-token pricing from /v1/models when listing
models for OpenRouter or Nous Portal. Prices are shown in a
column-aligned table with decimal points vertically aligned for
easy comparison.
Pricing appears in three places:
- /provider slash command (table with In/Out headers)
- hermes model picker (aligned columns in both TerminalMenu and
numbered fallback)
Implementation:
- Add fetch_models_with_pricing() in models.py with per-base_url
module-level cache (one network call per endpoint per session)
- Add _format_price_per_mtok() with fixed 2-decimal formatting
- Add format_model_pricing_table() for terminal table display
- Add get_pricing_for_provider() convenience wrapper
- Update _prompt_model_selection() to accept optional pricing dict
- Wire pricing through _model_flow_openrouter/nous in main.py
- Update test mocks for new pricing parameter
Adds OPENAI_MODEL_EXECUTION_GUIDANCE — XML-tagged behavioral guidance
injected for GPT and Codex models alongside the existing tool-use
enforcement. Targets four specific failure modes:
- <tool_persistence>: retry on empty/partial results instead of giving up
- <prerequisite_checks>: do discovery/lookup before jumping to final action
- <verification>: check correctness/grounding/formatting before finalizing
- <missing_context>: use lookup tools instead of hallucinating
Follows the same injection pattern as GOOGLE_MODEL_OPERATIONAL_GUIDANCE
for Gemini/Gemma models. Inspired by OpenClaw PR #38953 and OpenAI's
GPT-5.4 prompting guide patterns.
Threads (Telegram forum topics, Discord threads, Slack threads) now default
to shared sessions where all participants see the same conversation. This is
the expected UX for threaded conversations where multiple users @mention the
bot and interact collaboratively.
Changes:
- build_session_key(): when thread_id is present, user_id is no longer
appended to the session key (threads are shared by default)
- New config: thread_sessions_per_user (default: false) — opt-in to restore
per-user isolation in threads if needed
- Sender attribution: messages in shared threads are prefixed with
[sender name] so the agent can tell participants apart
- System prompt: shared threads show 'Multi-user thread' note instead of
a per-turn User line (avoids busting prompt cache)
- Wired through all callers: gateway/run.py, base.py, telegram.py, feishu.py
- Regular group messages (no thread) remain per-user isolated (unchanged)
- DM threads are unaffected (they have their own keying logic)
Closes community request from demontut_ re: thread-based shared sessions.
Reject "." as a plugin name — it resolves to the plugins directory
itself, which in force-install flows causes shutil.rmtree to wipe the
entire plugins tree.
- reject "." early with a clear error message
- explicit check for target == plugins_resolved (raise instead of allow)
- switch boundary check from string-prefix to Path.relative_to()
- add regression tests for sanitizer + install flow
Co-authored-by: Dusk1e <yusufalweshdemir@gmail.com>
Before launching an MCP server via npx/uvx, queries the OSV (Open Source
Vulnerabilities) API to check if the package has known malware advisories
(MAL-* IDs). Regular CVEs are ignored — only confirmed malware is blocked.
- Free, public API (Google-maintained), ~300ms per query
- Runs once per MCP server launch, inside _run_stdio() before subprocess spawn
- Parallel with other MCP servers (asyncio.gather already in place)
- Fail-open: network errors, timeouts, unrecognized commands → allow
- Parses npm (scoped @scope/pkg@version) and PyPI (name[extras]==version)
Inspired by Block/goose extension malware check.
Only request the privileged members intent when DISCORD_ALLOWED_USERS includes non-numeric entries that need username resolution. Also release the Discord token lock when startup fails so retries and restarts are not blocked by a stale lock.\n\nAdds regression tests for conditional intents and startup lock cleanup.
Existing honcho.json configs without an explicit observationMode now
default to 'unified' (the old default) instead of being silently
switched to 'directional'. New installations get 'directional' as
the new default.
Detection: _explicitly_configured (host block exists or enabled=true)
signals an existing config. When true and no observationMode is set
anywhere in the config chain, falls back to 'unified'. When false
(fresh install), uses 'directional'.
Users who explicitly set observationMode or granular observation
booleans are unaffected — explicit config always wins.
5 new tests covering all migration paths.
discover_plugin_cli_commands() now reads memory.provider from config.yaml
and only loads CLI registration for the active provider. If no memory
provider is set, no plugin CLI commands appear in the CLI.
Only one memory provider can be active at a time — at most one set of
plugin CLI commands is registered. Users who haven't configured honcho
(or any memory provider) won't see 'hermes honcho' in their help output.
Adds test for inactive provider returning empty results.
As the agent navigates into subdirectories via tool calls (read_file,
terminal, search_files, etc.), automatically discover and load project
context files (AGENTS.md, CLAUDE.md, .cursorrules) from those directories.
Previously, context files were only loaded from the CWD at session start.
If the agent moved into backend/, frontend/, or any subdirectory with its
own AGENTS.md, those instructions were never seen.
Now, SubdirectoryHintTracker watches tool call arguments for file paths
and shell commands, resolves directories, and loads hint files on first
access. Discovered hints are appended to the tool result so the model
gets relevant context at the moment it starts working in a new area —
without modifying the system prompt (preserving prompt caching).
Features:
- Extracts paths from tool args (path, workdir) and shell commands
- Loads AGENTS.md, CLAUDE.md, .cursorrules (first match per directory)
- Deduplicates — each directory loaded at most once per session
- Ignores paths outside the working directory
- Truncates large hint files at 8K chars
- Works on both sequential and concurrent tool execution paths
Inspired by Block/goose SubdirectoryHintTracker.
Plain functions imported as class attributes in APIServerAdapter get
auto-bound as methods via Python's descriptor protocol. Every
self._cron_*() call injected self as the first positional argument,
causing TypeError on all 8 cron API endpoints at runtime.
Wrap each import with staticmethod() so self._cron_*() calls dispatch
correctly without modifying any call sites.
Co-authored-by: teknium <teknium@nousresearch.com>
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).
Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.
Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.
Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Send AvailableCommandsUpdate on session create/load/resume/fork so ACP
clients (Zed, etc.) can discover /help, /model, /tools, /compact, etc.
Also rewrites /compact to use agent._compress_context() properly with
token estimation and session DB isolation.
Co-authored-by: NexVeridian <NexVeridian@users.noreply.github.com>
Route AIAgent print output to stderr via _print_fn for ACP stdio sessions.
Gate quiet-mode spinner startup on _should_start_quiet_spinner() so JSON-RPC
on stdout isn't corrupted. Child agents inherit the redirect.
Co-authored-by: Git-on-my-level <Git-on-my-level@users.noreply.github.com>
Removing an env-seeded credential (e.g. from OPENROUTER_API_KEY) via
'hermes auth' previously had no lasting effect -- the entry was deleted
from auth.json but load_pool() re-created it on the next call because
the env var was still set.
Now auth_remove_command detects env-sourced entries (source starts with
'env:') and calls the new remove_env_value() to strip the var from both
.env and os.environ, preventing re-seeding.
Changes:
- hermes_cli/config.py: add remove_env_value() -- atomically removes a
line from .env and pops from os.environ
- hermes_cli/auth_commands.py: auth_remove_command clears env var when
removing an env-seeded pool entry
- 8 new tests covering remove_env_value and the full zombie-credential
lifecycle (remove -> reload -> stays gone)
Previously, typing a /command that isn't a built-in, plugin, or skill
would silently fall through to the LLM as plain text. The model often
interprets it as a loose instruction and invents unrelated tool calls —
e.g. a stray /claude_code slipped through and the model fabricated a
delegate_task invocation that got stuck in an OAuth loop.
Now we check GATEWAY_KNOWN_COMMANDS after the skill / plugin /
unavailable-skill lookups and return an actionable message pointing the
user at /commands. The user gets feedback, and the agent doesn't waste
a round-trip guessing what /foo-bar was supposed to mean.
Telegram's Bot API disallows hyphens in command names, so
_build_telegram_menu registers /claude-code as /claude_code. When the
user taps it from autocomplete, the gateway dispatch did a direct
lookup against skill_cmds (keyed on the hyphenated form) and missed,
silently falling through to the LLM as plain text. The model would
then typically call delegate_task, spawning a Hermes subagent instead
of invoking the intended skill.
Normalize underscores to hyphens in skill and plugin command lookup,
matching the existing pattern in _check_unavailable_skill.
Telegram polling can inherit a stale webhook registration when a deployment
switches transport modes, which leaves getUpdates idle even though the gateway
starts cleanly. Outbound send also treats Telegram retry_after responses as
terminal errors, so brief flood control can drop tool progress and replies.
Constraint: Keep the PR narrowly scoped to upstream/main Telegram adapter behavior
Rejected: Port OpenClaw's broader polling supervisor and offset persistence | too broad for an isolated fix PR
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Polling mode should clear webhook state before starting getUpdates, and send-path retry logic must distinguish flood control from timeouts
Tested: uv run --extra dev pytest tests/gateway/test_telegram_* -q
Not-tested: Live Telegram webhook-to-polling migration and real Bot API 429 behavior
Replace the two-step stop/start restart with a single
launchctl kickstart -k call. When the gateway triggers a
restart from inside its own process tree, the old stop
command kills the shell before the start half is reached.
kickstart -k lets launchd handle the kill+restart atomically.
launchctl load/unload/start/stop are deprecated on macOS since 10.10
and fail silently on modern versions. This replaces them with the
current equivalents:
- load -> bootstrap gui/<uid> <plist>
- unload -> bootout gui/<uid>/<label>
- start -> kickstart gui/<uid>/<label>
- stop -> kill SIGTERM gui/<uid>/<label>
Adds _launchd_domain() helper returning the gui/<uid> target domain.
Updates test assertions to match the new command signatures.
Fixes#4820
When an agent was actively processing a message, /status sent via Telegram
(or any gateway) was queued as a pending interrupt instead of being dispatched
immediately. The base platform adapter's handle_message() only had special-case
bypass logic for /approve and /deny, so /status fell through to the default
interrupt path and was never processed as a system command.
Apply the same bypass pattern used by /approve//deny: detect cmd == 'status'
inside the active-session guard, dispatch directly to the message handler, and
send the response without touching session lifecycle or interrupt state.
Adds a regression test that verifies /status is dispatched and responded to
immediately even when _active_sessions contains an entry for the session.
* feat: coerce tool call arguments to match JSON Schema types
LLMs frequently return numbers as strings ("42" instead of 42) and
booleans as strings ("true" instead of true). This causes silent
failures with MCP tools and any tool with strictly-typed parameters.
Added coerce_tool_args() in model_tools.py that runs before every tool
dispatch. For each argument, it checks the tool registry schema and
attempts safe coercion:
- "42" → 42 when schema says "type": "integer"
- "3.14" → 3.14 when schema says "type": "number"
- "true"/"false" → True/False when schema says "type": "boolean"
- Union types tried in order
- Original values preserved when coercion fails or is not applicable
Inspired by Block/goose tool argument coercion system.
* fix: accept reasoning-only responses without retries — set content to "(empty)"
Previously, when a model returned reasoning/thinking but no visible
content, we entered a 120-line retry/classify/compress/salvage cascade
that wasted 3+ API calls trying to "fix" the response. The model was
done thinking — retrying with the same input just burned money.
Now reasoning-only responses are accepted immediately:
- Reasoning stays in the `reasoning` field (semantically correct)
- Content set to "(empty)" — valid non-empty string every provider accepts
- No retries, no compression triggers, no salvage logic
- Session history contains "(empty)" not "" — prevents #2128 session
poisoning where empty assistant content caused prefill rejections
Removes ~120 lines, adds ~15. Saves 2-3 API calls per reasoning-only
response. Fixes#2128.
Cherry-picked from PR #4338 by nepenth, resolved against current main.
Adds:
- Processing lifecycle reactions (eyes/checkmark/cross) via MATRIX_REACTIONS env
- Reaction send/receive with ReactionEvent + UnknownEvent fallback for older nio
- Fire-and-forget read receipts on text and media messages
- Message redaction, room history fetch, room creation, user invite
- Presence status control (online/offline/unavailable)
- Emote (/me) and notice message types with HTML rendering
- XSS-hardened markdown-to-HTML converter (strips raw HTML preprocessor,
sanitizes link URLs against javascript:/data:/vbscript: schemes)
- Comprehensive regex fallback with full block/inline markdown support
- Markdown>=3.6 added to [matrix] extras in pyproject.toml
- 46 new tests covering all features and security hardening
Cherry-picked from PR #3140 by chalkers, resolved against current main.
Registers RoomEncryptedImage/Audio/Video/File callbacks, decrypts
attachments via nio.crypto, caches all media types (images, audio,
documents), prevents ciphertext URL fallback for encrypted media.
Unifies the separate voice-message download into the main cache block.
Preserves main's MATRIX_REQUIRE_MENTION, auto-thread, and mention
stripping features. Includes 355 lines of encrypted media tests.
Cherry-picked from PR #3695 by binhnt92.
Matrix _sync_loop() and Mattermost _ws_loop() were retrying all errors
forever, including permanent auth failures (expired tokens, revoked
access). Now detects M_UNKNOWN_TOKEN, M_FORBIDDEN, 401/403 and stops
instead of spinning. Includes 216 lines of tests.
Cherry-picked from PR #5136 by thakoreh.
setup_gateway() uses shutil.which('uv') at line 2126 but shutil was
never imported at module level, causing NameError during Matrix E2EE
auto-install. Adds top-level import and regression test.
- Add OLLAMA_API_KEY to credential resolution chain for ollama.com endpoints
- Update requested_provider/_explicit_api_key/_explicit_base_url after /model
switch so _ensure_runtime_credentials() doesn't revert the switch
- Pass base_url/api_key from fallback config to resolve_provider_client()
- Add DirectAlias system: user-configurable model_aliases in config.yaml
checked before catalog resolution, with reverse lookup by model ID
- Add /model tab completion showing aliases with provider metadata
Co-authored-by: LucidPaths <LucidPaths@users.noreply.github.com>
LLMs frequently return numbers as strings ("42" instead of 42) and
booleans as strings ("true" instead of true). This causes silent
failures with MCP tools and any tool with strictly-typed parameters.
Added coerce_tool_args() in model_tools.py that runs before every tool
dispatch. For each argument, it checks the tool registry schema and
attempts safe coercion:
- "42" → 42 when schema says "type": "integer"
- "3.14" → 3.14 when schema says "type": "number"
- "true"/"false" → True/False when schema says "type": "boolean"
- Union types tried in order
- Original values preserved when coercion fails or is not applicable
Inspired by Block/goose tool argument coercion system.
Previously, tool results exceeding 100K characters were silently chopped
with only a '[Truncated]' notice — the rest of the content was lost
permanently. The model had no way to access the truncated portion.
Now, oversized results are written to HERMES_HOME/cache/tool_responses/
and the model receives:
- A 1,500-char head preview for immediate context
- The file path so it can use read_file/search_files on the full output
This preserves the context window protection (inline content stays small)
while making the full data recoverable. Falls back to the old destructive
truncation if the file write fails.
Inspired by Block/goose's large response handler pattern.
* feat(gateway): live-stream /update output + forward interactive prompts
Adds real-time output streaming and interactive prompt forwarding for
the gateway /update command, so users on Telegram/Discord/etc see the
full update progress and can respond to prompts (stash restore, config
migration) without needing terminal access.
Changes:
hermes_cli/main.py:
- Add --gateway flag to 'hermes update' argparse
- Add _gateway_prompt() file-based IPC function that writes
.update_prompt.json and polls for .update_response
- Modify _restore_stashed_changes() to accept optional input_fn
parameter for gateway mode prompt forwarding
- cmd_update() uses _gateway_prompt when --gateway is set, enabling
interactive stash restore and config migration prompts
gateway/run.py:
- _handle_update_command: spawn with --gateway flag and
PYTHONUNBUFFERED=1 for real-time output flushing
- Store session_key in .update_pending.json for cross-restart
session matching
- Add _update_prompt_pending dict to track sessions awaiting
update prompt responses
- Replace _watch_for_update_completion with _watch_update_progress:
streams output chunks every ~4s, detects .update_prompt.json and
forwards prompts to the user, handles completion/failure/timeout
- Add update prompt interception in _handle_message: when a prompt
is pending, the user's next message is written to .update_response
instead of being processed normally
- Preserve _send_update_notification as legacy fallback for
post-restart cases where adapter isn't available yet
File-based IPC protocol:
- .update_prompt.json: written by update process with prompt text,
default value, and unique ID
- .update_response: written by gateway with user's answer
- .update_output.txt: existing, now streamed in real-time
- .update_exit_code: existing completion marker
Tests: 16 new tests covering _gateway_prompt IPC, output streaming,
prompt detection/forwarding, message interception, and cleanup.
* feat: interactive buttons for update prompts (Telegram + Discord)
Telegram: Inline keyboard with ✓ Yes / ✗ No buttons. Clicking a button
answers the callback query, edits the message to show the choice, and
writes .update_response directly. CallbackQueryHandler registered on
the update_prompt: prefix.
Discord: UpdatePromptView (discord.ui.View) with green Yes / red No
buttons. Follows the ExecApprovalView pattern — auth check, embed color
update, disabled-after-click. Writes .update_response on click.
All platforms: /approve and /deny (and /yes, /no) now work as shorthand
for yes/no when an update prompt is pending. The text fallback message
instructs users to use these commands. Raw message interception still
works as a fallback for non-command responses.
Gateway watcher checks adapter for send_update_prompt method (class-level
check to avoid MagicMock false positives) and falls back to text prompt
with /approve instructions when unavailable.
* fix: block /update on non-messaging platforms (API, webhooks, ACP)
Add _UPDATE_ALLOWED_PLATFORMS frozenset that explicitly lists messaging
platforms where /update is permitted. API server, webhook, and ACP
platforms get a clear error directing them to run hermes update from
the terminal instead.
ACP and API server already don't reach _handle_message (separate
codepaths), and webhooks have distinct session keys that can't collide
with messaging sessions. This guard is belt-and-suspenders.