- Rename per-LLM-call hooks from pre_llm_request/post_llm_request for clarity vs pre_llm_call
- Emit summary kwargs only (counts, usage dict from normalize_usage); keep env_var_enabled for HERMES_DUMP_REQUESTS
- Add is_truthy_value/env_var_enabled to utils; wire hermes_cli.plugins._env_enabled through it
- Update Langfuse local setup doc; add scripts/langfuse_smoketest.py and optional ~/.hermes plugin tests
Made-with: Cursor
Add validate_config_structure() that catches common config.yaml mistakes:
- custom_providers as dict instead of list (missing '-' in YAML)
- fallback_model accidentally nested inside another section
- custom_providers entries missing required fields (name, base_url)
- Missing model section when custom_providers is configured
- Root-level keys that look like misplaced custom_providers fields
Surface these diagnostics at three levels:
1. Startup: print_config_warnings() runs at CLI and gateway module load,
so users see issues before hitting cryptic errors
2. Error time: 'Unknown provider' errors in auth.py and model_switch.py
now include config diagnostics with fix suggestions
3. Doctor: 'hermes doctor' shows a Config Structure section with all
issues and fix hints
Also adds a warning log in runtime_provider.py when custom_providers
is a dict (previously returned None silently).
Motivated by a Discord user who had malformed custom_providers YAML
and got only 'Unknown Provider' with no guidance on what was wrong.
17 new tests covering all validation paths.
Consolidated salvage from PRs #5301 (qaqcvc), #5339 (lance0),
#5058 and #5098 (maymuneth).
Mem0 API v2 compatibility (#5301):
- All reads use filters={user_id: ...} instead of bare user_id= kwarg
- All writes use filters with user_id + agent_id for attribution
- Response unwrapping for v2 dict format {results: [...]}
- Split _read_filters() vs _write_filters() — reads are user-scoped
only for cross-session recall, writes include agent_id
- Preserved 'hermes-user' default (no breaking change for existing users)
- Omitted run_id scoping from #5301 — cross-session memory is Mem0's
core value, session-scoping reads would defeat that purpose
Memory prefetch context fencing (#5339):
- Wraps prefetched memory in <memory-context> fenced blocks with system
note marking content as recalled context, NOT user input
- Sanitizes provider output to strip fence-escape sequences, preventing
injection where memory content breaks out of the fence
- API-call-time only — never persisted to session history
Secret redaction (#5058, #5098):
- Added prefix patterns for Groq (gsk_), Matrix (syt_), RetainDB
(retaindb_), Hindsight (hsk-), Mem0 (mem0_), ByteRover (brv_)
All 35 subprocess.run() calls in hermes_cli/gateway.py lacked timeout
parameters. If systemctl, launchctl, loginctl, wmic, or ps blocks,
hermes gateway start/stop/restart/status/install/uninstall hangs
indefinitely with no feedback.
Timeouts tiered by operation type:
- 10s: instant queries (is-active, status, list, ps, tail, journalctl)
- 30s: fast lifecycle (daemon-reload, enable, start, bootstrap, kickstart)
- 90s: graceful shutdown (stop, restart, bootout, kickstart -k) — exceeds
our TimeoutStopSec=60 to avoid premature timeout during shutdown
Special handling: _is_service_running() and launchd_status() catch
TimeoutExpired and treat it as not-running/not-loaded, consistent with
how non-zero return codes are already handled.
Inspired by PR #3732 (dlkakbs) and issue #4057 (SHL0MS).
Reimplemented on current main which has significantly changed launchctl
handling (bootout/bootstrap/kickstart vs legacy load/unload/start/stop).
Display live per-million-token pricing from /v1/models when listing
models for OpenRouter or Nous Portal. Prices are shown in a
column-aligned table with decimal points vertically aligned for
easy comparison.
Pricing appears in three places:
- /provider slash command (table with In/Out headers)
- hermes model picker (aligned columns in both TerminalMenu and
numbered fallback)
Implementation:
- Add fetch_models_with_pricing() in models.py with per-base_url
module-level cache (one network call per endpoint per session)
- Add _format_price_per_mtok() with fixed 2-decimal formatting
- Add format_model_pricing_table() for terminal table display
- Add get_pricing_for_provider() convenience wrapper
- Update _prompt_model_selection() to accept optional pricing dict
- Wire pricing through _model_flow_openrouter/nous in main.py
- Update test mocks for new pricing parameter
Adds OPENAI_MODEL_EXECUTION_GUIDANCE — XML-tagged behavioral guidance
injected for GPT and Codex models alongside the existing tool-use
enforcement. Targets four specific failure modes:
- <tool_persistence>: retry on empty/partial results instead of giving up
- <prerequisite_checks>: do discovery/lookup before jumping to final action
- <verification>: check correctness/grounding/formatting before finalizing
- <missing_context>: use lookup tools instead of hallucinating
Follows the same injection pattern as GOOGLE_MODEL_OPERATIONAL_GUIDANCE
for Gemini/Gemma models. Inspired by OpenClaw PR #38953 and OpenAI's
GPT-5.4 prompting guide patterns.
Threads (Telegram forum topics, Discord threads, Slack threads) now default
to shared sessions where all participants see the same conversation. This is
the expected UX for threaded conversations where multiple users @mention the
bot and interact collaboratively.
Changes:
- build_session_key(): when thread_id is present, user_id is no longer
appended to the session key (threads are shared by default)
- New config: thread_sessions_per_user (default: false) — opt-in to restore
per-user isolation in threads if needed
- Sender attribution: messages in shared threads are prefixed with
[sender name] so the agent can tell participants apart
- System prompt: shared threads show 'Multi-user thread' note instead of
a per-turn User line (avoids busting prompt cache)
- Wired through all callers: gateway/run.py, base.py, telegram.py, feishu.py
- Regular group messages (no thread) remain per-user isolated (unchanged)
- DM threads are unaffected (they have their own keying logic)
Closes community request from demontut_ re: thread-based shared sessions.
Reject "." as a plugin name — it resolves to the plugins directory
itself, which in force-install flows causes shutil.rmtree to wipe the
entire plugins tree.
- reject "." early with a clear error message
- explicit check for target == plugins_resolved (raise instead of allow)
- switch boundary check from string-prefix to Path.relative_to()
- add regression tests for sanitizer + install flow
Co-authored-by: Dusk1e <yusufalweshdemir@gmail.com>
Before launching an MCP server via npx/uvx, queries the OSV (Open Source
Vulnerabilities) API to check if the package has known malware advisories
(MAL-* IDs). Regular CVEs are ignored — only confirmed malware is blocked.
- Free, public API (Google-maintained), ~300ms per query
- Runs once per MCP server launch, inside _run_stdio() before subprocess spawn
- Parallel with other MCP servers (asyncio.gather already in place)
- Fail-open: network errors, timeouts, unrecognized commands → allow
- Parses npm (scoped @scope/pkg@version) and PyPI (name[extras]==version)
Inspired by Block/goose extension malware check.
Only request the privileged members intent when DISCORD_ALLOWED_USERS includes non-numeric entries that need username resolution. Also release the Discord token lock when startup fails so retries and restarts are not blocked by a stale lock.\n\nAdds regression tests for conditional intents and startup lock cleanup.
Existing honcho.json configs without an explicit observationMode now
default to 'unified' (the old default) instead of being silently
switched to 'directional'. New installations get 'directional' as
the new default.
Detection: _explicitly_configured (host block exists or enabled=true)
signals an existing config. When true and no observationMode is set
anywhere in the config chain, falls back to 'unified'. When false
(fresh install), uses 'directional'.
Users who explicitly set observationMode or granular observation
booleans are unaffected — explicit config always wins.
5 new tests covering all migration paths.
discover_plugin_cli_commands() now reads memory.provider from config.yaml
and only loads CLI registration for the active provider. If no memory
provider is set, no plugin CLI commands appear in the CLI.
Only one memory provider can be active at a time — at most one set of
plugin CLI commands is registered. Users who haven't configured honcho
(or any memory provider) won't see 'hermes honcho' in their help output.
Adds test for inactive provider returning empty results.
As the agent navigates into subdirectories via tool calls (read_file,
terminal, search_files, etc.), automatically discover and load project
context files (AGENTS.md, CLAUDE.md, .cursorrules) from those directories.
Previously, context files were only loaded from the CWD at session start.
If the agent moved into backend/, frontend/, or any subdirectory with its
own AGENTS.md, those instructions were never seen.
Now, SubdirectoryHintTracker watches tool call arguments for file paths
and shell commands, resolves directories, and loads hint files on first
access. Discovered hints are appended to the tool result so the model
gets relevant context at the moment it starts working in a new area —
without modifying the system prompt (preserving prompt caching).
Features:
- Extracts paths from tool args (path, workdir) and shell commands
- Loads AGENTS.md, CLAUDE.md, .cursorrules (first match per directory)
- Deduplicates — each directory loaded at most once per session
- Ignores paths outside the working directory
- Truncates large hint files at 8K chars
- Works on both sequential and concurrent tool execution paths
Inspired by Block/goose SubdirectoryHintTracker.
Plain functions imported as class attributes in APIServerAdapter get
auto-bound as methods via Python's descriptor protocol. Every
self._cron_*() call injected self as the first positional argument,
causing TypeError on all 8 cron API endpoints at runtime.
Wrap each import with staticmethod() so self._cron_*() calls dispatch
correctly without modifying any call sites.
Co-authored-by: teknium <teknium@nousresearch.com>
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).
Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.
Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.
Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Send AvailableCommandsUpdate on session create/load/resume/fork so ACP
clients (Zed, etc.) can discover /help, /model, /tools, /compact, etc.
Also rewrites /compact to use agent._compress_context() properly with
token estimation and session DB isolation.
Co-authored-by: NexVeridian <NexVeridian@users.noreply.github.com>
Route AIAgent print output to stderr via _print_fn for ACP stdio sessions.
Gate quiet-mode spinner startup on _should_start_quiet_spinner() so JSON-RPC
on stdout isn't corrupted. Child agents inherit the redirect.
Co-authored-by: Git-on-my-level <Git-on-my-level@users.noreply.github.com>
Removing an env-seeded credential (e.g. from OPENROUTER_API_KEY) via
'hermes auth' previously had no lasting effect -- the entry was deleted
from auth.json but load_pool() re-created it on the next call because
the env var was still set.
Now auth_remove_command detects env-sourced entries (source starts with
'env:') and calls the new remove_env_value() to strip the var from both
.env and os.environ, preventing re-seeding.
Changes:
- hermes_cli/config.py: add remove_env_value() -- atomically removes a
line from .env and pops from os.environ
- hermes_cli/auth_commands.py: auth_remove_command clears env var when
removing an env-seeded pool entry
- 8 new tests covering remove_env_value and the full zombie-credential
lifecycle (remove -> reload -> stays gone)
Previously, typing a /command that isn't a built-in, plugin, or skill
would silently fall through to the LLM as plain text. The model often
interprets it as a loose instruction and invents unrelated tool calls —
e.g. a stray /claude_code slipped through and the model fabricated a
delegate_task invocation that got stuck in an OAuth loop.
Now we check GATEWAY_KNOWN_COMMANDS after the skill / plugin /
unavailable-skill lookups and return an actionable message pointing the
user at /commands. The user gets feedback, and the agent doesn't waste
a round-trip guessing what /foo-bar was supposed to mean.
Telegram's Bot API disallows hyphens in command names, so
_build_telegram_menu registers /claude-code as /claude_code. When the
user taps it from autocomplete, the gateway dispatch did a direct
lookup against skill_cmds (keyed on the hyphenated form) and missed,
silently falling through to the LLM as plain text. The model would
then typically call delegate_task, spawning a Hermes subagent instead
of invoking the intended skill.
Normalize underscores to hyphens in skill and plugin command lookup,
matching the existing pattern in _check_unavailable_skill.
Telegram polling can inherit a stale webhook registration when a deployment
switches transport modes, which leaves getUpdates idle even though the gateway
starts cleanly. Outbound send also treats Telegram retry_after responses as
terminal errors, so brief flood control can drop tool progress and replies.
Constraint: Keep the PR narrowly scoped to upstream/main Telegram adapter behavior
Rejected: Port OpenClaw's broader polling supervisor and offset persistence | too broad for an isolated fix PR
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Polling mode should clear webhook state before starting getUpdates, and send-path retry logic must distinguish flood control from timeouts
Tested: uv run --extra dev pytest tests/gateway/test_telegram_* -q
Not-tested: Live Telegram webhook-to-polling migration and real Bot API 429 behavior
Replace the two-step stop/start restart with a single
launchctl kickstart -k call. When the gateway triggers a
restart from inside its own process tree, the old stop
command kills the shell before the start half is reached.
kickstart -k lets launchd handle the kill+restart atomically.
launchctl load/unload/start/stop are deprecated on macOS since 10.10
and fail silently on modern versions. This replaces them with the
current equivalents:
- load -> bootstrap gui/<uid> <plist>
- unload -> bootout gui/<uid>/<label>
- start -> kickstart gui/<uid>/<label>
- stop -> kill SIGTERM gui/<uid>/<label>
Adds _launchd_domain() helper returning the gui/<uid> target domain.
Updates test assertions to match the new command signatures.
Fixes#4820
When an agent was actively processing a message, /status sent via Telegram
(or any gateway) was queued as a pending interrupt instead of being dispatched
immediately. The base platform adapter's handle_message() only had special-case
bypass logic for /approve and /deny, so /status fell through to the default
interrupt path and was never processed as a system command.
Apply the same bypass pattern used by /approve//deny: detect cmd == 'status'
inside the active-session guard, dispatch directly to the message handler, and
send the response without touching session lifecycle or interrupt state.
Adds a regression test that verifies /status is dispatched and responded to
immediately even when _active_sessions contains an entry for the session.
* feat: coerce tool call arguments to match JSON Schema types
LLMs frequently return numbers as strings ("42" instead of 42) and
booleans as strings ("true" instead of true). This causes silent
failures with MCP tools and any tool with strictly-typed parameters.
Added coerce_tool_args() in model_tools.py that runs before every tool
dispatch. For each argument, it checks the tool registry schema and
attempts safe coercion:
- "42" → 42 when schema says "type": "integer"
- "3.14" → 3.14 when schema says "type": "number"
- "true"/"false" → True/False when schema says "type": "boolean"
- Union types tried in order
- Original values preserved when coercion fails or is not applicable
Inspired by Block/goose tool argument coercion system.
* fix: accept reasoning-only responses without retries — set content to "(empty)"
Previously, when a model returned reasoning/thinking but no visible
content, we entered a 120-line retry/classify/compress/salvage cascade
that wasted 3+ API calls trying to "fix" the response. The model was
done thinking — retrying with the same input just burned money.
Now reasoning-only responses are accepted immediately:
- Reasoning stays in the `reasoning` field (semantically correct)
- Content set to "(empty)" — valid non-empty string every provider accepts
- No retries, no compression triggers, no salvage logic
- Session history contains "(empty)" not "" — prevents #2128 session
poisoning where empty assistant content caused prefill rejections
Removes ~120 lines, adds ~15. Saves 2-3 API calls per reasoning-only
response. Fixes#2128.
Cherry-picked from PR #4338 by nepenth, resolved against current main.
Adds:
- Processing lifecycle reactions (eyes/checkmark/cross) via MATRIX_REACTIONS env
- Reaction send/receive with ReactionEvent + UnknownEvent fallback for older nio
- Fire-and-forget read receipts on text and media messages
- Message redaction, room history fetch, room creation, user invite
- Presence status control (online/offline/unavailable)
- Emote (/me) and notice message types with HTML rendering
- XSS-hardened markdown-to-HTML converter (strips raw HTML preprocessor,
sanitizes link URLs against javascript:/data:/vbscript: schemes)
- Comprehensive regex fallback with full block/inline markdown support
- Markdown>=3.6 added to [matrix] extras in pyproject.toml
- 46 new tests covering all features and security hardening
Cherry-picked from PR #3140 by chalkers, resolved against current main.
Registers RoomEncryptedImage/Audio/Video/File callbacks, decrypts
attachments via nio.crypto, caches all media types (images, audio,
documents), prevents ciphertext URL fallback for encrypted media.
Unifies the separate voice-message download into the main cache block.
Preserves main's MATRIX_REQUIRE_MENTION, auto-thread, and mention
stripping features. Includes 355 lines of encrypted media tests.
Cherry-picked from PR #3695 by binhnt92.
Matrix _sync_loop() and Mattermost _ws_loop() were retrying all errors
forever, including permanent auth failures (expired tokens, revoked
access). Now detects M_UNKNOWN_TOKEN, M_FORBIDDEN, 401/403 and stops
instead of spinning. Includes 216 lines of tests.
Cherry-picked from PR #5136 by thakoreh.
setup_gateway() uses shutil.which('uv') at line 2126 but shutil was
never imported at module level, causing NameError during Matrix E2EE
auto-install. Adds top-level import and regression test.
- Add OLLAMA_API_KEY to credential resolution chain for ollama.com endpoints
- Update requested_provider/_explicit_api_key/_explicit_base_url after /model
switch so _ensure_runtime_credentials() doesn't revert the switch
- Pass base_url/api_key from fallback config to resolve_provider_client()
- Add DirectAlias system: user-configurable model_aliases in config.yaml
checked before catalog resolution, with reverse lookup by model ID
- Add /model tab completion showing aliases with provider metadata
Co-authored-by: LucidPaths <LucidPaths@users.noreply.github.com>
LLMs frequently return numbers as strings ("42" instead of 42) and
booleans as strings ("true" instead of true). This causes silent
failures with MCP tools and any tool with strictly-typed parameters.
Added coerce_tool_args() in model_tools.py that runs before every tool
dispatch. For each argument, it checks the tool registry schema and
attempts safe coercion:
- "42" → 42 when schema says "type": "integer"
- "3.14" → 3.14 when schema says "type": "number"
- "true"/"false" → True/False when schema says "type": "boolean"
- Union types tried in order
- Original values preserved when coercion fails or is not applicable
Inspired by Block/goose tool argument coercion system.
Previously, tool results exceeding 100K characters were silently chopped
with only a '[Truncated]' notice — the rest of the content was lost
permanently. The model had no way to access the truncated portion.
Now, oversized results are written to HERMES_HOME/cache/tool_responses/
and the model receives:
- A 1,500-char head preview for immediate context
- The file path so it can use read_file/search_files on the full output
This preserves the context window protection (inline content stays small)
while making the full data recoverable. Falls back to the old destructive
truncation if the file write fails.
Inspired by Block/goose's large response handler pattern.
* feat(gateway): live-stream /update output + forward interactive prompts
Adds real-time output streaming and interactive prompt forwarding for
the gateway /update command, so users on Telegram/Discord/etc see the
full update progress and can respond to prompts (stash restore, config
migration) without needing terminal access.
Changes:
hermes_cli/main.py:
- Add --gateway flag to 'hermes update' argparse
- Add _gateway_prompt() file-based IPC function that writes
.update_prompt.json and polls for .update_response
- Modify _restore_stashed_changes() to accept optional input_fn
parameter for gateway mode prompt forwarding
- cmd_update() uses _gateway_prompt when --gateway is set, enabling
interactive stash restore and config migration prompts
gateway/run.py:
- _handle_update_command: spawn with --gateway flag and
PYTHONUNBUFFERED=1 for real-time output flushing
- Store session_key in .update_pending.json for cross-restart
session matching
- Add _update_prompt_pending dict to track sessions awaiting
update prompt responses
- Replace _watch_for_update_completion with _watch_update_progress:
streams output chunks every ~4s, detects .update_prompt.json and
forwards prompts to the user, handles completion/failure/timeout
- Add update prompt interception in _handle_message: when a prompt
is pending, the user's next message is written to .update_response
instead of being processed normally
- Preserve _send_update_notification as legacy fallback for
post-restart cases where adapter isn't available yet
File-based IPC protocol:
- .update_prompt.json: written by update process with prompt text,
default value, and unique ID
- .update_response: written by gateway with user's answer
- .update_output.txt: existing, now streamed in real-time
- .update_exit_code: existing completion marker
Tests: 16 new tests covering _gateway_prompt IPC, output streaming,
prompt detection/forwarding, message interception, and cleanup.
* feat: interactive buttons for update prompts (Telegram + Discord)
Telegram: Inline keyboard with ✓ Yes / ✗ No buttons. Clicking a button
answers the callback query, edits the message to show the choice, and
writes .update_response directly. CallbackQueryHandler registered on
the update_prompt: prefix.
Discord: UpdatePromptView (discord.ui.View) with green Yes / red No
buttons. Follows the ExecApprovalView pattern — auth check, embed color
update, disabled-after-click. Writes .update_response on click.
All platforms: /approve and /deny (and /yes, /no) now work as shorthand
for yes/no when an update prompt is pending. The text fallback message
instructs users to use these commands. Raw message interception still
works as a fallback for non-command responses.
Gateway watcher checks adapter for send_update_prompt method (class-level
check to avoid MagicMock false positives) and falls back to text prompt
with /approve instructions when unavailable.
* fix: block /update on non-messaging platforms (API, webhooks, ACP)
Add _UPDATE_ALLOWED_PLATFORMS frozenset that explicitly lists messaging
platforms where /update is permitted. API server, webhook, and ACP
platforms get a clear error directing them to run hermes update from
the terminal instead.
ACP and API server already don't reach _handle_message (separate
codepaths), and webhooks have distinct session keys that can't collide
with messaging sessions. This guard is belt-and-suspenders.
Resolve exact label matches before treating digit-only input as a positional index so destructive auth removal does not mis-target credentials named with numeric labels.
Constraint: The CLI remove path must keep supporting existing index-based usage while adding safer label targeting
Rejected: Ban numeric labels | labels are free-form and existing users may already rely on them
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: When a destructive command accepts multiple identifier forms, prefer exact identity matches before fallback parsing heuristics
Tested: Focused pytest slice for auth commands, credential pool recovery, and routing (273 passed); py_compile on changed Python files
Not-tested: Full repository pytest suite
Persist structured exhaustion metadata from provider errors, use explicit reset timestamps when available, and expose label-based credential targeting in the auth CLI. This keeps long-lived Codex cooldowns from being misreported as one-hour waits and avoids forcing operators to manage entries by list position alone.
Constraint: Existing credential pool JSON needs to remain backward compatible with stored entries that only record status code and timestamp
Constraint: Runtime recovery must keep the existing retry-then-rotate semantics for 429s while enriching pool state with provider metadata
Rejected: Add a separate credential scheduler subsystem | too large for the Hermes pool architecture and unnecessary for this fix
Rejected: Only change CLI formatting | would leave runtime rotation blind to resets_at and preserve the serial-failure behavior
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Preserve structured rate-limit metadata when new providers expose reset hints; do not collapse back to status-code-only exhaustion tracking
Tested: Focused pytest slice for auth commands, credential pool recovery, and routing (272 passed); py_compile on changed Python files; hermes -w auth list/remove smoke test with temporary HERMES_HOME
Not-tested: Full repository pytest suite, broader gateway/integration flows outside the touched auth and pool paths
Add 5 regression tests from PR #4476 (gnanam1990) to prevent re-introducing
the IGNORECASE bug that caused lowercase Python/TypeScript variable assignments
to be incorrectly redacted as secrets. The core fix landed in 6367e1c4.
Tests cover:
- Lowercase Python variable with 'token' in name
- Lowercase Python variable with 'api_key' in name
- TypeScript 'await' not treated as secret value
- TypeScript 'secret' variable assignment
- 'export' prefix preserved for uppercase env vars
Co-authored-by: gnanam1990 <gnanam1990@users.noreply.github.com>
TimedOut is a subclass of NetworkError in python-telegram-bot. The
inner retry loop in send() and the outer _send_with_retry() in base.py
both treated it as a transient connection error and retried — but
send_message is not idempotent. When the request reaches Telegram but
the HTTP response times out, the message is already delivered. Retrying
sends duplicates. Worst case: up to 9 copies (inner 3x × outer 3x).
Inner loop (telegram.py):
- Import TimedOut separately, isinstance-check before generic
NetworkError retry (same pattern as BadRequest carve-out from #3390)
- Re-raise immediately — no retry
- Mark as retryable=False in outer exception handler
Outer loop (base.py):
- Remove 'timeout', 'timed out', 'readtimeout', 'writetimeout' from
_RETRYABLE_ERROR_PATTERNS (read/write timeouts are delivery-ambiguous)
- Add 'connecttimeout' (safe — connection never established)
- Keep 'network' (other platforms still need it)
- Add _is_timeout_error() + early return to prevent plain-text fallback
on timeout errors (would also cause duplicate delivery)
Connection errors (ConnectionReset, ConnectError, etc.) are still
retried — these fail before the request reaches the server.
Credit: tmdgusya (PR #3899), barun1997 (PR #3904) for identifying the
bug and proposing fixes.
Closes#3899, closes#3904.
When streaming is enabled, the GatewayStreamConsumer sends raw text
chunks directly to the platform without post-processing. This causes
MEDIA:/path/to/file tags and [[audio_as_voice]] directives to appear
as visible text in the user's chat instead of being stripped.
The non-streaming path already handles this correctly via
extract_media() in base.py, but the streaming path was missing
equivalent cleanup.
Add _clean_for_display() to GatewayStreamConsumer that strips MEDIA:
tags and internal markers before any text reaches the platform. The
actual media file delivery is unaffected — _deliver_media_from_response()
in gateway/run.py still extracts files from the agent's final_response
(separate from the stream consumer's display text).
Reported by Ao [FotM] on Discord.
When commands like grep, diff, test, or find return non-zero exit codes
that aren't actual errors (grep 1 = no matches, diff 1 = files differ),
the model wastes turns investigating non-problems. This adds an
exit_code_meaning field to the terminal JSON result that explains
informational exit codes, so the agent can move on instead of debugging.
Covers grep/rg/ag/ack (no matches), diff (files differ), find (partial
access), test/[ (condition false), curl (timeouts, DNS, HTTP errors),
and git (context-dependent). Correctly extracts the last command from
pipelines and chains, strips full paths and env var assignments.
The exit_code field itself is unchanged — this is purely additive context.
Plugin context from pre_llm_call hooks was injected into the system
prompt, breaking the prompt cache prefix every turn when content
changed (typical for memory plugins). Now all plugin context goes
into the current turn's user message — the system prompt stays
identical across turns, preserving cached tokens.
The system prompt is reserved for Hermes internals. Plugins
contribute context alongside the user's input.
Also adds comprehensive documentation for all 6 plugin hooks:
pre_tool_call, post_tool_call, pre_llm_call, post_llm_call,
on_session_start, on_session_end — each with full callback
signatures, parameter tables, firing conditions, and examples.
Supersedes #5138 which identified the same cache-busting bug
and proposed an uncached system suffix approach. This fix goes
further by removing system prompt injection entirely.
Co-identified-by: OutThisLife (PR #5138)