One-shot local execution built `printf FENCE; <cmd>; __hermes_rc=...`, so a
command ending in a heredoc produced a closing line like `EOF; __hermes_rc=...`,
which is not a valid delimiter. Bash then treated the rest of the wrapper as
heredoc body, leaking it into tool output (e.g. gh issue/PR flows).
Use newline-separated wrapper lines so the delimiter stays alone and the
trailer runs after the heredoc completes.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.
Defaults:
- auxiliary.compression.timeout: 120s (was hardcoded 45s)
- auxiliary.vision.timeout: 30s (unchanged)
- all other aux tasks: 30s (was hardcoded 30s)
- title_generator: 30s (was hardcoded 15s)
call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.
Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.
Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
Prevent the agent from accidentally killing its own process with
pkill -f gateway, killall hermes, etc. Adds a dangerous command
pattern that triggers the approval flow.
Co-authored-by: arasovic <arasovic@users.noreply.github.com>
Adds 'hermes webhook' CLI subcommand and a skill — zero new model tools.
CLI commands (require webhook platform to be enabled):
hermes webhook subscribe <name> [--events, --prompt, --deliver, ...]
hermes webhook list
hermes webhook remove <name>
hermes webhook test <name>
All commands gate on webhook platform being enabled in config. If not
configured, prints setup instructions (gateway setup wizard, manual
config.yaml, or env vars).
The agent uses these via terminal tool, guided by the webhook-subscriptions
skill which documents setup, common patterns (GitHub, Stripe, CI/CD,
monitoring), prompt template syntax, security, and troubleshooting.
Adapter enhancement: webhook.py hot-reloads dynamic subscriptions from
~/.hermes/webhook_subscriptions.json on each incoming request (mtime-gated).
Static config.yaml routes always take precedence.
Docs: updated webhooks.md with Dynamic Subscriptions section, added
hermes webhook to cli-commands.md reference.
No new model tools. No toolset changes.
24 new tests for CLI CRUD, persistence, enabled-gate, and adapter
dynamic route loading.
Two fixes for /skills install and /skills uninstall slash commands:
1. input() hangs indefinitely inside prompt_toolkit's TUI event loop,
soft-locking the CLI. The user typing the slash command is already
implicit consent, so confirmation is now always skipped.
2. Cache invalidation was unconditional — installing or uninstalling a
skill mid-session silently broke the prompt cache, increasing costs.
The slash handler now defers cache invalidation by default (skill
takes effect next session). Pass --now to invalidate immediately,
with a message explaining the cost tradeoff. The CLI argparse path
(hermes skills install) is unaffected and still invalidates.
Fixes#3474
Salvaged from PR #3496 by dlkakbs.
hermes update hangs on input() when run from cron, scripts, or piped
contexts. Check both stdin and stdout isatty(), catch EOFError as a
fallback, and print guidance to run 'hermes config migrate' later.
Co-authored-by: phippsbot-byte <phippsbot-byte@users.noreply.github.com>
When all messaging platforms exhaust retries and get queued for background
reconnection, exit with code 1 so systemd Restart=on-failure can restart
the process. Previously the gateway stayed alive as a zombie with no
connected platforms and exit code 0.
Salvaged from PR #3567 by kelsia14. Test updates added.
Co-authored-by: kelsia14 <kelsia14@users.noreply.github.com>
* fix: keep gateway running through telegram proxy failures
- continue gateway startup in degraded mode when Telegram cannot connect yet
- ensure Telegram fallback transport also honors proxy env vars
- support reconnect retries without taking down the whole gateway
* test(telegram): cover proxy env handling in fallback transport
---------
Co-authored-by: kufufu9 <pi@local>
Salvage of PR #2173 (hanai) and PR #3432 (timknip).
Injects PATH, VIRTUAL_ENV, and HERMES_HOME into the macOS launchd plist so gateway subprocesses find user-installed tools (node, ffmpeg, etc.). Matches systemd unit parity with venv/bin, node_modules/.bin, and resolved node dir in PATH. Includes 7 new tests and docs updates across 4 pages.
Co-Authored-By: Han <ihanai1991@gmail.com>
Co-Authored-By: timknip <timknip@users.noreply.github.com>
Without enabling the Messages Tab in App Home settings, users see
"Sending messages to this app has been turned off" when trying to DM
the bot — even with all correct scopes and event subscriptions.
Add Step 5 (Enable the Messages Tab) between Event Subscriptions and
Install App, with a danger admonition. Also add troubleshooting entry
for this specific error message. Renumber subsequent steps (6→7→8→9).
Co-authored-by: Alberto Leal <mail4alberto@gmail.com>
Use atomic_json_write() from utils.py instead of plain open()/json.dump()
for the models.dev disk cache. Prevents corrupted cache if the process is
killed mid-write — _load_disk_cache() silently returns {} on corrupt JSON,
losing all model metadata until the next successful API fetch.
Co-authored-by: memosr <memosr@users.noreply.github.com>
Ollama reuses index 0 for every tool call in a parallel batch,
distinguishing them only by id. The streaming accumulator now
detects a new non-empty id at an already-active index and redirects
it to a fresh slot, preventing names and arguments from being
concatenated into a single tool call.
No-op for normal providers that use incrementing indices.
Co-authored-by: dmater01 <dmater01@users.noreply.github.com>
* Fixing mattermost configuration parsing bugs
* fix: add homeassistant to skills_config + platform consistency tests
Follow-up for cherry-picked #3512:
- Add homeassistant to skills_config.py PLATFORMS (was in tools_config
but missing from skills_config)
- Add 3 consistency tests that verify all platforms in tools_config have
matching toolset definitions, gateway includes, and skills_config entries
— prevents this class of bug from recurring
---------
Co-authored-by: DaneelV3 <dannel@v3rtical.tech>
Commands sent directly to the bot in groups include @botname suffix
(e.g. /compress@TigerNanoBot). get_command() now strips the @anything
part before lookup, matching how Telegram bot menu generates commands.
Fixes all slash commands silently doing nothing when sent with @mention.
Co-authored-by: MacroAnarchy <MacroAnarchy@users.noreply.github.com>
Add X-Content-Type-Options: nosniff and Referrer-Policy: no-referrer
to all API server responses via a new security_headers_middleware.
Co-authored-by: Oktay Aydin <aydnOktay@users.noreply.github.com>
Adds Access-Control-Max-Age: 600 to CORS preflight responses, telling
browsers to cache the preflight for 10 minutes. Reduces redundant OPTIONS
requests and improves perceived latency for browser-based API clients.
Salvaged from PR #3514 by aydnOktay.
Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
* feat: GPT tool-use steering + strip budget warnings from history
Two changes to improve tool reliability, especially for OpenAI GPT models:
1. GPT tool-use enforcement prompt: Adds GPT_TOOL_USE_GUIDANCE to the
system prompt when the model name contains 'gpt' and tools are loaded.
This addresses a known behavioral pattern where GPT models describe
intended actions ('I will run the tests') instead of actually making
tool calls. Inspired by similar steering in OpenCode (beast.txt) and
Cline (GPT-5.1 variant).
2. Budget warning history stripping: Budget pressure warnings injected by
_get_budget_warning() into tool results are now stripped when
conversation history is replayed via run_conversation(). Previously,
these turn-scoped signals persisted across turns, causing models to
avoid tool calls in all subsequent messages after any turn that hit
the 70-90% iteration threshold.
* fix: replace hardcoded ~/.hermes paths with get_hermes_home() for profile support
Prep for the upcoming profiles feature — each profile is a separate
HERMES_HOME directory, so all paths must respect the env var.
Fixes:
- gateway/platforms/matrix.py: Matrix E2EE store was hardcoded to
~/.hermes/matrix/store, ignoring HERMES_HOME. Now uses
get_hermes_home() so each profile gets its own Matrix state.
- gateway/platforms/telegram.py: Two locations reading config.yaml via
Path.home()/.hermes instead of get_hermes_home(). DM topic thread_id
persistence and hot-reload would read the wrong config in a profile.
- tools/file_tools.py: Security path for hub index blocking was
hardcoded to ~/.hermes, would miss the actual profile's hub cache.
- hermes_cli/gateway.py: Service naming now uses the profile name
(hermes-gateway-coder) instead of a cryptic hash suffix. Extracted
_profile_suffix() helper shared by systemd and launchd.
- hermes_cli/gateway.py: Launchd plist path and Label now scoped per
profile (ai.hermes.gateway-coder.plist). Previously all profiles
would collide on the same plist file on macOS.
- hermes_cli/gateway.py: Launchd plist now includes HERMES_HOME in
EnvironmentVariables — was missing entirely, making custom
HERMES_HOME broken on macOS launchd (pre-existing bug).
- All launchctl commands in gateway.py, main.py, status.py updated
to use get_launchd_label() instead of hardcoded string.
Test fixes: DM topic tests now set HERMES_HOME env var alongside
Path.home() mock. Launchd test uses get_launchd_label() for expected
commands.
PR #3566 intentionally routes suppressed content to stream_delta_callback
when tool calls are present, so reasoning tag extraction can fire during
streaming. The test was still asserting the old behavior where content
after tool calls was fully suppressed from the callback.
Updated the assertion to match: content IS delivered to the callback
(for tag extraction), with display-level suppression handled by the
CLI's _stream_delta.
StreamResponse headers are flushed on prepare() before the CORS
middleware can inject them. Resolve CORS headers up front using
_cors_headers_for_origin() so the full set (including
Access-Control-Allow-Origin) is present on SSE streams.
Co-authored-by: ygd58 <ygd58@users.noreply.github.com>
Add GET /v1/health as an alias to the existing /health endpoint so
OpenAI-compatible health checks work out of the box.
Co-authored-by: Oktay Aydin <aydnOktay@users.noreply.github.com>
The base orchestrator passes metadata=_thread_metadata to
send_image_file, send_video, and send_document. WhatsApp was the
only platform adapter missing the parameter, causing TypeError
crashes when sending media.
Extended to all three methods (original PR only fixed send_image_file).
Salvaged from PR #3144.
Co-authored-by: afifai <afifai@users.noreply.github.com>
* fix(provider): remove MiniMax /v1→/anthropic auto-correction to allow user override
The minimax-specific auto-correction in runtime_provider.py was
preventing users from overriding to the OpenAI-compatible endpoint
via MINIMAX_BASE_URL. Users in certain regions get nginx 404 on
api.minimax.io/anthropic and need to switch to api.minimax.chat/v1.
The generic URL-suffix detection already handles /anthropic →
anthropic_messages, so the minimax-specific code was redundant for
the default path and harmful for the override path.
Now: default /anthropic URL works via generic detection, user
override to /v1 gets chat_completions mode naturally.
Closes#3546 (different approach — respects user overrides instead
of changing the default endpoint).
* fix(display): show reasoning during streaming even when tool calls suppress content
When a model generates content (containing <REASONING_SCRATCHPAD> tags)
alongside tool calls in the same API response, content deltas were
suppressed from streaming once any tool call chunk arrived. This
prevented the CLI's tag extraction from running, so reasoning was
never shown during streaming. The post-response fallback then
displayed reasoning AFTER the already-visible streamed response,
creating a confusing reversed order.
Fix: route suppressed content to stream_delta_callback even when tool
calls are present. The CLI's _stream_delta handles tag extraction —
reasoning tags are routed to the reasoning display box, while
non-reasoning text is handled by the existing stream display logic.
This ensures reasoning appears before tool execution and the final
response, matching the expected visual order.
The TOOL_USE_ENFORCEMENT_GUIDANCE injection (added in #3528) was
hardcoded to only match gpt/codex model names. This makes it a
config option so users can turn it on for any model family.
New config key: agent.tool_use_enforcement
- "auto" (default): matches gpt/codex (existing behavior)
- true: inject for all models
- false: never inject
- list of strings: custom model-name substrings to match
e.g. ["gpt", "codex", "deepseek", "qwen"]
No version bump needed — deep merge provides the default
automatically for existing installs.
12 new tests covering all config modes.
The documentation incorrectly instructed users to set Public Bot to OFF,
but this prevents using the Discord-provided invite link (recommended method),
causing the error: 'Private application cannot have a default authorization link'.
Changes:
- Changed Step 2: Public Bot now set to ON (required for Installation tab method)
- Added info callout explaining the Private Bot alternative (use Manual URL)
- Added note in Step 5 Option A clarifying the Public Bot requirement
Fixes Discord bot setup flow for new users following the recommended path.
Co-authored-by: Docs Fix <docs-fix@example.com>
* fix(gateway): preserve full transcript on /compress instead of overwriting
The /compress command calls _compress_context() which correctly ends the
old session (preserving its full transcript in SQLite) and creates a new
session_id for the continuation. However, it then immediately called
rewrite_transcript() on the OLD session_id, overwriting the preserved
transcript with the compressed version — destroying searchable history.
Auto-compression (triggered by context pressure) does not have this bug
because the gateway already handles the session_id swap via the
agent.session_id != session_id check after _run_agent_sync.
Fix: after _compress_context creates the new session, write the compressed
messages into the NEW session_id and update the session store pointer.
The old session's full transcript stays intact and searchable via
session_search.
Before: /compress destroys original messages, session_search can't find
details from compressed portions.
After: /compress behaves like /new for history — full transcript preserved,
compressed context for the live session.
* fix(gateway): preserve transcript on /compress and hygiene compression
Apply session_id swap after _compress_context in both /compress handler
and hygiene pre-compression. _compress_context creates a new session
(ending the old one), but both paths were calling rewrite_transcript on
the OLD session_id — overwriting the preserved transcript and destroying
searchable history.
Now follows the same pattern as the auto-compression handler (lines
5415-5423): detect the new session_id, update the session store entry,
and write compressed messages to the new session.
Also fix FakeCompressAgent test mock to include session_id attribute
and simulate the session_id change that real _compress_context performs.
Co-authored-by: MacroAnarchy <MacroAnarchy@users.noreply.github.com>
---------
Co-authored-by: MacroAnarchy <MacroAnarchy@users.noreply.github.com>
* fix(matrix): harden e2ee access-token handling
* fix: patch nio mock in e2ee maintenance sync loop test
The sync_loop now imports nio for SyncError checking (from PR #3280),
so the test needs to inject a fake nio module via sys.modules.
---------
Co-authored-by: Cortana <andrew+cortana@chalkley.org>
dict.get(key, default) only returns default when key is ABSENT.
When YAML has 'key:' with no value, it parses as None — .get()
returns None, then .strip() crashes with AttributeError.
Use (x or '') pattern to handle both missing and null cases.
Salvaged from PR #3217.
Co-authored-by: erosika <erosika@users.noreply.github.com>
The minimax-specific auto-correction in runtime_provider.py was
preventing users from overriding to the OpenAI-compatible endpoint
via MINIMAX_BASE_URL. Users in certain regions get nginx 404 on
api.minimax.io/anthropic and need to switch to api.minimax.chat/v1.
The generic URL-suffix detection already handles /anthropic →
anthropic_messages, so the minimax-specific code was redundant for
the default path and harmful for the override path.
Now: default /anthropic URL works via generic detection, user
override to /v1 gets chat_completions mode naturally.
Closes#3546 (different approach — respects user overrides instead
of changing the default endpoint).
Drop the swe-rex dependency for Modal terminal backend and use the
Modal SDK directly (Sandbox.create + Sandbox.exec). This fixes:
- AsyncUsageWarning from synchronous App.lookup() in async context
- DeprecationError from unencrypted_ports / .url on unencrypted tunnels
(deprecated 2026-03-05)
The new implementation:
- Uses modal.App.lookup.aio() for async-safe app creation
- Uses Sandbox.create.aio() with 'sleep infinity' entrypoint
- Uses Sandbox.exec.aio() for direct command execution (no HTTP server
or tunnel needed)
- Keeps all existing features: persistent filesystem snapshots,
configurable resources (CPU/memory/disk), sudo support, interrupt
handling, _AsyncWorker for event loop safety
Consistent with the Docker backend precedent (PR #2804) where we
removed mini-swe-agent in favor of direct docker run.
Files changed:
- tools/environments/modal.py - core rewrite
- tools/terminal_tool.py - health check: modal instead of swerex
- hermes_cli/setup.py - install modal instead of swe-rex[modal]
- pyproject.toml - modal extra: modal>=1.0.0 instead of swe-rex[modal]
- scripts/kill_modal.sh - grep for hermes-agent instead of swe-rex
- tests/ - updated for new implementation
- environments/README.md - updated patches section
- website/docs - updated install command
The plugin system defined six lifecycle hooks but only pre_tool_call and
post_tool_call were invoked. This activates the remaining four so that
external plugins (e.g. memory systems) can hook into the conversation
loop without touching core code.
Hook semantics:
- on_session_start: fires once when a new session is created
- pre_llm_call: fires once per turn before the tool-calling loop;
plugins can return {"context": "..."} to inject into the ephemeral
system prompt (not cached, not persisted)
- post_llm_call: fires once per turn after the loop completes, with
user_message and assistant_response for sync/storage
- on_session_end: fires at the end of every run_conversation call
invoke_hook() now returns a list of non-None callback return values,
enabling pre_llm_call context injection while remaining backward
compatible (existing hooks that return None are unaffected).
Salvaged from PR #2823.
Co-authored-by: Nicolò Boschi <boschi1997@gmail.com>
Browser clients using the Idempotency-Key header for request
deduplication were blocked by CORS preflight because the header
was not listed in Access-Control-Allow-Headers.
Add Idempotency-Key to _CORS_HEADERS and add tests for both the
new header allowance and the existing Vary: Origin behavior.
Co-authored-by: aydnOktay <aydnOktay@users.noreply.github.com>
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
load_jobs() uses strict json.load() which rejects bare control characters
(e.g. literal newlines) in JSON string values. When a cron job prompt
contains such characters, the parser throws JSONDecodeError and the
function silently returns an empty list — causing ALL scheduled jobs
to stop firing with no error logged.
Fix: on JSONDecodeError, retry with json.loads(strict=False). If jobs
are recovered, auto-rewrite the file with proper escaping via save_jobs()
and log a warning. Only fall back to empty list if the JSON is truly
unrecoverable.
Co-authored-by: Sebastian Bochna <sbochna@SB-MBP-M2-2.local>
Root cause: Anthropic buffers entire tool call arguments and goes silent
for minutes while thinking (verified: 167s gap with zero SSE events on
direct API). OpenRouter's upstream proxy times out after ~125s of
inactivity and drops the connection with 'Network connection lost'.
Fix: Send the x-anthropic-beta: fine-grained-tool-streaming-2025-05-14
header for Claude models on OpenRouter. This makes Anthropic stream
tool call arguments token-by-token instead of buffering them, keeping
the connection alive through OpenRouter's proxy.
Live-tested: the exact prompt that consistently failed at ~128s now
completes successfully — 2,972 lines written, 49K tokens, 8 minutes.
Additional improvements:
1. Send explicit max_tokens for Claude through OpenRouter. Without it,
OpenRouter defaults to 65,536 (confirmed via echo_upstream_body) —
only half of Opus 4.6's 128K limit.
2. Classify SSE 'Network connection lost' as retryable in the streaming
inner retry loop. The OpenAI SDK raises APIError from SSE error
events, which was bypassing our transient error retry logic.
3. Actionable diagnostic guidance when stream-drop retries exhaust.
Add ~/.local/bin, ~/.cargo/bin, ~/go/bin, ~/.npm-global/bin to the
systemd unit PATH so tools installed via uv/pipx/cargo/go are
discoverable by MCP servers and terminal commands.
Uses a _build_user_local_paths() helper that checks exists() before
adding, and correctly resolves home dir for both user and system
service types.
Co-authored-by: Kal Sze <ksze@users.noreply.github.com>
Cherry-pick of feat/gpt-tool-steering with modifications:
1. Tool-use enforcement prompt (refactored from GPT-specific):
- Renamed GPT_TOOL_USE_GUIDANCE -> TOOL_USE_ENFORCEMENT_GUIDANCE
- Added TOOL_USE_ENFORCEMENT_MODELS tuple: ('gpt', 'codex')
- Injection logic now checks against the tuple instead of hardcoding
'gpt' — adding new model families is a one-line change
- Addresses models describing actions instead of making tool calls
2. Budget warning history stripping:
- _strip_budget_warnings_from_history() strips _budget_warning JSON
keys and [BUDGET WARNING: ...] text from tool results at the start
of run_conversation()
- Prevents old budget warnings from poisoning subsequent turns
Based on PR #3479 by teknium1.
* fix: harden `hermes update` against diverged history, non-main branches, and gateway edge cases
The self-update command (`hermes update` / gateway `/update`) could fail
or silently corrupt state in several scenarios:
1. **Diverged history** — `git pull --ff-only` aborts with a cryptic
subprocess error when upstream has force-pushed or rebased. Now falls
back to `git reset --hard origin/main` since local changes are already
stashed.
2. **User on a feature branch / detached HEAD** — the old code would
either clobber the feature branch HEAD to point at origin/main, or
silently pull against a non-existent remote branch. Now auto-checkouts
main before pulling, with a clear warning.
3. **Fetch failures** — network or auth errors produced raw subprocess
tracebacks. Now shows user-friendly messages ("Network error",
"Authentication failed") with actionable hints.
4. **reset --hard failure** — if the fallback reset itself fails (disk
full, permissions), the old code would still attempt stash restore on
a broken working tree. Now skips restore and tells the user their
changes are safe in stash.
5. **Gateway /update stash conflicts** — non-interactive mode (Telegram
`/update`) called sys.exit(1) when stash restore had conflicts, making
the entire update report as failed even though the code update itself
succeeded. Now treats stash conflicts as non-fatal in non-interactive
mode (returns False instead of exiting).
* fix: restore stash and branch on 'already up to date' early return
The PR moved stash creation before the commit-count check (needed for
the branch-switching feature), but the 'already up to date' early return
didn't restore the stash or switch back to the original branch — leaving
the user stranded on main with changes trapped in a stash.
Now the early-return path restores the stash and checks out the original
branch when applicable.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
EmailAdapter._seen_uids accumulates every IMAP UID ever seen but
never removes any. A long-running gateway processing a high-volume
inbox would leak memory indefinitely — thousands of integers per day.
IMAP UIDs are monotonically increasing integers, so old UIDs are safe
to drop: new messages always have higher UIDs, and the IMAP UNSEEN
flag already prevents re-delivery regardless of our local tracking.
Fix adds _trim_seen_uids() which keeps only the most recent 1000 UIDs
(half of the 2000-entry cap) when the set grows too large. Called
automatically during connect() and after each fetch cycle.
Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
- Change default inference_base_url from dashscope-intl Anthropic-compat
endpoint to coding-intl OpenAI-compat /v1 endpoint. The old Anthropic
endpoint 404'd when used with the OpenAI SDK (which appends
/chat/completions to a /apps/anthropic base URL).
- Update curated model list: remove models unavailable on coding-intl
(qwen3-max, qwen-plus-latest, qwen3.5-flash, qwen-vl-max), add
third-party models available on the platform (glm-5, glm-4.7,
kimi-k2.5, MiniMax-M2.5).
- URL-based api_mode auto-detection still works: overriding
DASHSCOPE_BASE_URL to an /apps/anthropic endpoint automatically
switches to anthropic_messages mode.
- Update provider description and env var descriptions to reflect the
coding-intl multi-provider platform.
- Update tests to match new default URL and test the anthropic override
path instead.
* fix: cap context pressure percentage at 100% in display
The forward-looking token estimate can overshoot the compaction threshold
(e.g. a large tool result pushes it from 70% to 109% in one step). The
progress bar was already capped via min(), but pct_int was not — causing
the user to see '109% to compaction' which is confusing.
Cap pct_int at 100 in both CLI and gateway display functions.
Reported by @JoshExile82.
* refactor: use real API token counts for compression decisions
Replace the rough chars/3 estimation with actual prompt_tokens +
completion_tokens from the API response. The estimation was needed to
predict whether tool results would push context past the threshold, but
the default 50% threshold leaves ample headroom — if tool results push
past it, the next API call reports real usage and triggers compression
then.
This removes all estimation from the compression and context pressure
paths, making both 100% data-driven from provider-reported token counts.
Also removes the dead _msg_count_before_tools variable.
* Fix#3409: Add fallback to session_search to prevent false negatives on summarization failure
Fixes#3409. When the auxiliary summarizer fails or returns None, the tool now returns a raw fallback preview of the matched session instead of silently dropping it and returning an empty list
* fix: clean up fallback logic — separate exception handling from preview
Restructure the loop: handle exceptions first (log + nullify), build
entry dict once, then branch on result truthiness. Removes duplicated
field assignments and makes the control flow linear.
---------
Co-authored-by: devorun <130918800+devorun@users.noreply.github.com>
Matrix platform was missing from the PLATFORMS config, causing a
KeyError in _get_platform_tools() when handling Matrix messages.
Every other platform (telegram, discord, slack, etc.) was present
but matrix was overlooked.
Co-authored-by: williamtwomey <williamtwomey@users.noreply.github.com>
_expand_git_reference() and _rg_files() called subprocess.run()
without a timeout. On a large repository, @diff, @staged, or
@git:N references could hang the agent indefinitely while git
or ripgrep processes slow output.
- Add timeout=30 to git subprocess in _expand_git_reference()
with a user-friendly error message on TimeoutExpired
- Add timeout=10 to rg subprocess in _rg_files() returning
None on timeout (falls back to os.walk folder listing)
Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
hermes tools and _get_platform_tools() call get_plugin_toolsets() /
_get_plugin_toolset_keys() without first ensuring plugins have been
discovered. discover_plugins() only runs as a side effect of importing
model_tools.py, which hermes tools never does. This means:
- hermes tools TUI never shows plugin toolsets (invisible to users)
- _get_platform_tools() in standalone processes misses plugin toolsets
Fix: call discover_plugins() (idempotent) in both
_get_plugin_toolset_keys() and _get_effective_configurable_toolsets()
before accessing plugin state. In the gateway/CLI where model_tools.py
is already imported, the call is a no-op (discover_and_load checks
_discovered flag).
When finish_reason='length' and the response contains only reasoning
(think blocks or empty content), the model exhausted its output token
budget on thinking with nothing left for the actual response.
Previously, this fell into either:
- chat_completions: 3 useless continuation retries (model hits same limit)
- anthropic/codex: generic 'Response truncated' error with rollback
Now: detect the think-only + length condition early and return immediately
with a targeted error message: 'Model used all output tokens on reasoning
with none left for the response. Try lowering reasoning effort or
increasing max_tokens.'
This saves 2 wasted API calls on the chat_completions path and gives
users actionable guidance instead of a cryptic error.
The existing think-only retry logic (finish_reason='stop') is unchanged —
that's a genuine model glitch where retrying can help.
Salvage of #3389 by @binhnt92 with reasoning fallback and retry logic added on top.
All 7 auxiliary LLM call sites now use extract_content_or_reasoning() which mirrors the main agent loop's behavior: extract content, strip think blocks, fall back to structured reasoning fields, retry on empty.
Closes#3389.
Show only agentic models that map to OpenRouter defaults:
Qwen/Qwen3.5-397B-A17B ↔ qwen/qwen3.5-plus
Qwen/Qwen3.5-35B-A3B ↔ qwen/qwen3.5-35b-a3b
deepseek-ai/DeepSeek-V3.2 ↔ deepseek/deepseek-chat
moonshotai/Kimi-K2.5 ↔ moonshotai/kimi-k2.5
MiniMaxAI/MiniMax-M2.5 ↔ minimax/minimax-m2.5
zai-org/GLM-5 ↔ z-ai/glm-5
XiaomiMiMo/MiMo-V2-Flash ↔ xiaomi/mimo-v2-pro
moonshotai/Kimi-K2-Thinking ↔ moonshotai/kimi-k2-thinking
Users can still pick any HF model via Enter custom model name.
Previously, tirith exit code 1 (block) immediately rejected the command
with no approval prompt — users saw 'BLOCKED: Command blocked by
security scan' and the agent moved on. This prevented gateway/CLI users
from approving pipe-to-shell installs like 'curl ... | sh' even when
they understood the risk.
Changes:
- Tirith 'block' and 'warn' now both go through the approval flow.
Users see the full tirith findings (severity, title, description,
safer alternatives) and can choose to approve or deny.
- New _format_tirith_description() builds rich descriptions from tirith
findings JSON so the approval prompt is informative.
- CLI startup now warns when tirith is enabled but not available, so
users know command scanning is degraded to pattern matching only.
The default approval choice is still deny, so the security posture is
unchanged for unattended/timeout scenarios.
Reported via Discord by pistrie — 'curl -fsSL https://mandex.dev/install.sh | sh'
was hard-blocked with no way to approve.