mistralai v2.x is a namespace package — `Mistral` class lives at
`mistralai.client`, not at the top-level `mistralai` module. The
previous `from mistralai import Mistral` raises ImportError at runtime.
Update both production code and test fixture to use the correct path.
Based on #6079 by @tunamitom with critical fixes and comprehensive tests.
Changes from #6079:
- Fix: sanitization overwrite bug — Qwen message prep now runs AFTER codex
field sanitization, not before (was silently discarding Qwen transforms)
- Fix: missing try/except AuthError in runtime_provider.py — stale Qwen
credentials now fall through to next provider on auto-detect
- Fix: 'qwen' alias conflict — bare 'qwen' stays mapped to 'alibaba'
(DashScope); use 'qwen-portal' or 'qwen-cli' for the OAuth provider
- Fix: hardcoded ['coder-model'] replaced with live API fetch + curated
fallback list (qwen3-coder-plus, qwen3-coder)
- Fix: extract _is_qwen_portal() helper + _qwen_portal_headers() to replace
5 inline 'portal.qwen.ai' string checks and share headers between init
and credential swap
- Fix: add Qwen branch to _apply_client_headers_for_base_url for mid-session
credential swaps
- Fix: remove suspicious TypeError catch blocks around _prompt_provider_choice
- Fix: handle bare string items in content lists (were silently dropped)
- Fix: remove redundant dict() copies after deepcopy in message prep
- Revert: unrelated ai-gateway test mock removal and model_switch.py comment deletion
New tests (30 test functions):
- _qwen_cli_auth_path, _read_qwen_cli_tokens (success + 3 error paths)
- _save_qwen_cli_tokens (roundtrip, parent creation, permissions)
- _qwen_access_token_is_expiring (5 edge cases: fresh, expired, within skew,
None, non-numeric)
- _refresh_qwen_cli_tokens (success, preserve old refresh, 4 error paths,
default expires_in, disk persistence)
- resolve_qwen_runtime_credentials (fresh, auto-refresh, force-refresh,
missing token, env override)
- get_qwen_auth_status (logged in, not logged in)
- Runtime provider resolution (direct, pool entry, alias)
- _build_api_kwargs (metadata, vl_high_resolution_images, message formatting,
max_tokens suppression)
* fix(tools): skip camofox auto-cleanup when managed persistence is enabled
When managed_persistence is enabled, cleanup_browser() was calling
camofox_close() which destroys the server-side browser context via
DELETE /sessions/{userId}, killing login sessions across cron runs.
Add camofox_soft_cleanup() — a public wrapper that drops only the
in-memory session entry when managed persistence is on, returning True.
When persistence is off it returns False so the caller falls back to
the full camofox_close(). The inactivity reaper still handles idle
resource cleanup.
Also surface a logger.warning() when _managed_persistence_enabled()
fails to load config, replacing a silent except-and-return-False.
Salvaged from #6182 by el-analista (Eduardo Perea Fernandez).
Added public API wrapper to avoid cross-module private imports,
and test coverage for both persistence paths.
Co-authored-by: Eduardo Perea Fernandez <el-analista@users.noreply.github.com>
* fix(doctor): only check the active memory provider, not all providers unconditionally
hermes doctor had hardcoded Honcho Memory and Mem0 Memory sections that
always ran regardless of the user's memory.provider config setting. After
the swappable memory provider update (#4623), users with leftover Honcho
config but no active provider saw false 'broken' errors.
Replaced both sections with a single Memory Provider section that reads
memory.provider from config.yaml and only checks the configured provider.
Users with no external provider see a green 'Built-in memory active' check.
Reported by community user michaelruiz001, confirmed by Eri (Honcho).
---------
Co-authored-by: Eduardo Perea Fernandez <el-analista@users.noreply.github.com>
Problem: hermes -w sessions accumulated 37+ worktrees and 1200+ orphaned
branches because:
- _cleanup_worktree bailed on any dirty working tree, but agent sessions
almost always leave untracked files/artifacts behind
- _prune_stale_worktrees had the same dirty-check, so stale worktrees
survived indefinitely
- pr-* and hermes/* branches from PR review had zero cleanup mechanism
Changes:
- _cleanup_worktree: check for unpushed commits instead of dirty state.
Agent work lives in pushed commits/PRs — dirty working tree without
unpushed commits is just artifacts, safe to remove.
- _prune_stale_worktrees: three-tier age system:
- Under 24h: skip (session may be active)
- 24h-72h: remove if no unpushed commits
- Over 72h: force remove regardless
- New _prune_orphaned_branches: on each -w startup, deletes local
hermes/hermes-* and pr-* branches with no corresponding worktree.
Protects main, checked-out branch, and active worktree branches.
Tests: 42 pass (6 new covering unpushed-commit logic, force-prune
tier, and orphaned branch cleanup).
- Fire on_session_finalize and on_session_reset in gateway _handle_reset_command()
- Fire on_session_finalize during gateway stop() for each active agent
- Move CLI test from tests/ root to tests/cli/ (matches recent restructure)
- Add 5 gateway tests covering reset hooks, ordering, shutdown, and error handling
- Place on_session_reset after new session is guaranteed to exist (covers
the get_or_create_session fallback path)
Plugins can now subscribe to session boundary events via
ctx.register_hook('on_session_finalize', ...) and
ctx.register_hook('on_session_reset', ...).
on_session_finalize — fires during CLI exit (/quit, Ctrl-C) and
before /new or /reset, giving plugins a chance to flush or clean up.
on_session_reset — fires after a new session is created via
/new or /reset, so plugins can initialize per-session state.
Closes#5592
Anthropic signs thinking blocks against the full turn content. Any
upstream mutation (context compression, session truncation, orphan
stripping, message merging) invalidates the signature, causing HTTP 400
'Invalid signature in thinking block' — especially in long-lived
gateway sessions.
Strategy (following clawdbot/OpenClaw pattern):
1. Strip thinking/redacted_thinking from all assistant messages EXCEPT
the last one — preserves reasoning continuity on the current
tool-use chain while avoiding stale signature errors on older turns.
2. Downgrade unsigned thinking blocks to plain text — Anthropic can't
validate them, but the reasoning content is preserved.
3. Strip cache_control from thinking/redacted_thinking blocks to
prevent cache markers from interfering with signature validation.
4. Drop thinking blocks from the second message when merging
consecutive assistant messages (role alternation enforcement).
5. Error recovery: on HTTP 400 mentioning 'signature' and 'thinking',
strip all reasoning_details from the conversation and retry once.
This is the safety net for edge cases the proactive stripping
misses.
Addresses the issue reported in PR #6086 by @mingginwan while
preserving reasoning continuity (their PR stripped ALL thinking
blocks unconditionally).
Files changed:
- agent/anthropic_adapter.py: thinking block management in
convert_messages_to_anthropic (strip old turns, downgrade unsigned,
strip cache_control, merge-time strip)
- run_agent.py: one-shot signature error recovery in retry loop
- tests/test_anthropic_adapter.py: 10 new tests covering all cases
Gateway and cron had inconsistent reasoning_effort resolution:
- CLI: config.yaml only (correct)
- Gateway: config.yaml first, env var fallback
- Cron: env var first, config.yaml fallback
All three now read exclusively from agent.reasoning_effort in config.yaml.
Removed HERMES_REASONING_EFFORT env var support entirely — .env is for
secrets only, not behavioral config.
Previously crash recovery recreated detached sessions as if they were
fully managed, so polls and kills could lie about liveness and the
checkpoint could forget recovered jobs after the next restart.
This commit refreshes recovered host-backed sessions from real PID
state, keeps checkpoint data durable, and preserves notify watcher
metadata while treating sandbox-only PIDs as non-recoverable.
- Persist `pid_scope` in `tools/process_registry.py` and skip
recovering sandbox-backed entries without a host-visible PID handle
- Refresh detached sessions on access so `get`/`poll`/`wait` and active
session queries observe exited processes instead of hanging forever
- Allow recovered host PIDs to be terminated honestly and requeue
`notify_on_complete` watchers during checkpoint recovery
- Add regression tests for durable checkpoints, detached exit/kill
behavior, sandbox skip logic, and recovered notify watchers
- DEFAULT_RESULT_SIZE_CHARS: 50K -> 100K (match current _LARGE_RESULT_CHARS)
- DEFAULT_PREVIEW_SIZE_CHARS: 2K -> 1.5K (match current _LARGE_RESULT_PREVIEW_CHARS)
- Per-tool overrides all set to 100K (terminal, execute_code, search_files)
- Remove pre-read byte guard (no behavioral regression vs current main)
- Revert limit signature change to int=500 (match current default)
- Restore original read_file schema description
- Update test assertions to match 100K thresholds
Simplify the vision auto-detection chain from 5 backends (openrouter,
nous, codex, anthropic, custom) down to 3:
1. OpenRouter (known vision-capable default model)
2. Nous Portal (known vision-capable default model)
3. Active provider + model (whatever the user is running)
4. Stop
This is simpler and more predictable. The active provider step uses
resolve_provider_client() which handles all provider types including
named custom providers (from #5978).
Removed the complex preferred-provider promotion logic and API-level
fallback — the chain is short enough that it doesn't need them.
Based on PR #5376 by Mibay. Closes#5366.
_deliver_result() now returns Optional[str] — None on success, error
message on failure. All failure paths (unknown platform, platform
disabled, config load error, send failure, unresolvable target)
return descriptive error strings.
mark_job_run() gains delivery_error param, tracked as
last_delivery_error on the job — separate from agent execution errors.
A job where the agent succeeded but delivery failed shows
last_status='ok' + last_delivery_error='...'.
The cronjob list tool now surfaces last_delivery_error so agents and
users can see when cron outputs aren't arriving.
Inspired by PR #5863 (oxngon) — reimplemented with proper wiring.
Tests: 3 new mark_job_run tests + 6 new _deliver_result return tests.
Add button-based exec approval to the Feishu adapter, matching the
existing Discord, Telegram, and Slack implementations.
When the agent encounters a dangerous command, Feishu users now see
an interactive card with four buttons instead of text instructions:
- Allow Once (primary)
- Allow Session
- Always Allow
- Deny (danger)
Implementation:
- send_exec_approval() sends an interactive card via the Feishu
message API with buttons carrying hermes_action in their value dict
- _handle_card_action_event() intercepts approval button clicks
before routing them as synthetic commands, directly calling
resolve_gateway_approval() to unblock the agent thread
- _update_approval_card() replaces the orange approval card with a
green (approved) or red (denied) status card showing who acted
- _approval_state dict tracks pending approval_id → session_key
mappings; cleaned up on resolution
The gateway's existing routing in _approval_notify_sync already checks
getattr(type(adapter), 'send_exec_approval', None) and will
automatically use the button-based flow for Feishu.
Tests: 16 new tests covering send, callback resolution, state
management, card updates, and non-interference with existing card
actions.
Salvaged fixes from community PRs:
- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
key lookup (was checking top-level dict instead of store['providers']).
OAuth providers now correctly detected in /model picker.
Cherry-picked from PR #5911 by Xule Lin (linxule).
- fix(ollama): pass num_ctx to override 2048 default context window.
Ollama defaults to 2048 context regardless of model capabilities. Now
auto-detects from /api/show metadata and injects num_ctx into every
request. Config override via model.ollama_num_ctx. Fixes#2708.
Cherry-picked from PR #5929 by kshitij (kshitijk4poor).
- fix(aux): normalize provider aliases for vision/auxiliary routing.
Adds _normalize_aux_provider() with 17 aliases (google→gemini,
claude→anthropic, glm→zai, etc). Fixes vision routing failure when
provider is set to 'google' instead of 'gemini'.
Cherry-picked from PR #5793 by e11i (Elizabeth1979).
- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
but auxiliary client uses OpenAI SDK which appends /chat/completions →
404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
Inspired by PR #5786 by Lempkey.
Added debug logging to silent exception blocks across all fixes.
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Currently, MCP servers are included on all platforms by default. If a
platform's toolset list does not explicitly name any MCP servers, every
globally enabled MCP server is injected. There is no way to opt a
platform out of MCP servers entirely.
This matters for the API server platform when used as an execution
backend — each spawned agent session gets the full MCP tool schema
injected into its system prompt, dramatically inflating token usage
(e.g. 57K tokens vs 9K without MCP tools) and slowing response times.
Add a "no_mcp" sentinel value for platform_toolsets. When present in a
platform's toolset list, all MCP servers are excluded for that platform.
Other platforms are unaffected.
Usage in config.yaml:
platform_toolsets:
api_server:
- terminal
- file
- web
- no_mcp # exclude all MCP servers
The sentinel is filtered out of the final toolset — it does not appear
as an actual toolset name.
- The MCP SDK Pydantic model uses camelCase (structuredContent), not
snake_case (structured_content). The original getattr was a silent no-op.
- When structuredContent is present, return it AS the result instead of
alongside text — the structured payload is the machine-readable data.
- Move test file to tests/tools/ and fix fake class to use camelCase.
- Patch _run_on_mcp_loop in tests so the handler actually executes.
* fix(telegram): replace substring caption check with exact line-by-line match
Captions in photo bursts and media group albums were silently dropped when
a shorter caption happened to be a substring of an existing one (e.g.
"Meeting" lost inside "Meeting agenda"). Extract a shared _merge_caption
static helper that splits on "\n\n" and uses exact match with whitespace
normalisation, then use it in both _enqueue_photo_event and
_queue_media_group_event.
Adds 13 unit tests covering the fixed bug scenarios.
Cherry-picked from PR #2671 by Dilee.
* fix: extend caption substring fix to all platforms
Move _merge_caption helper from TelegramAdapter to BasePlatformAdapter
so all adapters inherit it. Fix the same substring-containment bug in:
- gateway/platforms/base.py (photo burst merging)
- gateway/run.py (priority photo follow-up merging)
- gateway/platforms/feishu.py (media batch merging)
The original fix only covered telegram.py. The same bug existed in base.py
and run.py (pure substring check) and feishu.py (list membership without
whitespace normalization).
* fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing
Two bugs caused auxiliary tasks (vision, compression, etc.) to fail when
using named custom providers defined in config.yaml:
1. 'provider: main' was hardcoded to 'custom', which only checks legacy
OPENAI_BASE_URL env vars. Now reads _read_main_provider() to resolve
to the actual provider (e.g., 'custom:beans', 'openrouter', 'deepseek').
2. Named custom provider names (e.g., 'beans') fell through to
PROVIDER_REGISTRY which doesn't know about config.yaml entries.
Now checks _get_named_custom_provider() before the registry fallback.
Fixes both resolve_provider_client() and _normalize_vision_provider()
so the fix covers all auxiliary tasks (vision, compression, web_extract,
session_search, etc.).
Adds 13 unit tests. Reported by Laura via Discord.
---------
Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
* fix(cli): route error messages through ChatConsole inside patch_stdout
Cherry-pick of PR #5798 by @icn5381.
Replace self.console.print() with ChatConsole().print() for 11 error/status
messages reachable during the interactive session. Inside patch_stdout,
self.console (plain Rich Console) writes raw ANSI escapes that StdoutProxy
mangles into garbled text. ChatConsole uses prompt_toolkit's native
print_formatted_text which renders correctly.
Same class of bug as #2262 — that fix covered agent output but missed
these error paths in _ensure_runtime_credentials, _init_agent, quick
commands, skill loading, and plan mode.
* fix(model-picker): add scrolling viewport to curses provider menu
Cherry-pick of PR #5790 by @Lempkey. Fixes#5755.
_curses_prompt_choice rendered items starting unconditionally from index 0
with no scroll offset. The 'More providers' submenu has 13 entries. On
terminals shorter than ~16 rows, items past the fold were never drawn.
When UP-arrow wrapped cursor from 0 to the last item (Cancel, index 12),
the highlight rendered off-screen — appearing as if only Cancel existed.
Adds scroll_offset tracking that adjusts each frame to keep the cursor
inside the visible window.
* feat(cli): skin-aware compact banner + git state in startup banner
Combined salvage of PR #5922 by @ASRagab and PR #5877 by @xinbenlv.
Compact banner changes (from #5922):
- Read active skin colors and branding instead of hardcoding gold/NOUS HERMES
- Default skin preserves backward-compatible legacy branding
- Non-default skins use their own agent_name and colors
Git state in banner (from #5877):
- New format_banner_version_label() shows upstream/local git hashes
- Full banner title now includes git state (upstream hash, carried commits)
- Compact banner line2 shows the version label with git state
- Widen compact banner max width from 64 to 88 to fit version info
Both the full Rich banner and compact fallback are now skin-aware
and show git state.
- Add 7 unit tests for _profile_arg: default home, named profile,
hash path, nested path, invalid name, systemd integration, launchd integration
- Add stt.local.language to config.yaml (empty = auto-detect)
- Both STT code paths now read config.yaml first, env var fallback,
then default (auto-detect for faster-whisper, 'en' for CLI command)
- HERMES_LOCAL_STT_LANGUAGE env var still works as backward-compat fallback
Follow-up to sroecker's PR #5918 — test mocks were using the old 3-arg
callback signature (name, preview, args) instead of the new
(event_type, name, preview, args, **kwargs).
The response validation stage unconditionally marked Codex Responses API
replies as invalid when response.output was empty, triggering unnecessary
retries and fallback chains. However, _normalize_codex_response can
recover from this state by synthesizing output from response.output_text.
Now the validation stage checks for output_text before marking the
response invalid, matching the normalization logic. Also fixes
logging.warning → logger.warning for consistency with the rest of the
file.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The webhook adapter stored per-request `deliver`/`deliver_extra` config in
`_delivery_info[chat_id]` during POST handling and consumed it via `.pop()`
inside `send()`. That worked for routes whose agent run produced exactly
one outbound message — the final response — but it broke whenever the
agent emitted any interim status message before the final response.
Status messages flow through the same `send(chat_id, ...)` path as the
final response (see `gateway/run.py::_status_callback_sync` →
`adapter.send(...)`). Common triggers include:
- "🔄 Primary model failed — switching to fallback: ..."
(run_agent.py::_emit_status when `fallback_providers` activates)
- context-pressure / compression notices
- any other lifecycle event routed through `status_callback`
When any of those fired, the first `send()` call popped the entry, so the
subsequent final-response `send()` saw an empty dict and silently
downgraded `deliver_type` from `"telegram"` (or `discord`/`slack`/etc.) to
the default `"log"`. The agent's response was logged to the gateway log
instead of being delivered to the configured cross-platform target — no
warning, no error, just a missing message.
This was easy to hit in practice. Any user with `fallback_providers`
configured saw it the first time their primary provider hiccuped on a
webhook-triggered run. Routes that worked perfectly in dev (where the
primary stays healthy) silently dropped responses in prod.
Fix: read `_delivery_info` with `.get()` so multiple `send()` calls for
the same `chat_id` all see the same delivery config. To keep the dict
bounded without relying on per-send cleanup, add a parallel
`_delivery_info_created` timestamp dict and a `_prune_delivery_info()`
helper that drops entries older than `_idempotency_ttl` (1h, same window
already used by `_seen_deliveries`). Pruning runs on each POST, mirroring
the existing `_seen_deliveries` cleanup pattern.
Worst-case memory footprint is now `rate_limit * TTL = 30/min * 60min =
1800` entries, each ~1KB → under 2 MB. In practice it'll be far smaller
because most webhooks complete in seconds, not the full hour.
Test changes:
- `test_delivery_info_cleaned_after_send` is replaced with
`test_delivery_info_survives_multiple_sends`, which is now the
regression test for this bug — it asserts that two consecutive
`send()` calls both see the delivery config.
- A new `test_delivery_info_pruned_via_ttl` covers the TTL cleanup
behavior.
- The two integration tests that asserted `chat_id not in
adapter._delivery_info` after `send()` now assert the opposite, with
a comment explaining why.
All 40 tests in `tests/gateway/test_webhook_adapter.py` and
`tests/gateway/test_webhook_integration.py` pass. Verified end-to-end
locally against a dynamic `hermes webhook subscribe` route configured
with `--deliver telegram --deliver-chat-id <user>`: with `gpt-5.4` as
the primary (currently flaky) and `claude-opus-4.6` as the fallback,
the fallback notification fires, the agent finishes, and the final
response is delivered to Telegram as expected.
* refactor: re-architect tests to mirror the codebase
* Update tests.yml
* fix: add missing tool_error imports after registry refactor
* fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist
patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers,
causing test_code_execution tests to hit the Modal remote path.
* fix(tests): fix update_check and telegram xdist failures
- test_update_check: replace patch("hermes_cli.banner.os.getenv") with
monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os
directly, it uses get_hermes_home() from hermes_constants.
- test_telegram_conflict/approval_buttons: provide real exception classes
for telegram.error mock (NetworkError, TimedOut, BadRequest) so the
except clause in connect() doesn't fail with "catching classes that do
not inherit from BaseException" when xdist pollutes sys.modules.
* fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock
Previously, all/new tool progress modes always hard-truncated previews
to 40 chars, ignoring the display.tool_preview_length config. This made
it impossible for gateway users to see meaningful command/path info
without switching to verbose mode (which shows too much detail).
Now all/new modes read tool_preview_length from config:
- tool_preview_length: 0 (default/unset) → 40 chars (no regression)
- tool_preview_length: 120 → 120-char previews in all/new mode
- verbose mode: unchanged (already respected the config)
Users who want longer previews can set:
display:
tool_preview_length: 120
Reported by demontut_ on Discord.
Captions in photo bursts and media group albums were silently dropped when
a shorter caption happened to be a substring of an existing one (e.g.
"Meeting" lost inside "Meeting agenda"). Extract a shared _merge_caption
static helper that splits on "\n\n" and uses exact match with whitespace
normalisation, then use it in both _enqueue_photo_event and
_queue_media_group_event.
Adds 13 unit tests covering the fixed bug scenarios.
Cherry-picked from PR #2671 by Dilee.
Based on PR #5413 spec by MaheshtheDev (Mahesh Sanikommu).
Changes:
- Add search_mode config (hybrid/memories/documents) passed to SDK
- Add {identity} template support in container_tag for profile-scoped containers
- Add SUPERMEMORY_CONTAINER_TAG env var override (priority over config)
- Add multi-container mode: enable_custom_container_tags, custom_containers,
custom_container_instructions in supermemory.json
- Dynamic tool schemas when multi-container enabled (optional container_tag param)
- Whitelist validation for custom container tags in tool calls
- Simplify get_config_schema() to only prompt for API key during setup
- Defer container_tag sanitization to initialize() (after template resolution)
- Add custom_id support to documents.add calls
- Update README with multi-container docs, search_mode, identity template,
support links (Discord, email)
- Update memory-providers.md with new features and multi-container example
- Update memory-provider-plugin.md with minimal vs full schema guidance
- Add 12 new tests covering identity template, search_mode, multi-container,
config schema, and env var override
When the model produces structured reasoning (via API fields like .reasoning,
.reasoning_content, .reasoning_details) but no visible text content, append
the assistant message as prefill and continue the loop. The model sees its own
reasoning context on the next turn and produces the text portion.
Inspired by clawdbot's 'incomplete-text' recovery pattern. Up to 2 prefill
attempts before falling through to the existing '(empty)' terminal.
Key design decisions:
- Only triggers for structured reasoning (API fields), NOT inline <think> tags
- Prefill messages are popped on success to maintain strict role alternation
- _thinking_prefill marker stripped from all API message building paths
- Works across all providers: OpenAI (continuation), Anthropic (native prefill)
Verified with E2E tests: simulated thinking-only → real OpenRouter continuation
produces correct content. Also confirmed Qwen models consistently produce
structured-reasoning-only responses under token pressure.
Add win32 platform branch to clipboard.py so Ctrl+V image paste
works on native Windows (PowerShell / Windows Terminal), not just
WSL2.
Uses the same .NET System.Windows.Forms.Clipboard approach as the
WSL path but calls PowerShell directly instead of powershell.exe
(the WSL cross-call path). Tries 'powershell' first (Windows
PowerShell 5.1, always available), then 'pwsh' (PowerShell 7+).
PowerShell executable is discovered once and cached for the process
lifetime.
Includes 14 new tests covering:
- Platform dispatch (save_clipboard_image + has_clipboard_image)
- Image detection via PowerShell .NET check
- Base64 PNG extraction and decode
- Edge cases: no PowerShell, empty output, invalid base64, timeout
Adapted from PR #5679 (0xbyt4) to cover edge cases not in the integration tests:
video routing, unknown extension fallback to send_document, multi-file delivery,
and single-failure isolation.
The cron delivery path sent raw 'MEDIA:/path/to/file' text instead
of uploading the file as a native attachment. The standalone path
(via _send_to_platform) already extracted MEDIA tags and forwarded
them as media_files, but the live adapter path passed the unprocessed
delivery_content directly to adapter.send().
Two bugs fixed:
1. Live adapter path now sends cleaned text (MEDIA tags stripped)
instead of raw content — prevents 'MEDIA:/path' from appearing
as literal text in Discord/Telegram/etc.
2. Live adapter path now sends each extracted media file via the
adapter's native method (send_voice for audio, send_image_file
for images, send_video for video, send_document as fallback) —
files are uploaded as proper platform attachments.
The file-type routing mirrors BasePlatformAdapter._process_message_background
to ensure consistent behavior between normal gateway responses and
cron-delivered responses.
Adds 2 tests:
- test_live_adapter_sends_media_as_attachments: verifies Discord
adapter receives send_voice call for .mp3 file
- test_live_adapter_sends_cleaned_text_not_raw: verifies MEDIA tag
stripped from text sent via live adapter
Memory plugins (Mem0, Honcho) used static identifiers ('hermes-user',
config peerName) meaning all gateway users shared the same memory bucket.
Changes:
- AIAgent.__init__: add user_id parameter, store as self._user_id
- run_agent.py: include user_id in _init_kwargs passed to memory providers
- gateway/run.py: pass source.user_id to AIAgent in primary + background paths
- Mem0 plugin: prefer kwargs user_id over config default
- Honcho plugin: override cfg.peer_name with gateway user_id when present
CLI sessions (user_id=None) preserve existing defaults. Only gateway
sessions with a real platform user_id get per-user memory scoping.
Reported by plev333.
When the bot sends a message in a thread, track its ts in _bot_message_ts.
When the bot is @mentioned in a thread, register it in _mentioned_threads.
Both sets enable auto-responding to future messages in those threads
without requiring repeated @mentions — making the bot behave like a
team member that stays engaged once a conversation starts.
Channel message gating now checks 4 signals (in order):
1. @mention in this message
2. Reply in a thread the bot started/participated in (_bot_message_ts)
3. Message in a thread where the bot was previously @mentioned (_mentioned_threads)
4. Existing session for this thread (_has_active_session_for_thread — survives restarts)
Thread context fetching now triggers on ANY first-entry path (not just
@mention), so the agent gets context whether it's entering via a mention,
a bot-thread reply, or a mentioned-thread auto-trigger.
Both tracking sets are bounded (5000 cap with prune-oldest-half) to prevent
unbounded memory growth in long-running deployments.
Salvaged from PR #5754 by @hhhonzik. Preserves our existing approval buttons,
thread context fetching, and session key fix. Does NOT include the
edit_message format_message() removal (that was a regression in the original PR).
Tests: 4 new tests for bot-ts tracking and mentioned-thread bounds.
Slack:
- Add Block Kit interactive buttons for command approval (Allow Once,
Allow Session, Always Allow, Deny) via send_exec_approval()
- Register @app.action handlers for each approval button
- Add _fetch_thread_context() — fetches thread history via
conversations.replies when bot is first @mentioned mid-thread
- Fix _has_active_session_for_thread() to use build_session_key()
instead of manual key construction (fixes session key mismatch bug
where thread_sessions_per_user flag was ignored, ref PR #5833)
Telegram:
- Add InlineKeyboard approval buttons via send_exec_approval()
- Add ea:* callback handling in _handle_callback_query()
- Uses monotonic counter + _approval_state dict to map button clicks
back to session keys (avoids 64-byte callback_data limit)
Both platforms now auto-detected by the gateway runner's
_approval_notify_sync() — any adapter with send_exec_approval() on
its class gets button-based approval instead of text fallback.
Inspired by community PRs #3898 (LevSky22), #2953 (ygd58), #5833
(heathley). Implemented fresh on current main.
Tests: 24 new tests covering button rendering, action handling,
thread context fetching, session key fix, double-click prevention.
* fix: repair 57 failing CI tests across 14 files
Categories of fixes:
**Test isolation under xdist (-n auto):**
- test_hermes_logging: Strip ALL RotatingFileHandlers before each test
to prevent handlers leaked from other xdist workers from polluting counts
- test_code_execution: Force TERMINAL_ENV=local in setUp — prevents Modal
AuthError when another test leaks TERMINAL_ENV=modal
- test_timezone: Same TERMINAL_ENV fix for execute_code timezone tests
- test_codex_execution_paths: Mock _resolve_turn_agent_config to ensure
model resolution works regardless of xdist worker state
**Matrix adapter tests (nio not installed in CI):**
- Add _make_fake_nio() helper with real response classes for isinstance()
checks in production code
- Replace MagicMock(spec=nio.XxxResponse) with fake_nio instances
- Wrap production method calls with patch.dict('sys.modules', {'nio': ...})
so import nio succeeds in method bodies
- Use try/except instead of pytest.importorskip for nio.crypto imports
(importorskip can be fooled by MagicMock in sys.modules)
- test_matrix_voice: Skip entire file if nio is a mock, not just missing
**Stale test expectations:**
- test_cli_provider_resolution: _prompt_provider_choice now takes **kwargs
(default param added); mock getpass.getpass alongside input
- test_anthropic_oauth_flow: Mock getpass.getpass (code switched from input)
- test_gemini_provider: Mock models.dev + OpenRouter API lookups to test
hardcoded defaults without external API variance
- test_code_execution: Add notify_on_complete to blocked terminal params
- test_setup_openclaw_migration: Mock prompt_choice to select 'Full setup'
(new quick-setup path leads to _require_tty → sys.exit in CI)
- test_skill_manager_tool: Patch get_all_skills_dirs alongside SKILLS_DIR
so _find_skill searches tmp_path, not real ~/.hermes/skills/
**Missing attributes in object.__new__ test runners:**
- test_platform_reconnect: Add session_store to _make_runner()
- test_session_race_guard: Add hooks, _running_agents_ts, session_store,
delivery_router to _make_runner()
**Production bug fix (gateway/run.py):**
- Fix sentinel eviction race: _AGENT_PENDING_SENTINEL was immediately
evicted by the stale-detection logic because sentinels have no
get_activity_summary() method, causing _stale_idle=inf >= timeout.
Guard _should_evict with 'is not _AGENT_PENDING_SENTINEL'.
* fix: address remaining CI failures
- test_setup_openclaw_migration: Also mock _offer_launch_chat (called at
end of both quick and full setup paths)
- test_code_execution: Move TERMINAL_ENV=local to module level to protect
ALL test classes (TestEnvVarFiltering, TestExecuteCodeEdgeCases,
TestInterruptHandling, TestHeadTailTruncation) from xdist env leaks
- test_matrix: Use try/except for nio.crypto imports (importorskip can be
fooled by MagicMock in sys.modules under xdist)
check_nous_free_tier() now caches its result for 180 seconds to avoid
redundant Portal API calls during a session (auxiliary client init,
model selection, login flow all call it independently).
The TTL is short enough that an account upgrade from free to paid is
reflected within 3 minutes. clear_nous_free_tier_cache() is exposed
for explicit invalidation on login/logout.
Adds 4 tests for cache hit, TTL expiry, explicit clear, and TTL bound.
- Show pricing during initial Nous Portal login (was missing from
_login_nous, only shown in the already-logged-in hermes model path)
- Filter free models for paid subscribers: non-allowlisted free models
are hidden; allowlisted models (xiaomi/mimo-v2-pro, xiaomi/mimo-v2-omni)
only appear when actually priced as free
- Detect free-tier accounts via portal api/oauth/account endpoint
(monthly_charge == 0); free-tier users see only free models as
selectable, with paid models shown dimmed and unselectable
- Use xiaomi/mimo-v2-omni as the auxiliary vision model for free-tier
Nous users so vision_analyze and browser_vision work without paid
model access (replaces the default google/gemini-3-flash-preview)
- Unavailable models rendered via print() before TerminalMenu to avoid
simple_term_menu line-width padding artifacts; upgrade URL resolved
from auth state portal_base_url (supports staging/custom portals)
- Add 21 tests covering filter_nous_free_models, is_nous_free_tier,
and partition_nous_models_by_tier
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.