Reverts the sanitizer addition from PR #2466 (originally #2129).
We already have _empty_content_retries handling for reasoning-only
responses. The trailing strip risks silently eating valid messages
and is redundant with existing empty-content handling.
Replace asyncio.run() with thread-local persistent event loops for
worker threads (e.g., delegate_task's ThreadPoolExecutor). asyncio.run()
creates and closes a fresh loop on every call, leaving cached
httpx/AsyncOpenAI clients bound to a dead loop — causing 'Event loop is
closed' errors during GC when parallel subagents clean up connections.
The fix mirrors the main thread's _get_tool_loop() pattern but uses
threading.local() so each worker thread gets its own long-lived loop,
avoiding both cross-thread contention and the create-destroy lifecycle.
Added 4 regression tests covering worker loop persistence, reuse,
per-thread isolation, and separation from the main thread's loop.
When the model returns multiple tool calls, run_agent.py executes them
concurrently in a ThreadPoolExecutor. Each thread called _run_async()
which used a shared persistent event loop (_get_tool_loop()). If two
async tools (like web_extract) ran in parallel, the second thread would
hit 'This event loop is already running' on the shared loop.
Fix: detect worker threads (not main thread) and use asyncio.run() with
a per-thread fresh loop instead of the shared persistent one. The shared
loop is still used for the main thread (CLI sequential path) to keep
cached async clients (httpx/AsyncOpenAI) alive.
Co-authored-by: Test <test@test.com>
Cherry-picked from PR #2146 by @crazywriter1. Fixes#2104.
asyncio.run() creates and closes a fresh event loop each call. Cached
httpx/AsyncOpenAI clients bound to the dead loop crash on GC with
'Event loop is closed'. This hit vision_analyze on first use in CLI.
Two-layer fix:
- model_tools._run_async(): replace asyncio.run() with persistent
loop via _get_tool_loop() + run_until_complete()
- auxiliary_client._get_cached_client(): track which loop created
each async client, discard stale entries if loop is closed
6 regression tests covering loop lifecycle, reuse, and full vision
dispatch chain.
Co-authored-by: Test <test@test.com>
* fix: prevent unavailable tool names from leaking into model schemas
When web_search/web_extract fail check_fn (no API key configured), their
names were still leaking into tool descriptions via two paths:
1. execute_code schema: sandbox_enabled was computed from tools_to_include
(pre-filter) instead of the actual available tools (post-filter), so
the execute_code description listed web_search/web_extract as available
sandbox imports even when they weren't.
2. browser_navigate schema: hardcoded description said 'prefer web_search
or web_extract' regardless of whether those tools existed.
The model saw these references, assumed the tools existed, and tried
calling them directly — triggering 'Unknown tool' errors.
Fix: compute available_tool_names from the filtered result set and use
that for both execute_code sandbox listing and browser_navigate description
patching.
* docs: add pitfall about cross-tool references in schema descriptions
---------
Co-authored-by: Test <test@test.com>
* perf: cache base_url.lower() via property, consolidate triple load_config(), hoist set constant
run_agent.py:
- Add base_url property that auto-caches _base_url_lower on every
assignment, eliminating 12+ redundant .lower() calls per API cycle
across __init__, _build_api_kwargs, _supports_reasoning_extra_body,
and the main conversation loop
- Consolidate three separate load_config() disk reads in __init__
(memory, skills, compression) into a single call, reusing the
result dict for all three config sections
model_tools.py:
- Hoist _READ_SEARCH_TOOLS set to module level (was rebuilt inside
handle_function_call on every tool invocation)
* Use endpoint metadata for custom model context and pricing
---------
Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>
Four small fixes:
1. model_tools.py: Tool import failures logged at WARNING instead of
DEBUG. If a tool module fails to import (syntax error, missing dep),
the user now sees a warning instead of the tool silently vanishing.
2. hermes_cli/config.py: Remove duplicate 'import sys' (lines 19, 21).
3. agent/model_metadata.py: Remove 6 duplicate entries in
DEFAULT_CONTEXT_LENGTHS dict. Python keeps the last value, so no
functional change, but removes maintenance confusion.
4. hermes_state.py: Add missing self._lock to the LIKE query in
resolve_session_id(). The exact-match path used get_session()
(which locks internally), but the prefix fallback queried _conn
without the lock.
Add inference.sh CLI (infsh) as a tool integration, giving agents
access to 150+ AI apps through a single CLI — image gen (FLUX, Reve,
Seedream), video (Veo, Wan, Seedance), LLMs, search (Tavily, Exa),
3D, avatar/lipsync, and more. One API key manages all services.
Tools:
- infsh: run any infsh CLI command (app list, app run, etc.)
- infsh_install: install the CLI if not present
Registered as an 'inference' toolset (opt-in, not in core tools).
Includes comprehensive skill docs with examples for all app categories.
Changes from original PR:
- NOT added to _HERMES_CORE_TOOLS (available via --toolsets inference)
- Added 12 tests covering tool registration, command execution,
error handling, timeout, JSON parsing, and install flow
Inspired by PR #1021 by @okaris.
Co-authored-by: okaris <okaris@users.noreply.github.com>
browser_console was registered in the tool registry but missing from
all toolset definitions (TOOLSETS, _HERMES_CORE_TOOLS, _LEGACY_TOOLSET_MAP),
so the agent could never discover or use it.
Added to all 4 locations + 4 wiring tests.
Cherry-picked from PR #1084 by @0xbyt4 (authorship preserved in tests).
Salvaged from PR #1470 by adavyas.
Core fix: Honcho tool calls in a multi-session gateway could route to
the wrong session because honcho_tools.py relied on process-global
state. Now threads session context through the call chain:
AIAgent._invoke_tool() → handle_function_call() → registry.dispatch()
→ handler **kw → _resolve_session_context()
Changes:
- Add _resolve_session_context() to prefer per-call context over globals
- Plumb honcho_manager + honcho_session_key through handle_function_call
- Add sync_honcho=False to run_conversation() for synthetic flush turns
- Pass honcho_session_key through gateway memory flush lifecycle
- Harden gateway PID detection when /proc cmdline is unreadable
- Make interrupt test scripts import-safe for pytest-xdist
- Wrap BibTeX examples in Jekyll raw blocks for docs build
- Fix thread-order-dependent assertion in client lifecycle test
- Expand Honcho docs: session isolation, lifecycle, routing internals
Dropped from original PR:
- Indentation change in _create_request_openai_client that would move
client creation inside the lock (causes unnecessary contention)
Co-authored-by: adavyas <adavyas@users.noreply.github.com>
Follow-up to PR #705 (merged from 0xbyt4). Addresses several issues:
1. CONSECUTIVE-ONLY TRACKING: Redesigned the read/search tracker to only
warn/block on truly consecutive identical calls. Any other tool call
in between (write, patch, terminal, etc.) resets the counter via
notify_other_tool_call(), called from handle_function_call() in
model_tools.py. This prevents false blocks in read→edit→verify flows.
2. THRESHOLD ADJUSTMENT: Warn on 3rd consecutive (was 2nd), block on
4th+ consecutive (was 3rd+). Gives the model more room before
intervening.
3. TUPLE UNPACKING BUG: Fixed get_read_files_summary() which crashed on
search keys (5-tuple) when trying to unpack as 3-tuple. Now uses a
separate read_history set that only tracks file reads.
4. WEB_EXTRACT DOCSTRING: Reverted incorrect removal of 'title' from
web_extract return docs in code_execution_tool.py — the field IS
returned by web_tools.py.
5. TESTS: Rewrote test_read_loop_detection.py (35 tests) to cover
consecutive-only behavior, notify_other_tool_call, interleaved
read/search, and summary-unaffected-by-searches.
The process-global _last_resolved_tool_names gets overwritten when
subagents resolve their own toolsets, causing execute_code in the
parent agent to generate imports for the wrong set of tools.
Fix: handle_function_call() now accepts an enabled_tools parameter.
run_agent.py already passes self.valid_tool_names at both call sites.
This change makes model_tools.py actually use it, falling back to the
global only when the caller doesn't provide a list (backward compat).
When a user disables the web toolset via 'hermes tools', the execute_code
schema description still hardcoded web_search/web_extract as available,
causing the model to keep trying to use them. Similarly, delegate_task
always defaulted to ['terminal', 'file', 'web'] for subagents regardless
of the parent's config.
Changes:
- execute_code schema is now built dynamically via build_execute_code_schema()
based on which sandbox tools are actually enabled
- model_tools.py rebuilds the execute_code schema at definition time using
the intersection of sandbox-allowed and session-enabled tools
- delegate_task now inherits the parent agent's enabled_toolsets instead of
hardcoding DEFAULT_TOOLSETS when no explicit toolsets are specified
- delegate_task description updated to say 'inherits your enabled toolsets'
Reported by kotyKD on Discord.
Connect to external MCP servers via stdio transport, discover their tools
at startup, and register them into the hermes-agent tool registry.
- New tools/mcp_tool.py: config loading, server connection via background
event loop, tool handler factories, discovery, and graceful shutdown
- model_tools.py: trigger MCP discovery after built-in tool imports
- cli.py: call shutdown_mcp_servers in _run_cleanup
- pyproject.toml: add mcp>=1.2.0 as optional dependency
- 27 unit tests covering config, schema conversion, handlers, registration,
SDK interaction, toolset injection, graceful fallback, and shutdown
Config format (in ~/.hermes/config.yaml):
mcp_servers:
filesystem:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
- Add ha_list_entities, ha_get_state, ha_call_service tools via REST API
- Add WebSocket gateway adapter for real-time state_changed event monitoring
- Support domain/entity filtering, cooldown, and auto-reconnect with backoff
- Use REST API for outbound notifications to avoid WS race condition
- Gate tool availability on HASS_TOKEN env var
- Add 82 unit tests covering real logic (filtering, payload building, event pipeline)
Opt-in persistent cross-session user modeling via Honcho. Reads
~/.honcho/config.json as single source of truth (shared with
Claude Code, Cursor, and other Honcho-enabled tools). Zero impact
when disabled or unconfigured.
- honcho_integration/ package (client, session manager, peer resolution)
- Host-based config resolution matching claude-honcho/cursor-honcho pattern
- Prefetch user context into system prompt per conversation turn
- Sync user/assistant messages to Honcho after each exchange
- query_user_context tool for mid-conversation dialectic reasoning
- Gated activation: requires ~/.honcho/config.json with enabled=true
- Revised descriptions for various tools in model_tools.py, browser_tool.py, code_execution_tool.py, delegate_tool.py, and terminal_tool.py to enhance clarity and reduce verbosity.
- Improved consistency in terminology and formatting across tool descriptions, ensuring users have a clearer understanding of tool functionalities and usage.
- Introduced the `delegate_task` tool, allowing the main agent to spawn child AIAgent instances with isolated context for complex tasks.
- Supported both single-task and batch processing (up to 3 concurrent tasks) to enhance task management capabilities.
- Updated configuration options for delegation, including maximum iterations and default toolsets for subagents.
- Enhanced documentation to provide clear guidance on using the delegation feature and its configuration.
- Added comprehensive tests to ensure the functionality and reliability of the delegation logic.
- Changed the target parameter from "content" and "files" to "grep" and "find" to better represent their functionality.
- Revised descriptions in the tool definitions and execution code schema to enhance understanding of search modes and output formats.
- Ensured consistency in the handling of search operations across the codebase.
- Updated the tool name from "search" to "search_files" across multiple files to better reflect its functionality.
- Adjusted related documentation and descriptions to ensure clarity in usage and expected behavior.
- Enhanced the toolset definitions and mappings to incorporate the new naming convention, improving overall consistency in the codebase.
- Introduced a new `execute_code` tool that allows the agent to run Python scripts that call Hermes tools via RPC, reducing the number of round trips required for tool interactions.
- Added configuration options for timeout and maximum tool calls in the sandbox environment.
- Updated the toolset definitions to include the new code execution capabilities, ensuring integration across platforms.
- Implemented comprehensive tests for the code execution sandbox, covering various scenarios including tool call limits and error handling.
- Enhanced the CLI and documentation to reflect the new functionality, providing users with clear guidance on using the code execution tool.
- Added a new `clarify_tool` to enable the agent to ask structured multiple-choice or open-ended questions to users.
- Implemented callback functionality for user interaction, allowing the platform to handle UI presentation.
- Updated the CLI and agent to support clarify questions, including timeout handling and response management.
- Enhanced toolset definitions and requirements to include the clarify tool, ensuring availability across platforms.
- Added a new `skill_manager_tool` to enable agents to create, update, and delete their own skills, enhancing procedural memory capabilities.
- Updated the skills directory structure to support user-created skills in `~/.hermes/skills/`, allowing for better organization and management.
- Enhanced the CLI and documentation to reflect the new skill management functionalities, including detailed instructions on creating and modifying skills.
- Implemented a manifest-based syncing mechanism for bundled skills to ensure user modifications are preserved during updates.
- Enhanced the `handle_send_message_function_call` to support sending messages to multiple platforms (Telegram, Discord, Slack, WhatsApp) using their respective APIs.
- Added error handling for missing parameters and platform configuration issues.
- Introduced asynchronous message sending with helper functions for each platform, improving responsiveness and reliability.
- Updated documentation within the function to clarify usage and requirements.
Two-part implementation:
Part A - Curated Bounded Memory:
- New memory tool (tools/memory_tool.py) with MEMORY.md + USER.md stores
- Character-limited (2200/1375 chars), § delimited entries
- Frozen snapshot injected into system prompt at session start
- Model manages pruning via replace/remove with substring matching
- Usage indicator shown in system prompt header
Part B - SQLite Session Store:
- New hermes_state.py with SessionDB class, FTS5 full-text search
- Gateway session.py rewritten to dual-write SQLite + legacy JSONL
- Compression-triggered session splitting with parent_session_id chains
- New session_search tool with Gemini Flash summarization of matched sessions
- CLI session lifecycle (create on launch, close on exit)
Also:
- System prompt now cached per session, only rebuilt on compression
(fixes prefix cache invalidation from date/time changes every turn)
- Config version bumped to 3, hermes doctor checks for new artifacts
- Disabled in batch_runner and RL environments
Single `todo` tool that reads (no params) or writes (provide todos array
with merge flag). In-memory TodoStore on AIAgent, no system prompt
mutation, behavioral guidance in tool description only. State re-injected
after context compression events. Gateway sessions hydrate from
conversation history. Added to all platform toolsets.
Also wired into RL agent_loop.py with per-run TodoStore and fixed
browser_snapshot user_task passthrough from first user message.
New process registry and tool for managing long-running background processes
across all terminal backends (local, Docker, Singularity, Modal, SSH).
Process Registry (tools/process_registry.py):
- ProcessSession tracking with rolling 200KB output buffer
- spawn_local() with optional PTY via ptyprocess for interactive CLIs
- spawn_via_env() for non-local backends (runs inside sandbox, never on host)
- Background reader threads per process (Popen stdout or PTY)
- wait() with timeout clamping, interrupt support, and transparent limit reporting
- JSON checkpoint to ~/.hermes/processes.json for gateway crash recovery
- Module-level singleton shared across agent loop, gateway, and RL
Process Tool (model_tools.py):
- 7 actions: list, poll, log, wait, kill, write, submit
- Paired with terminal in all toolsets (CLI, messaging, RL)
- Timeout clamping with transparent notes in response
Terminal Tool Updates (tools/terminal_tool.py):
- Replaced nohup background mode with registry spawn (returns session_id)
- Added workdir parameter for per-command working directory
- Added check_interval parameter for gateway auto-check watchers
- Added pty parameter for interactive CLI tools (Codex, Claude Code)
- Updated TERMINAL_TOOL_DESCRIPTION with full background workflow docs
- Cleanup thread now respects active background processes (won't reap sandbox)
Gateway Integration (gateway/run.py, session.py, config.py):
- Session reset protection: sessions with active processes exempt from reset
- Default idle timeout increased from 2 hours to 24 hours
- from_dict fallback aligned to match (was 120, now 1440)
- session_key env var propagated to process registry for session mapping
- Crash recovery on gateway startup via checkpoint probe
- check_interval watcher: asyncio task polls process, delivers updates to platform
RL Safety (environments/):
- tool_context.py cleanup() kills background processes on episode end
- hermes_base_env.py warns when enabled_toolsets is None (loads all tools)
- Process tool safe in RL via wait() blocking the agent loop
Also:
- Added ptyprocess as optional dependency (in pyproject.toml [pty] extra + [all])
- Fixed pre-existing bug: rl_test_inference missing from TOOL_TO_TOOLSET_MAP
- Updated AGENTS.md with process management docs and project structure
- Updated README.md terminal section with process management overview
- Updated the vision tool to accept both HTTP/HTTPS URLs and local file paths for image analysis.
- Implemented caching of user-uploaded images in local directories to ensure reliable access for the vision tool, addressing issues with ephemeral URLs.
- Enhanced platform adapters (Discord, Telegram, WhatsApp) to download and cache images, allowing for immediate analysis and enriched message context.
- Added a new method to auto-analyze images attached by users, enriching the conversation with detailed descriptions.
- Improved documentation for image handling processes and updated related functions for clarity and efficiency.
- Updated the image generation function description to clarify usage with markdown.
- Added `send_image` method to `BasePlatformAdapter` for native image sending across platforms.
- Implemented `send_image` in `DiscordAdapter` and `TelegramAdapter` to handle image attachments directly.
- Introduced `extract_images` method to extract image URLs from markdown and HTML, improving content processing.
- Enhanced message handling to support sending images as attachments while maintaining text content.
- Removed the skills_categories tool from the skills toolset, streamlining the skills functionality to focus on skills_list and skill_view.
- Updated the system prompt to dynamically build a compact skills index, allowing the model to quickly reference available skills without additional tool calls.
- Cleaned up related code and documentation to reflect the removal of skills_categories, ensuring clarity and consistency across the codebase.
- Updated `ALL_POSSIBLE_TOOLS` to auto-derive from `TOOL_TO_TOOLSET_MAP` for consistent schema.
- Introduced `_extract_reasoning_stats` function to track reasoning coverage in assistant turns.
- Enhanced `_process_batch_worker` to discard prompts with no reasoning and aggregate reasoning statistics.
- Updated documentation and comments for clarity on new features and changes.
- Updated `.gitignore` to exclude `testlogs` directory.
- Refactored `handle_web_function_call` in `model_tools.py` to support running async functions in existing event loops, improving compatibility with Atropos.
- Introduced a thread pool executor in `agent_loop.py` for running synchronous tool calls that internally use `asyncio.run()`, preventing deadlocks.
- Added `ToolError` class to track tool execution errors, enhancing error reporting during agent loops.
- Updated `wandb_log` method in `hermes_base_env.py` to log tool error statistics for better monitoring.
- Implemented patches in `patches.py` to ensure async-safe operation of tools within Atropos's event loop.
- Enhanced `ToolContext` and `terminal_tool.py` to utilize the new async handling, improving overall tool execution reliability.
- Introduced file manipulation capabilities in `model_tools.py`, including functions for reading, writing, patching, and searching files.
- Added a new `file` toolset in `toolsets.py` and updated distributions to include file tools.
- Enhanced `setup-hermes.sh` and `install.sh` scripts to check for and optionally install `ripgrep` for faster file searching.
- Implemented a new `file_operations.py` module to encapsulate file operations using shell commands.
- Updated `doctor.py` and `install.ps1` to check for `ripgrep` and provide installation guidance if not found.
- Added fuzzy matching and patch parsing capabilities to improve file manipulation accuracy and flexibility.
- Modified `model_tools.py` to update default model IDs and add new RL function `rl_test_inference`.
- Enhanced `README.md` with installation instructions for submodules and updated API key usage.
- Improved `rl_cli.py` to load configuration from `~/.hermes/config.yaml` and set terminal working directory for RL tools.
- Updated `run_agent.py` to handle empty string arguments as empty objects for better JSON validation.
- Refined installation scripts to ensure submodules are cloned and installed correctly, enhancing setup experience.
- Added the tinker-atropos submodule for enhanced RL training capabilities.
- Updated model_tools.py to reorder RL function definitions and improve descriptions.
- Modified rl_cli.py to include checks for the tinker-atropos setup and provide user guidance.
- Adjusted toolsets.py and __init__.py to reflect changes in RL function availability.
- Enhanced rl_training_tool.py to manage training processes directly without a separate API server.
- Updated `.env.example` to include Tinker and WandB API keys for reinforcement learning training.
- Enhanced `model_tools.py` to clarify configuration options and streamline the RL training process.
- Expanded `README.md` with detailed instructions for setting up RL training using Tinker and WandB.
- Modified `hermes_cli` files to integrate RL training tools and ensure proper configuration checks.
- Improved `rl_training_tool.py` to reflect changes in training parameters and configuration management.
- Updated the `skills_categories` function to include a `verbose` parameter, allowing users to request skill counts per category.
- Modified the `handle_skills_function_call` method to pass the `verbose` argument to `skills_categories`.
- Improved error handling in the `AIAgent` class by injecting a recovery message when invalid JSON arguments are detected, guiding users on how to correct their tool calls.
- Enhanced the `GatewayRunner` to return a user-friendly error message if the agent fails to generate a final response, improving overall user experience.
- Updated the terminal tool's command approval flow to improve user interaction when executing potentially dangerous commands, replacing the previous confirmation method with a clear explanation and instructions for adding commands to the allowlist.
- Removed the internal `force` parameter from the model API, ensuring that dangerous command approvals are handled solely through user prompts.
- Enhanced the CLI to provide better feedback regarding tool availability, including improved messaging for enabled and disabled toolsets.
- Updated AGENTS.md to reflect changes in the command approval process and configuration instructions.