Fixes discovered while running TBLite baseline evaluation:
1. ephemeral_disk param not supported in modal 1.3.5 - check before passing
2. Modal legacy image builder requires working pip - add ensurepip fix via
setup_dockerfile_commands to handle task images with broken pip
3. Host cwd leaked into Modal sandbox - add /home/ to host prefix check
4. Tilde ~ not expanded by subprocess.run(cwd=) in sandboxes - use /root
5. install_pipx must stay True for swerex-remote to be available
Dependencies also needed (not in this commit):
- git submodule update --init mini-swe-agent
- uv pip install swe-rex boto3
On macOS, Docker Desktop installs the CLI to /usr/local/bin/docker, but
when Hermes runs as a gateway service (launchd) or in other non-login
contexts, /usr/local/bin is often not in PATH. This causes the Docker
requirements check to fail with 'No such file or directory: docker' even
though docker works fine from the user's terminal.
Add find_docker() helper that uses shutil.which() first, then probes
common Docker Desktop install paths on macOS (/usr/local/bin,
/opt/homebrew/bin, Docker.app bundle). The resolved path is cached and
passed to mini-swe-agent via its 'executable' parameter.
- tools/environments/docker.py: add find_docker(), use it in
_storage_opt_supported() and pass to _Docker(executable=...)
- tools/terminal_tool.py: use find_docker() in requirements check
- tests/tools/test_docker_find.py: 4 tests (PATH, fallback, not found, cache)
2877 tests pass.
Authored by voteblake.
- Semaphore limits concurrent Modal sandbox creations to 8 (configurable)
to prevent thread pool deadlocks when 86+ tasks fire simultaneously
- Modal cleanup guard for failed init (prevents AttributeError)
- CWD override to /app for TB2 containers
- Add /home/ to host path validation for container backends
Authored by shitcoinsherpa. Adds encoding='utf-8' to all text-mode
open() calls in gateway/run.py, gateway/config.py, hermes_cli/config.py,
hermes_cli/main.py, and hermes_cli/status.py. Prevents encoding errors
on Windows where the default locale is not UTF-8.
Also fixed 4 additional open() calls in gateway/run.py that were added
after the PR branch was created.
The existing password prompt uses /dev/tty and termios to read input
with echo disabled. Neither exists on Windows.
On Windows, msvcrt.getwch() reads a single character from the console
without echoing it. This adds a Windows code path that uses getwch()
in a loop, collecting characters until Enter is pressed.
The Unix path using termios and /dev/tty is unchanged.
- Add max_concurrent_tasks config (default 8) with semaphore in TB2 eval
- Pass cwd: /app via register_task_env_overrides for TB2 tasks
- Add /home/ to host path prefixes as safety net for container backends
When all 86 TerminalBench2 tasks fire simultaneously, each creates a Modal sandbox
via asyncio.run() inside a thread pool worker. Modal's blocking calls deadlock
when too many are created at once. The semaphore ensures max 8 concurrent creations.
Co-Authored-By: hermes-agent[bot] <hermes-agent[bot]@users.noreply.github.com>
Add Daytona to image selection, container_config guards, environment
factory, requirements check, and diagnostics in terminal_tool.py and
file_tools.py. Also add to sandboxed-backend approval bypass.
Signed-off-by: rovle <lovre.pesut@gmail.com>
Updated documentation within terminal_tool.py to emphasize the appropriate use of foreground and background processes. Enhanced descriptions for the timeout setting and background execution to guide users towards optimal configurations for scripts, builds, and long-running tasks. Adjusted the default timeout value from 60 to 180 seconds for improved handling of longer operations.
- Introduce a new test suite for the `redact_sensitive_text` function, covering various sensitive data formats including API keys, tokens, and environment variables.
- Ensure that sensitive information is properly masked in logs and outputs while non-sensitive data remains unchanged.
- Add tests for different scenarios including JSON fields, authorization headers, and environment variable assignments.
- Implement a redacting formatter for logging to enhance security during log output.
The sudo password was embedded in shell commands via single-quote
interpolation: echo '{password}' | sudo -S
If the password contained shell metacharacters (single quotes,
$(), backticks), they would be interpreted by the shell, enabling
arbitrary command execution.
Fix: use shlex.quote() which properly escapes all shell-special
characters, ensuring the password is always treated as a literal
string argument to echo.
The SSH backend was missing from check_terminal_requirements(), causing
it to fall through to `return False`. This silently disabled both the
terminal and file tools when TERMINAL_ENV=ssh was configured.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The browser_tool signal handler calls sys.exit(130) which raises
SystemExit. When this fires during terminal_tool's atexit cleanup
(specifically during _cleanup_thread.join()), it produces an unhandled
traceback. Wrapping the join in a try/except suppresses the race
without changing shutdown behavior.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Added a new section in the README for Inference Providers, detailing setup instructions for Nous Portal, OpenRouter, and Custom Endpoints, improving user guidance for LLM connections.
- Updated messaging platform setup instructions to include Slack and WhatsApp, providing clearer steps for configuration.
- Introduced a new environment variable, TERMINAL_SANDBOX_DIR, to allow users to customize the sandbox storage location for Docker and Singularity environments.
- Refactored the Docker and Singularity environment classes to utilize the new sandbox directory for persistent workspaces, enhancing organization and usability.
- Improved handling of working directories across various environments, ensuring compatibility and clarity in execution paths.
- Introduced a shared interrupt signaling mechanism to allow tools to check for user interrupts during long-running operations.
- Updated the AIAgent to handle interrupts more effectively, ensuring in-progress tool calls are canceled and multiple interrupt messages are combined into one prompt.
- Enhanced the CLI configuration to include container resource limits (CPU, memory, disk) and persistence options for Docker, Singularity, and Modal environments.
- Improved documentation to clarify interrupt behaviors and container resource settings, providing users with better guidance on configuration and usage.
- Updated the authorization logic to include a per-platform allow-all flag for improved flexibility.
- Revised the order of checks to prioritize platform-specific allow-all settings, followed by environment variable allowlists and DM pairing approvals.
- Added global allow-all configuration for broader access control.
- Improved handling of allowlists by stripping whitespace and ensuring valid entries are processed.
- Updated the environment creation condition to specifically check for "singularity" instead of allowing "local", ensuring more precise handling of environment types during task execution.
- Added functionality to suppress logging noise from specific modules when in quiet mode, improving user experience in CLI.
- Updated terminal_tool.py to change the log level for fallback directory usage from warning to debug, providing clearer context without cluttering logs.
- Added methods for handling sudo password and dangerous command approval prompts using a callback mechanism in cli.py.
- Integrated these prompts with the prompt_toolkit UI for improved user experience.
- Updated terminal_tool.py to support callback registration for interactive prompts, enhancing the CLI's interactivity.
- Introduced a background thread for API calls in run_agent.py to allow for interrupt handling during long-running operations.
- Enhanced error handling for interrupted API calls, ensuring graceful degradation of user experience.
- Introduced logging functionality in cli.py, run_agent.py, scheduler.py, and various tool modules to replace print statements with structured logging.
- Enhanced error handling and informational messages to improve debugging and monitoring capabilities.
- Ensured consistent logging practices across the codebase, facilitating better traceability and maintenance.
- Revised descriptions for various tools in model_tools.py, browser_tool.py, code_execution_tool.py, delegate_tool.py, and terminal_tool.py to enhance clarity and reduce verbosity.
- Improved consistency in terminology and formatting across tool descriptions, ensuring users have a clearer understanding of tool functionalities and usage.
- Replaced getpass with direct reading from /dev/tty to enhance password input handling without echoing.
- Updated threading logic for password input to ensure proper cleanup and error handling.
- Improved visual feedback during password prompt, including clearer separation and timeout messaging.
- Enhanced user experience by providing immediate feedback on password input status.
- Modified the `_exec` method in `ShellFileOperations` to accept `stdin_data`, allowing large content to be piped directly to commands, bypassing ARG_MAX limitations.
- Updated the `execute` method in various environment classes (`_LocalEnvironment`, `_SingularityEnvironment`, `_SSHEnvironment`, `_DockerEnvironment`) to support `stdin_data`, improving command execution flexibility.
- Removed the unique marker generation for heredoc in favor of direct stdin piping, simplifying file writing operations and enhancing performance for large files.
New process registry and tool for managing long-running background processes
across all terminal backends (local, Docker, Singularity, Modal, SSH).
Process Registry (tools/process_registry.py):
- ProcessSession tracking with rolling 200KB output buffer
- spawn_local() with optional PTY via ptyprocess for interactive CLIs
- spawn_via_env() for non-local backends (runs inside sandbox, never on host)
- Background reader threads per process (Popen stdout or PTY)
- wait() with timeout clamping, interrupt support, and transparent limit reporting
- JSON checkpoint to ~/.hermes/processes.json for gateway crash recovery
- Module-level singleton shared across agent loop, gateway, and RL
Process Tool (model_tools.py):
- 7 actions: list, poll, log, wait, kill, write, submit
- Paired with terminal in all toolsets (CLI, messaging, RL)
- Timeout clamping with transparent notes in response
Terminal Tool Updates (tools/terminal_tool.py):
- Replaced nohup background mode with registry spawn (returns session_id)
- Added workdir parameter for per-command working directory
- Added check_interval parameter for gateway auto-check watchers
- Added pty parameter for interactive CLI tools (Codex, Claude Code)
- Updated TERMINAL_TOOL_DESCRIPTION with full background workflow docs
- Cleanup thread now respects active background processes (won't reap sandbox)
Gateway Integration (gateway/run.py, session.py, config.py):
- Session reset protection: sessions with active processes exempt from reset
- Default idle timeout increased from 2 hours to 24 hours
- from_dict fallback aligned to match (was 120, now 1440)
- session_key env var propagated to process registry for session mapping
- Crash recovery on gateway startup via checkpoint probe
- check_interval watcher: asyncio task polls process, delivers updates to platform
RL Safety (environments/):
- tool_context.py cleanup() kills background processes on episode end
- hermes_base_env.py warns when enabled_toolsets is None (loads all tools)
- Process tool safe in RL via wait() blocking the agent loop
Also:
- Added ptyprocess as optional dependency (in pyproject.toml [pty] extra + [all])
- Fixed pre-existing bug: rl_test_inference missing from TOOL_TO_TOOLSET_MAP
- Updated AGENTS.md with process management docs and project structure
- Updated README.md terminal section with process management overview
When using Modal, Docker, SSH, or Singularity as the terminal backend
from the CLI, the agent resolved cwd: "." to the host machine's local
path (e.g. /Users/rewbs/code/hermes-agent) and passed it to the remote
sandbox, where it doesn't exist. All commands failed with "No such file
or directory".
Root cause: cli.py unconditionally resolved "." to os.getcwd() and wrote
it to TERMINAL_CWD regardless of backend type. Every tool then used that
host-local path as the working directory inside the remote environment.
Fixes:
- cli.py: only resolve "." to os.getcwd() for the local backend. For all
remote backends (ssh, docker, modal, singularity), leave TERMINAL_CWD
unset so the tool layer uses per-backend defaults (/root, /, ~, etc.)
- terminal_tool.py: added sanity check -- if TERMINAL_CWD contains a
host-local prefix (/Users/, /home/, C:\) for a non-local backend, log
a warning and fall back to the backend's default
- terminal_tool.py: SSH default CWD is now ~ instead of os.getcwd()
- file_operations.py: last-resort CWD fallback changed from os.getcwd()
to "/" so host paths never leak into remote file operations
- Improved the caching mechanism for ShellFileOperations to ensure stale entries are invalidated when environments are cleaned up.
- Enhanced thread safety by refining the use of locks during environment creation and cleanup processes.
- Streamlined the cleanup of inactive environments to prevent blocking other tool calls, ensuring efficient resource management.
- Added error handling and messaging improvements for better user feedback during environment cleanup.
- Introduced a new cleanup function that ensures terminal and browser sessions are cleaned up only once during application exit.
- Updated atexit registration to use the new cleanup function, enhancing resource management and preventing potential issues from multiple cleanup calls.
- Modified terminal cleanup messaging to only display when environments are cleaned, improving user feedback.
- Introduced a new script, `kill_modal.sh`, to facilitate stopping running Modal apps, including the ability to stop all apps or specific swe-rex sandboxes.
- Enhanced user experience with clear usage instructions and feedback during the stopping process.
- Improved error handling to ensure smooth execution even if some apps fail to stop.
- Added functionality to signal and terminate long-running terminal commands when a new user message is received, allowing for immediate agent response.
- Introduced a global interrupt event in the terminal tool to facilitate early termination of subprocesses.
- Updated the AIAgent class to handle interrupts gracefully, ensuring that remaining tool calls are skipped and appropriate messages are returned to maintain valid message sequences.
- Added a new `_atexit_cleanup` function to handle cleanup of active environments and stop the cleanup thread upon program exit.
- Enhanced logging to inform users about the number of remaining sandboxes being shut down during cleanup.
- Introduced new environments: Terminal Test Environment and SWE Environment, each with default configurations for testing and software engineering tasks.
- Added TerminalBench 2.0 evaluation environment with comprehensive setup for agentic LLMs, including task execution and verification.
- Enhanced ToolContext with methods for uploading and downloading files, ensuring binary-safe operations.
- Updated documentation across environments to reflect new features and usage instructions.
- Refactored existing environment configurations for consistency and clarity.
- Updated the _SingularityEnvironment class to utilize a persistent Apptainer instance, allowing state (files, installs, environment changes) to persist across commands.
- Enhanced the initialization process to start a background instance with full isolation and writable filesystem.
- Modified the execute method to connect to the running instance, ensuring commands run within the same container context.
- Implemented cleanup functionality to stop the persistent instance on cleanup or destruction, improving resource management.
- Updated class documentation to reflect new features and usage of the persistent environment.
- Added checks for local installation of the agent-browser CLI in the `_find_agent_browser` function, improving installation guidance.
- Implemented per-task socket directory management in `_run_browser_command` to prevent concurrency issues.
- Updated `cleanup_browser` to remove per-task socket directories, ensuring proper resource cleanup after task completion.
- Refactored comments for clarity and improved documentation throughout the browser tool code.