New process registry and tool for managing long-running background processes
across all terminal backends (local, Docker, Singularity, Modal, SSH).
Process Registry (tools/process_registry.py):
- ProcessSession tracking with rolling 200KB output buffer
- spawn_local() with optional PTY via ptyprocess for interactive CLIs
- spawn_via_env() for non-local backends (runs inside sandbox, never on host)
- Background reader threads per process (Popen stdout or PTY)
- wait() with timeout clamping, interrupt support, and transparent limit reporting
- JSON checkpoint to ~/.hermes/processes.json for gateway crash recovery
- Module-level singleton shared across agent loop, gateway, and RL
Process Tool (model_tools.py):
- 7 actions: list, poll, log, wait, kill, write, submit
- Paired with terminal in all toolsets (CLI, messaging, RL)
- Timeout clamping with transparent notes in response
Terminal Tool Updates (tools/terminal_tool.py):
- Replaced nohup background mode with registry spawn (returns session_id)
- Added workdir parameter for per-command working directory
- Added check_interval parameter for gateway auto-check watchers
- Added pty parameter for interactive CLI tools (Codex, Claude Code)
- Updated TERMINAL_TOOL_DESCRIPTION with full background workflow docs
- Cleanup thread now respects active background processes (won't reap sandbox)
Gateway Integration (gateway/run.py, session.py, config.py):
- Session reset protection: sessions with active processes exempt from reset
- Default idle timeout increased from 2 hours to 24 hours
- from_dict fallback aligned to match (was 120, now 1440)
- session_key env var propagated to process registry for session mapping
- Crash recovery on gateway startup via checkpoint probe
- check_interval watcher: asyncio task polls process, delivers updates to platform
RL Safety (environments/):
- tool_context.py cleanup() kills background processes on episode end
- hermes_base_env.py warns when enabled_toolsets is None (loads all tools)
- Process tool safe in RL via wait() blocking the agent loop
Also:
- Added ptyprocess as optional dependency (in pyproject.toml [pty] extra + [all])
- Fixed pre-existing bug: rl_test_inference missing from TOOL_TO_TOOLSET_MAP
- Updated AGENTS.md with process management docs and project structure
- Updated README.md terminal section with process management overview
When using Modal, Docker, SSH, or Singularity as the terminal backend
from the CLI, the agent resolved cwd: "." to the host machine's local
path (e.g. /Users/rewbs/code/hermes-agent) and passed it to the remote
sandbox, where it doesn't exist. All commands failed with "No such file
or directory".
Root cause: cli.py unconditionally resolved "." to os.getcwd() and wrote
it to TERMINAL_CWD regardless of backend type. Every tool then used that
host-local path as the working directory inside the remote environment.
Fixes:
- cli.py: only resolve "." to os.getcwd() for the local backend. For all
remote backends (ssh, docker, modal, singularity), leave TERMINAL_CWD
unset so the tool layer uses per-backend defaults (/root, /, ~, etc.)
- terminal_tool.py: added sanity check -- if TERMINAL_CWD contains a
host-local prefix (/Users/, /home/, C:\) for a non-local backend, log
a warning and fall back to the backend's default
- terminal_tool.py: SSH default CWD is now ~ instead of os.getcwd()
- file_operations.py: last-resort CWD fallback changed from os.getcwd()
to "/" so host paths never leak into remote file operations
- Improved the caching mechanism for ShellFileOperations to ensure stale entries are invalidated when environments are cleaned up.
- Enhanced thread safety by refining the use of locks during environment creation and cleanup processes.
- Streamlined the cleanup of inactive environments to prevent blocking other tool calls, ensuring efficient resource management.
- Added error handling and messaging improvements for better user feedback during environment cleanup.
- Introduced a new cleanup function that ensures terminal and browser sessions are cleaned up only once during application exit.
- Updated atexit registration to use the new cleanup function, enhancing resource management and preventing potential issues from multiple cleanup calls.
- Modified terminal cleanup messaging to only display when environments are cleaned, improving user feedback.
- Introduced a new script, `kill_modal.sh`, to facilitate stopping running Modal apps, including the ability to stop all apps or specific swe-rex sandboxes.
- Enhanced user experience with clear usage instructions and feedback during the stopping process.
- Improved error handling to ensure smooth execution even if some apps fail to stop.
- Added functionality to signal and terminate long-running terminal commands when a new user message is received, allowing for immediate agent response.
- Introduced a global interrupt event in the terminal tool to facilitate early termination of subprocesses.
- Updated the AIAgent class to handle interrupts gracefully, ensuring that remaining tool calls are skipped and appropriate messages are returned to maintain valid message sequences.
- Added a new `_atexit_cleanup` function to handle cleanup of active environments and stop the cleanup thread upon program exit.
- Enhanced logging to inform users about the number of remaining sandboxes being shut down during cleanup.
- Introduced new environments: Terminal Test Environment and SWE Environment, each with default configurations for testing and software engineering tasks.
- Added TerminalBench 2.0 evaluation environment with comprehensive setup for agentic LLMs, including task execution and verification.
- Enhanced ToolContext with methods for uploading and downloading files, ensuring binary-safe operations.
- Updated documentation across environments to reflect new features and usage instructions.
- Refactored existing environment configurations for consistency and clarity.
- Updated the _SingularityEnvironment class to utilize a persistent Apptainer instance, allowing state (files, installs, environment changes) to persist across commands.
- Enhanced the initialization process to start a background instance with full isolation and writable filesystem.
- Modified the execute method to connect to the running instance, ensuring commands run within the same container context.
- Implemented cleanup functionality to stop the persistent instance on cleanup or destruction, improving resource management.
- Updated class documentation to reflect new features and usage of the persistent environment.
- Added checks for local installation of the agent-browser CLI in the `_find_agent_browser` function, improving installation guidance.
- Implemented per-task socket directory management in `_run_browser_command` to prevent concurrency issues.
- Updated `cleanup_browser` to remove per-task socket directories, ensuring proper resource cleanup after task completion.
- Refactored comments for clarity and improved documentation throughout the browser tool code.
- Changed the default LLM model in the setup wizard and example environment file to 'anthropic/claude-opus-4.6'.
- Updated terminal working directory settings in CLI and related files to use the current directory ('.') instead of '/tmp'.
- Enhanced documentation comments for clarity on terminal configuration and working directory behavior.
- Updated `.gitignore` to exclude `testlogs` directory.
- Refactored `handle_web_function_call` in `model_tools.py` to support running async functions in existing event loops, improving compatibility with Atropos.
- Introduced a thread pool executor in `agent_loop.py` for running synchronous tool calls that internally use `asyncio.run()`, preventing deadlocks.
- Added `ToolError` class to track tool execution errors, enhancing error reporting during agent loops.
- Updated `wandb_log` method in `hermes_base_env.py` to log tool error statistics for better monitoring.
- Implemented patches in `patches.py` to ensure async-safe operation of tools within Atropos's event loop.
- Enhanced `ToolContext` and `terminal_tool.py` to utilize the new async handling, improving overall tool execution reliability.
- Updated the terminal tool's command approval flow to improve user interaction when executing potentially dangerous commands, replacing the previous confirmation method with a clear explanation and instructions for adding commands to the allowlist.
- Removed the internal `force` parameter from the model API, ensuring that dangerous command approvals are handled solely through user prompts.
- Enhanced the CLI to provide better feedback regarding tool availability, including improved messaging for enabled and disabled toolsets.
- Updated AGENTS.md to reflect changes in the command approval process and configuration instructions.
- Added a safety mechanism to detect and approve potentially dangerous commands (e.g., `rm -rf`, `DROP TABLE`).
- Introduced an approval flow for local/SSH backends, prompting users for confirmation with options to allow once, for the session, or permanently.
- Updated configuration to include a `command_allowlist` for storing approved patterns.
- Enhanced messaging for sudo failures in messaging contexts.
- Updated relevant documentation in AGENTS.md and TODO.md to reflect these changes.
- Changed default Docker, Singularity, and Modal images in configuration files to use "nikolaik/python-nodejs:python3.11-nodejs20" for improved compatibility.
- Updated the default model in the configuration to "anthropic/claude-sonnet-4.5" and adjusted related setup prompts for API provider configuration.
- Introduced a new CLI option for selecting a custom OpenAI-compatible endpoint, enhancing flexibility in model provider setup.
- Enhanced the prompt choice functionality to support arrow key navigation for better user experience in CLI interactions.
- Updated documentation in relevant files to reflect these changes and improve user guidance.
- Implemented automatic session logging, saving conversation trajectories to the `logs/` directory in JSON format, with each session having a unique identifier.
- Updated the CLI to display the session ID in the welcome banner for easy reference.
- Introduced an interactive sudo password prompt in CLI mode, allowing users to enter their password with a 45-second timeout, enhancing user experience during command execution.
- Documented session logging and interactive sudo features in `README.md`, `cli.md`, and `cli-config.yaml.example` for better user guidance.
- Added support for sudo commands in local, Docker, Singularity, and SSH environments by introducing the `SUDO_PASSWORD` environment variable.
- Updated terminal tool configurations in `.env.example` and `cli-config.yaml.example` to document the new sudo functionality.
- Enhanced the command execution process to handle sudo commands gracefully, preventing hangs on interactive prompts and providing clear error messages when no password is configured.
- Updated `README.md` to include instructions for using sudo support and SSH backend configuration.
- Revised `TODO.md` to reflect the completion of the sudo feature and outline future enhancements.
- Introduced `cli-config.yaml.example` to provide a template for configuring the CLI behavior, including model settings, terminal tool configurations, agent behavior, and toolsets.
- Created `cli.py` for an interactive terminal interface, allowing users to start the Hermes Agent with various options and toolsets.
- Added `hermes` launcher script for convenient CLI access.
- Updated `model_tools.py` to support quiet mode for suppressing output during tool initialization and execution.
- Enhanced logging in various tools to respect quiet mode, improving user experience by reducing unnecessary output.
- Added `prompt_toolkit` to `requirements.txt` for improved CLI interaction capabilities.
- Created `TODO.md` for future improvements and enhancements to the Hermes Agent framework.
- Modified `.env.example` to set the default terminal environment to 'singularity' and updated Docker and Singularity image references for better compatibility.
- Enhanced `run_mixed_tasks.sh` and `run_terminal_tasks.sh` scripts to utilize the new Singularity setup, including improved logging and cache directory management.
- Introduced functionality in `terminal_tool.py` to automatically build and cache SIF images from Docker URLs, streamlining the execution environment setup.
- Updated logging messages for clarity on image usage and cache directory paths.
- Introduced new browser automation tools in `browser_tool.py` for navigating, interacting with, and extracting content from web pages using the agent-browser CLI and Browserbase cloud execution.
- Updated `.env.example` to include new configuration options for Browserbase API keys and session settings.
- Enhanced `model_tools.py` and `toolsets.py` to integrate browser tools into the existing tool framework, ensuring consistent access across toolsets.
- Updated `README.md` with setup instructions for browser tools and their usage examples.
- Added new test script `test_modal_terminal.py` to validate Modal terminal backend functionality.
- Improved `run_agent.py` to support browser tool integration and logging enhancements for better tracking of API responses.
- Updated `.env.example` to include new API keys and configuration options for the mini-swe-agent backend, including support for local, Docker, and Modal environments.
- Added `.gitmodules` to include mini-swe-agent as a submodule for easier integration.
- Refactored `mini_swe_runner.py` to use the updated model format and default to OpenRouter for API calls.
- Enhanced `model_tools.py` to support the new terminal tool definitions and ensure compatibility with the mini-swe-agent backend.
- Updated `README.md` to reflect changes in setup instructions and environment variable configurations.
- Improved `terminal_tool.py` to manage execution environments and lifecycle, ensuring proper cleanup and error handling.
- Introduced `terminal_hecate.py` for executing commands on MorphCloud VMs, providing an alternative backend for terminal operations.
- Added support for tracking partial results and tool error counts in batch processing.
- Implemented filtering of corrupted entries during batch file combination based on valid tool names.
- Updated terminal tool to improve command execution and error handling, including retry logic for transient failures.
- Refactored model tools to use a simple terminal tool with no session persistence.
- Improved logging and error messages for invalid API responses and tool calls.
- Introduced chunked processing for large content in web tools to manage size limitations effectively.