- Introduced new environments: Terminal Test Environment and SWE Environment, each with default configurations for testing and software engineering tasks.
- Added TerminalBench 2.0 evaluation environment with comprehensive setup for agentic LLMs, including task execution and verification.
- Enhanced ToolContext with methods for uploading and downloading files, ensuring binary-safe operations.
- Updated documentation across environments to reflect new features and usage instructions.
- Refactored existing environment configurations for consistency and clarity.
- Updated the _SingularityEnvironment class to utilize a persistent Apptainer instance, allowing state (files, installs, environment changes) to persist across commands.
- Enhanced the initialization process to start a background instance with full isolation and writable filesystem.
- Modified the execute method to connect to the running instance, ensuring commands run within the same container context.
- Implemented cleanup functionality to stop the persistent instance on cleanup or destruction, improving resource management.
- Updated class documentation to reflect new features and usage of the persistent environment.
- Added checks for local installation of the agent-browser CLI in the `_find_agent_browser` function, improving installation guidance.
- Implemented per-task socket directory management in `_run_browser_command` to prevent concurrency issues.
- Updated `cleanup_browser` to remove per-task socket directories, ensuring proper resource cleanup after task completion.
- Refactored comments for clarity and improved documentation throughout the browser tool code.
- Changed the default LLM model in the setup wizard and example environment file to 'anthropic/claude-opus-4.6'.
- Updated terminal working directory settings in CLI and related files to use the current directory ('.') instead of '/tmp'.
- Enhanced documentation comments for clarity on terminal configuration and working directory behavior.
- Updated `.gitignore` to exclude `testlogs` directory.
- Refactored `handle_web_function_call` in `model_tools.py` to support running async functions in existing event loops, improving compatibility with Atropos.
- Introduced a thread pool executor in `agent_loop.py` for running synchronous tool calls that internally use `asyncio.run()`, preventing deadlocks.
- Added `ToolError` class to track tool execution errors, enhancing error reporting during agent loops.
- Updated `wandb_log` method in `hermes_base_env.py` to log tool error statistics for better monitoring.
- Implemented patches in `patches.py` to ensure async-safe operation of tools within Atropos's event loop.
- Enhanced `ToolContext` and `terminal_tool.py` to utilize the new async handling, improving overall tool execution reliability.
- Updated the terminal tool's command approval flow to improve user interaction when executing potentially dangerous commands, replacing the previous confirmation method with a clear explanation and instructions for adding commands to the allowlist.
- Removed the internal `force` parameter from the model API, ensuring that dangerous command approvals are handled solely through user prompts.
- Enhanced the CLI to provide better feedback regarding tool availability, including improved messaging for enabled and disabled toolsets.
- Updated AGENTS.md to reflect changes in the command approval process and configuration instructions.
- Added a safety mechanism to detect and approve potentially dangerous commands (e.g., `rm -rf`, `DROP TABLE`).
- Introduced an approval flow for local/SSH backends, prompting users for confirmation with options to allow once, for the session, or permanently.
- Updated configuration to include a `command_allowlist` for storing approved patterns.
- Enhanced messaging for sudo failures in messaging contexts.
- Updated relevant documentation in AGENTS.md and TODO.md to reflect these changes.
- Changed default Docker, Singularity, and Modal images in configuration files to use "nikolaik/python-nodejs:python3.11-nodejs20" for improved compatibility.
- Updated the default model in the configuration to "anthropic/claude-sonnet-4.5" and adjusted related setup prompts for API provider configuration.
- Introduced a new CLI option for selecting a custom OpenAI-compatible endpoint, enhancing flexibility in model provider setup.
- Enhanced the prompt choice functionality to support arrow key navigation for better user experience in CLI interactions.
- Updated documentation in relevant files to reflect these changes and improve user guidance.
- Implemented automatic session logging, saving conversation trajectories to the `logs/` directory in JSON format, with each session having a unique identifier.
- Updated the CLI to display the session ID in the welcome banner for easy reference.
- Introduced an interactive sudo password prompt in CLI mode, allowing users to enter their password with a 45-second timeout, enhancing user experience during command execution.
- Documented session logging and interactive sudo features in `README.md`, `cli.md`, and `cli-config.yaml.example` for better user guidance.
- Added support for sudo commands in local, Docker, Singularity, and SSH environments by introducing the `SUDO_PASSWORD` environment variable.
- Updated terminal tool configurations in `.env.example` and `cli-config.yaml.example` to document the new sudo functionality.
- Enhanced the command execution process to handle sudo commands gracefully, preventing hangs on interactive prompts and providing clear error messages when no password is configured.
- Updated `README.md` to include instructions for using sudo support and SSH backend configuration.
- Revised `TODO.md` to reflect the completion of the sudo feature and outline future enhancements.
- Introduced `cli-config.yaml.example` to provide a template for configuring the CLI behavior, including model settings, terminal tool configurations, agent behavior, and toolsets.
- Created `cli.py` for an interactive terminal interface, allowing users to start the Hermes Agent with various options and toolsets.
- Added `hermes` launcher script for convenient CLI access.
- Updated `model_tools.py` to support quiet mode for suppressing output during tool initialization and execution.
- Enhanced logging in various tools to respect quiet mode, improving user experience by reducing unnecessary output.
- Added `prompt_toolkit` to `requirements.txt` for improved CLI interaction capabilities.
- Created `TODO.md` for future improvements and enhancements to the Hermes Agent framework.
- Modified `.env.example` to set the default terminal environment to 'singularity' and updated Docker and Singularity image references for better compatibility.
- Enhanced `run_mixed_tasks.sh` and `run_terminal_tasks.sh` scripts to utilize the new Singularity setup, including improved logging and cache directory management.
- Introduced functionality in `terminal_tool.py` to automatically build and cache SIF images from Docker URLs, streamlining the execution environment setup.
- Updated logging messages for clarity on image usage and cache directory paths.
- Introduced new browser automation tools in `browser_tool.py` for navigating, interacting with, and extracting content from web pages using the agent-browser CLI and Browserbase cloud execution.
- Updated `.env.example` to include new configuration options for Browserbase API keys and session settings.
- Enhanced `model_tools.py` and `toolsets.py` to integrate browser tools into the existing tool framework, ensuring consistent access across toolsets.
- Updated `README.md` with setup instructions for browser tools and their usage examples.
- Added new test script `test_modal_terminal.py` to validate Modal terminal backend functionality.
- Improved `run_agent.py` to support browser tool integration and logging enhancements for better tracking of API responses.
- Updated `.env.example` to include new API keys and configuration options for the mini-swe-agent backend, including support for local, Docker, and Modal environments.
- Added `.gitmodules` to include mini-swe-agent as a submodule for easier integration.
- Refactored `mini_swe_runner.py` to use the updated model format and default to OpenRouter for API calls.
- Enhanced `model_tools.py` to support the new terminal tool definitions and ensure compatibility with the mini-swe-agent backend.
- Updated `README.md` to reflect changes in setup instructions and environment variable configurations.
- Improved `terminal_tool.py` to manage execution environments and lifecycle, ensuring proper cleanup and error handling.
- Introduced `terminal_hecate.py` for executing commands on MorphCloud VMs, providing an alternative backend for terminal operations.
- Added support for tracking partial results and tool error counts in batch processing.
- Implemented filtering of corrupted entries during batch file combination based on valid tool names.
- Updated terminal tool to improve command execution and error handling, including retry logic for transient failures.
- Refactored model tools to use a simple terminal tool with no session persistence.
- Improved logging and error messages for invalid API responses and tool calls.
- Introduced chunked processing for large content in web tools to manage size limitations effectively.