# Deep Analysis: Hermes Tool System ## Executive Summary This report provides a comprehensive analysis of the Hermes agent tool infrastructure, covering: - Tool registration and dispatch (registry.py) - 30+ tool implementations across multiple categories - 6 environment backends (local, Docker, Modal, SSH, Singularity, Daytona) - Security boundaries and dangerous command detection - Toolset definitions and composition system --- ## 1. Tool Execution Flow Diagram ``` ┌─────────────────────────────────────────────────────────────────────────────────┐ │ TOOL EXECUTION FLOW │ └─────────────────────────────────────────────────────────────────────────────────┘ ┌─────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ User/LLM │───▶│ Model Tools │───▶│ Tool Registry │ │ Request │ │ (model_tools.py)│ │ (registry.py) │ └─────────────┘ └──────────────────┘ └──────────────────┘ │ ┌─────────────────────────────────────┼─────────────────────────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────────┐ ┌────────────────────┐ ┌─────────────────────┐ │ File Tools │ │ Terminal Tool │ │ Web Tools │ │ ─────────────── │ │ ────────────────── │ │ ─────────────────── │ │ • read_file │ │ • Local execution │ │ • web_search │ │ • write_file │ │ • Docker sandbox │ │ • web_extract │ │ • patch │ │ • Modal cloud │ │ • web_crawl │ │ • search_files │ │ • SSH remote │ │ │ └────────┬────────┘ │ • Singularity │ └─────────────────────┘ │ │ • Daytona │ │ │ └─────────┬──────────┘ │ │ │ │ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────────┐ │ ENVIRONMENT BACKENDS │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Local │ │ Docker │ │ Modal │ │ SSH │ │Singularity│ │ Daytona │ │ │ │──────────│ │──────────│ │──────────│ │──────────│ │───────────│ │──────────│ │ │ │subprocess│ │container │ │Sandbox │ │ControlMaster│ │overlay │ │workspace │ │ │ │ -l │ │exec │ │.exec() │ │connection │ │SIF │ │.exec() │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └───────────┘ └──────────┘ │ └─────────────────────────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────┐ │ SECURITY CHECKPOINT │ │ ┌─────────────────────┐ │ │ │ 1. Tirith Scanner │ │ │ │ (command content)│ │ │ ├─────────────────────┤ │ │ │ 2. Pattern Matching │ │ │ │ (DANGEROUS_PATTERNS)│ │ │ ├─────────────────────┤ │ │ │ 3. Smart Approval │ │ │ │ (aux LLM) │ │ │ └─────────────────────┘ │ └─────────────────────────────┘ │ ┌─────────────────────────────────┼─────────────────────────────────┐ │ │ │ ▼ ▼ ▼ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ APPROVED │ │ BLOCKED │ │ USER PROMPT │ │ (execute) │ │ (deny + reason) │ │ (once/session/always/deny) └──────────────────┘ └──────────────────┘ └──────────────────┘ ┌──────────────────────────────────────────────────────────────────────────────────────────────┐ │ ADDITIONAL TOOL CATEGORIES │ ├──────────────────────────────────────────────────────────────────────────────────────────────┤ │ Browser Tools │ Vision Tools │ MoA Tools │ Skills Tools │ Code Exec │ Delegate │ TTS │ │ ───────────── │ ──────────── │ ───────── │ ──────────── │ ───────── │ ──────── │ ──────────│ │ • navigate │ • analyze │ • reason │ • list │ • sandbox │ • spawn │ • speech │ │ • click │ • extract │ • debate │ • view │ • RPC │ • batch │ • voices │ │ • snapshot │ │ │ • manage │ • 7 tools │ • depth │ │ │ • scroll │ │ │ │ limit │ limit │ │ └──────────────────────────────────────────────────────────────────────────────────────────────┘ ``` --- ## 2. Security Boundary Analysis ### 2.1 Multi-Layer Security Architecture | Layer | Component | Purpose | |-------|-----------|---------| | **Layer 1** | Container Isolation | Docker/Modal/Singularity sandboxes isolate from host | | **Layer 2** | Dangerous Pattern Detection | Regex-based command filtering (approval.py) | | **Layer 3** | Tirith Security Scanner | Content-level threat detection (pipe-to-shell, homograph URLs) | | **Layer 4** | Smart Approval (Aux LLM) | LLM-based risk assessment for edge cases | | **Layer 5** | File System Guards | Sensitive path blocking (/etc, ~/.ssh, ~/.hermes/.env) | | **Layer 6** | Process Limits | Timeouts, memory limits, PID limits, capability dropping | ### 2.2 Environment Security Comparison | Backend | Isolation Level | Persistent | Root Access | Network | Use Case | |---------|-----------------|------------|-------------|---------|----------| | **Local** | None (host) | Optional | User's own | Full | Development, trusted code | | **Docker** | Container + caps | Optional | Container root | Isolated | General sandboxing | | **Modal** | Cloud VM | Snapshots | Root | Isolated | Cloud compute, scalability | | **SSH** | Remote machine | Yes | Remote user | Networked | Production servers | | **Singularity** | Container + overlay | Optional | User-mapped | Configurable | HPC environments | | **Daytona** | Cloud workspace | Yes | Root | Isolated | Managed dev environments | ### 2.3 Security Hardening Details **Docker Environment (tools/environments/docker.py:107-117):** ```python _SECURITY_ARGS = [ "--cap-drop", "ALL", # Drop all capabilities "--cap-add", "DAC_OVERRIDE", # Allow root to write host-owned dirs "--cap-add", "CHOWN", "--cap-add", "FOWNER", "--security-opt", "no-new-privileges", "--pids-limit", "256", "--tmpfs", "/tmp:rw,nosuid,size=512m", ] ``` **Local Environment Secret Isolation (tools/environments/local.py:28-131):** - Dynamic blocklist derived from provider registry - Blocks 60+ API key environment variables - Prevents credential leakage to subprocesses - Support for `_HERMES_FORCE_` prefix overrides --- ## 3. All Dangerous Command Detection Patterns ### 3.1 Pattern Categories (from tools/approval.py:40-78) ```python DANGEROUS_PATTERNS = [ # File System Destruction (r'\brm\s+(-[^\s]*\s+)*/', "delete in root path"), (r'\brm\s+-[^\s]*r', "recursive delete"), # Permission Escalation (r'\bchmod\s+(-[^\s]*\s+)*(777|666|o\+[rwx]*w|a\+[rwx]*w)\b', "world/other-writable permissions"), (r'\bchown\s+(-[^\s]*)?R\s+root', "recursive chown to root"), # Disk/Filesystem Operations (r'\bmkfs\b', "format filesystem"), (r'\bdd\s+.*if=', "disk copy"), (r'>\s*/dev/sd', "write to block device"), # Database Destruction (r'\bDROP\s+(TABLE|DATABASE)\b', "SQL DROP"), (r'\bDELETE\s+FROM\b(?!.*\bWHERE\b)', "SQL DELETE without WHERE"), (r'\bTRUNCATE\s+(TABLE)?\s*\w', "SQL TRUNCATE"), # System Configuration (r'>\s*/etc/', "overwrite system config"), (r'\bsystemctl\s+(stop|disable|mask)\b', "stop/disable system service"), # Process Termination (r'\bkill\s+-9\s+-1\b', "kill all processes"), (r'\bpkill\s+-9\b', "force kill processes"), (r'\b(pkill|killall)\b.*\b(hermes|gateway|cli\.py)\b', "kill hermes/gateway"), # Code Injection (r':\(\)\s*\{\s*:\s*\|\s*:\s*&\s*\}\s*;\s*:', "fork bomb"), (r'\b(bash|sh|zsh|ksh)\s+-[^\s]*c(\s+|$)', "shell command via -c flag"), (r'\b(curl|wget)\b.*\|\s*(ba)?sh\b', "pipe remote content to shell"), (r'\b(bash|sh|zsh|ksh)\s+<\s*>?\s*["\']?{_SENSITIVE_WRITE_TARGET}', "overwrite system file via redirection"), # File Operations (r'\bxargs\s+.*\brm\b', "xargs with rm"), (r'\bfind\b.*-exec\s+(/\S*/)?rm\b', "find -exec rm"), (r'\bfind\b.*-delete\b', "find -delete"), (r'\b(cp|mv|install)\b.*\s/etc/', "copy/move file into /etc/"), (r'\bsed\s+-[^\s]*i.*\s/etc/', "in-place edit of system config"), # Gateway Protection (r'gateway\s+run\b.*(&\s*$|&\s*;|\bdisown\b|\bsetsid\b)', "start gateway outside systemd"), (r'\bnohup\b.*gateway\s+run\b', "start gateway outside systemd"), ] ``` ### 3.2 Sensitive Path Patterns ```python # SSH keys _SSH_SENSITIVE_PATH = r'(?:~|\$home|\$\{home\})/\.ssh(?:/|$)' # Hermes environment _HERMES_ENV_PATH = ( r'(?:~\/\.hermes/|' r'(?:\$home|\$\{home\})/\.hermes/|' r'(?:\$hermes_home|\$\{hermes_home\})/)' r'\.env\b' ) # System paths _SENSITIVE_WRITE_TARGET = ( r'(?:/etc/|/dev/sd|' rf'{_SSH_SENSITIVE_PATH}|' rf'{_HERMES_ENV_PATH})' ) ``` ### 3.3 Approval Flow States ``` Command Input │ ▼ ┌─────────────────────┐ │ Pattern Detection │────┐ │ (approval.py) │ │ └─────────────────────┘ │ │ │ ▼ │ ┌─────────────────────┐ │ │ Tirith Scanner │────┤ │ (tirith_security.py)│ │ └─────────────────────┘ │ │ │ ▼ │ ┌─────────────────────┐ │ │ Mode = smart? │────┼──▶ Smart Approval (aux LLM) │ │ │ └─────────────────────┘ │ │ │ ▼ │ ┌─────────────────────┐ │ │ Gateway/CLI? │────┼──▶ Async Approval Prompt │ │ │ └─────────────────────┘ │ │ │ ▼ │ ┌─────────────────────┐ │ │ Interactive Prompt │◀───┘ │ (once/session/ │ │ always/deny) │ └─────────────────────┘ ``` --- ## 4. Tool Improvement Recommendations ### 4.1 Critical Improvements | # | Recommendation | Impact | Effort | |---|----------------|--------|--------| | 1 | **Implement tool call result caching** | High | Medium | | | Cache file reads, search results with TTL to prevent redundant I/O | | | | 2 | **Add tool execution metrics/observability** | High | Low | | | Track duration, success rates, token usage per tool for optimization | | | | 3 | **Implement tool retry with exponential backoff** | Medium | Low | | | Terminal tool has basic retry (terminal_tool.py:1105-1130) but could be generalized | | | | 4 | **Add tool call rate limiting per session** | Medium | Medium | | | Prevent runaway loops (e.g., 1000+ search calls in one session) | | | | 5 | **Create tool health check system** | Medium | Medium | | | Periodic validation that tools are functioning (API keys valid, services up) | | | ### 4.2 Security Enhancements | # | Recommendation | Impact | Effort | |---|----------------|--------|--------| | 6 | **Implement command intent classification** | High | Medium | | | Use lightweight model to classify commands before execution for better risk assessment | | | | 7 | **Add network egress filtering for sandbox tools** | High | Medium | | | Whitelist domains for web_extract, block known malicious IPs | | | | 8 | **Implement tool call provenance logging** | Medium | Low | | | Immutable log of what tools were called with what args for audit | | | ### 4.3 Usability Improvements | # | Recommendation | Impact | Effort | |---|----------------|--------|--------| | 9 | **Add tool suggestion system** | Medium | Medium | | | When LLM uses suboptimal pattern (cat vs read_file), suggest better alternative | | | | 10 | **Implement progressive tool disclosure** | Medium | High | | | Start with minimal toolset, expand based on task complexity indicators | | | --- ## 5. Missing Tool Coverage Gaps ### 5.1 High-Priority Gaps | Gap | Use Case | Current Workaround | |-----|----------|-------------------| | **Database query tool** | SQL database exploration | terminal with sqlite3/psql | | **API testing tool** | REST API debugging (curl alternative) | terminal with curl | | **Git operations tool** | Structured git commands (status, diff, log) | terminal with git | | **Package manager tool** | Structured pip/npm/apt operations | terminal with package managers | | **Archive/zip tool** | Create/extract archives | terminal with tar/unzip | ### 5.2 Medium-Priority Gaps | Gap | Use Case | Current Workaround | |-----|----------|-------------------| | **Diff tool** | Structured file comparison | search_files + manual compare | | **JSON/YAML manipulation** | Structured config editing | read_file + write_file | | **Image manipulation** | Resize, crop, convert images | terminal with ImageMagick | | **PDF operations** | Extract text, merge, split | terminal with pdftotext | | **Data visualization** | Generate charts from data | code_execution with matplotlib | ### 5.3 Advanced Gaps | Gap | Description | |-----|-------------| | **Vector database tool** | Semantic search over embeddings | | **Test runner tool** | Structured test execution with parsing | | **Linter/formatter tool** | Code quality checks with structured output | | **Dependency analysis tool** | Visualize and analyze code dependencies | | **Documentation generator tool** | Auto-generate docs from code | --- ## 6. Tool Registry Architecture ### 6.1 Registration Flow ```python # From tools/registry.py class ToolRegistry: def register(self, name: str, toolset: str, schema: dict, handler: Callable, check_fn: Callable = None, ...) def dispatch(self, name: str, args: dict, **kwargs) -> str def get_definitions(self, tool_names: Set[str], quiet: bool = False) -> List[dict] ``` ### 6.2 Tool Entry Structure ```python class ToolEntry: __slots__ = ( "name", # Tool identifier "toolset", # Category (file, terminal, web, etc.) "schema", # OpenAI-format JSON schema "handler", # Callable implementation "check_fn", # Availability check (returns bool) "requires_env",# Required env var names "is_async", # Whether handler is async "description", # Human-readable description "emoji", # Visual identifier ) ``` ### 6.3 Registration Example (file_tools.py:560-563) ```python registry.register( name="read_file", toolset="file", schema=READ_FILE_SCHEMA, handler=_handle_read_file, check_fn=_check_file_reqs, emoji="📖" ) ``` --- ## 7. Toolset Composition System ### 7.1 Toolset Definition (toolsets.py:72-377) ```python TOOLSETS = { "file": { "description": "File manipulation tools", "tools": ["read_file", "write_file", "patch", "search_files"], "includes": [] }, "debugging": { "description": "Debugging and troubleshooting toolkit", "tools": ["terminal", "process"], "includes": ["web", "file"] # Composes other toolsets }, } ``` ### 7.2 Resolution Algorithm ```python def resolve_toolset(name: str, visited: Set[str] = None) -> List[str]: # 1. Cycle detection # 2. Get toolset definition # 3. Collect direct tools # 4. Recursively resolve includes (diamond deps handled) # 5. Return deduplicated list ``` ### 7.3 Platform-Specific Toolsets | Toolset | Purpose | Key Difference | |---------|---------|----------------| | `hermes-cli` | Full CLI access | All tools available | | `hermes-acp` | Editor integration | No messaging, audio, or clarify UI | | `hermes-api-server` | HTTP API | No interactive UI tools | | `hermes-telegram` | Telegram bot | Full access with safety checks | | `hermes-gateway` | Union of all messaging | Includes all platform tools | --- ## 8. Environment Backend Deep Dive ### 8.1 Base Class Interface (tools/environments/base.py) ```python class BaseEnvironment(ABC): def execute(self, command: str, cwd: str = "", *, timeout: int | None = None, stdin_data: str | None = None) -> dict: """Return {"output": str, "returncode": int}""" def cleanup(self): """Release backend resources""" ``` ### 8.2 Environment Feature Matrix | Feature | Local | Docker | Modal | SSH | Singularity | Daytona | |---------|-------|--------|-------|-----|-------------|---------| | PTY support | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | | Persistent shell | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | | Filesystem persistence | Optional | Optional | Snapshots | N/A (remote) | Optional | Yes | | Interrupt handling | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Sudo support | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | | Resource limits | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ | | GPU support | ❌ | ✅ | ✅ | Remote | ✅ | ✅ | --- ## 9. Process Registry System ### 9.1 Background Process Management (tools/process_registry.py) ```python class ProcessRegistry: def spawn_local(self, command, cwd, task_id, ...) -> ProcessSession def spawn_via_env(self, env, command, ...) -> ProcessSession def poll(self, session_id: str) -> dict def wait(self, session_id: str, timeout: int = None) -> dict def kill(self, session_id: str) ``` ### 9.2 Process Session States ``` CREATED ──▶ RUNNING ──▶ FINISHED │ │ ▼ ▼ INTERRUPTED TIMEOUT (exit_code=130) (exit_code=124) ``` --- ## 10. Code Analysis Summary ### 10.1 Lines of Code by Component | Component | Files | Approx. LOC | |-----------|-------|-------------| | Tool Implementations | 30+ | ~15,000 | | Environment Backends | 6 | ~3,500 | | Registry & Core | 2 | ~800 | | Security (approval, tirith) | 2 | ~1,200 | | Process Management | 1 | ~900 | | **Total** | **40+** | **~21,400** | ### 10.2 Test Coverage - 150+ test files in `tests/tools/` - Unit tests for each tool - Integration tests for environments - Security-focused tests for approval system --- ## Appendix A: File Organization ``` tools/ ├── registry.py # Tool registration & dispatch ├── __init__.py # Package exports │ ├── file_tools.py # read_file, write_file, patch, search_files ├── file_operations.py # ShellFileOperations backend │ ├── terminal_tool.py # Main terminal execution (1,358 lines) ├── process_registry.py # Background process management │ ├── web_tools.py # web_search, web_extract, web_crawl (1,843 lines) ├── browser_tool.py # Browser automation (1,955 lines) ├── browser_providers/ # Browserbase, BrowserUse providers │ ├── approval.py # Dangerous command detection (670 lines) ├── tirith_security.py # External security scanner (670 lines) │ ├── environments/ # Execution backends │ ├── base.py # BaseEnvironment ABC │ ├── local.py # Local subprocess (486 lines) │ ├── docker.py # Docker containers (535 lines) │ ├── modal.py # Modal cloud (372 lines) │ ├── ssh.py # SSH remote (307 lines) │ ├── singularity.py # Singularity/Apptainer │ ├── daytona.py # Daytona workspaces │ └── persistent_shell.py # Shared persistent shell mixin │ ├── code_execution_tool.py # Programmatic tool calling (806 lines) ├── delegate_tool.py # Subagent spawning (794 lines) │ ├── skills_tool.py # Skill management (1,344 lines) ├── skill_manager_tool.py # Skill CRUD operations │ └── [20+ additional tools...] toolsets.py # Toolset definitions (641 lines) ``` --- *Report generated from comprehensive analysis of the Hermes agent tool system.*