Files
hermes-agent/tools_analysis_report.md
Allegro 10271c6b44
Some checks failed
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Failing after 25s
Tests / test (pull_request) Failing after 24s
Docker Build and Publish / build-and-push (pull_request) Failing after 35s
security: fix command injection vulnerabilities (CVSS 9.8)
Replace shell=True with list-based subprocess execution to prevent
command injection via malicious user input.

Changes:
- tools/transcription_tools.py: Use shlex.split() + shell=False
- tools/environments/docker.py: List-based commands with container ID validation

Fixes CVE-level vulnerability where malicious file paths or container IDs
could inject arbitrary commands.

CVSS: 9.8 (Critical)
Refs: V-001 in SECURITY_AUDIT_REPORT.md
2026-03-30 23:15:11 +00:00

25 KiB

Deep Analysis: Hermes Tool System

Executive Summary

This report provides a comprehensive analysis of the Hermes agent tool infrastructure, covering:

  • Tool registration and dispatch (registry.py)
  • 30+ tool implementations across multiple categories
  • 6 environment backends (local, Docker, Modal, SSH, Singularity, Daytona)
  • Security boundaries and dangerous command detection
  • Toolset definitions and composition system

1. Tool Execution Flow Diagram

┌─────────────────────────────────────────────────────────────────────────────────┐
│                              TOOL EXECUTION FLOW                                 │
└─────────────────────────────────────────────────────────────────────────────────┘

┌─────────────┐    ┌──────────────────┐    ┌──────────────────┐
│   User/LLM  │───▶│  Model Tools     │───▶│ Tool Registry    │
│  Request    │    │  (model_tools.py)│    │ (registry.py)    │
└─────────────┘    └──────────────────┘    └──────────────────┘
                                                    │
              ┌─────────────────────────────────────┼─────────────────────────────────────┐
              │                                     │                                     │
              ▼                                     ▼                                     ▼
     ┌─────────────────┐              ┌────────────────────┐              ┌─────────────────────┐
     │ File Tools      │              │ Terminal Tool      │              │ Web Tools           │
     │ ─────────────── │              │ ────────────────── │              │ ─────────────────── │
     │ • read_file     │              │ • Local execution  │              │ • web_search        │
     │ • write_file    │              │ • Docker sandbox   │              │ • web_extract       │
     │ • patch         │              │ • Modal cloud      │              │ • web_crawl         │
     │ • search_files  │              │ • SSH remote       │              │                     │
     └────────┬────────┘              │ • Singularity      │              └─────────────────────┘
              │                       │ • Daytona          │                       │
              │                       └─────────┬──────────┘                       │
              │                                 │                                  │
              ▼                                 ▼                                  ▼
     ┌─────────────────────────────────────────────────────────────────────────────────────────┐
     │                              ENVIRONMENT BACKENDS                                       │
     │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
     │  │  Local   │  │  Docker  │  │  Modal   │  │   SSH    │  │Singularity│ │ Daytona  │   │
     │  │──────────│  │──────────│  │──────────│  │──────────│  │───────────│  │──────────│   │
     │  │subprocess│  │container │  │Sandbox   │  │ControlMaster│ │overlay   │  │workspace │   │
     │  │   -l     │  │exec      │  │.exec()   │  │connection │  │SIF       │  │.exec()   │   │
     │  └──────────┘  └──────────┘  └──────────┘  └──────────┘  └───────────┘  └──────────┘   │
     └─────────────────────────────────────────────────────────────────────────────────────────┘
                                              │
                                              ▼
                              ┌─────────────────────────────┐
                              │    SECURITY CHECKPOINT      │
                              │  ┌─────────────────────┐    │
                              │  │ 1. Tirith Scanner   │    │
                              │  │    (command content)│    │
                              │  ├─────────────────────┤    │
                              │  │ 2. Pattern Matching │    │
                              │  │    (DANGEROUS_PATTERNS)│   │
                              │  ├─────────────────────┤    │
                              │  │ 3. Smart Approval   │    │
                              │  │    (aux LLM)        │    │
                              │  └─────────────────────┘    │
                              └─────────────────────────────┘
                                              │
              ┌─────────────────────────────────┼─────────────────────────────────┐
              │                                 │                                 │
              ▼                                 ▼                                 ▼
     ┌──────────────────┐           ┌──────────────────┐               ┌──────────────────┐
     │  APPROVED        │           │  BLOCKED         │               │  USER PROMPT     │
     │  (execute)       │           │  (deny + reason) │               │  (once/session/always/deny)
     └──────────────────┘           └──────────────────┘               └──────────────────┘

┌──────────────────────────────────────────────────────────────────────────────────────────────┐
│                              ADDITIONAL TOOL CATEGORIES                                      │
├──────────────────────────────────────────────────────────────────────────────────────────────┤
│  Browser Tools │ Vision Tools │ MoA Tools │ Skills Tools │ Code Exec │ Delegate │ TTS       │
│  ───────────── │ ──────────── │ ───────── │ ──────────── │ ───────── │ ──────── │ ──────────│
│  • navigate    │ • analyze    │ • reason  │ • list       │ • sandbox │ • spawn  │ • speech  │
│  • click       │ • extract    │ • debate  │ • view       │ • RPC     │ • batch  │ • voices  │
│  • snapshot    │              │           │ • manage     │ • 7 tools │ • depth  │           │
│  • scroll      │              │           │              │   limit   │   limit  │           │
└──────────────────────────────────────────────────────────────────────────────────────────────┘

2. Security Boundary Analysis

2.1 Multi-Layer Security Architecture

Layer Component Purpose
Layer 1 Container Isolation Docker/Modal/Singularity sandboxes isolate from host
Layer 2 Dangerous Pattern Detection Regex-based command filtering (approval.py)
Layer 3 Tirith Security Scanner Content-level threat detection (pipe-to-shell, homograph URLs)
Layer 4 Smart Approval (Aux LLM) LLM-based risk assessment for edge cases
Layer 5 File System Guards Sensitive path blocking (/etc, ~/.ssh, ~/.hermes/.env)
Layer 6 Process Limits Timeouts, memory limits, PID limits, capability dropping

2.2 Environment Security Comparison

Backend Isolation Level Persistent Root Access Network Use Case
Local None (host) Optional User's own Full Development, trusted code
Docker Container + caps Optional Container root Isolated General sandboxing
Modal Cloud VM Snapshots Root Isolated Cloud compute, scalability
SSH Remote machine Yes Remote user Networked Production servers
Singularity Container + overlay Optional User-mapped Configurable HPC environments
Daytona Cloud workspace Yes Root Isolated Managed dev environments

2.3 Security Hardening Details

Docker Environment (tools/environments/docker.py:107-117):

_SECURITY_ARGS = [
    "--cap-drop", "ALL",          # Drop all capabilities
    "--cap-add", "DAC_OVERRIDE",   # Allow root to write host-owned dirs
    "--cap-add", "CHOWN",
    "--cap-add", "FOWNER",
    "--security-opt", "no-new-privileges",
    "--pids-limit", "256",
    "--tmpfs", "/tmp:rw,nosuid,size=512m",
]

Local Environment Secret Isolation (tools/environments/local.py:28-131):

  • Dynamic blocklist derived from provider registry
  • Blocks 60+ API key environment variables
  • Prevents credential leakage to subprocesses
  • Support for _HERMES_FORCE_ prefix overrides

3. All Dangerous Command Detection Patterns

3.1 Pattern Categories (from tools/approval.py:40-78)

DANGEROUS_PATTERNS = [
    # File System Destruction
    (r'\brm\s+(-[^\s]*\s+)*/', "delete in root path"),
    (r'\brm\s+-[^\s]*r', "recursive delete"),
    
    # Permission Escalation
    (r'\bchmod\s+(-[^\s]*\s+)*(777|666|o\+[rwx]*w|a\+[rwx]*w)\b', "world/other-writable permissions"),
    (r'\bchown\s+(-[^\s]*)?R\s+root', "recursive chown to root"),
    
    # Disk/Filesystem Operations
    (r'\bmkfs\b', "format filesystem"),
    (r'\bdd\s+.*if=', "disk copy"),
    (r'>\s*/dev/sd', "write to block device"),
    
    # Database Destruction
    (r'\bDROP\s+(TABLE|DATABASE)\b', "SQL DROP"),
    (r'\bDELETE\s+FROM\b(?!.*\bWHERE\b)', "SQL DELETE without WHERE"),
    (r'\bTRUNCATE\s+(TABLE)?\s*\w', "SQL TRUNCATE"),
    
    # System Configuration
    (r'>\s*/etc/', "overwrite system config"),
    (r'\bsystemctl\s+(stop|disable|mask)\b', "stop/disable system service"),
    
    # Process Termination
    (r'\bkill\s+-9\s+-1\b', "kill all processes"),
    (r'\bpkill\s+-9\b', "force kill processes"),
    (r'\b(pkill|killall)\b.*\b(hermes|gateway|cli\.py)\b', "kill hermes/gateway"),
    
    # Code Injection
    (r':\(\)\s*\{\s*:\s*\|\s*:\s*&\s*\}\s*;\s*:', "fork bomb"),
    (r'\b(bash|sh|zsh|ksh)\s+-[^\s]*c(\s+|$)', "shell command via -c flag"),
    (r'\b(curl|wget)\b.*\|\s*(ba)?sh\b', "pipe remote content to shell"),
    (r'\b(bash|sh|zsh|ksh)\s+<\s*<?\s*\(\s*(curl|wget)\b', "execute remote script via process substitution"),
    
    # Sensitive Path Writes
    (rf'\btee\b.*["\']?{_SENSITIVE_WRITE_TARGET}', "overwrite system file via tee"),
    (rf'>>?\s*["\']?{_SENSITIVE_WRITE_TARGET}', "overwrite system file via redirection"),
    
    # File Operations
    (r'\bxargs\s+.*\brm\b', "xargs with rm"),
    (r'\bfind\b.*-exec\s+(/\S*/)?rm\b', "find -exec rm"),
    (r'\bfind\b.*-delete\b', "find -delete"),
    (r'\b(cp|mv|install)\b.*\s/etc/', "copy/move file into /etc/"),
    (r'\bsed\s+-[^\s]*i.*\s/etc/', "in-place edit of system config"),
    
    # Gateway Protection
    (r'gateway\s+run\b.*(&\s*$|&\s*;|\bdisown\b|\bsetsid\b)', "start gateway outside systemd"),
    (r'\bnohup\b.*gateway\s+run\b', "start gateway outside systemd"),
]

3.2 Sensitive Path Patterns

# SSH keys
_SSH_SENSITIVE_PATH = r'(?:~|\$home|\$\{home\})/\.ssh(?:/|$)'

# Hermes environment
_HERMES_ENV_PATH = (
    r'(?:~\/\.hermes/|'
    r'(?:\$home|\$\{home\})/\.hermes/|'
    r'(?:\$hermes_home|\$\{hermes_home\})/)'
    r'\.env\b'
)

# System paths
_SENSITIVE_WRITE_TARGET = (
    r'(?:/etc/|/dev/sd|'
    rf'{_SSH_SENSITIVE_PATH}|'
    rf'{_HERMES_ENV_PATH})'
)

3.3 Approval Flow States

Command Input
      │
      ▼
┌─────────────────────┐
│ Pattern Detection   │────┐
│ (approval.py)       │    │
└─────────────────────┘    │
      │                    │
      ▼                    │
┌─────────────────────┐    │
│ Tirith Scanner      │────┤
│ (tirith_security.py)│    │
└─────────────────────┘    │
      │                    │
      ▼                    │
┌─────────────────────┐    │
│ Mode = smart?       │────┼──▶ Smart Approval (aux LLM)
│                     │    │
└─────────────────────┘    │
      │                    │
      ▼                    │
┌─────────────────────┐    │
│ Gateway/CLI?        │────┼──▶ Async Approval Prompt
│                     │    │
└─────────────────────┘    │
      │                    │
      ▼                    │
┌─────────────────────┐    │
│ Interactive Prompt  │◀───┘
│ (once/session/     │
│  always/deny)      │
└─────────────────────┘

4. Tool Improvement Recommendations

4.1 Critical Improvements

# Recommendation Impact Effort
1 Implement tool call result caching High Medium
Cache file reads, search results with TTL to prevent redundant I/O
2 Add tool execution metrics/observability High Low
Track duration, success rates, token usage per tool for optimization
3 Implement tool retry with exponential backoff Medium Low
Terminal tool has basic retry (terminal_tool.py:1105-1130) but could be generalized
4 Add tool call rate limiting per session Medium Medium
Prevent runaway loops (e.g., 1000+ search calls in one session)
5 Create tool health check system Medium Medium
Periodic validation that tools are functioning (API keys valid, services up)

4.2 Security Enhancements

# Recommendation Impact Effort
6 Implement command intent classification High Medium
Use lightweight model to classify commands before execution for better risk assessment
7 Add network egress filtering for sandbox tools High Medium
Whitelist domains for web_extract, block known malicious IPs
8 Implement tool call provenance logging Medium Low
Immutable log of what tools were called with what args for audit

4.3 Usability Improvements

# Recommendation Impact Effort
9 Add tool suggestion system Medium Medium
When LLM uses suboptimal pattern (cat vs read_file), suggest better alternative
10 Implement progressive tool disclosure Medium High
Start with minimal toolset, expand based on task complexity indicators

5. Missing Tool Coverage Gaps

5.1 High-Priority Gaps

Gap Use Case Current Workaround
Database query tool SQL database exploration terminal with sqlite3/psql
API testing tool REST API debugging (curl alternative) terminal with curl
Git operations tool Structured git commands (status, diff, log) terminal with git
Package manager tool Structured pip/npm/apt operations terminal with package managers
Archive/zip tool Create/extract archives terminal with tar/unzip

5.2 Medium-Priority Gaps

Gap Use Case Current Workaround
Diff tool Structured file comparison search_files + manual compare
JSON/YAML manipulation Structured config editing read_file + write_file
Image manipulation Resize, crop, convert images terminal with ImageMagick
PDF operations Extract text, merge, split terminal with pdftotext
Data visualization Generate charts from data code_execution with matplotlib

5.3 Advanced Gaps

Gap Description
Vector database tool Semantic search over embeddings
Test runner tool Structured test execution with parsing
Linter/formatter tool Code quality checks with structured output
Dependency analysis tool Visualize and analyze code dependencies
Documentation generator tool Auto-generate docs from code

6. Tool Registry Architecture

6.1 Registration Flow

# From tools/registry.py
class ToolRegistry:
    def register(self, name: str, toolset: str, schema: dict, 
                 handler: Callable, check_fn: Callable = None, ...)
    
    def dispatch(self, name: str, args: dict, **kwargs) -> str
    
    def get_definitions(self, tool_names: Set[str], quiet: bool = False) -> List[dict]

6.2 Tool Entry Structure

class ToolEntry:
    __slots__ = (
        "name",        # Tool identifier
        "toolset",     # Category (file, terminal, web, etc.)
        "schema",      # OpenAI-format JSON schema
        "handler",     # Callable implementation
        "check_fn",    # Availability check (returns bool)
        "requires_env",# Required env var names
        "is_async",    # Whether handler is async
        "description", # Human-readable description
        "emoji",       # Visual identifier
    )

6.3 Registration Example (file_tools.py:560-563)

registry.register(
    name="read_file",
    toolset="file",
    schema=READ_FILE_SCHEMA,
    handler=_handle_read_file,
    check_fn=_check_file_reqs,
    emoji="📖"
)

7. Toolset Composition System

7.1 Toolset Definition (toolsets.py:72-377)

TOOLSETS = {
    "file": {
        "description": "File manipulation tools",
        "tools": ["read_file", "write_file", "patch", "search_files"],
        "includes": []
    },
    "debugging": {
        "description": "Debugging and troubleshooting toolkit",
        "tools": ["terminal", "process"],
        "includes": ["web", "file"]  # Composes other toolsets
    },
}

7.2 Resolution Algorithm

def resolve_toolset(name: str, visited: Set[str] = None) -> List[str]:
    # 1. Cycle detection
    # 2. Get toolset definition
    # 3. Collect direct tools
    # 4. Recursively resolve includes (diamond deps handled)
    # 5. Return deduplicated list

7.3 Platform-Specific Toolsets

Toolset Purpose Key Difference
hermes-cli Full CLI access All tools available
hermes-acp Editor integration No messaging, audio, or clarify UI
hermes-api-server HTTP API No interactive UI tools
hermes-telegram Telegram bot Full access with safety checks
hermes-gateway Union of all messaging Includes all platform tools

8. Environment Backend Deep Dive

8.1 Base Class Interface (tools/environments/base.py)

class BaseEnvironment(ABC):
    def execute(self, command: str, cwd: str = "", *,
                timeout: int | None = None,
                stdin_data: str | None = None) -> dict:
        """Return {"output": str, "returncode": int}"""
    
    def cleanup(self):
        """Release backend resources"""

8.2 Environment Feature Matrix

Feature Local Docker Modal SSH Singularity Daytona
PTY support
Persistent shell
Filesystem persistence Optional Optional Snapshots N/A (remote) Optional Yes
Interrupt handling
Sudo support
Resource limits
GPU support Remote

9. Process Registry System

9.1 Background Process Management (tools/process_registry.py)

class ProcessRegistry:
    def spawn_local(self, command, cwd, task_id, ...) -> ProcessSession
    def spawn_via_env(self, env, command, ...) -> ProcessSession
    def poll(self, session_id: str) -> dict
    def wait(self, session_id: str, timeout: int = None) -> dict
    def kill(self, session_id: str)

9.2 Process Session States

CREATED ──▶ RUNNING ──▶ FINISHED
               │            │
               ▼            ▼
          INTERRUPTED   TIMEOUT
          (exit_code=130) (exit_code=124)

10. Code Analysis Summary

10.1 Lines of Code by Component

Component Files Approx. LOC
Tool Implementations 30+ ~15,000
Environment Backends 6 ~3,500
Registry & Core 2 ~800
Security (approval, tirith) 2 ~1,200
Process Management 1 ~900
Total 40+ ~21,400

10.2 Test Coverage

  • 150+ test files in tests/tools/
  • Unit tests for each tool
  • Integration tests for environments
  • Security-focused tests for approval system

Appendix A: File Organization

tools/
├── registry.py              # Tool registration & dispatch
├── __init__.py              # Package exports
│
├── file_tools.py            # read_file, write_file, patch, search_files
├── file_operations.py       # ShellFileOperations backend
│
├── terminal_tool.py         # Main terminal execution (1,358 lines)
├── process_registry.py      # Background process management
│
├── web_tools.py             # web_search, web_extract, web_crawl (1,843 lines)
├── browser_tool.py          # Browser automation (1,955 lines)
├── browser_providers/       # Browserbase, BrowserUse providers
│
├── approval.py              # Dangerous command detection (670 lines)
├── tirith_security.py       # External security scanner (670 lines)
│
├── environments/            # Execution backends
│   ├── base.py              # BaseEnvironment ABC
│   ├── local.py             # Local subprocess (486 lines)
│   ├── docker.py            # Docker containers (535 lines)
│   ├── modal.py             # Modal cloud (372 lines)
│   ├── ssh.py               # SSH remote (307 lines)
│   ├── singularity.py       # Singularity/Apptainer
│   ├── daytona.py           # Daytona workspaces
│   └── persistent_shell.py  # Shared persistent shell mixin
│
├── code_execution_tool.py   # Programmatic tool calling (806 lines)
├── delegate_tool.py         # Subagent spawning (794 lines)
│
├── skills_tool.py           # Skill management (1,344 lines)
├── skill_manager_tool.py    # Skill CRUD operations
│
└── [20+ additional tools...]

toolsets.py                  # Toolset definitions (641 lines)

Report generated from comprehensive analysis of the Hermes agent tool system.