tools_analysis_report.md

# Deep Analysis: Hermes Tool System

## Executive Summary

This report provides a comprehensive analysis of the Hermes agent tool infrastructure, covering:
- Tool registration and dispatch (registry.py)
- 30+ tool implementations across multiple categories
- 6 environment backends (local, Docker, Modal, SSH, Singularity, Daytona)
- Security boundaries and dangerous command detection
- Toolset definitions and composition system

---

## 1. Tool Execution Flow Diagram

```
┌─────────────────────────────────────────────────────────────────────────────────┐
│                              TOOL EXECUTION FLOW                                 │
└─────────────────────────────────────────────────────────────────────────────────┘

┌─────────────┐    ┌──────────────────┐    ┌──────────────────┐
│   User/LLM  │───▶│  Model Tools     │───▶│ Tool Registry    │
│  Request    │    │  (model_tools.py)│    │ (registry.py)    │
└─────────────┘    └──────────────────┘    └──────────────────┘
                                                    │
              ┌─────────────────────────────────────┼─────────────────────────────────────┐
              │                                     │                                     │
              ▼                                     ▼                                     ▼
     ┌─────────────────┐              ┌────────────────────┐              ┌─────────────────────┐
     │ File Tools      │              │ Terminal Tool      │              │ Web Tools           │
     │ ─────────────── │              │ ────────────────── │              │ ─────────────────── │
     │ • read_file     │              │ • Local execution  │              │ • web_search        │
     │ • write_file    │              │ • Docker sandbox   │              │ • web_extract       │
     │ • patch         │              │ • Modal cloud      │              │ • web_crawl         │
     │ • search_files  │              │ • SSH remote       │              │                     │
     └────────┬────────┘              │ • Singularity      │              └─────────────────────┘
              │                       │ • Daytona          │                       │
              │                       └─────────┬──────────┘                       │
              │                                 │                                  │
              ▼                                 ▼                                  ▼
     ┌─────────────────────────────────────────────────────────────────────────────────────────┐
     │                              ENVIRONMENT BACKENDS                                       │
     │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
     │  │  Local   │  │  Docker  │  │  Modal   │  │   SSH    │  │Singularity│ │ Daytona  │   │
     │  │──────────│  │──────────│  │──────────│  │──────────│  │───────────│  │──────────│   │
     │  │subprocess│  │container │  │Sandbox   │  │ControlMaster│ │overlay   │  │workspace │   │
     │  │   -l     │  │exec      │  │.exec()   │  │connection │  │SIF       │  │.exec()   │   │
     │  └──────────┘  └──────────┘  └──────────┘  └──────────┘  └───────────┘  └──────────┘   │
     └─────────────────────────────────────────────────────────────────────────────────────────┘
                                              │
                                              ▼
                              ┌─────────────────────────────┐
                              │    SECURITY CHECKPOINT      │
                              │  ┌─────────────────────┐    │
                              │  │ 1. Tirith Scanner   │    │
                              │  │    (command content)│    │
                              │  ├─────────────────────┤    │
                              │  │ 2. Pattern Matching │    │
                              │  │    (DANGEROUS_PATTERNS)│   │
                              │  ├─────────────────────┤    │
                              │  │ 3. Smart Approval   │    │
                              │  │    (aux LLM)        │    │
                              │  └─────────────────────┘    │
                              └─────────────────────────────┘
                                              │
              ┌─────────────────────────────────┼─────────────────────────────────┐
              │                                 │                                 │
              ▼                                 ▼                                 ▼
     ┌──────────────────┐           ┌──────────────────┐               ┌──────────────────┐
     │  APPROVED        │           │  BLOCKED         │               │  USER PROMPT     │
     │  (execute)       │           │  (deny + reason) │               │  (once/session/always/deny)
     └──────────────────┘           └──────────────────┘               └──────────────────┘

┌──────────────────────────────────────────────────────────────────────────────────────────────┐
│                              ADDITIONAL TOOL CATEGORIES                                      │
├──────────────────────────────────────────────────────────────────────────────────────────────┤
│  Browser Tools │ Vision Tools │ MoA Tools │ Skills Tools │ Code Exec │ Delegate │ TTS       │
│  ───────────── │ ──────────── │ ───────── │ ──────────── │ ───────── │ ──────── │ ──────────│
│  • navigate    │ • analyze    │ • reason  │ • list       │ • sandbox │ • spawn  │ • speech  │
│  • click       │ • extract    │ • debate  │ • view       │ • RPC     │ • batch  │ • voices  │
│  • snapshot    │              │           │ • manage     │ • 7 tools │ • depth  │           │
│  • scroll      │              │           │              │   limit   │   limit  │           │
└──────────────────────────────────────────────────────────────────────────────────────────────┘
```

---

## 2. Security Boundary Analysis

### 2.1 Multi-Layer Security Architecture

| Layer | Component | Purpose |
|-------|-----------|---------|
| **Layer 1** | Container Isolation | Docker/Modal/Singularity sandboxes isolate from host |
| **Layer 2** | Dangerous Pattern Detection | Regex-based command filtering (approval.py) |
| **Layer 3** | Tirith Security Scanner | Content-level threat detection (pipe-to-shell, homograph URLs) |
| **Layer 4** | Smart Approval (Aux LLM) | LLM-based risk assessment for edge cases |
| **Layer 5** | File System Guards | Sensitive path blocking (/etc, ~/.ssh, ~/.hermes/.env) |
| **Layer 6** | Process Limits | Timeouts, memory limits, PID limits, capability dropping |

### 2.2 Environment Security Comparison

| Backend | Isolation Level | Persistent | Root Access | Network | Use Case |
|---------|-----------------|------------|-------------|---------|----------|
| **Local** | None (host) | Optional | User's own | Full | Development, trusted code |
| **Docker** | Container + caps | Optional | Container root | Isolated | General sandboxing |
| **Modal** | Cloud VM | Snapshots | Root | Isolated | Cloud compute, scalability |
| **SSH** | Remote machine | Yes | Remote user | Networked | Production servers |
| **Singularity** | Container + overlay | Optional | User-mapped | Configurable | HPC environments |
| **Daytona** | Cloud workspace | Yes | Root | Isolated | Managed dev environments |

### 2.3 Security Hardening Details

**Docker Environment (tools/environments/docker.py:107-117):**
```python
_SECURITY_ARGS = [
    "--cap-drop", "ALL",          # Drop all capabilities
    "--cap-add", "DAC_OVERRIDE",   # Allow root to write host-owned dirs
    "--cap-add", "CHOWN",
    "--cap-add", "FOWNER",
    "--security-opt", "no-new-privileges",
    "--pids-limit", "256",
    "--tmpfs", "/tmp:rw,nosuid,size=512m",
]
```

**Local Environment Secret Isolation (tools/environments/local.py:28-131):**
- Dynamic blocklist derived from provider registry
- Blocks 60+ API key environment variables
- Prevents credential leakage to subprocesses
- Support for `_HERMES_FORCE_` prefix overrides

---

## 3. All Dangerous Command Detection Patterns

### 3.1 Pattern Categories (from tools/approval.py:40-78)

```python
DANGEROUS_PATTERNS = [
    # File System Destruction
    (r'\brm\s+(-[^\s]*\s+)*/', "delete in root path"),
    (r'\brm\s+-[^\s]*r', "recursive delete"),
    
    # Permission Escalation
    (r'\bchmod\s+(-[^\s]*\s+)*(777|666|o\+[rwx]*w|a\+[rwx]*w)\b', "world/other-writable permissions"),
    (r'\bchown\s+(-[^\s]*)?R\s+root', "recursive chown to root"),
    
    # Disk/Filesystem Operations
    (r'\bmkfs\b', "format filesystem"),
    (r'\bdd\s+.*if=', "disk copy"),
    (r'>\s*/dev/sd', "write to block device"),
    
    # Database Destruction
    (r'\bDROP\s+(TABLE|DATABASE)\b', "SQL DROP"),
    (r'\bDELETE\s+FROM\b(?!.*\bWHERE\b)', "SQL DELETE without WHERE"),
    (r'\bTRUNCATE\s+(TABLE)?\s*\w', "SQL TRUNCATE"),
    
    # System Configuration
    (r'>\s*/etc/', "overwrite system config"),
    (r'\bsystemctl\s+(stop|disable|mask)\b', "stop/disable system service"),
    
    # Process Termination
    (r'\bkill\s+-9\s+-1\b', "kill all processes"),
    (r'\bpkill\s+-9\b', "force kill processes"),
    (r'\b(pkill|killall)\b.*\b(hermes|gateway|cli\.py)\b', "kill hermes/gateway"),
    
    # Code Injection
    (r':\(\)\s*\{\s*:\s*\|\s*:\s*&\s*\}\s*;\s*:', "fork bomb"),
    (r'\b(bash|sh|zsh|ksh)\s+-[^\s]*c(\s+|$)', "shell command via -c flag"),
    (r'\b(curl|wget)\b.*\|\s*(ba)?sh\b', "pipe remote content to shell"),
    (r'\b(bash|sh|zsh|ksh)\s+<\s*<?\s*\(\s*(curl|wget)\b', "execute remote script via process substitution"),
    
    # Sensitive Path Writes
    (rf'\btee\b.*["\']?{_SENSITIVE_WRITE_TARGET}', "overwrite system file via tee"),
    (rf'>>?\s*["\']?{_SENSITIVE_WRITE_TARGET}', "overwrite system file via redirection"),
    
    # File Operations
    (r'\bxargs\s+.*\brm\b', "xargs with rm"),
    (r'\bfind\b.*-exec\s+(/\S*/)?rm\b', "find -exec rm"),
    (r'\bfind\b.*-delete\b', "find -delete"),
    (r'\b(cp|mv|install)\b.*\s/etc/', "copy/move file into /etc/"),
    (r'\bsed\s+-[^\s]*i.*\s/etc/', "in-place edit of system config"),
    
    # Gateway Protection
    (r'gateway\s+run\b.*(&\s*$|&\s*;|\bdisown\b|\bsetsid\b)', "start gateway outside systemd"),
    (r'\bnohup\b.*gateway\s+run\b', "start gateway outside systemd"),
]
```

### 3.2 Sensitive Path Patterns

```python
# SSH keys
_SSH_SENSITIVE_PATH = r'(?:~|\$home|\$\{home\})/\.ssh(?:/|$)'

# Hermes environment
_HERMES_ENV_PATH = (
    r'(?:~\/\.hermes/|'
    r'(?:\$home|\$\{home\})/\.hermes/|'
    r'(?:\$hermes_home|\$\{hermes_home\})/)'
    r'\.env\b'
)

# System paths
_SENSITIVE_WRITE_TARGET = (
    r'(?:/etc/|/dev/sd|'
    rf'{_SSH_SENSITIVE_PATH}|'
    rf'{_HERMES_ENV_PATH})'
)
```

### 3.3 Approval Flow States

```
Command Input
      │
      ▼
┌─────────────────────┐
│ Pattern Detection   │────┐
│ (approval.py)       │    │
└─────────────────────┘    │
      │                    │
      ▼                    │
┌─────────────────────┐    │
│ Tirith Scanner      │────┤
│ (tirith_security.py)│    │
└─────────────────────┘    │
      │                    │
      ▼                    │
┌─────────────────────┐    │
│ Mode = smart?       │────┼──▶ Smart Approval (aux LLM)
│                     │    │
└─────────────────────┘    │
      │                    │
      ▼                    │
┌─────────────────────┐    │
│ Gateway/CLI?        │────┼──▶ Async Approval Prompt
│                     │    │
└─────────────────────┘    │
      │                    │
      ▼                    │
┌─────────────────────┐    │
│ Interactive Prompt  │◀───┘
│ (once/session/     │
│  always/deny)      │
└─────────────────────┘
```

---

## 4. Tool Improvement Recommendations

### 4.1 Critical Improvements

| # | Recommendation | Impact | Effort |
|---|----------------|--------|--------|
| 1 | **Implement tool call result caching** | High | Medium |
|   | Cache file reads, search results with TTL to prevent redundant I/O | | |
| 2 | **Add tool execution metrics/observability** | High | Low |
|   | Track duration, success rates, token usage per tool for optimization | | |
| 3 | **Implement tool retry with exponential backoff** | Medium | Low |
|   | Terminal tool has basic retry (terminal_tool.py:1105-1130) but could be generalized | | |
| 4 | **Add tool call rate limiting per session** | Medium | Medium |
|   | Prevent runaway loops (e.g., 1000+ search calls in one session) | | |
| 5 | **Create tool health check system** | Medium | Medium |
|   | Periodic validation that tools are functioning (API keys valid, services up) | | |

### 4.2 Security Enhancements

| # | Recommendation | Impact | Effort |
|---|----------------|--------|--------|
| 6 | **Implement command intent classification** | High | Medium |
|   | Use lightweight model to classify commands before execution for better risk assessment | | |
| 7 | **Add network egress filtering for sandbox tools** | High | Medium |
|   | Whitelist domains for web_extract, block known malicious IPs | | |
| 8 | **Implement tool call provenance logging** | Medium | Low |
|   | Immutable log of what tools were called with what args for audit | | |

### 4.3 Usability Improvements

| # | Recommendation | Impact | Effort |
|---|----------------|--------|--------|
| 9 | **Add tool suggestion system** | Medium | Medium |
|   | When LLM uses suboptimal pattern (cat vs read_file), suggest better alternative | | |
| 10 | **Implement progressive tool disclosure** | Medium | High |
|   | Start with minimal toolset, expand based on task complexity indicators | | |

---

## 5. Missing Tool Coverage Gaps

### 5.1 High-Priority Gaps

| Gap | Use Case | Current Workaround |
|-----|----------|-------------------|
| **Database query tool** | SQL database exploration | terminal with sqlite3/psql |
| **API testing tool** | REST API debugging (curl alternative) | terminal with curl |
| **Git operations tool** | Structured git commands (status, diff, log) | terminal with git |
| **Package manager tool** | Structured pip/npm/apt operations | terminal with package managers |
| **Archive/zip tool** | Create/extract archives | terminal with tar/unzip |

### 5.2 Medium-Priority Gaps

| Gap | Use Case | Current Workaround |
|-----|----------|-------------------|
| **Diff tool** | Structured file comparison | search_files + manual compare |
| **JSON/YAML manipulation** | Structured config editing | read_file + write_file |
| **Image manipulation** | Resize, crop, convert images | terminal with ImageMagick |
| **PDF operations** | Extract text, merge, split | terminal with pdftotext |
| **Data visualization** | Generate charts from data | code_execution with matplotlib |

### 5.3 Advanced Gaps

| Gap | Description |
|-----|-------------|
| **Vector database tool** | Semantic search over embeddings |
| **Test runner tool** | Structured test execution with parsing |
| **Linter/formatter tool** | Code quality checks with structured output |
| **Dependency analysis tool** | Visualize and analyze code dependencies |
| **Documentation generator tool** | Auto-generate docs from code |

---

## 6. Tool Registry Architecture

### 6.1 Registration Flow

```python
# From tools/registry.py
class ToolRegistry:
    def register(self, name: str, toolset: str, schema: dict, 
                 handler: Callable, check_fn: Callable = None, ...)
    
    def dispatch(self, name: str, args: dict, **kwargs) -> str
    
    def get_definitions(self, tool_names: Set[str], quiet: bool = False) -> List[dict]
```

### 6.2 Tool Entry Structure

```python
class ToolEntry:
    __slots__ = (
        "name",        # Tool identifier
        "toolset",     # Category (file, terminal, web, etc.)
        "schema",      # OpenAI-format JSON schema
        "handler",     # Callable implementation
        "check_fn",    # Availability check (returns bool)
        "requires_env",# Required env var names
        "is_async",    # Whether handler is async
        "description", # Human-readable description
        "emoji",       # Visual identifier
    )
```

### 6.3 Registration Example (file_tools.py:560-563)

```python
registry.register(
    name="read_file",
    toolset="file",
    schema=READ_FILE_SCHEMA,
    handler=_handle_read_file,
    check_fn=_check_file_reqs,
    emoji="📖"
)
```

---

## 7. Toolset Composition System

### 7.1 Toolset Definition (toolsets.py:72-377)

```python
TOOLSETS = {
    "file": {
        "description": "File manipulation tools",
        "tools": ["read_file", "write_file", "patch", "search_files"],
        "includes": []
    },
    "debugging": {
        "description": "Debugging and troubleshooting toolkit",
        "tools": ["terminal", "process"],
        "includes": ["web", "file"]  # Composes other toolsets
    },
}
```

### 7.2 Resolution Algorithm

```python
def resolve_toolset(name: str, visited: Set[str] = None) -> List[str]:
    # 1. Cycle detection
    # 2. Get toolset definition
    # 3. Collect direct tools
    # 4. Recursively resolve includes (diamond deps handled)
    # 5. Return deduplicated list
```

### 7.3 Platform-Specific Toolsets

| Toolset | Purpose | Key Difference |
|---------|---------|----------------|
| `hermes-cli` | Full CLI access | All tools available |
| `hermes-acp` | Editor integration | No messaging, audio, or clarify UI |
| `hermes-api-server` | HTTP API | No interactive UI tools |
| `hermes-telegram` | Telegram bot | Full access with safety checks |
| `hermes-gateway` | Union of all messaging | Includes all platform tools |

---

## 8. Environment Backend Deep Dive

### 8.1 Base Class Interface (tools/environments/base.py)

```python
class BaseEnvironment(ABC):
    def execute(self, command: str, cwd: str = "", *,
                timeout: int | None = None,
                stdin_data: str | None = None) -> dict:
        """Return {"output": str, "returncode": int}"""
    
    def cleanup(self):
        """Release backend resources"""
```

### 8.2 Environment Feature Matrix

| Feature | Local | Docker | Modal | SSH | Singularity | Daytona |
|---------|-------|--------|-------|-----|-------------|---------|
| PTY support | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ |
| Persistent shell | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ |
| Filesystem persistence | Optional | Optional | Snapshots | N/A (remote) | Optional | Yes |
| Interrupt handling | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Sudo support | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Resource limits | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ |
| GPU support | ❌ | ✅ | ✅ | Remote | ✅ | ✅ |

---

## 9. Process Registry System

### 9.1 Background Process Management (tools/process_registry.py)

```python
class ProcessRegistry:
    def spawn_local(self, command, cwd, task_id, ...) -> ProcessSession
    def spawn_via_env(self, env, command, ...) -> ProcessSession
    def poll(self, session_id: str) -> dict
    def wait(self, session_id: str, timeout: int = None) -> dict
    def kill(self, session_id: str)
```

### 9.2 Process Session States

```
CREATED ──▶ RUNNING ──▶ FINISHED
               │            │
               ▼            ▼
          INTERRUPTED   TIMEOUT
          (exit_code=130) (exit_code=124)
```

---

## 10. Code Analysis Summary

### 10.1 Lines of Code by Component

| Component | Files | Approx. LOC |
|-----------|-------|-------------|
| Tool Implementations | 30+ | ~15,000 |
| Environment Backends | 6 | ~3,500 |
| Registry & Core | 2 | ~800 |
| Security (approval, tirith) | 2 | ~1,200 |
| Process Management | 1 | ~900 |
| **Total** | **40+** | **~21,400** |

### 10.2 Test Coverage

- 150+ test files in `tests/tools/`
- Unit tests for each tool
- Integration tests for environments
- Security-focused tests for approval system

---

## Appendix A: File Organization

```
tools/
├── registry.py              # Tool registration & dispatch
├── __init__.py              # Package exports
│
├── file_tools.py            # read_file, write_file, patch, search_files
├── file_operations.py       # ShellFileOperations backend
│
├── terminal_tool.py         # Main terminal execution (1,358 lines)
├── process_registry.py      # Background process management
│
├── web_tools.py             # web_search, web_extract, web_crawl (1,843 lines)
├── browser_tool.py          # Browser automation (1,955 lines)
├── browser_providers/       # Browserbase, BrowserUse providers
│
├── approval.py              # Dangerous command detection (670 lines)
├── tirith_security.py       # External security scanner (670 lines)
│
├── environments/            # Execution backends
│   ├── base.py              # BaseEnvironment ABC
│   ├── local.py             # Local subprocess (486 lines)
│   ├── docker.py            # Docker containers (535 lines)
│   ├── modal.py             # Modal cloud (372 lines)
│   ├── ssh.py               # SSH remote (307 lines)
│   ├── singularity.py       # Singularity/Apptainer
│   ├── daytona.py           # Daytona workspaces
│   └── persistent_shell.py  # Shared persistent shell mixin
│
├── code_execution_tool.py   # Programmatic tool calling (806 lines)
├── delegate_tool.py         # Subagent spawning (794 lines)
│
├── skills_tool.py           # Skill management (1,344 lines)
├── skill_manager_tool.py    # Skill CRUD operations
│
└── [20+ additional tools...]

toolsets.py                  # Toolset definitions (641 lines)
```

---

*Report generated from comprehensive analysis of the Hermes agent tool system.*