Files
hermes-agent/tools_analysis_report.md

534 lines
25 KiB
Markdown
Raw Permalink Normal View History

# Deep Analysis: Hermes Tool System
## Executive Summary
This report provides a comprehensive analysis of the Hermes agent tool infrastructure, covering:
- Tool registration and dispatch (registry.py)
- 30+ tool implementations across multiple categories
- 6 environment backends (local, Docker, Modal, SSH, Singularity, Daytona)
- Security boundaries and dangerous command detection
- Toolset definitions and composition system
---
## 1. Tool Execution Flow Diagram
```
┌─────────────────────────────────────────────────────────────────────────────────┐
│ TOOL EXECUTION FLOW │
└─────────────────────────────────────────────────────────────────────────────────┘
┌─────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ User/LLM │───▶│ Model Tools │───▶│ Tool Registry │
│ Request │ │ (model_tools.py)│ │ (registry.py) │
└─────────────┘ └──────────────────┘ └──────────────────┘
┌─────────────────────────────────────┼─────────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌────────────────────┐ ┌─────────────────────┐
│ File Tools │ │ Terminal Tool │ │ Web Tools │
│ ─────────────── │ │ ────────────────── │ │ ─────────────────── │
│ • read_file │ │ • Local execution │ │ • web_search │
│ • write_file │ │ • Docker sandbox │ │ • web_extract │
│ • patch │ │ • Modal cloud │ │ • web_crawl │
│ • search_files │ │ • SSH remote │ │ │
└────────┬────────┘ │ • Singularity │ └─────────────────────┘
│ │ • Daytona │ │
│ └─────────┬──────────┘ │
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ ENVIRONMENT BACKENDS │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Local │ │ Docker │ │ Modal │ │ SSH │ │Singularity│ │ Daytona │ │
│ │──────────│ │──────────│ │──────────│ │──────────│ │───────────│ │──────────│ │
│ │subprocess│ │container │ │Sandbox │ │ControlMaster│ │overlay │ │workspace │ │
│ │ -l │ │exec │ │.exec() │ │connection │ │SIF │ │.exec() │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └───────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────┐
│ SECURITY CHECKPOINT │
│ ┌─────────────────────┐ │
│ │ 1. Tirith Scanner │ │
│ │ (command content)│ │
│ ├─────────────────────┤ │
│ │ 2. Pattern Matching │ │
│ │ (DANGEROUS_PATTERNS)│ │
│ ├─────────────────────┤ │
│ │ 3. Smart Approval │ │
│ │ (aux LLM) │ │
│ └─────────────────────┘ │
└─────────────────────────────┘
┌─────────────────────────────────┼─────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ APPROVED │ │ BLOCKED │ │ USER PROMPT │
│ (execute) │ │ (deny + reason) │ │ (once/session/always/deny)
└──────────────────┘ └──────────────────┘ └──────────────────┘
┌──────────────────────────────────────────────────────────────────────────────────────────────┐
│ ADDITIONAL TOOL CATEGORIES │
├──────────────────────────────────────────────────────────────────────────────────────────────┤
│ Browser Tools │ Vision Tools │ MoA Tools │ Skills Tools │ Code Exec │ Delegate │ TTS │
│ ───────────── │ ──────────── │ ───────── │ ──────────── │ ───────── │ ──────── │ ──────────│
│ • navigate │ • analyze │ • reason │ • list │ • sandbox │ • spawn │ • speech │
│ • click │ • extract │ • debate │ • view │ • RPC │ • batch │ • voices │
│ • snapshot │ │ │ • manage │ • 7 tools │ • depth │ │
│ • scroll │ │ │ │ limit │ limit │ │
└──────────────────────────────────────────────────────────────────────────────────────────────┘
```
---
## 2. Security Boundary Analysis
### 2.1 Multi-Layer Security Architecture
| Layer | Component | Purpose |
|-------|-----------|---------|
| **Layer 1** | Container Isolation | Docker/Modal/Singularity sandboxes isolate from host |
| **Layer 2** | Dangerous Pattern Detection | Regex-based command filtering (approval.py) |
| **Layer 3** | Tirith Security Scanner | Content-level threat detection (pipe-to-shell, homograph URLs) |
| **Layer 4** | Smart Approval (Aux LLM) | LLM-based risk assessment for edge cases |
| **Layer 5** | File System Guards | Sensitive path blocking (/etc, ~/.ssh, ~/.hermes/.env) |
| **Layer 6** | Process Limits | Timeouts, memory limits, PID limits, capability dropping |
### 2.2 Environment Security Comparison
| Backend | Isolation Level | Persistent | Root Access | Network | Use Case |
|---------|-----------------|------------|-------------|---------|----------|
| **Local** | None (host) | Optional | User's own | Full | Development, trusted code |
| **Docker** | Container + caps | Optional | Container root | Isolated | General sandboxing |
| **Modal** | Cloud VM | Snapshots | Root | Isolated | Cloud compute, scalability |
| **SSH** | Remote machine | Yes | Remote user | Networked | Production servers |
| **Singularity** | Container + overlay | Optional | User-mapped | Configurable | HPC environments |
| **Daytona** | Cloud workspace | Yes | Root | Isolated | Managed dev environments |
### 2.3 Security Hardening Details
**Docker Environment (tools/environments/docker.py:107-117):**
```python
_SECURITY_ARGS = [
"--cap-drop", "ALL", # Drop all capabilities
"--cap-add", "DAC_OVERRIDE", # Allow root to write host-owned dirs
"--cap-add", "CHOWN",
"--cap-add", "FOWNER",
"--security-opt", "no-new-privileges",
"--pids-limit", "256",
"--tmpfs", "/tmp:rw,nosuid,size=512m",
]
```
**Local Environment Secret Isolation (tools/environments/local.py:28-131):**
- Dynamic blocklist derived from provider registry
- Blocks 60+ API key environment variables
- Prevents credential leakage to subprocesses
- Support for `_HERMES_FORCE_` prefix overrides
---
## 3. All Dangerous Command Detection Patterns
### 3.1 Pattern Categories (from tools/approval.py:40-78)
```python
DANGEROUS_PATTERNS = [
# File System Destruction
(r'\brm\s+(-[^\s]*\s+)*/', "delete in root path"),
(r'\brm\s+-[^\s]*r', "recursive delete"),
# Permission Escalation
(r'\bchmod\s+(-[^\s]*\s+)*(777|666|o\+[rwx]*w|a\+[rwx]*w)\b', "world/other-writable permissions"),
(r'\bchown\s+(-[^\s]*)?R\s+root', "recursive chown to root"),
# Disk/Filesystem Operations
(r'\bmkfs\b', "format filesystem"),
(r'\bdd\s+.*if=', "disk copy"),
(r'>\s*/dev/sd', "write to block device"),
# Database Destruction
(r'\bDROP\s+(TABLE|DATABASE)\b', "SQL DROP"),
(r'\bDELETE\s+FROM\b(?!.*\bWHERE\b)', "SQL DELETE without WHERE"),
(r'\bTRUNCATE\s+(TABLE)?\s*\w', "SQL TRUNCATE"),
# System Configuration
(r'>\s*/etc/', "overwrite system config"),
(r'\bsystemctl\s+(stop|disable|mask)\b', "stop/disable system service"),
# Process Termination
(r'\bkill\s+-9\s+-1\b', "kill all processes"),
(r'\bpkill\s+-9\b', "force kill processes"),
(r'\b(pkill|killall)\b.*\b(hermes|gateway|cli\.py)\b', "kill hermes/gateway"),
# Code Injection
(r':\(\)\s*\{\s*:\s*\|\s*:\s*&\s*\}\s*;\s*:', "fork bomb"),
(r'\b(bash|sh|zsh|ksh)\s+-[^\s]*c(\s+|$)', "shell command via -c flag"),
(r'\b(curl|wget)\b.*\|\s*(ba)?sh\b', "pipe remote content to shell"),
(r'\b(bash|sh|zsh|ksh)\s+<\s*<?\s*\(\s*(curl|wget)\b', "execute remote script via process substitution"),
# Sensitive Path Writes
(rf'\btee\b.*["\']?{_SENSITIVE_WRITE_TARGET}', "overwrite system file via tee"),
(rf'>>?\s*["\']?{_SENSITIVE_WRITE_TARGET}', "overwrite system file via redirection"),
# File Operations
(r'\bxargs\s+.*\brm\b', "xargs with rm"),
(r'\bfind\b.*-exec\s+(/\S*/)?rm\b', "find -exec rm"),
(r'\bfind\b.*-delete\b', "find -delete"),
(r'\b(cp|mv|install)\b.*\s/etc/', "copy/move file into /etc/"),
(r'\bsed\s+-[^\s]*i.*\s/etc/', "in-place edit of system config"),
# Gateway Protection
(r'gateway\s+run\b.*(&\s*$|&\s*;|\bdisown\b|\bsetsid\b)', "start gateway outside systemd"),
(r'\bnohup\b.*gateway\s+run\b', "start gateway outside systemd"),
]
```
### 3.2 Sensitive Path Patterns
```python
# SSH keys
_SSH_SENSITIVE_PATH = r'(?:~|\$home|\$\{home\})/\.ssh(?:/|$)'
# Hermes environment
_HERMES_ENV_PATH = (
r'(?:~\/\.hermes/|'
r'(?:\$home|\$\{home\})/\.hermes/|'
r'(?:\$hermes_home|\$\{hermes_home\})/)'
r'\.env\b'
)
# System paths
_SENSITIVE_WRITE_TARGET = (
r'(?:/etc/|/dev/sd|'
rf'{_SSH_SENSITIVE_PATH}|'
rf'{_HERMES_ENV_PATH})'
)
```
### 3.3 Approval Flow States
```
Command Input
┌─────────────────────┐
│ Pattern Detection │────┐
│ (approval.py) │ │
└─────────────────────┘ │
│ │
▼ │
┌─────────────────────┐ │
│ Tirith Scanner │────┤
│ (tirith_security.py)│ │
└─────────────────────┘ │
│ │
▼ │
┌─────────────────────┐ │
│ Mode = smart? │────┼──▶ Smart Approval (aux LLM)
│ │ │
└─────────────────────┘ │
│ │
▼ │
┌─────────────────────┐ │
│ Gateway/CLI? │────┼──▶ Async Approval Prompt
│ │ │
└─────────────────────┘ │
│ │
▼ │
┌─────────────────────┐ │
│ Interactive Prompt │◀───┘
│ (once/session/ │
│ always/deny) │
└─────────────────────┘
```
---
## 4. Tool Improvement Recommendations
### 4.1 Critical Improvements
| # | Recommendation | Impact | Effort |
|---|----------------|--------|--------|
| 1 | **Implement tool call result caching** | High | Medium |
| | Cache file reads, search results with TTL to prevent redundant I/O | | |
| 2 | **Add tool execution metrics/observability** | High | Low |
| | Track duration, success rates, token usage per tool for optimization | | |
| 3 | **Implement tool retry with exponential backoff** | Medium | Low |
| | Terminal tool has basic retry (terminal_tool.py:1105-1130) but could be generalized | | |
| 4 | **Add tool call rate limiting per session** | Medium | Medium |
| | Prevent runaway loops (e.g., 1000+ search calls in one session) | | |
| 5 | **Create tool health check system** | Medium | Medium |
| | Periodic validation that tools are functioning (API keys valid, services up) | | |
### 4.2 Security Enhancements
| # | Recommendation | Impact | Effort |
|---|----------------|--------|--------|
| 6 | **Implement command intent classification** | High | Medium |
| | Use lightweight model to classify commands before execution for better risk assessment | | |
| 7 | **Add network egress filtering for sandbox tools** | High | Medium |
| | Whitelist domains for web_extract, block known malicious IPs | | |
| 8 | **Implement tool call provenance logging** | Medium | Low |
| | Immutable log of what tools were called with what args for audit | | |
### 4.3 Usability Improvements
| # | Recommendation | Impact | Effort |
|---|----------------|--------|--------|
| 9 | **Add tool suggestion system** | Medium | Medium |
| | When LLM uses suboptimal pattern (cat vs read_file), suggest better alternative | | |
| 10 | **Implement progressive tool disclosure** | Medium | High |
| | Start with minimal toolset, expand based on task complexity indicators | | |
---
## 5. Missing Tool Coverage Gaps
### 5.1 High-Priority Gaps
| Gap | Use Case | Current Workaround |
|-----|----------|-------------------|
| **Database query tool** | SQL database exploration | terminal with sqlite3/psql |
| **API testing tool** | REST API debugging (curl alternative) | terminal with curl |
| **Git operations tool** | Structured git commands (status, diff, log) | terminal with git |
| **Package manager tool** | Structured pip/npm/apt operations | terminal with package managers |
| **Archive/zip tool** | Create/extract archives | terminal with tar/unzip |
### 5.2 Medium-Priority Gaps
| Gap | Use Case | Current Workaround |
|-----|----------|-------------------|
| **Diff tool** | Structured file comparison | search_files + manual compare |
| **JSON/YAML manipulation** | Structured config editing | read_file + write_file |
| **Image manipulation** | Resize, crop, convert images | terminal with ImageMagick |
| **PDF operations** | Extract text, merge, split | terminal with pdftotext |
| **Data visualization** | Generate charts from data | code_execution with matplotlib |
### 5.3 Advanced Gaps
| Gap | Description |
|-----|-------------|
| **Vector database tool** | Semantic search over embeddings |
| **Test runner tool** | Structured test execution with parsing |
| **Linter/formatter tool** | Code quality checks with structured output |
| **Dependency analysis tool** | Visualize and analyze code dependencies |
| **Documentation generator tool** | Auto-generate docs from code |
---
## 6. Tool Registry Architecture
### 6.1 Registration Flow
```python
# From tools/registry.py
class ToolRegistry:
def register(self, name: str, toolset: str, schema: dict,
handler: Callable, check_fn: Callable = None, ...)
def dispatch(self, name: str, args: dict, **kwargs) -> str
def get_definitions(self, tool_names: Set[str], quiet: bool = False) -> List[dict]
```
### 6.2 Tool Entry Structure
```python
class ToolEntry:
__slots__ = (
"name", # Tool identifier
"toolset", # Category (file, terminal, web, etc.)
"schema", # OpenAI-format JSON schema
"handler", # Callable implementation
"check_fn", # Availability check (returns bool)
"requires_env",# Required env var names
"is_async", # Whether handler is async
"description", # Human-readable description
"emoji", # Visual identifier
)
```
### 6.3 Registration Example (file_tools.py:560-563)
```python
registry.register(
name="read_file",
toolset="file",
schema=READ_FILE_SCHEMA,
handler=_handle_read_file,
check_fn=_check_file_reqs,
emoji="📖"
)
```
---
## 7. Toolset Composition System
### 7.1 Toolset Definition (toolsets.py:72-377)
```python
TOOLSETS = {
"file": {
"description": "File manipulation tools",
"tools": ["read_file", "write_file", "patch", "search_files"],
"includes": []
},
"debugging": {
"description": "Debugging and troubleshooting toolkit",
"tools": ["terminal", "process"],
"includes": ["web", "file"] # Composes other toolsets
},
}
```
### 7.2 Resolution Algorithm
```python
def resolve_toolset(name: str, visited: Set[str] = None) -> List[str]:
# 1. Cycle detection
# 2. Get toolset definition
# 3. Collect direct tools
# 4. Recursively resolve includes (diamond deps handled)
# 5. Return deduplicated list
```
### 7.3 Platform-Specific Toolsets
| Toolset | Purpose | Key Difference |
|---------|---------|----------------|
| `hermes-cli` | Full CLI access | All tools available |
| `hermes-acp` | Editor integration | No messaging, audio, or clarify UI |
| `hermes-api-server` | HTTP API | No interactive UI tools |
| `hermes-telegram` | Telegram bot | Full access with safety checks |
| `hermes-gateway` | Union of all messaging | Includes all platform tools |
---
## 8. Environment Backend Deep Dive
### 8.1 Base Class Interface (tools/environments/base.py)
```python
class BaseEnvironment(ABC):
def execute(self, command: str, cwd: str = "", *,
timeout: int | None = None,
stdin_data: str | None = None) -> dict:
"""Return {"output": str, "returncode": int}"""
def cleanup(self):
"""Release backend resources"""
```
### 8.2 Environment Feature Matrix
| Feature | Local | Docker | Modal | SSH | Singularity | Daytona |
|---------|-------|--------|-------|-----|-------------|---------|
| PTY support | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ |
| Persistent shell | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ |
| Filesystem persistence | Optional | Optional | Snapshots | N/A (remote) | Optional | Yes |
| Interrupt handling | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Sudo support | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Resource limits | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ |
| GPU support | ❌ | ✅ | ✅ | Remote | ✅ | ✅ |
---
## 9. Process Registry System
### 9.1 Background Process Management (tools/process_registry.py)
```python
class ProcessRegistry:
def spawn_local(self, command, cwd, task_id, ...) -> ProcessSession
def spawn_via_env(self, env, command, ...) -> ProcessSession
def poll(self, session_id: str) -> dict
def wait(self, session_id: str, timeout: int = None) -> dict
def kill(self, session_id: str)
```
### 9.2 Process Session States
```
CREATED ──▶ RUNNING ──▶ FINISHED
│ │
▼ ▼
INTERRUPTED TIMEOUT
(exit_code=130) (exit_code=124)
```
---
## 10. Code Analysis Summary
### 10.1 Lines of Code by Component
| Component | Files | Approx. LOC |
|-----------|-------|-------------|
| Tool Implementations | 30+ | ~15,000 |
| Environment Backends | 6 | ~3,500 |
| Registry & Core | 2 | ~800 |
| Security (approval, tirith) | 2 | ~1,200 |
| Process Management | 1 | ~900 |
| **Total** | **40+** | **~21,400** |
### 10.2 Test Coverage
- 150+ test files in `tests/tools/`
- Unit tests for each tool
- Integration tests for environments
- Security-focused tests for approval system
---
## Appendix A: File Organization
```
tools/
├── registry.py # Tool registration & dispatch
├── __init__.py # Package exports
├── file_tools.py # read_file, write_file, patch, search_files
├── file_operations.py # ShellFileOperations backend
├── terminal_tool.py # Main terminal execution (1,358 lines)
├── process_registry.py # Background process management
├── web_tools.py # web_search, web_extract, web_crawl (1,843 lines)
├── browser_tool.py # Browser automation (1,955 lines)
├── browser_providers/ # Browserbase, BrowserUse providers
├── approval.py # Dangerous command detection (670 lines)
├── tirith_security.py # External security scanner (670 lines)
├── environments/ # Execution backends
│ ├── base.py # BaseEnvironment ABC
│ ├── local.py # Local subprocess (486 lines)
│ ├── docker.py # Docker containers (535 lines)
│ ├── modal.py # Modal cloud (372 lines)
│ ├── ssh.py # SSH remote (307 lines)
│ ├── singularity.py # Singularity/Apptainer
│ ├── daytona.py # Daytona workspaces
│ └── persistent_shell.py # Shared persistent shell mixin
├── code_execution_tool.py # Programmatic tool calling (806 lines)
├── delegate_tool.py # Subagent spawning (794 lines)
├── skills_tool.py # Skill management (1,344 lines)
├── skill_manager_tool.py # Skill CRUD operations
└── [20+ additional tools...]
toolsets.py # Toolset definitions (641 lines)
```
---
*Report generated from comprehensive analysis of the Hermes agent tool system.*