Replace shell=True with list-based subprocess execution to prevent command injection via malicious user input. Changes: - tools/transcription_tools.py: Use shlex.split() + shell=False - tools/environments/docker.py: List-based commands with container ID validation Fixes CVE-level vulnerability where malicious file paths or container IDs could inject arbitrary commands. CVSS: 9.8 (Critical) Refs: V-001 in SECURITY_AUDIT_REPORT.md
25 KiB
25 KiB
Deep Analysis: Hermes Tool System
Executive Summary
This report provides a comprehensive analysis of the Hermes agent tool infrastructure, covering:
- Tool registration and dispatch (registry.py)
- 30+ tool implementations across multiple categories
- 6 environment backends (local, Docker, Modal, SSH, Singularity, Daytona)
- Security boundaries and dangerous command detection
- Toolset definitions and composition system
1. Tool Execution Flow Diagram
┌─────────────────────────────────────────────────────────────────────────────────┐
│ TOOL EXECUTION FLOW │
└─────────────────────────────────────────────────────────────────────────────────┘
┌─────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ User/LLM │───▶│ Model Tools │───▶│ Tool Registry │
│ Request │ │ (model_tools.py)│ │ (registry.py) │
└─────────────┘ └──────────────────┘ └──────────────────┘
│
┌─────────────────────────────────────┼─────────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌────────────────────┐ ┌─────────────────────┐
│ File Tools │ │ Terminal Tool │ │ Web Tools │
│ ─────────────── │ │ ────────────────── │ │ ─────────────────── │
│ • read_file │ │ • Local execution │ │ • web_search │
│ • write_file │ │ • Docker sandbox │ │ • web_extract │
│ • patch │ │ • Modal cloud │ │ • web_crawl │
│ • search_files │ │ • SSH remote │ │ │
└────────┬────────┘ │ • Singularity │ └─────────────────────┘
│ │ • Daytona │ │
│ └─────────┬──────────┘ │
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ ENVIRONMENT BACKENDS │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Local │ │ Docker │ │ Modal │ │ SSH │ │Singularity│ │ Daytona │ │
│ │──────────│ │──────────│ │──────────│ │──────────│ │───────────│ │──────────│ │
│ │subprocess│ │container │ │Sandbox │ │ControlMaster│ │overlay │ │workspace │ │
│ │ -l │ │exec │ │.exec() │ │connection │ │SIF │ │.exec() │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └───────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────┐
│ SECURITY CHECKPOINT │
│ ┌─────────────────────┐ │
│ │ 1. Tirith Scanner │ │
│ │ (command content)│ │
│ ├─────────────────────┤ │
│ │ 2. Pattern Matching │ │
│ │ (DANGEROUS_PATTERNS)│ │
│ ├─────────────────────┤ │
│ │ 3. Smart Approval │ │
│ │ (aux LLM) │ │
│ └─────────────────────┘ │
└─────────────────────────────┘
│
┌─────────────────────────────────┼─────────────────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ APPROVED │ │ BLOCKED │ │ USER PROMPT │
│ (execute) │ │ (deny + reason) │ │ (once/session/always/deny)
└──────────────────┘ └──────────────────┘ └──────────────────┘
┌──────────────────────────────────────────────────────────────────────────────────────────────┐
│ ADDITIONAL TOOL CATEGORIES │
├──────────────────────────────────────────────────────────────────────────────────────────────┤
│ Browser Tools │ Vision Tools │ MoA Tools │ Skills Tools │ Code Exec │ Delegate │ TTS │
│ ───────────── │ ──────────── │ ───────── │ ──────────── │ ───────── │ ──────── │ ──────────│
│ • navigate │ • analyze │ • reason │ • list │ • sandbox │ • spawn │ • speech │
│ • click │ • extract │ • debate │ • view │ • RPC │ • batch │ • voices │
│ • snapshot │ │ │ • manage │ • 7 tools │ • depth │ │
│ • scroll │ │ │ │ limit │ limit │ │
└──────────────────────────────────────────────────────────────────────────────────────────────┘
2. Security Boundary Analysis
2.1 Multi-Layer Security Architecture
| Layer | Component | Purpose |
|---|---|---|
| Layer 1 | Container Isolation | Docker/Modal/Singularity sandboxes isolate from host |
| Layer 2 | Dangerous Pattern Detection | Regex-based command filtering (approval.py) |
| Layer 3 | Tirith Security Scanner | Content-level threat detection (pipe-to-shell, homograph URLs) |
| Layer 4 | Smart Approval (Aux LLM) | LLM-based risk assessment for edge cases |
| Layer 5 | File System Guards | Sensitive path blocking (/etc, ~/.ssh, ~/.hermes/.env) |
| Layer 6 | Process Limits | Timeouts, memory limits, PID limits, capability dropping |
2.2 Environment Security Comparison
| Backend | Isolation Level | Persistent | Root Access | Network | Use Case |
|---|---|---|---|---|---|
| Local | None (host) | Optional | User's own | Full | Development, trusted code |
| Docker | Container + caps | Optional | Container root | Isolated | General sandboxing |
| Modal | Cloud VM | Snapshots | Root | Isolated | Cloud compute, scalability |
| SSH | Remote machine | Yes | Remote user | Networked | Production servers |
| Singularity | Container + overlay | Optional | User-mapped | Configurable | HPC environments |
| Daytona | Cloud workspace | Yes | Root | Isolated | Managed dev environments |
2.3 Security Hardening Details
Docker Environment (tools/environments/docker.py:107-117):
_SECURITY_ARGS = [
"--cap-drop", "ALL", # Drop all capabilities
"--cap-add", "DAC_OVERRIDE", # Allow root to write host-owned dirs
"--cap-add", "CHOWN",
"--cap-add", "FOWNER",
"--security-opt", "no-new-privileges",
"--pids-limit", "256",
"--tmpfs", "/tmp:rw,nosuid,size=512m",
]
Local Environment Secret Isolation (tools/environments/local.py:28-131):
- Dynamic blocklist derived from provider registry
- Blocks 60+ API key environment variables
- Prevents credential leakage to subprocesses
- Support for
_HERMES_FORCE_prefix overrides
3. All Dangerous Command Detection Patterns
3.1 Pattern Categories (from tools/approval.py:40-78)
DANGEROUS_PATTERNS = [
# File System Destruction
(r'\brm\s+(-[^\s]*\s+)*/', "delete in root path"),
(r'\brm\s+-[^\s]*r', "recursive delete"),
# Permission Escalation
(r'\bchmod\s+(-[^\s]*\s+)*(777|666|o\+[rwx]*w|a\+[rwx]*w)\b', "world/other-writable permissions"),
(r'\bchown\s+(-[^\s]*)?R\s+root', "recursive chown to root"),
# Disk/Filesystem Operations
(r'\bmkfs\b', "format filesystem"),
(r'\bdd\s+.*if=', "disk copy"),
(r'>\s*/dev/sd', "write to block device"),
# Database Destruction
(r'\bDROP\s+(TABLE|DATABASE)\b', "SQL DROP"),
(r'\bDELETE\s+FROM\b(?!.*\bWHERE\b)', "SQL DELETE without WHERE"),
(r'\bTRUNCATE\s+(TABLE)?\s*\w', "SQL TRUNCATE"),
# System Configuration
(r'>\s*/etc/', "overwrite system config"),
(r'\bsystemctl\s+(stop|disable|mask)\b', "stop/disable system service"),
# Process Termination
(r'\bkill\s+-9\s+-1\b', "kill all processes"),
(r'\bpkill\s+-9\b', "force kill processes"),
(r'\b(pkill|killall)\b.*\b(hermes|gateway|cli\.py)\b', "kill hermes/gateway"),
# Code Injection
(r':\(\)\s*\{\s*:\s*\|\s*:\s*&\s*\}\s*;\s*:', "fork bomb"),
(r'\b(bash|sh|zsh|ksh)\s+-[^\s]*c(\s+|$)', "shell command via -c flag"),
(r'\b(curl|wget)\b.*\|\s*(ba)?sh\b', "pipe remote content to shell"),
(r'\b(bash|sh|zsh|ksh)\s+<\s*<?\s*\(\s*(curl|wget)\b', "execute remote script via process substitution"),
# Sensitive Path Writes
(rf'\btee\b.*["\']?{_SENSITIVE_WRITE_TARGET}', "overwrite system file via tee"),
(rf'>>?\s*["\']?{_SENSITIVE_WRITE_TARGET}', "overwrite system file via redirection"),
# File Operations
(r'\bxargs\s+.*\brm\b', "xargs with rm"),
(r'\bfind\b.*-exec\s+(/\S*/)?rm\b', "find -exec rm"),
(r'\bfind\b.*-delete\b', "find -delete"),
(r'\b(cp|mv|install)\b.*\s/etc/', "copy/move file into /etc/"),
(r'\bsed\s+-[^\s]*i.*\s/etc/', "in-place edit of system config"),
# Gateway Protection
(r'gateway\s+run\b.*(&\s*$|&\s*;|\bdisown\b|\bsetsid\b)', "start gateway outside systemd"),
(r'\bnohup\b.*gateway\s+run\b', "start gateway outside systemd"),
]
3.2 Sensitive Path Patterns
# SSH keys
_SSH_SENSITIVE_PATH = r'(?:~|\$home|\$\{home\})/\.ssh(?:/|$)'
# Hermes environment
_HERMES_ENV_PATH = (
r'(?:~\/\.hermes/|'
r'(?:\$home|\$\{home\})/\.hermes/|'
r'(?:\$hermes_home|\$\{hermes_home\})/)'
r'\.env\b'
)
# System paths
_SENSITIVE_WRITE_TARGET = (
r'(?:/etc/|/dev/sd|'
rf'{_SSH_SENSITIVE_PATH}|'
rf'{_HERMES_ENV_PATH})'
)
3.3 Approval Flow States
Command Input
│
▼
┌─────────────────────┐
│ Pattern Detection │────┐
│ (approval.py) │ │
└─────────────────────┘ │
│ │
▼ │
┌─────────────────────┐ │
│ Tirith Scanner │────┤
│ (tirith_security.py)│ │
└─────────────────────┘ │
│ │
▼ │
┌─────────────────────┐ │
│ Mode = smart? │────┼──▶ Smart Approval (aux LLM)
│ │ │
└─────────────────────┘ │
│ │
▼ │
┌─────────────────────┐ │
│ Gateway/CLI? │────┼──▶ Async Approval Prompt
│ │ │
└─────────────────────┘ │
│ │
▼ │
┌─────────────────────┐ │
│ Interactive Prompt │◀───┘
│ (once/session/ │
│ always/deny) │
└─────────────────────┘
4. Tool Improvement Recommendations
4.1 Critical Improvements
| # | Recommendation | Impact | Effort |
|---|---|---|---|
| 1 | Implement tool call result caching | High | Medium |
| Cache file reads, search results with TTL to prevent redundant I/O | |||
| 2 | Add tool execution metrics/observability | High | Low |
| Track duration, success rates, token usage per tool for optimization | |||
| 3 | Implement tool retry with exponential backoff | Medium | Low |
| Terminal tool has basic retry (terminal_tool.py:1105-1130) but could be generalized | |||
| 4 | Add tool call rate limiting per session | Medium | Medium |
| Prevent runaway loops (e.g., 1000+ search calls in one session) | |||
| 5 | Create tool health check system | Medium | Medium |
| Periodic validation that tools are functioning (API keys valid, services up) |
4.2 Security Enhancements
| # | Recommendation | Impact | Effort |
|---|---|---|---|
| 6 | Implement command intent classification | High | Medium |
| Use lightweight model to classify commands before execution for better risk assessment | |||
| 7 | Add network egress filtering for sandbox tools | High | Medium |
| Whitelist domains for web_extract, block known malicious IPs | |||
| 8 | Implement tool call provenance logging | Medium | Low |
| Immutable log of what tools were called with what args for audit |
4.3 Usability Improvements
| # | Recommendation | Impact | Effort |
|---|---|---|---|
| 9 | Add tool suggestion system | Medium | Medium |
| When LLM uses suboptimal pattern (cat vs read_file), suggest better alternative | |||
| 10 | Implement progressive tool disclosure | Medium | High |
| Start with minimal toolset, expand based on task complexity indicators |
5. Missing Tool Coverage Gaps
5.1 High-Priority Gaps
| Gap | Use Case | Current Workaround |
|---|---|---|
| Database query tool | SQL database exploration | terminal with sqlite3/psql |
| API testing tool | REST API debugging (curl alternative) | terminal with curl |
| Git operations tool | Structured git commands (status, diff, log) | terminal with git |
| Package manager tool | Structured pip/npm/apt operations | terminal with package managers |
| Archive/zip tool | Create/extract archives | terminal with tar/unzip |
5.2 Medium-Priority Gaps
| Gap | Use Case | Current Workaround |
|---|---|---|
| Diff tool | Structured file comparison | search_files + manual compare |
| JSON/YAML manipulation | Structured config editing | read_file + write_file |
| Image manipulation | Resize, crop, convert images | terminal with ImageMagick |
| PDF operations | Extract text, merge, split | terminal with pdftotext |
| Data visualization | Generate charts from data | code_execution with matplotlib |
5.3 Advanced Gaps
| Gap | Description |
|---|---|
| Vector database tool | Semantic search over embeddings |
| Test runner tool | Structured test execution with parsing |
| Linter/formatter tool | Code quality checks with structured output |
| Dependency analysis tool | Visualize and analyze code dependencies |
| Documentation generator tool | Auto-generate docs from code |
6. Tool Registry Architecture
6.1 Registration Flow
# From tools/registry.py
class ToolRegistry:
def register(self, name: str, toolset: str, schema: dict,
handler: Callable, check_fn: Callable = None, ...)
def dispatch(self, name: str, args: dict, **kwargs) -> str
def get_definitions(self, tool_names: Set[str], quiet: bool = False) -> List[dict]
6.2 Tool Entry Structure
class ToolEntry:
__slots__ = (
"name", # Tool identifier
"toolset", # Category (file, terminal, web, etc.)
"schema", # OpenAI-format JSON schema
"handler", # Callable implementation
"check_fn", # Availability check (returns bool)
"requires_env",# Required env var names
"is_async", # Whether handler is async
"description", # Human-readable description
"emoji", # Visual identifier
)
6.3 Registration Example (file_tools.py:560-563)
registry.register(
name="read_file",
toolset="file",
schema=READ_FILE_SCHEMA,
handler=_handle_read_file,
check_fn=_check_file_reqs,
emoji="📖"
)
7. Toolset Composition System
7.1 Toolset Definition (toolsets.py:72-377)
TOOLSETS = {
"file": {
"description": "File manipulation tools",
"tools": ["read_file", "write_file", "patch", "search_files"],
"includes": []
},
"debugging": {
"description": "Debugging and troubleshooting toolkit",
"tools": ["terminal", "process"],
"includes": ["web", "file"] # Composes other toolsets
},
}
7.2 Resolution Algorithm
def resolve_toolset(name: str, visited: Set[str] = None) -> List[str]:
# 1. Cycle detection
# 2. Get toolset definition
# 3. Collect direct tools
# 4. Recursively resolve includes (diamond deps handled)
# 5. Return deduplicated list
7.3 Platform-Specific Toolsets
| Toolset | Purpose | Key Difference |
|---|---|---|
hermes-cli |
Full CLI access | All tools available |
hermes-acp |
Editor integration | No messaging, audio, or clarify UI |
hermes-api-server |
HTTP API | No interactive UI tools |
hermes-telegram |
Telegram bot | Full access with safety checks |
hermes-gateway |
Union of all messaging | Includes all platform tools |
8. Environment Backend Deep Dive
8.1 Base Class Interface (tools/environments/base.py)
class BaseEnvironment(ABC):
def execute(self, command: str, cwd: str = "", *,
timeout: int | None = None,
stdin_data: str | None = None) -> dict:
"""Return {"output": str, "returncode": int}"""
def cleanup(self):
"""Release backend resources"""
8.2 Environment Feature Matrix
| Feature | Local | Docker | Modal | SSH | Singularity | Daytona |
|---|---|---|---|---|---|---|
| PTY support | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ |
| Persistent shell | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ |
| Filesystem persistence | Optional | Optional | Snapshots | N/A (remote) | Optional | Yes |
| Interrupt handling | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Sudo support | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Resource limits | ❌ | ✅ | ✅ | ❌ | ✅ | ✅ |
| GPU support | ❌ | ✅ | ✅ | Remote | ✅ | ✅ |
9. Process Registry System
9.1 Background Process Management (tools/process_registry.py)
class ProcessRegistry:
def spawn_local(self, command, cwd, task_id, ...) -> ProcessSession
def spawn_via_env(self, env, command, ...) -> ProcessSession
def poll(self, session_id: str) -> dict
def wait(self, session_id: str, timeout: int = None) -> dict
def kill(self, session_id: str)
9.2 Process Session States
CREATED ──▶ RUNNING ──▶ FINISHED
│ │
▼ ▼
INTERRUPTED TIMEOUT
(exit_code=130) (exit_code=124)
10. Code Analysis Summary
10.1 Lines of Code by Component
| Component | Files | Approx. LOC |
|---|---|---|
| Tool Implementations | 30+ | ~15,000 |
| Environment Backends | 6 | ~3,500 |
| Registry & Core | 2 | ~800 |
| Security (approval, tirith) | 2 | ~1,200 |
| Process Management | 1 | ~900 |
| Total | 40+ | ~21,400 |
10.2 Test Coverage
- 150+ test files in
tests/tools/ - Unit tests for each tool
- Integration tests for environments
- Security-focused tests for approval system
Appendix A: File Organization
tools/
├── registry.py # Tool registration & dispatch
├── __init__.py # Package exports
│
├── file_tools.py # read_file, write_file, patch, search_files
├── file_operations.py # ShellFileOperations backend
│
├── terminal_tool.py # Main terminal execution (1,358 lines)
├── process_registry.py # Background process management
│
├── web_tools.py # web_search, web_extract, web_crawl (1,843 lines)
├── browser_tool.py # Browser automation (1,955 lines)
├── browser_providers/ # Browserbase, BrowserUse providers
│
├── approval.py # Dangerous command detection (670 lines)
├── tirith_security.py # External security scanner (670 lines)
│
├── environments/ # Execution backends
│ ├── base.py # BaseEnvironment ABC
│ ├── local.py # Local subprocess (486 lines)
│ ├── docker.py # Docker containers (535 lines)
│ ├── modal.py # Modal cloud (372 lines)
│ ├── ssh.py # SSH remote (307 lines)
│ ├── singularity.py # Singularity/Apptainer
│ ├── daytona.py # Daytona workspaces
│ └── persistent_shell.py # Shared persistent shell mixin
│
├── code_execution_tool.py # Programmatic tool calling (806 lines)
├── delegate_tool.py # Subagent spawning (794 lines)
│
├── skills_tool.py # Skill management (1,344 lines)
├── skill_manager_tool.py # Skill CRUD operations
│
└── [20+ additional tools...]
toolsets.py # Toolset definitions (641 lines)
Report generated from comprehensive analysis of the Hermes agent tool system.