Timmy_Foundation/hermes-agent

Fork 0

Files

Allegro 10271c6b44

Supply Chain Audit / Scan PR for supply chain risks (pull_request) Failing after 25s

Details

Tests / test (pull_request) Failing after 24s

Details

Docker Build and Publish / build-and-push (pull_request) Failing after 35s

Details

security: fix command injection vulnerabilities (CVSS 9.8)

Replace shell=True with list-based subprocess execution to prevent
command injection via malicious user input.

Changes:
- tools/transcription_tools.py: Use shlex.split() + shell=False
- tools/environments/docker.py: List-based commands with container ID validation

Fixes CVE-level vulnerability where malicious file paths or container IDs
could inject arbitrary commands.

CVSS: 9.8 (Critical)
Refs: V-001 in SECURITY_AUDIT_REPORT.md

2026-03-30 23:15:11 +00:00

25 KiB

Raw Permalink Blame History

Deep Analysis: Hermes Tool System

Executive Summary

This report provides a comprehensive analysis of the Hermes agent tool infrastructure, covering:

Tool registration and dispatch (registry.py)
30+ tool implementations across multiple categories
6 environment backends (local, Docker, Modal, SSH, Singularity, Daytona)
Security boundaries and dangerous command detection
Toolset definitions and composition system

1. Tool Execution Flow Diagram

┌─────────────────────────────────────────────────────────────────────────────────┐
│                              TOOL EXECUTION FLOW                                 │
└─────────────────────────────────────────────────────────────────────────────────┘

┌─────────────┐    ┌──────────────────┐    ┌──────────────────┐
│   User/LLM  │───▶│  Model Tools     │───▶│ Tool Registry    │
│  Request    │    │  (model_tools.py)│    │ (registry.py)    │
└─────────────┘    └──────────────────┘    └──────────────────┘
                                                    │
              ┌─────────────────────────────────────┼─────────────────────────────────────┐
              │                                     │                                     │
              ▼                                     ▼                                     ▼
     ┌─────────────────┐              ┌────────────────────┐              ┌─────────────────────┐
     │ File Tools      │              │ Terminal Tool      │              │ Web Tools           │
     │ ─────────────── │              │ ────────────────── │              │ ─────────────────── │
     │ • read_file     │              │ • Local execution  │              │ • web_search        │
     │ • write_file    │              │ • Docker sandbox   │              │ • web_extract       │
     │ • patch         │              │ • Modal cloud      │              │ • web_crawl         │
     │ • search_files  │              │ • SSH remote       │              │                     │
     └────────┬────────┘              │ • Singularity      │              └─────────────────────┘
              │                       │ • Daytona          │                       │
              │                       └─────────┬──────────┘                       │
              │                                 │                                  │
              ▼                                 ▼                                  ▼
     ┌─────────────────────────────────────────────────────────────────────────────────────────┐
     │                              ENVIRONMENT BACKENDS                                       │
     │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
     │  │  Local   │  │  Docker  │  │  Modal   │  │   SSH    │  │Singularity│ │ Daytona  │   │
     │  │──────────│  │──────────│  │──────────│  │──────────│  │───────────│  │──────────│   │
     │  │subprocess│  │container │  │Sandbox   │  │ControlMaster│ │overlay   │  │workspace │   │
     │  │   -l     │  │exec      │  │.exec()   │  │connection │  │SIF       │  │.exec()   │   │
     │  └──────────┘  └──────────┘  └──────────┘  └──────────┘  └───────────┘  └──────────┘   │
     └─────────────────────────────────────────────────────────────────────────────────────────┘
                                              │
                                              ▼
                              ┌─────────────────────────────┐
                              │    SECURITY CHECKPOINT      │
                              │  ┌─────────────────────┐    │
                              │  │ 1. Tirith Scanner   │    │
                              │  │    (command content)│    │
                              │  ├─────────────────────┤    │
                              │  │ 2. Pattern Matching │    │
                              │  │    (DANGEROUS_PATTERNS)│   │
                              │  ├─────────────────────┤    │
                              │  │ 3. Smart Approval   │    │
                              │  │    (aux LLM)        │    │
                              │  └─────────────────────┘    │
                              └─────────────────────────────┘
                                              │
              ┌─────────────────────────────────┼─────────────────────────────────┐
              │                                 │                                 │
              ▼                                 ▼                                 ▼
     ┌──────────────────┐           ┌──────────────────┐               ┌──────────────────┐
     │  APPROVED        │           │  BLOCKED         │               │  USER PROMPT     │
     │  (execute)       │           │  (deny + reason) │               │  (once/session/always/deny)
     └──────────────────┘           └──────────────────┘               └──────────────────┘

┌──────────────────────────────────────────────────────────────────────────────────────────────┐
│                              ADDITIONAL TOOL CATEGORIES                                      │
├──────────────────────────────────────────────────────────────────────────────────────────────┤
│  Browser Tools │ Vision Tools │ MoA Tools │ Skills Tools │ Code Exec │ Delegate │ TTS       │
│  ───────────── │ ──────────── │ ───────── │ ──────────── │ ───────── │ ──────── │ ──────────│
│  • navigate    │ • analyze    │ • reason  │ • list       │ • sandbox │ • spawn  │ • speech  │
│  • click       │ • extract    │ • debate  │ • view       │ • RPC     │ • batch  │ • voices  │
│  • snapshot    │              │           │ • manage     │ • 7 tools │ • depth  │           │
│  • scroll      │              │           │              │   limit   │   limit  │           │
└──────────────────────────────────────────────────────────────────────────────────────────────┘

2. Security Boundary Analysis

2.1 Multi-Layer Security Architecture

Layer	Component	Purpose
Layer 1	Container Isolation	Docker/Modal/Singularity sandboxes isolate from host
Layer 2	Dangerous Pattern Detection	Regex-based command filtering (approval.py)
Layer 3	Tirith Security Scanner	Content-level threat detection (pipe-to-shell, homograph URLs)
Layer 4	Smart Approval (Aux LLM)	LLM-based risk assessment for edge cases
Layer 5	File System Guards	Sensitive path blocking (/etc, ~/.ssh, ~/.hermes/.env)
Layer 6	Process Limits	Timeouts, memory limits, PID limits, capability dropping

2.2 Environment Security Comparison

Backend	Isolation Level	Persistent	Root Access	Network	Use Case
Local	None (host)	Optional	User's own	Full	Development, trusted code
Docker	Container + caps	Optional	Container root	Isolated	General sandboxing
Modal	Cloud VM	Snapshots	Root	Isolated	Cloud compute, scalability
SSH	Remote machine	Yes	Remote user	Networked	Production servers
Singularity	Container + overlay	Optional	User-mapped	Configurable	HPC environments
Daytona	Cloud workspace	Yes	Root	Isolated	Managed dev environments

2.3 Security Hardening Details

Docker Environment (tools/environments/docker.py:107-117):

_SECURITY_ARGS = [
    "--cap-drop", "ALL",          # Drop all capabilities
    "--cap-add", "DAC_OVERRIDE",   # Allow root to write host-owned dirs
    "--cap-add", "CHOWN",
    "--cap-add", "FOWNER",
    "--security-opt", "no-new-privileges",
    "--pids-limit", "256",
    "--tmpfs", "/tmp:rw,nosuid,size=512m",
]

Local Environment Secret Isolation (tools/environments/local.py:28-131):

Dynamic blocklist derived from provider registry
Blocks 60+ API key environment variables
Prevents credential leakage to subprocesses
Support for _HERMES_FORCE_ prefix overrides

3. All Dangerous Command Detection Patterns

3.1 Pattern Categories (from tools/approval.py:40-78)

DANGEROUS_PATTERNS = [
    # File System Destruction
    (r'\brm\s+(-[^\s]*\s+)*/', "delete in root path"),
    (r'\brm\s+-[^\s]*r', "recursive delete"),
    
    # Permission Escalation
    (r'\bchmod\s+(-[^\s]*\s+)*(777|666|o\+[rwx]*w|a\+[rwx]*w)\b', "world/other-writable permissions"),
    (r'\bchown\s+(-[^\s]*)?R\s+root', "recursive chown to root"),
    
    # Disk/Filesystem Operations
    (r'\bmkfs\b', "format filesystem"),
    (r'\bdd\s+.*if=', "disk copy"),
    (r'>\s*/dev/sd', "write to block device"),
    
    # Database Destruction
    (r'\bDROP\s+(TABLE|DATABASE)\b', "SQL DROP"),
    (r'\bDELETE\s+FROM\b(?!.*\bWHERE\b)', "SQL DELETE without WHERE"),
    (r'\bTRUNCATE\s+(TABLE)?\s*\w', "SQL TRUNCATE"),
    
    # System Configuration
    (r'>\s*/etc/', "overwrite system config"),
    (r'\bsystemctl\s+(stop|disable|mask)\b', "stop/disable system service"),
    
    # Process Termination
    (r'\bkill\s+-9\s+-1\b', "kill all processes"),
    (r'\bpkill\s+-9\b', "force kill processes"),
    (r'\b(pkill|killall)\b.*\b(hermes|gateway|cli\.py)\b', "kill hermes/gateway"),
    
    # Code Injection
    (r':\(\)\s*\{\s*:\s*\|\s*:\s*&\s*\}\s*;\s*:', "fork bomb"),
    (r'\b(bash|sh|zsh|ksh)\s+-[^\s]*c(\s+|$)', "shell command via -c flag"),
    (r'\b(curl|wget)\b.*\|\s*(ba)?sh\b', "pipe remote content to shell"),
    (r'\b(bash|sh|zsh|ksh)\s+<\s*<?\s*\(\s*(curl|wget)\b', "execute remote script via process substitution"),
    
    # Sensitive Path Writes
    (rf'\btee\b.*["\']?{_SENSITIVE_WRITE_TARGET}', "overwrite system file via tee"),
    (rf'>>?\s*["\']?{_SENSITIVE_WRITE_TARGET}', "overwrite system file via redirection"),
    
    # File Operations
    (r'\bxargs\s+.*\brm\b', "xargs with rm"),
    (r'\bfind\b.*-exec\s+(/\S*/)?rm\b', "find -exec rm"),
    (r'\bfind\b.*-delete\b', "find -delete"),
    (r'\b(cp|mv|install)\b.*\s/etc/', "copy/move file into /etc/"),
    (r'\bsed\s+-[^\s]*i.*\s/etc/', "in-place edit of system config"),
    
    # Gateway Protection
    (r'gateway\s+run\b.*(&\s*$|&\s*;|\bdisown\b|\bsetsid\b)', "start gateway outside systemd"),
    (r'\bnohup\b.*gateway\s+run\b', "start gateway outside systemd"),
]

3.2 Sensitive Path Patterns

# SSH keys
_SSH_SENSITIVE_PATH = r'(?:~|\$home|\$\{home\})/\.ssh(?:/|$)'

# Hermes environment
_HERMES_ENV_PATH = (
    r'(?:~\/\.hermes/|'
    r'(?:\$home|\$\{home\})/\.hermes/|'
    r'(?:\$hermes_home|\$\{hermes_home\})/)'
    r'\.env\b'
)

# System paths
_SENSITIVE_WRITE_TARGET = (
    r'(?:/etc/|/dev/sd|'
    rf'{_SSH_SENSITIVE_PATH}|'
    rf'{_HERMES_ENV_PATH})'
)

3.3 Approval Flow States

Command Input
      │
      ▼
┌─────────────────────┐
│ Pattern Detection   │────┐
│ (approval.py)       │    │
└─────────────────────┘    │
      │                    │
      ▼                    │
┌─────────────────────┐    │
│ Tirith Scanner      │────┤
│ (tirith_security.py)│    │
└─────────────────────┘    │
      │                    │
      ▼                    │
┌─────────────────────┐    │
│ Mode = smart?       │────┼──▶ Smart Approval (aux LLM)
│                     │    │
└─────────────────────┘    │
      │                    │
      ▼                    │
┌─────────────────────┐    │
│ Gateway/CLI?        │────┼──▶ Async Approval Prompt
│                     │    │
└─────────────────────┘    │
      │                    │
      ▼                    │
┌─────────────────────┐    │
│ Interactive Prompt  │◀───┘
│ (once/session/     │
│  always/deny)      │
└─────────────────────┘

4. Tool Improvement Recommendations

4.1 Critical Improvements

#	Recommendation	Impact	Effort
1	Implement tool call result caching	High	Medium
	Cache file reads, search results with TTL to prevent redundant I/O
2	Add tool execution metrics/observability	High	Low
	Track duration, success rates, token usage per tool for optimization
3	Implement tool retry with exponential backoff	Medium	Low
	Terminal tool has basic retry (terminal_tool.py:1105-1130) but could be generalized
4	Add tool call rate limiting per session	Medium	Medium
	Prevent runaway loops (e.g., 1000+ search calls in one session)
5	Create tool health check system	Medium	Medium
	Periodic validation that tools are functioning (API keys valid, services up)

4.2 Security Enhancements

#	Recommendation	Impact	Effort
6	Implement command intent classification	High	Medium
	Use lightweight model to classify commands before execution for better risk assessment
7	Add network egress filtering for sandbox tools	High	Medium
	Whitelist domains for web_extract, block known malicious IPs
8	Implement tool call provenance logging	Medium	Low
	Immutable log of what tools were called with what args for audit

4.3 Usability Improvements

#	Recommendation	Impact	Effort
9	Add tool suggestion system	Medium	Medium
	When LLM uses suboptimal pattern (cat vs read_file), suggest better alternative
10	Implement progressive tool disclosure	Medium	High
	Start with minimal toolset, expand based on task complexity indicators

5. Missing Tool Coverage Gaps

5.1 High-Priority Gaps

Gap	Use Case	Current Workaround
Database query tool	SQL database exploration	terminal with sqlite3/psql
API testing tool	REST API debugging (curl alternative)	terminal with curl
Git operations tool	Structured git commands (status, diff, log)	terminal with git
Package manager tool	Structured pip/npm/apt operations	terminal with package managers
Archive/zip tool	Create/extract archives	terminal with tar/unzip

5.2 Medium-Priority Gaps

Gap	Use Case	Current Workaround
Diff tool	Structured file comparison	search_files + manual compare
JSON/YAML manipulation	Structured config editing	read_file + write_file
Image manipulation	Resize, crop, convert images	terminal with ImageMagick
PDF operations	Extract text, merge, split	terminal with pdftotext
Data visualization	Generate charts from data	code_execution with matplotlib

5.3 Advanced Gaps

Gap	Description
Vector database tool	Semantic search over embeddings
Test runner tool	Structured test execution with parsing
Linter/formatter tool	Code quality checks with structured output
Dependency analysis tool	Visualize and analyze code dependencies
Documentation generator tool	Auto-generate docs from code

6. Tool Registry Architecture

6.1 Registration Flow

# From tools/registry.py
class ToolRegistry:
    def register(self, name: str, toolset: str, schema: dict, 
                 handler: Callable, check_fn: Callable = None, ...)
    
    def dispatch(self, name: str, args: dict, **kwargs) -> str
    
    def get_definitions(self, tool_names: Set[str], quiet: bool = False) -> List[dict]

6.2 Tool Entry Structure

class ToolEntry:
    __slots__ = (
        "name",        # Tool identifier
        "toolset",     # Category (file, terminal, web, etc.)
        "schema",      # OpenAI-format JSON schema
        "handler",     # Callable implementation
        "check_fn",    # Availability check (returns bool)
        "requires_env",# Required env var names
        "is_async",    # Whether handler is async
        "description", # Human-readable description
        "emoji",       # Visual identifier
    )

6.3 Registration Example (file_tools.py:560-563)

registry.register(
    name="read_file",
    toolset="file",
    schema=READ_FILE_SCHEMA,
    handler=_handle_read_file,
    check_fn=_check_file_reqs,
    emoji="📖"
)

7. Toolset Composition System

7.1 Toolset Definition (toolsets.py:72-377)

TOOLSETS = {
    "file": {
        "description": "File manipulation tools",
        "tools": ["read_file", "write_file", "patch", "search_files"],
        "includes": []
    },
    "debugging": {
        "description": "Debugging and troubleshooting toolkit",
        "tools": ["terminal", "process"],
        "includes": ["web", "file"]  # Composes other toolsets
    },
}

7.2 Resolution Algorithm

def resolve_toolset(name: str, visited: Set[str] = None) -> List[str]:
    # 1. Cycle detection
    # 2. Get toolset definition
    # 3. Collect direct tools
    # 4. Recursively resolve includes (diamond deps handled)
    # 5. Return deduplicated list

7.3 Platform-Specific Toolsets

Toolset	Purpose	Key Difference
`hermes-cli`	Full CLI access	All tools available
`hermes-acp`	Editor integration	No messaging, audio, or clarify UI
`hermes-api-server`	HTTP API	No interactive UI tools
`hermes-telegram`	Telegram bot	Full access with safety checks
`hermes-gateway`	Union of all messaging	Includes all platform tools

8. Environment Backend Deep Dive

8.1 Base Class Interface (tools/environments/base.py)

class BaseEnvironment(ABC):
    def execute(self, command: str, cwd: str = "", *,
                timeout: int | None = None,
                stdin_data: str | None = None) -> dict:
        """Return {"output": str, "returncode": int}"""
    
    def cleanup(self):
        """Release backend resources"""

8.2 Environment Feature Matrix

Feature	Local	Docker	Modal	SSH	Singularity	Daytona
PTY support	✅	❌	❌	✅	❌	❌
Persistent shell	✅	❌	❌	✅	❌	❌
Filesystem persistence	Optional	Optional	Snapshots	N/A (remote)	Optional	Yes
Interrupt handling	✅	✅	✅	✅	✅	✅
Sudo support	✅	✅	✅	✅	✅	✅
Resource limits	❌	✅	✅	❌	✅	✅
GPU support	❌	✅	✅	Remote	✅	✅

9. Process Registry System

9.1 Background Process Management (tools/process_registry.py)

class ProcessRegistry:
    def spawn_local(self, command, cwd, task_id, ...) -> ProcessSession
    def spawn_via_env(self, env, command, ...) -> ProcessSession
    def poll(self, session_id: str) -> dict
    def wait(self, session_id: str, timeout: int = None) -> dict
    def kill(self, session_id: str)

9.2 Process Session States

CREATED ──▶ RUNNING ──▶ FINISHED
               │            │
               ▼            ▼
          INTERRUPTED   TIMEOUT
          (exit_code=130) (exit_code=124)

10. Code Analysis Summary

10.1 Lines of Code by Component

Component	Files	Approx. LOC
Tool Implementations	30+	~15,000
Environment Backends	6	~3,500
Registry & Core	2	~800
Security (approval, tirith)	2	~1,200
Process Management	1	~900
Total	40+	~21,400

10.2 Test Coverage

150+ test files in tests/tools/
Unit tests for each tool
Integration tests for environments
Security-focused tests for approval system

Appendix A: File Organization

tools/
├── registry.py              # Tool registration & dispatch
├── __init__.py              # Package exports
│
├── file_tools.py            # read_file, write_file, patch, search_files
├── file_operations.py       # ShellFileOperations backend
│
├── terminal_tool.py         # Main terminal execution (1,358 lines)
├── process_registry.py      # Background process management
│
├── web_tools.py             # web_search, web_extract, web_crawl (1,843 lines)
├── browser_tool.py          # Browser automation (1,955 lines)
├── browser_providers/       # Browserbase, BrowserUse providers
│
├── approval.py              # Dangerous command detection (670 lines)
├── tirith_security.py       # External security scanner (670 lines)
│
├── environments/            # Execution backends
│   ├── base.py              # BaseEnvironment ABC
│   ├── local.py             # Local subprocess (486 lines)
│   ├── docker.py            # Docker containers (535 lines)
│   ├── modal.py             # Modal cloud (372 lines)
│   ├── ssh.py               # SSH remote (307 lines)
│   ├── singularity.py       # Singularity/Apptainer
│   ├── daytona.py           # Daytona workspaces
│   └── persistent_shell.py  # Shared persistent shell mixin
│
├── code_execution_tool.py   # Programmatic tool calling (806 lines)
├── delegate_tool.py         # Subagent spawning (794 lines)
│
├── skills_tool.py           # Skill management (1,344 lines)
├── skill_manager_tool.py    # Skill CRUD operations
│
└── [20+ additional tools...]

toolsets.py                  # Toolset definitions (641 lines)

Report generated from comprehensive analysis of the Hermes agent tool system.

25 KiB Raw Permalink Blame History

Deep Analysis: Hermes Tool System

Executive Summary

1. Tool Execution Flow Diagram

2. Security Boundary Analysis

2.1 Multi-Layer Security Architecture

2.2 Environment Security Comparison

2.3 Security Hardening Details

3. All Dangerous Command Detection Patterns

3.1 Pattern Categories (from tools/approval.py:40-78)

3.2 Sensitive Path Patterns

3.3 Approval Flow States

4. Tool Improvement Recommendations

4.1 Critical Improvements

4.2 Security Enhancements

4.3 Usability Improvements

5. Missing Tool Coverage Gaps

5.1 High-Priority Gaps

5.2 Medium-Priority Gaps

5.3 Advanced Gaps

6. Tool Registry Architecture

6.1 Registration Flow

6.2 Tool Entry Structure

6.3 Registration Example (file_tools.py:560-563)

7. Toolset Composition System

7.1 Toolset Definition (toolsets.py:72-377)

7.2 Resolution Algorithm

7.3 Platform-Specific Toolsets

8. Environment Backend Deep Dive

8.1 Base Class Interface (tools/environments/base.py)

8.2 Environment Feature Matrix

9. Process Registry System

9.1 Background Process Management (tools/process_registry.py)

9.2 Process Session States

10. Code Analysis Summary

10.1 Lines of Code by Component

10.2 Test Coverage

Appendix A: File Organization

25 KiB

Raw Permalink Blame History