New browser capabilities and a built-in skill for agent-driven web QA. ## New tool: browser_console Returns console messages (log/warn/error/info) AND uncaught JavaScript exceptions in a single call. Uses agent-browser's 'console' and 'errors' commands through the existing session plumbing. Supports --clear to reset buffers. Verified working in both local and Browserbase cloud modes. ## Enhanced tool: browser_vision(annotate=True) New boolean parameter on browser_vision. When true, agent-browser overlays numbered [N] labels on interactive elements — each [N] maps to ref @eN. Annotation data (element name, role, bounding box) returned alongside the vision analysis. Useful for QA reports and spatial reasoning. ## Config: browser.record_sessions Auto-record browser sessions as WebM video files when enabled: - Starts recording on first browser_navigate - Stops and saves on browser_close - Saves to ~/.hermes/browser_recordings/ - Works in both local and cloud modes (verified) - Disabled by default ## Built-in skill: dogfood Systematic exploratory QA testing for web applications. Teaches the agent a 5-phase workflow: 1. Plan — accept URL, create output dirs, set scope 2. Explore — systematic crawl with annotated screenshots 3. Collect Evidence — screenshots, console errors, JS exceptions 4. Categorize — severity (Critical/High/Medium/Low) and category (Functional/Visual/Accessibility/Console/UX/Content) 5. Report — structured markdown with per-issue evidence Includes: - skills/dogfood/SKILL.md — full workflow instructions - skills/dogfood/references/issue-taxonomy.md — severity/category defs - skills/dogfood/templates/dogfood-report-template.md — report template ## Tests 21 new tests covering: - browser_console message/error parsing, clear flag, empty/failed states - browser_console schema registration - browser_vision annotate schema and flag passing - record_sessions config defaults and recording lifecycle - Dogfood skill file existence and content validation Addresses #315.
5.5 KiB
sidebar_position, title, description
| sidebar_position | title | description |
|---|---|---|
| 1 | Tools & Toolsets | Overview of Hermes Agent's tools — what's available, how toolsets work, and terminal backends |
Tools & Toolsets
Tools are functions that extend the agent's capabilities. They're organized into logical toolsets that can be enabled or disabled per platform.
Available Tools
| Category | Tools | Description |
|---|---|---|
| Web | web_search, web_extract |
Search the web, extract page content |
| Terminal | terminal, process |
Execute commands (local/docker/singularity/modal/daytona/ssh backends), manage background processes |
| File | read_file, write_file, patch, search_files |
Read, write, edit, and search files |
| Browser | browser_navigate, browser_click, browser_type, browser_console, etc. |
Full browser automation via Browserbase |
| Vision | vision_analyze |
Image analysis via multimodal models |
| Image Gen | image_generate |
Generate images (FLUX via FAL) |
| TTS | text_to_speech |
Text-to-speech (Edge TTS / ElevenLabs / OpenAI) |
| Reasoning | mixture_of_agents |
Multi-model reasoning |
| Skills | skills_list, skill_view, skill_manage |
Find, view, create, and manage skills |
| Todo | todo |
Read/write task list for multi-step planning |
| Memory | memory |
Persistent notes + user profile across sessions |
| Session Search | session_search |
Search + summarize past conversations (FTS5) |
| Cronjob | schedule_cronjob, list_cronjobs, remove_cronjob |
Scheduled task management |
| Code Execution | execute_code |
Run Python scripts that call tools via RPC sandbox |
| Delegation | delegate_task |
Spawn subagents with isolated context |
| Clarify | clarify |
Ask the user multiple-choice or open-ended questions |
| MCP | Auto-discovered | External tools from MCP servers |
Using Toolsets
# Use specific toolsets
hermes chat --toolsets "web,terminal"
# See all available tools
hermes tools
# Configure tools per platform (interactive)
hermes tools
Available toolsets: web, terminal, file, browser, vision, image_gen, moa, skills, tts, todo, memory, session_search, cronjob, code_execution, delegation, clarify, and more.
Terminal Backends
The terminal tool can execute commands in different environments:
| Backend | Description | Use Case |
|---|---|---|
local |
Run on your machine (default) | Development, trusted tasks |
docker |
Isolated containers | Security, reproducibility |
ssh |
Remote server | Sandboxing, keep agent away from its own code |
singularity |
HPC containers | Cluster computing, rootless |
modal |
Cloud execution | Serverless, scale |
Configuration
# In ~/.hermes/config.yaml
terminal:
backend: local # or: docker, ssh, singularity, modal, daytona
cwd: "." # Working directory
timeout: 180 # Command timeout in seconds
Docker Backend
terminal:
backend: docker
docker_image: python:3.11-slim
SSH Backend
Recommended for security — agent can't modify its own code:
terminal:
backend: ssh
# Set credentials in ~/.hermes/.env
TERMINAL_SSH_HOST=my-server.example.com
TERMINAL_SSH_USER=myuser
TERMINAL_SSH_KEY=~/.ssh/id_rsa
Singularity/Apptainer
# Pre-build SIF for parallel workers
apptainer build ~/python.sif docker://python:3.11-slim
# Configure
hermes config set terminal.backend singularity
hermes config set terminal.singularity_image ~/python.sif
Modal (Serverless Cloud)
uv pip install "swe-rex[modal]"
modal setup
hermes config set terminal.backend modal
Container Resources
Configure CPU, memory, disk, and persistence for all container backends:
terminal:
backend: docker # or singularity, modal, daytona
container_cpu: 1 # CPU cores (default: 1)
container_memory: 5120 # Memory in MB (default: 5GB)
container_disk: 51200 # Disk in MB (default: 50GB)
container_persistent: true # Persist filesystem across sessions (default: true)
When container_persistent: true, installed packages, files, and config survive across sessions.
Container Security
All container backends run with security hardening:
- Read-only root filesystem (Docker)
- All Linux capabilities dropped
- No privilege escalation
- PID limits (256 processes)
- Full namespace isolation
- Persistent workspace via volumes, not writable root layer
Background Process Management
Start background processes and manage them:
terminal(command="pytest -v tests/", background=true)
# Returns: {"session_id": "proc_abc123", "pid": 12345}
# Then manage with the process tool:
process(action="list") # Show all running processes
process(action="poll", session_id="proc_abc123") # Check status
process(action="wait", session_id="proc_abc123") # Block until done
process(action="log", session_id="proc_abc123") # Full output
process(action="kill", session_id="proc_abc123") # Terminate
process(action="write", session_id="proc_abc123", data="y") # Send input
PTY mode (pty=true) enables interactive CLI tools like Codex and Claude Code.
Sudo Support
If a command needs sudo, you'll be prompted for your password (cached for the session). Or set SUDO_PASSWORD in ~/.hermes/.env.
:::warning
On messaging platforms, if sudo fails, the output includes a tip to add SUDO_PASSWORD to ~/.hermes/.env.
:::