Files
hermes-agent/website/docs/user-guide/features/tools.md
teknium1 a8bf414f4a feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.

## New tool: browser_console

Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.

## Enhanced tool: browser_vision(annotate=True)

New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.

## Config: browser.record_sessions

Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default

## Built-in skill: dogfood

Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
   (Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence

Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template

## Tests

21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation

Addresses #315.
2026-03-08 21:28:12 -07:00

5.5 KiB

sidebar_position, title, description
sidebar_position title description
1 Tools & Toolsets Overview of Hermes Agent's tools — what's available, how toolsets work, and terminal backends

Tools & Toolsets

Tools are functions that extend the agent's capabilities. They're organized into logical toolsets that can be enabled or disabled per platform.

Available Tools

Category Tools Description
Web web_search, web_extract Search the web, extract page content
Terminal terminal, process Execute commands (local/docker/singularity/modal/daytona/ssh backends), manage background processes
File read_file, write_file, patch, search_files Read, write, edit, and search files
Browser browser_navigate, browser_click, browser_type, browser_console, etc. Full browser automation via Browserbase
Vision vision_analyze Image analysis via multimodal models
Image Gen image_generate Generate images (FLUX via FAL)
TTS text_to_speech Text-to-speech (Edge TTS / ElevenLabs / OpenAI)
Reasoning mixture_of_agents Multi-model reasoning
Skills skills_list, skill_view, skill_manage Find, view, create, and manage skills
Todo todo Read/write task list for multi-step planning
Memory memory Persistent notes + user profile across sessions
Session Search session_search Search + summarize past conversations (FTS5)
Cronjob schedule_cronjob, list_cronjobs, remove_cronjob Scheduled task management
Code Execution execute_code Run Python scripts that call tools via RPC sandbox
Delegation delegate_task Spawn subagents with isolated context
Clarify clarify Ask the user multiple-choice or open-ended questions
MCP Auto-discovered External tools from MCP servers

Using Toolsets

# Use specific toolsets
hermes chat --toolsets "web,terminal"

# See all available tools
hermes tools

# Configure tools per platform (interactive)
hermes tools

Available toolsets: web, terminal, file, browser, vision, image_gen, moa, skills, tts, todo, memory, session_search, cronjob, code_execution, delegation, clarify, and more.

Terminal Backends

The terminal tool can execute commands in different environments:

Backend Description Use Case
local Run on your machine (default) Development, trusted tasks
docker Isolated containers Security, reproducibility
ssh Remote server Sandboxing, keep agent away from its own code
singularity HPC containers Cluster computing, rootless
modal Cloud execution Serverless, scale

Configuration

# In ~/.hermes/config.yaml
terminal:
  backend: local    # or: docker, ssh, singularity, modal, daytona
  cwd: "."          # Working directory
  timeout: 180      # Command timeout in seconds

Docker Backend

terminal:
  backend: docker
  docker_image: python:3.11-slim

SSH Backend

Recommended for security — agent can't modify its own code:

terminal:
  backend: ssh
# Set credentials in ~/.hermes/.env
TERMINAL_SSH_HOST=my-server.example.com
TERMINAL_SSH_USER=myuser
TERMINAL_SSH_KEY=~/.ssh/id_rsa

Singularity/Apptainer

# Pre-build SIF for parallel workers
apptainer build ~/python.sif docker://python:3.11-slim

# Configure
hermes config set terminal.backend singularity
hermes config set terminal.singularity_image ~/python.sif

Modal (Serverless Cloud)

uv pip install "swe-rex[modal]"
modal setup
hermes config set terminal.backend modal

Container Resources

Configure CPU, memory, disk, and persistence for all container backends:

terminal:
  backend: docker  # or singularity, modal, daytona
  container_cpu: 1              # CPU cores (default: 1)
  container_memory: 5120        # Memory in MB (default: 5GB)
  container_disk: 51200         # Disk in MB (default: 50GB)
  container_persistent: true    # Persist filesystem across sessions (default: true)

When container_persistent: true, installed packages, files, and config survive across sessions.

Container Security

All container backends run with security hardening:

  • Read-only root filesystem (Docker)
  • All Linux capabilities dropped
  • No privilege escalation
  • PID limits (256 processes)
  • Full namespace isolation
  • Persistent workspace via volumes, not writable root layer

Background Process Management

Start background processes and manage them:

terminal(command="pytest -v tests/", background=true)
# Returns: {"session_id": "proc_abc123", "pid": 12345}

# Then manage with the process tool:
process(action="list")       # Show all running processes
process(action="poll", session_id="proc_abc123")   # Check status
process(action="wait", session_id="proc_abc123")   # Block until done
process(action="log", session_id="proc_abc123")    # Full output
process(action="kill", session_id="proc_abc123")   # Terminate
process(action="write", session_id="proc_abc123", data="y")  # Send input

PTY mode (pty=true) enables interactive CLI tools like Codex and Claude Code.

Sudo Support

If a command needs sudo, you'll be prompted for your password (cached for the session). Or set SUDO_PASSWORD in ~/.hermes/.env.

:::warning On messaging platforms, if sudo fails, the output includes a tip to add SUDO_PASSWORD to ~/.hermes/.env. :::