Files

teknium1 a8bf414f4a feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill

New browser capabilities and a built-in skill for agent-driven web QA.

## New tool: browser_console

Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.

## Enhanced tool: browser_vision(annotate=True)

New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.

## Config: browser.record_sessions

Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default

## Built-in skill: dogfood

Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
   (Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence

Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template

## Tests

21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation

Addresses #315.

2026-03-08 21:28:12 -07:00

5.5 KiB

Raw Blame History

sidebar_position, title, description

sidebar_position	title	description
1	Tools & Toolsets	Overview of Hermes Agent's tools — what's available, how toolsets work, and terminal backends

Tools & Toolsets

Tools are functions that extend the agent's capabilities. They're organized into logical toolsets that can be enabled or disabled per platform.

Available Tools

Category	Tools	Description
Web	`web_search`, `web_extract`	Search the web, extract page content
Terminal	`terminal`, `process`	Execute commands (local/docker/singularity/modal/daytona/ssh backends), manage background processes
File	`read_file`, `write_file`, `patch`, `search_files`	Read, write, edit, and search files
Browser	`browser_navigate`, `browser_click`, `browser_type`, `browser_console`, etc.	Full browser automation via Browserbase
Vision	`vision_analyze`	Image analysis via multimodal models
Image Gen	`image_generate`	Generate images (FLUX via FAL)
TTS	`text_to_speech`	Text-to-speech (Edge TTS / ElevenLabs / OpenAI)
Reasoning	`mixture_of_agents`	Multi-model reasoning
Skills	`skills_list`, `skill_view`, `skill_manage`	Find, view, create, and manage skills
Todo	`todo`	Read/write task list for multi-step planning
Memory	`memory`	Persistent notes + user profile across sessions
Session Search	`session_search`	Search + summarize past conversations (FTS5)
Cronjob	`schedule_cronjob`, `list_cronjobs`, `remove_cronjob`	Scheduled task management
Code Execution	`execute_code`	Run Python scripts that call tools via RPC sandbox
Delegation	`delegate_task`	Spawn subagents with isolated context
Clarify	`clarify`	Ask the user multiple-choice or open-ended questions
MCP	Auto-discovered	External tools from MCP servers

Using Toolsets

# Use specific toolsets
hermes chat --toolsets "web,terminal"

# See all available tools
hermes tools

# Configure tools per platform (interactive)
hermes tools

Available toolsets: web, terminal, file, browser, vision, image_gen, moa, skills, tts, todo, memory, session_search, cronjob, code_execution, delegation, clarify, and more.

Terminal Backends

The terminal tool can execute commands in different environments:

Backend	Description	Use Case
`local`	Run on your machine (default)	Development, trusted tasks
`docker`	Isolated containers	Security, reproducibility
`ssh`	Remote server	Sandboxing, keep agent away from its own code
`singularity`	HPC containers	Cluster computing, rootless
`modal`	Cloud execution	Serverless, scale

Configuration

# In ~/.hermes/config.yaml
terminal:
  backend: local    # or: docker, ssh, singularity, modal, daytona
  cwd: "."          # Working directory
  timeout: 180      # Command timeout in seconds

Docker Backend

terminal:
  backend: docker
  docker_image: python:3.11-slim

SSH Backend

Recommended for security — agent can't modify its own code:

terminal:
  backend: ssh

# Set credentials in ~/.hermes/.env
TERMINAL_SSH_HOST=my-server.example.com
TERMINAL_SSH_USER=myuser
TERMINAL_SSH_KEY=~/.ssh/id_rsa

Singularity/Apptainer

# Pre-build SIF for parallel workers
apptainer build ~/python.sif docker://python:3.11-slim

# Configure
hermes config set terminal.backend singularity
hermes config set terminal.singularity_image ~/python.sif

uv pip install "swe-rex[modal]"
modal setup
hermes config set terminal.backend modal

Container Resources

Configure CPU, memory, disk, and persistence for all container backends:

terminal:
  backend: docker  # or singularity, modal, daytona
  container_cpu: 1              # CPU cores (default: 1)
  container_memory: 5120        # Memory in MB (default: 5GB)
  container_disk: 51200         # Disk in MB (default: 50GB)
  container_persistent: true    # Persist filesystem across sessions (default: true)

When container_persistent: true, installed packages, files, and config survive across sessions.

Container Security

All container backends run with security hardening:

Read-only root filesystem (Docker)
All Linux capabilities dropped
No privilege escalation
PID limits (256 processes)
Full namespace isolation
Persistent workspace via volumes, not writable root layer

Background Process Management

Start background processes and manage them:

terminal(command="pytest -v tests/", background=true)
# Returns: {"session_id": "proc_abc123", "pid": 12345}

# Then manage with the process tool:
process(action="list")       # Show all running processes
process(action="poll", session_id="proc_abc123")   # Check status
process(action="wait", session_id="proc_abc123")   # Block until done
process(action="log", session_id="proc_abc123")    # Full output
process(action="kill", session_id="proc_abc123")   # Terminate
process(action="write", session_id="proc_abc123", data="y")  # Send input

PTY mode (pty=true) enables interactive CLI tools like Codex and Claude Code.

Sudo Support

If a command needs sudo, you'll be prompted for your password (cached for the session). Or set SUDO_PASSWORD in ~/.hermes/.env.

:::warning On messaging platforms, if sudo fails, the output includes a tip to add SUDO_PASSWORD to ~/.hermes/.env. :::

5.5 KiB Raw Blame History