Merge pull request #745 from NousResearch/hermes/hermes-f8d56335
feat: browser console tool, annotated screenshots, auto-recording, and dogfood QA skill
This commit is contained in:
@@ -69,7 +69,7 @@ hermes-agent/
|
||||
│ ├── file_tools.py # File read/write/search/patch tools
|
||||
│ ├── file_operations.py # File operations helpers
|
||||
│ ├── web_tools.py # Firecrawl search/extract
|
||||
│ ├── browser_tool.py # Browserbase browser automation
|
||||
│ ├── browser_tool.py # Browserbase browser automation (browser_console, session recording)
|
||||
│ ├── vision_tools.py # Image analysis via auxiliary LLM
|
||||
│ ├── image_generation_tool.py # FLUX image generation via fal.ai
|
||||
│ ├── tts_tool.py # Text-to-speech
|
||||
@@ -113,7 +113,7 @@ hermes-agent/
|
||||
├── cron/ # Scheduler implementation
|
||||
├── environments/ # RL training environments (Atropos integration)
|
||||
├── honcho_integration/ # Honcho client & session management
|
||||
├── skills/ # Bundled skill sources
|
||||
├── skills/ # Bundled skill sources (includes dogfood QA testing)
|
||||
├── optional-skills/ # Official optional skills (not activated by default)
|
||||
├── scripts/ # Install scripts, utilities
|
||||
├── tests/ # Full pytest suite (~2300+ tests)
|
||||
|
||||
1
cli.py
1
cli.py
@@ -161,6 +161,7 @@ def load_cli_config() -> Dict[str, Any]:
|
||||
},
|
||||
"browser": {
|
||||
"inactivity_timeout": 120, # Auto-cleanup inactive browser sessions after 2 min
|
||||
"record_sessions": False, # Auto-record browser sessions as WebM videos
|
||||
},
|
||||
"compression": {
|
||||
"enabled": True, # Auto-compress when approaching context limit
|
||||
|
||||
@@ -81,6 +81,7 @@ DEFAULT_CONFIG = {
|
||||
|
||||
"browser": {
|
||||
"inactivity_timeout": 120,
|
||||
"record_sessions": False, # Auto-record browser sessions as WebM videos
|
||||
},
|
||||
|
||||
"compression": {
|
||||
|
||||
162
skills/dogfood/SKILL.md
Normal file
162
skills/dogfood/SKILL.md
Normal file
@@ -0,0 +1,162 @@
|
||||
---
|
||||
name: dogfood
|
||||
description: Systematic exploratory QA testing of web applications — find bugs, capture evidence, and generate structured reports
|
||||
version: 1.0.0
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [qa, testing, browser, web, dogfood]
|
||||
related_skills: []
|
||||
---
|
||||
|
||||
# Dogfood: Systematic Web Application QA Testing
|
||||
|
||||
## Overview
|
||||
|
||||
This skill guides you through systematic exploratory QA testing of web applications using the browser toolset. You will navigate the application, interact with elements, capture evidence of issues, and produce a structured bug report.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Browser toolset must be available (`browser_navigate`, `browser_snapshot`, `browser_click`, `browser_type`, `browser_vision`, `browser_console`, `browser_scroll`, `browser_back`, `browser_press`, `browser_close`)
|
||||
- A target URL and testing scope from the user
|
||||
|
||||
## Inputs
|
||||
|
||||
The user provides:
|
||||
1. **Target URL** — the entry point for testing
|
||||
2. **Scope** — what areas/features to focus on (or "full site" for comprehensive testing)
|
||||
3. **Output directory** (optional) — where to save screenshots and the report (default: `./dogfood-output`)
|
||||
|
||||
## Workflow
|
||||
|
||||
Follow this 5-phase systematic workflow:
|
||||
|
||||
### Phase 1: Plan
|
||||
|
||||
1. Create the output directory structure:
|
||||
```
|
||||
{output_dir}/
|
||||
├── screenshots/ # Evidence screenshots
|
||||
└── report.md # Final report (generated in Phase 5)
|
||||
```
|
||||
2. Identify the testing scope based on user input.
|
||||
3. Build a rough sitemap by planning which pages and features to test:
|
||||
- Landing/home page
|
||||
- Navigation links (header, footer, sidebar)
|
||||
- Key user flows (sign up, login, search, checkout, etc.)
|
||||
- Forms and interactive elements
|
||||
- Edge cases (empty states, error pages, 404s)
|
||||
|
||||
### Phase 2: Explore
|
||||
|
||||
For each page or feature in your plan:
|
||||
|
||||
1. **Navigate** to the page:
|
||||
```
|
||||
browser_navigate(url="https://example.com/page")
|
||||
```
|
||||
|
||||
2. **Take a snapshot** to understand the DOM structure:
|
||||
```
|
||||
browser_snapshot()
|
||||
```
|
||||
|
||||
3. **Check the console** for JavaScript errors:
|
||||
```
|
||||
browser_console(clear=true)
|
||||
```
|
||||
Do this after every navigation and after every significant interaction. Silent JS errors are high-value findings.
|
||||
|
||||
4. **Take an annotated screenshot** to visually assess the page and identify interactive elements:
|
||||
```
|
||||
browser_vision(question="Describe the page layout, identify any visual issues, broken elements, or accessibility concerns", annotate=true)
|
||||
```
|
||||
The `annotate=true` flag overlays numbered `[N]` labels on interactive elements. Each `[N]` maps to ref `@eN` for subsequent browser commands.
|
||||
|
||||
5. **Test interactive elements** systematically:
|
||||
- Click buttons and links: `browser_click(ref="@eN")`
|
||||
- Fill forms: `browser_type(ref="@eN", text="test input")`
|
||||
- Test keyboard navigation: `browser_press(key="Tab")`, `browser_press(key="Enter")`
|
||||
- Scroll through content: `browser_scroll(direction="down")`
|
||||
- Test form validation with invalid inputs
|
||||
- Test empty submissions
|
||||
|
||||
6. **After each interaction**, check for:
|
||||
- Console errors: `browser_console()`
|
||||
- Visual changes: `browser_vision(question="What changed after the interaction?")`
|
||||
- Expected vs actual behavior
|
||||
|
||||
### Phase 3: Collect Evidence
|
||||
|
||||
For every issue found:
|
||||
|
||||
1. **Take a screenshot** showing the issue:
|
||||
```
|
||||
browser_vision(question="Capture and describe the issue visible on this page", annotate=false)
|
||||
```
|
||||
Save the `screenshot_path` from the response — you will reference it in the report.
|
||||
|
||||
2. **Record the details**:
|
||||
- URL where the issue occurs
|
||||
- Steps to reproduce
|
||||
- Expected behavior
|
||||
- Actual behavior
|
||||
- Console errors (if any)
|
||||
- Screenshot path
|
||||
|
||||
3. **Classify the issue** using the issue taxonomy (see `references/issue-taxonomy.md`):
|
||||
- Severity: Critical / High / Medium / Low
|
||||
- Category: Functional / Visual / Accessibility / Console / UX / Content
|
||||
|
||||
### Phase 4: Categorize
|
||||
|
||||
1. Review all collected issues.
|
||||
2. De-duplicate — merge issues that are the same bug manifesting in different places.
|
||||
3. Assign final severity and category to each issue.
|
||||
4. Sort by severity (Critical first, then High, Medium, Low).
|
||||
5. Count issues by severity and category for the executive summary.
|
||||
|
||||
### Phase 5: Report
|
||||
|
||||
Generate the final report using the template at `templates/dogfood-report-template.md`.
|
||||
|
||||
The report must include:
|
||||
1. **Executive summary** with total issue count, breakdown by severity, and testing scope
|
||||
2. **Per-issue sections** with:
|
||||
- Issue number and title
|
||||
- Severity and category badges
|
||||
- URL where observed
|
||||
- Description of the issue
|
||||
- Steps to reproduce
|
||||
- Expected vs actual behavior
|
||||
- Screenshot references (use `MEDIA:<screenshot_path>` for inline images)
|
||||
- Console errors if relevant
|
||||
3. **Summary table** of all issues
|
||||
4. **Testing notes** — what was tested, what was not, any blockers
|
||||
|
||||
Save the report to `{output_dir}/report.md`.
|
||||
|
||||
## Tools Reference
|
||||
|
||||
| Tool | Purpose |
|
||||
|------|---------|
|
||||
| `browser_navigate` | Go to a URL |
|
||||
| `browser_snapshot` | Get DOM text snapshot (accessibility tree) |
|
||||
| `browser_click` | Click an element by ref (`@eN`) or text |
|
||||
| `browser_type` | Type into an input field |
|
||||
| `browser_scroll` | Scroll up/down on the page |
|
||||
| `browser_back` | Go back in browser history |
|
||||
| `browser_press` | Press a keyboard key |
|
||||
| `browser_vision` | Screenshot + AI analysis; use `annotate=true` for element labels |
|
||||
| `browser_console` | Get JS console output and errors |
|
||||
| `browser_close` | Close the browser session |
|
||||
|
||||
## Tips
|
||||
|
||||
- **Always check `browser_console()` after navigating and after significant interactions.** Silent JS errors are among the most valuable findings.
|
||||
- **Use `annotate=true` with `browser_vision`** when you need to reason about interactive element positions or when the snapshot refs are unclear.
|
||||
- **Test with both valid and invalid inputs** — form validation bugs are common.
|
||||
- **Scroll through long pages** — content below the fold may have rendering issues.
|
||||
- **Test navigation flows** — click through multi-step processes end-to-end.
|
||||
- **Check responsive behavior** by noting any layout issues visible in screenshots.
|
||||
- **Don't forget edge cases**: empty states, very long text, special characters, rapid clicking.
|
||||
- When reporting screenshots to the user, include `MEDIA:<screenshot_path>` so they can see the evidence inline.
|
||||
109
skills/dogfood/references/issue-taxonomy.md
Normal file
109
skills/dogfood/references/issue-taxonomy.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# Issue Taxonomy
|
||||
|
||||
Use this taxonomy to classify issues found during dogfood QA testing.
|
||||
|
||||
## Severity Levels
|
||||
|
||||
### Critical
|
||||
The issue makes a core feature completely unusable or causes data loss.
|
||||
|
||||
**Examples:**
|
||||
- Application crashes or shows a blank white page
|
||||
- Form submission silently loses user data
|
||||
- Authentication is completely broken (can't log in at all)
|
||||
- Payment flow fails and charges the user without completing the order
|
||||
- Security vulnerability (e.g., XSS, exposed credentials in console)
|
||||
|
||||
### High
|
||||
The issue significantly impairs functionality but a workaround may exist.
|
||||
|
||||
**Examples:**
|
||||
- A key button does nothing when clicked (but refreshing fixes it)
|
||||
- Search returns no results for valid queries
|
||||
- Form validation rejects valid input
|
||||
- Page loads but critical content is missing or garbled
|
||||
- Navigation link leads to a 404 or wrong page
|
||||
- Uncaught JavaScript exceptions in the console on core pages
|
||||
|
||||
### Medium
|
||||
The issue is noticeable and affects user experience but doesn't block core functionality.
|
||||
|
||||
**Examples:**
|
||||
- Layout is misaligned or overlapping on certain screen sections
|
||||
- Images fail to load (broken image icons)
|
||||
- Slow performance (visible loading delays > 3 seconds)
|
||||
- Form field lacks proper validation feedback (no error message on bad input)
|
||||
- Console warnings that suggest deprecated or misconfigured features
|
||||
- Inconsistent styling between similar pages
|
||||
|
||||
### Low
|
||||
Minor polish issues that don't affect functionality.
|
||||
|
||||
**Examples:**
|
||||
- Typos or grammatical errors in text content
|
||||
- Minor spacing or alignment inconsistencies
|
||||
- Placeholder text left in production ("Lorem ipsum")
|
||||
- Favicon missing
|
||||
- Console info/debug messages that shouldn't be in production
|
||||
- Subtle color contrast issues that don't fail WCAG requirements
|
||||
|
||||
## Categories
|
||||
|
||||
### Functional
|
||||
Issues where features don't work as expected.
|
||||
|
||||
- Buttons/links that don't respond
|
||||
- Forms that don't submit or submit incorrectly
|
||||
- Broken user flows (can't complete a multi-step process)
|
||||
- Incorrect data displayed
|
||||
- Features that work partially
|
||||
|
||||
### Visual
|
||||
Issues with the visual presentation of the page.
|
||||
|
||||
- Layout problems (overlapping elements, broken grids)
|
||||
- Broken images or missing media
|
||||
- Styling inconsistencies
|
||||
- Responsive design failures
|
||||
- Z-index issues (elements hidden behind others)
|
||||
- Text overflow or truncation
|
||||
|
||||
### Accessibility
|
||||
Issues that prevent or hinder access for users with disabilities.
|
||||
|
||||
- Missing alt text on meaningful images
|
||||
- Poor color contrast (fails WCAG AA)
|
||||
- Elements not reachable via keyboard navigation
|
||||
- Missing form labels or ARIA attributes
|
||||
- Focus indicators missing or unclear
|
||||
- Screen reader incompatible content
|
||||
|
||||
### Console
|
||||
Issues detected through JavaScript console output.
|
||||
|
||||
- Uncaught exceptions and unhandled promise rejections
|
||||
- Failed network requests (4xx, 5xx errors in console)
|
||||
- Deprecation warnings
|
||||
- CORS errors
|
||||
- Mixed content warnings (HTTP resources on HTTPS page)
|
||||
- Excessive console.log output left from development
|
||||
|
||||
### UX (User Experience)
|
||||
Issues where functionality works but the experience is poor.
|
||||
|
||||
- Confusing navigation or information architecture
|
||||
- Missing loading indicators (user doesn't know something is happening)
|
||||
- No feedback after user actions (e.g., button click with no visible result)
|
||||
- Inconsistent interaction patterns
|
||||
- Missing confirmation dialogs for destructive actions
|
||||
- Poor error messages that don't help the user recover
|
||||
|
||||
### Content
|
||||
Issues with the text, media, or information on the page.
|
||||
|
||||
- Typos and grammatical errors
|
||||
- Placeholder/dummy content in production
|
||||
- Outdated information
|
||||
- Missing content (empty sections)
|
||||
- Broken or dead links to external resources
|
||||
- Incorrect or misleading labels
|
||||
86
skills/dogfood/templates/dogfood-report-template.md
Normal file
86
skills/dogfood/templates/dogfood-report-template.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# Dogfood QA Report
|
||||
|
||||
**Target:** {target_url}
|
||||
**Date:** {date}
|
||||
**Scope:** {scope_description}
|
||||
**Tester:** Hermes Agent (automated exploratory QA)
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
| Severity | Count |
|
||||
|----------|-------|
|
||||
| 🔴 Critical | {critical_count} |
|
||||
| 🟠 High | {high_count} |
|
||||
| 🟡 Medium | {medium_count} |
|
||||
| 🔵 Low | {low_count} |
|
||||
| **Total** | **{total_count}** |
|
||||
|
||||
**Overall Assessment:** {one_sentence_assessment}
|
||||
|
||||
---
|
||||
|
||||
## Issues
|
||||
|
||||
<!-- Repeat this section for each issue found, sorted by severity (Critical first) -->
|
||||
|
||||
### Issue #{issue_number}: {issue_title}
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Severity** | {severity} |
|
||||
| **Category** | {category} |
|
||||
| **URL** | {url_where_found} |
|
||||
|
||||
**Description:**
|
||||
{detailed_description_of_the_issue}
|
||||
|
||||
**Steps to Reproduce:**
|
||||
1. {step_1}
|
||||
2. {step_2}
|
||||
3. {step_3}
|
||||
|
||||
**Expected Behavior:**
|
||||
{what_should_happen}
|
||||
|
||||
**Actual Behavior:**
|
||||
{what_actually_happens}
|
||||
|
||||
**Screenshot:**
|
||||
MEDIA:{screenshot_path}
|
||||
|
||||
**Console Errors** (if applicable):
|
||||
```
|
||||
{console_error_output}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
<!-- End of per-issue section -->
|
||||
|
||||
## Issues Summary Table
|
||||
|
||||
| # | Title | Severity | Category | URL |
|
||||
|---|-------|----------|----------|-----|
|
||||
| {n} | {title} | {severity} | {category} | {url} |
|
||||
|
||||
## Testing Coverage
|
||||
|
||||
### Pages Tested
|
||||
- {list_of_pages_visited}
|
||||
|
||||
### Features Tested
|
||||
- {list_of_features_exercised}
|
||||
|
||||
### Not Tested / Out of Scope
|
||||
- {areas_not_covered_and_why}
|
||||
|
||||
### Blockers
|
||||
- {any_issues_that_prevented_testing_certain_areas}
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
{any_additional_observations_or_recommendations}
|
||||
276
tests/tools/test_browser_console.py
Normal file
276
tests/tools/test_browser_console.py
Normal file
@@ -0,0 +1,276 @@
|
||||
"""Tests for browser_console tool and browser_vision annotate param."""
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from unittest.mock import patch, MagicMock
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))
|
||||
|
||||
|
||||
# ── browser_console ──────────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestBrowserConsole:
|
||||
"""browser_console() returns console messages + JS errors in one call."""
|
||||
|
||||
def test_returns_console_messages_and_errors(self):
|
||||
from tools.browser_tool import browser_console
|
||||
|
||||
console_response = {
|
||||
"success": True,
|
||||
"data": {
|
||||
"messages": [
|
||||
{"text": "hello", "type": "log", "timestamp": 1},
|
||||
{"text": "oops", "type": "error", "timestamp": 2},
|
||||
]
|
||||
},
|
||||
}
|
||||
errors_response = {
|
||||
"success": True,
|
||||
"data": {
|
||||
"errors": [
|
||||
{"message": "Uncaught TypeError", "timestamp": 3},
|
||||
]
|
||||
},
|
||||
}
|
||||
|
||||
with patch("tools.browser_tool._run_browser_command") as mock_cmd:
|
||||
mock_cmd.side_effect = [console_response, errors_response]
|
||||
result = json.loads(browser_console(task_id="test"))
|
||||
|
||||
assert result["success"] is True
|
||||
assert result["total_messages"] == 2
|
||||
assert result["total_errors"] == 1
|
||||
assert result["console_messages"][0]["text"] == "hello"
|
||||
assert result["console_messages"][1]["text"] == "oops"
|
||||
assert result["js_errors"][0]["message"] == "Uncaught TypeError"
|
||||
|
||||
def test_passes_clear_flag(self):
|
||||
from tools.browser_tool import browser_console
|
||||
|
||||
empty = {"success": True, "data": {"messages": [], "errors": []}}
|
||||
with patch("tools.browser_tool._run_browser_command", return_value=empty) as mock_cmd:
|
||||
browser_console(clear=True, task_id="test")
|
||||
|
||||
calls = mock_cmd.call_args_list
|
||||
# Both console and errors should get --clear
|
||||
assert calls[0][0] == ("test", "console", ["--clear"])
|
||||
assert calls[1][0] == ("test", "errors", ["--clear"])
|
||||
|
||||
def test_no_clear_by_default(self):
|
||||
from tools.browser_tool import browser_console
|
||||
|
||||
empty = {"success": True, "data": {"messages": [], "errors": []}}
|
||||
with patch("tools.browser_tool._run_browser_command", return_value=empty) as mock_cmd:
|
||||
browser_console(task_id="test")
|
||||
|
||||
calls = mock_cmd.call_args_list
|
||||
assert calls[0][0] == ("test", "console", [])
|
||||
assert calls[1][0] == ("test", "errors", [])
|
||||
|
||||
def test_empty_console_and_errors(self):
|
||||
from tools.browser_tool import browser_console
|
||||
|
||||
empty = {"success": True, "data": {"messages": [], "errors": []}}
|
||||
with patch("tools.browser_tool._run_browser_command", return_value=empty):
|
||||
result = json.loads(browser_console(task_id="test"))
|
||||
|
||||
assert result["total_messages"] == 0
|
||||
assert result["total_errors"] == 0
|
||||
assert result["console_messages"] == []
|
||||
assert result["js_errors"] == []
|
||||
|
||||
def test_handles_failed_commands(self):
|
||||
from tools.browser_tool import browser_console
|
||||
|
||||
failed = {"success": False, "error": "No session"}
|
||||
with patch("tools.browser_tool._run_browser_command", return_value=failed):
|
||||
result = json.loads(browser_console(task_id="test"))
|
||||
|
||||
# Should still return success with empty data
|
||||
assert result["success"] is True
|
||||
assert result["total_messages"] == 0
|
||||
assert result["total_errors"] == 0
|
||||
|
||||
|
||||
# ── browser_console schema ───────────────────────────────────────────
|
||||
|
||||
|
||||
class TestBrowserConsoleSchema:
|
||||
"""browser_console is properly registered in the tool registry."""
|
||||
|
||||
def test_schema_in_browser_schemas(self):
|
||||
from tools.browser_tool import BROWSER_TOOL_SCHEMAS
|
||||
|
||||
names = [s["name"] for s in BROWSER_TOOL_SCHEMAS]
|
||||
assert "browser_console" in names
|
||||
|
||||
def test_schema_has_clear_param(self):
|
||||
from tools.browser_tool import BROWSER_TOOL_SCHEMAS
|
||||
|
||||
schema = next(s for s in BROWSER_TOOL_SCHEMAS if s["name"] == "browser_console")
|
||||
props = schema["parameters"]["properties"]
|
||||
assert "clear" in props
|
||||
assert props["clear"]["type"] == "boolean"
|
||||
|
||||
|
||||
# ── browser_vision annotate ──────────────────────────────────────────
|
||||
|
||||
|
||||
class TestBrowserVisionAnnotate:
|
||||
"""browser_vision supports annotate parameter."""
|
||||
|
||||
def test_schema_has_annotate_param(self):
|
||||
from tools.browser_tool import BROWSER_TOOL_SCHEMAS
|
||||
|
||||
schema = next(s for s in BROWSER_TOOL_SCHEMAS if s["name"] == "browser_vision")
|
||||
props = schema["parameters"]["properties"]
|
||||
assert "annotate" in props
|
||||
assert props["annotate"]["type"] == "boolean"
|
||||
|
||||
def test_annotate_false_no_flag(self):
|
||||
"""Without annotate, screenshot command has no --annotate flag."""
|
||||
from tools.browser_tool import browser_vision
|
||||
|
||||
with (
|
||||
patch("tools.browser_tool._run_browser_command") as mock_cmd,
|
||||
patch("tools.browser_tool._aux_vision_client") as mock_client,
|
||||
patch("tools.browser_tool._DEFAULT_VISION_MODEL", "test-model"),
|
||||
patch("tools.browser_tool._get_vision_model", return_value="test-model"),
|
||||
):
|
||||
mock_cmd.return_value = {"success": True, "data": {}}
|
||||
# Will fail at screenshot file read, but we can check the command
|
||||
try:
|
||||
browser_vision("test", annotate=False, task_id="test")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
if mock_cmd.called:
|
||||
args = mock_cmd.call_args[0]
|
||||
cmd_args = args[2] if len(args) > 2 else []
|
||||
assert "--annotate" not in cmd_args
|
||||
|
||||
def test_annotate_true_adds_flag(self):
|
||||
"""With annotate=True, screenshot command includes --annotate."""
|
||||
from tools.browser_tool import browser_vision
|
||||
|
||||
with (
|
||||
patch("tools.browser_tool._run_browser_command") as mock_cmd,
|
||||
patch("tools.browser_tool._aux_vision_client") as mock_client,
|
||||
patch("tools.browser_tool._DEFAULT_VISION_MODEL", "test-model"),
|
||||
patch("tools.browser_tool._get_vision_model", return_value="test-model"),
|
||||
):
|
||||
mock_cmd.return_value = {"success": True, "data": {}}
|
||||
try:
|
||||
browser_vision("test", annotate=True, task_id="test")
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
if mock_cmd.called:
|
||||
args = mock_cmd.call_args[0]
|
||||
cmd_args = args[2] if len(args) > 2 else []
|
||||
assert "--annotate" in cmd_args
|
||||
|
||||
|
||||
# ── auto-recording config ────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestRecordSessionsConfig:
|
||||
"""browser.record_sessions config option."""
|
||||
|
||||
def test_default_config_has_record_sessions(self):
|
||||
from hermes_cli.config import DEFAULT_CONFIG
|
||||
|
||||
browser_cfg = DEFAULT_CONFIG.get("browser", {})
|
||||
assert "record_sessions" in browser_cfg
|
||||
assert browser_cfg["record_sessions"] is False
|
||||
|
||||
def test_maybe_start_recording_disabled(self):
|
||||
"""Recording doesn't start when config says record_sessions: false."""
|
||||
from tools.browser_tool import _maybe_start_recording, _recording_sessions
|
||||
|
||||
with (
|
||||
patch("tools.browser_tool._run_browser_command") as mock_cmd,
|
||||
patch("builtins.open", side_effect=FileNotFoundError),
|
||||
):
|
||||
_maybe_start_recording("test-task")
|
||||
|
||||
mock_cmd.assert_not_called()
|
||||
assert "test-task" not in _recording_sessions
|
||||
|
||||
def test_maybe_stop_recording_noop_when_not_recording(self):
|
||||
"""Stopping when not recording is a no-op."""
|
||||
from tools.browser_tool import _maybe_stop_recording, _recording_sessions
|
||||
|
||||
_recording_sessions.discard("test-task") # ensure not in set
|
||||
with patch("tools.browser_tool._run_browser_command") as mock_cmd:
|
||||
_maybe_stop_recording("test-task")
|
||||
|
||||
mock_cmd.assert_not_called()
|
||||
|
||||
|
||||
# ── dogfood skill files ──────────────────────────────────────────────
|
||||
|
||||
|
||||
class TestDogfoodSkill:
|
||||
"""Dogfood skill files exist and have correct structure."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _skill_dir(self):
|
||||
# Use the actual repo skills dir (not temp)
|
||||
self.skill_dir = os.path.join(
|
||||
os.path.dirname(__file__), "..", "..", "skills", "dogfood"
|
||||
)
|
||||
|
||||
def test_skill_md_exists(self):
|
||||
assert os.path.exists(os.path.join(self.skill_dir, "SKILL.md"))
|
||||
|
||||
def test_taxonomy_exists(self):
|
||||
assert os.path.exists(
|
||||
os.path.join(self.skill_dir, "references", "issue-taxonomy.md")
|
||||
)
|
||||
|
||||
def test_report_template_exists(self):
|
||||
assert os.path.exists(
|
||||
os.path.join(self.skill_dir, "templates", "dogfood-report-template.md")
|
||||
)
|
||||
|
||||
def test_skill_md_has_frontmatter(self):
|
||||
with open(os.path.join(self.skill_dir, "SKILL.md")) as f:
|
||||
content = f.read()
|
||||
assert content.startswith("---")
|
||||
assert "name: dogfood" in content
|
||||
assert "description:" in content
|
||||
|
||||
def test_skill_references_browser_console(self):
|
||||
with open(os.path.join(self.skill_dir, "SKILL.md")) as f:
|
||||
content = f.read()
|
||||
assert "browser_console" in content
|
||||
|
||||
def test_skill_references_annotate(self):
|
||||
with open(os.path.join(self.skill_dir, "SKILL.md")) as f:
|
||||
content = f.read()
|
||||
assert "annotate" in content
|
||||
|
||||
def test_taxonomy_has_severity_levels(self):
|
||||
with open(
|
||||
os.path.join(self.skill_dir, "references", "issue-taxonomy.md")
|
||||
) as f:
|
||||
content = f.read()
|
||||
assert "Critical" in content
|
||||
assert "High" in content
|
||||
assert "Medium" in content
|
||||
assert "Low" in content
|
||||
|
||||
def test_taxonomy_has_categories(self):
|
||||
with open(
|
||||
os.path.join(self.skill_dir, "references", "issue-taxonomy.md")
|
||||
) as f:
|
||||
content = f.read()
|
||||
assert "Functional" in content
|
||||
assert "Visual" in content
|
||||
assert "Accessibility" in content
|
||||
assert "Console" in content
|
||||
@@ -144,6 +144,7 @@ def _socket_safe_tmpdir() -> str:
|
||||
# Track active sessions per task
|
||||
# Stores: session_name (always), bb_session_id + cdp_url (cloud mode only)
|
||||
_active_sessions: Dict[str, Dict[str, str]] = {} # task_id -> {session_name, ...}
|
||||
_recording_sessions: set = set() # task_ids with active recordings
|
||||
|
||||
# Flag to track if cleanup has been done
|
||||
_cleanup_done = False
|
||||
@@ -478,11 +479,31 @@ BROWSER_TOOL_SCHEMAS = [
|
||||
"question": {
|
||||
"type": "string",
|
||||
"description": "What you want to know about the page visually. Be specific about what you're looking for."
|
||||
},
|
||||
"annotate": {
|
||||
"type": "boolean",
|
||||
"default": False,
|
||||
"description": "If true, overlay numbered [N] labels on interactive elements. Each [N] maps to ref @eN for subsequent browser commands. Useful for QA and spatial reasoning about page layout."
|
||||
}
|
||||
},
|
||||
"required": ["question"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "browser_console",
|
||||
"description": "Get browser console output and JavaScript errors from the current page. Returns console.log/warn/error/info messages and uncaught JS exceptions. Use this to detect silent JavaScript errors, failed API calls, and application warnings. Requires browser_navigate to be called first.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"clear": {
|
||||
"type": "boolean",
|
||||
"default": False,
|
||||
"description": "If true, clear the message buffers after reading"
|
||||
}
|
||||
},
|
||||
"required": []
|
||||
}
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
@@ -998,9 +1019,10 @@ def browser_navigate(url: str, task_id: Optional[str] = None) -> str:
|
||||
session_info = _get_session_info(effective_task_id)
|
||||
is_first_nav = session_info.get("_first_nav", True)
|
||||
|
||||
# Mark that we've done at least one navigation
|
||||
# Auto-start recording if configured and this is first navigation
|
||||
if is_first_nav:
|
||||
session_info["_first_nav"] = False
|
||||
_maybe_start_recording(effective_task_id)
|
||||
|
||||
result = _run_browser_command(effective_task_id, "open", [url], timeout=60)
|
||||
|
||||
@@ -1264,6 +1286,10 @@ def browser_close(task_id: Optional[str] = None) -> str:
|
||||
JSON string with close result
|
||||
"""
|
||||
effective_task_id = task_id or "default"
|
||||
|
||||
# Stop auto-recording before closing
|
||||
_maybe_stop_recording(effective_task_id)
|
||||
|
||||
result = _run_browser_command(effective_task_id, "close", [])
|
||||
|
||||
# Close the backend session (Browserbase API in cloud mode, nothing extra in local mode)
|
||||
@@ -1294,6 +1320,103 @@ def browser_close(task_id: Optional[str] = None) -> str:
|
||||
}, ensure_ascii=False)
|
||||
|
||||
|
||||
def browser_console(clear: bool = False, task_id: Optional[str] = None) -> str:
|
||||
"""Get browser console messages and JavaScript errors.
|
||||
|
||||
Returns both console output (log/warn/error/info from the page's JS)
|
||||
and uncaught exceptions (crashes, unhandled promise rejections).
|
||||
|
||||
Args:
|
||||
clear: If True, clear the message/error buffers after reading
|
||||
task_id: Task identifier for session isolation
|
||||
|
||||
Returns:
|
||||
JSON string with console messages and JS errors
|
||||
"""
|
||||
effective_task_id = task_id or "default"
|
||||
|
||||
console_args = ["--clear"] if clear else []
|
||||
error_args = ["--clear"] if clear else []
|
||||
|
||||
console_result = _run_browser_command(effective_task_id, "console", console_args)
|
||||
errors_result = _run_browser_command(effective_task_id, "errors", error_args)
|
||||
|
||||
messages = []
|
||||
if console_result.get("success"):
|
||||
for msg in console_result.get("data", {}).get("messages", []):
|
||||
messages.append({
|
||||
"type": msg.get("type", "log"),
|
||||
"text": msg.get("text", ""),
|
||||
"source": "console",
|
||||
})
|
||||
|
||||
errors = []
|
||||
if errors_result.get("success"):
|
||||
for err in errors_result.get("data", {}).get("errors", []):
|
||||
errors.append({
|
||||
"message": err.get("message", ""),
|
||||
"source": "exception",
|
||||
})
|
||||
|
||||
return json.dumps({
|
||||
"success": True,
|
||||
"console_messages": messages,
|
||||
"js_errors": errors,
|
||||
"total_messages": len(messages),
|
||||
"total_errors": len(errors),
|
||||
}, ensure_ascii=False)
|
||||
|
||||
|
||||
def _maybe_start_recording(task_id: str):
|
||||
"""Start recording if browser.record_sessions is enabled in config."""
|
||||
if task_id in _recording_sessions:
|
||||
return
|
||||
try:
|
||||
hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
|
||||
config_path = hermes_home / "config.yaml"
|
||||
record_enabled = False
|
||||
if config_path.exists():
|
||||
import yaml
|
||||
with open(config_path) as f:
|
||||
cfg = yaml.safe_load(f) or {}
|
||||
record_enabled = cfg.get("browser", {}).get("record_sessions", False)
|
||||
|
||||
if not record_enabled:
|
||||
return
|
||||
|
||||
recordings_dir = hermes_home / "browser_recordings"
|
||||
recordings_dir.mkdir(parents=True, exist_ok=True)
|
||||
_cleanup_old_recordings(max_age_hours=72)
|
||||
|
||||
import time
|
||||
timestamp = time.strftime("%Y%m%d_%H%M%S")
|
||||
recording_path = recordings_dir / f"session_{timestamp}_{task_id[:16]}.webm"
|
||||
|
||||
result = _run_browser_command(task_id, "record", ["start", str(recording_path)])
|
||||
if result.get("success"):
|
||||
_recording_sessions.add(task_id)
|
||||
logger.info("Auto-recording browser session %s to %s", task_id, recording_path)
|
||||
else:
|
||||
logger.debug("Could not start auto-recording: %s", result.get("error"))
|
||||
except Exception as e:
|
||||
logger.debug("Auto-recording setup failed: %s", e)
|
||||
|
||||
|
||||
def _maybe_stop_recording(task_id: str):
|
||||
"""Stop recording if one is active for this session."""
|
||||
if task_id not in _recording_sessions:
|
||||
return
|
||||
try:
|
||||
result = _run_browser_command(task_id, "record", ["stop"])
|
||||
if result.get("success"):
|
||||
path = result.get("data", {}).get("path", "")
|
||||
logger.info("Saved browser recording for session %s: %s", task_id, path)
|
||||
except Exception as e:
|
||||
logger.debug("Could not stop recording for %s: %s", task_id, e)
|
||||
finally:
|
||||
_recording_sessions.discard(task_id)
|
||||
|
||||
|
||||
def browser_get_images(task_id: Optional[str] = None) -> str:
|
||||
"""
|
||||
Get all images on the current page.
|
||||
@@ -1348,7 +1471,7 @@ def browser_get_images(task_id: Optional[str] = None) -> str:
|
||||
}, ensure_ascii=False)
|
||||
|
||||
|
||||
def browser_vision(question: str, task_id: Optional[str] = None) -> str:
|
||||
def browser_vision(question: str, annotate: bool = False, task_id: Optional[str] = None) -> str:
|
||||
"""
|
||||
Take a screenshot of the current page and analyze it with vision AI.
|
||||
|
||||
@@ -1362,6 +1485,7 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
|
||||
|
||||
Args:
|
||||
question: What you want to know about the page visually
|
||||
annotate: If True, overlay numbered [N] labels on interactive elements
|
||||
task_id: Task identifier for session isolation
|
||||
|
||||
Returns:
|
||||
@@ -1393,10 +1517,13 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
|
||||
_cleanup_old_screenshots(screenshots_dir, max_age_hours=24)
|
||||
|
||||
# Take screenshot using agent-browser
|
||||
screenshot_args = [str(screenshot_path)]
|
||||
if annotate:
|
||||
screenshot_args.insert(0, "--annotate")
|
||||
result = _run_browser_command(
|
||||
effective_task_id,
|
||||
"screenshot",
|
||||
[str(screenshot_path)],
|
||||
screenshot_args,
|
||||
timeout=30
|
||||
)
|
||||
|
||||
@@ -1456,11 +1583,15 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
|
||||
)
|
||||
|
||||
analysis = response.choices[0].message.content
|
||||
return json.dumps({
|
||||
response_data = {
|
||||
"success": True,
|
||||
"analysis": analysis,
|
||||
"screenshot_path": str(screenshot_path),
|
||||
}, ensure_ascii=False)
|
||||
}
|
||||
# Include annotation data if annotated screenshot was taken
|
||||
if annotate and result.get("data", {}).get("annotations"):
|
||||
response_data["annotations"] = result["data"]["annotations"]
|
||||
return json.dumps(response_data, ensure_ascii=False)
|
||||
|
||||
except Exception as e:
|
||||
# Keep the screenshot if it was captured successfully — the failure is
|
||||
@@ -1490,6 +1621,25 @@ def _cleanup_old_screenshots(screenshots_dir, max_age_hours=24):
|
||||
pass # Non-critical — don't fail the screenshot operation
|
||||
|
||||
|
||||
def _cleanup_old_recordings(max_age_hours=72):
|
||||
"""Remove browser recordings older than max_age_hours to prevent disk bloat."""
|
||||
import time
|
||||
try:
|
||||
hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
|
||||
recordings_dir = hermes_home / "browser_recordings"
|
||||
if not recordings_dir.exists():
|
||||
return
|
||||
cutoff = time.time() - (max_age_hours * 3600)
|
||||
for f in recordings_dir.glob("session_*.webm"):
|
||||
try:
|
||||
if f.stat().st_mtime < cutoff:
|
||||
f.unlink()
|
||||
except Exception:
|
||||
pass
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Cleanup and Management Functions
|
||||
# ============================================================================
|
||||
@@ -1561,6 +1711,9 @@ def cleanup_browser(task_id: Optional[str] = None) -> None:
|
||||
bb_session_id = session_info.get("bb_session_id", "unknown")
|
||||
logger.debug("Found session for task %s: bb_session_id=%s", task_id, bb_session_id)
|
||||
|
||||
# Stop auto-recording before closing (saves the file)
|
||||
_maybe_stop_recording(task_id)
|
||||
|
||||
# Try to close via agent-browser first (needs session in _active_sessions)
|
||||
try:
|
||||
_run_browser_command(task_id, "close", [], timeout=10)
|
||||
@@ -1776,6 +1929,13 @@ registry.register(
|
||||
name="browser_vision",
|
||||
toolset="browser",
|
||||
schema=_BROWSER_SCHEMA_MAP["browser_vision"],
|
||||
handler=lambda args, **kw: browser_vision(question=args.get("question", ""), task_id=kw.get("task_id")),
|
||||
handler=lambda args, **kw: browser_vision(question=args.get("question", ""), annotate=args.get("annotate", False), task_id=kw.get("task_id")),
|
||||
check_fn=check_browser_requirements,
|
||||
)
|
||||
registry.register(
|
||||
name="browser_console",
|
||||
toolset="browser",
|
||||
schema=_BROWSER_SCHEMA_MAP["browser_console"],
|
||||
handler=lambda args, **kw: browser_console(clear=args.get("clear", False), task_id=kw.get("task_id")),
|
||||
check_fn=check_browser_requirements,
|
||||
)
|
||||
|
||||
@@ -620,6 +620,16 @@ code_execution:
|
||||
max_tool_calls: 50 # Max tool calls within code execution
|
||||
```
|
||||
|
||||
## Browser
|
||||
|
||||
Configure browser automation behavior:
|
||||
|
||||
```yaml
|
||||
browser:
|
||||
inactivity_timeout: 120 # Seconds before auto-closing idle sessions
|
||||
record_sessions: false # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/
|
||||
```
|
||||
|
||||
## Delegation
|
||||
|
||||
Configure subagent behavior for the delegate tool:
|
||||
|
||||
@@ -142,6 +142,16 @@ What does the chart on this page show?
|
||||
|
||||
Screenshots are stored in `~/.hermes/browser_screenshots/` and automatically cleaned up after 24 hours.
|
||||
|
||||
### `browser_console`
|
||||
|
||||
Get browser console output (log/warn/error messages) and uncaught JavaScript exceptions from the current page. Essential for detecting silent JS errors that don't appear in the accessibility tree.
|
||||
|
||||
```
|
||||
Check the browser console for any JavaScript errors
|
||||
```
|
||||
|
||||
Use `clear=True` to clear the console after reading, so subsequent calls only show new messages.
|
||||
|
||||
### `browser_close`
|
||||
|
||||
Close the browser session and release resources. Call this when done to free up Browserbase session quota.
|
||||
@@ -175,6 +185,17 @@ Agent workflow:
|
||||
4. browser_close()
|
||||
```
|
||||
|
||||
## Session Recording
|
||||
|
||||
Automatically record browser sessions as WebM video files:
|
||||
|
||||
```yaml
|
||||
browser:
|
||||
record_sessions: true # default: false
|
||||
```
|
||||
|
||||
When enabled, recording starts automatically on the first `browser_navigate` and saves to `~/.hermes/browser_recordings/` when the session closes. Works in both local and cloud (Browserbase) modes. Recordings older than 72 hours are automatically cleaned up.
|
||||
|
||||
## Stealth Features
|
||||
|
||||
Browserbase provides automatic stealth capabilities:
|
||||
|
||||
@@ -15,7 +15,7 @@ Tools are functions that extend the agent's capabilities. They're organized into
|
||||
| **Web** | `web_search`, `web_extract` | Search the web, extract page content |
|
||||
| **Terminal** | `terminal`, `process` | Execute commands (local/docker/singularity/modal/daytona/ssh backends), manage background processes |
|
||||
| **File** | `read_file`, `write_file`, `patch`, `search_files` | Read, write, edit, and search files |
|
||||
| **Browser** | `browser_navigate`, `browser_click`, `browser_type`, etc. | Full browser automation via Browserbase |
|
||||
| **Browser** | `browser_navigate`, `browser_click`, `browser_type`, `browser_console`, etc. | Full browser automation via Browserbase |
|
||||
| **Vision** | `vision_analyze` | Image analysis via multimodal models |
|
||||
| **Image Gen** | `image_generate` | Generate images (FLUX via FAL) |
|
||||
| **TTS** | `text_to_speech` | Text-to-speech (Edge TTS / ElevenLabs / OpenAI) |
|
||||
|
||||
Reference in New Issue
Block a user