diff --git a/AGENTS.md b/AGENTS.md index 2a183cf6b..f6b6c6926 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -69,7 +69,7 @@ hermes-agent/ │ ├── file_tools.py # File read/write/search/patch tools │ ├── file_operations.py # File operations helpers │ ├── web_tools.py # Firecrawl search/extract -│ ├── browser_tool.py # Browserbase browser automation +│ ├── browser_tool.py # Browserbase browser automation (browser_console, session recording) │ ├── vision_tools.py # Image analysis via auxiliary LLM │ ├── image_generation_tool.py # FLUX image generation via fal.ai │ ├── tts_tool.py # Text-to-speech @@ -113,7 +113,7 @@ hermes-agent/ ├── cron/ # Scheduler implementation ├── environments/ # RL training environments (Atropos integration) ├── honcho_integration/ # Honcho client & session management -├── skills/ # Bundled skill sources +├── skills/ # Bundled skill sources (includes dogfood QA testing) ├── optional-skills/ # Official optional skills (not activated by default) ├── scripts/ # Install scripts, utilities ├── tests/ # Full pytest suite (~2300+ tests) diff --git a/cli.py b/cli.py index fdb34e68d..362fc6f2d 100755 --- a/cli.py +++ b/cli.py @@ -161,6 +161,7 @@ def load_cli_config() -> Dict[str, Any]: }, "browser": { "inactivity_timeout": 120, # Auto-cleanup inactive browser sessions after 2 min + "record_sessions": False, # Auto-record browser sessions as WebM videos }, "compression": { "enabled": True, # Auto-compress when approaching context limit diff --git a/hermes_cli/config.py b/hermes_cli/config.py index 27a9dc9e6..184440a5a 100644 --- a/hermes_cli/config.py +++ b/hermes_cli/config.py @@ -81,6 +81,7 @@ DEFAULT_CONFIG = { "browser": { "inactivity_timeout": 120, + "record_sessions": False, # Auto-record browser sessions as WebM videos }, "compression": { diff --git a/skills/dogfood/SKILL.md b/skills/dogfood/SKILL.md new file mode 100644 index 000000000..81a4ebfde --- /dev/null +++ b/skills/dogfood/SKILL.md @@ -0,0 +1,162 @@ +--- +name: dogfood +description: Systematic exploratory QA testing of web applications — find bugs, capture evidence, and generate structured reports +version: 1.0.0 +metadata: + hermes: + tags: [qa, testing, browser, web, dogfood] + related_skills: [] +--- + +# Dogfood: Systematic Web Application QA Testing + +## Overview + +This skill guides you through systematic exploratory QA testing of web applications using the browser toolset. You will navigate the application, interact with elements, capture evidence of issues, and produce a structured bug report. + +## Prerequisites + +- Browser toolset must be available (`browser_navigate`, `browser_snapshot`, `browser_click`, `browser_type`, `browser_vision`, `browser_console`, `browser_scroll`, `browser_back`, `browser_press`, `browser_close`) +- A target URL and testing scope from the user + +## Inputs + +The user provides: +1. **Target URL** — the entry point for testing +2. **Scope** — what areas/features to focus on (or "full site" for comprehensive testing) +3. **Output directory** (optional) — where to save screenshots and the report (default: `./dogfood-output`) + +## Workflow + +Follow this 5-phase systematic workflow: + +### Phase 1: Plan + +1. Create the output directory structure: + ``` + {output_dir}/ + ├── screenshots/ # Evidence screenshots + └── report.md # Final report (generated in Phase 5) + ``` +2. Identify the testing scope based on user input. +3. Build a rough sitemap by planning which pages and features to test: + - Landing/home page + - Navigation links (header, footer, sidebar) + - Key user flows (sign up, login, search, checkout, etc.) + - Forms and interactive elements + - Edge cases (empty states, error pages, 404s) + +### Phase 2: Explore + +For each page or feature in your plan: + +1. **Navigate** to the page: + ``` + browser_navigate(url="https://example.com/page") + ``` + +2. **Take a snapshot** to understand the DOM structure: + ``` + browser_snapshot() + ``` + +3. **Check the console** for JavaScript errors: + ``` + browser_console(clear=true) + ``` + Do this after every navigation and after every significant interaction. Silent JS errors are high-value findings. + +4. **Take an annotated screenshot** to visually assess the page and identify interactive elements: + ``` + browser_vision(question="Describe the page layout, identify any visual issues, broken elements, or accessibility concerns", annotate=true) + ``` + The `annotate=true` flag overlays numbered `[N]` labels on interactive elements. Each `[N]` maps to ref `@eN` for subsequent browser commands. + +5. **Test interactive elements** systematically: + - Click buttons and links: `browser_click(ref="@eN")` + - Fill forms: `browser_type(ref="@eN", text="test input")` + - Test keyboard navigation: `browser_press(key="Tab")`, `browser_press(key="Enter")` + - Scroll through content: `browser_scroll(direction="down")` + - Test form validation with invalid inputs + - Test empty submissions + +6. **After each interaction**, check for: + - Console errors: `browser_console()` + - Visual changes: `browser_vision(question="What changed after the interaction?")` + - Expected vs actual behavior + +### Phase 3: Collect Evidence + +For every issue found: + +1. **Take a screenshot** showing the issue: + ``` + browser_vision(question="Capture and describe the issue visible on this page", annotate=false) + ``` + Save the `screenshot_path` from the response — you will reference it in the report. + +2. **Record the details**: + - URL where the issue occurs + - Steps to reproduce + - Expected behavior + - Actual behavior + - Console errors (if any) + - Screenshot path + +3. **Classify the issue** using the issue taxonomy (see `references/issue-taxonomy.md`): + - Severity: Critical / High / Medium / Low + - Category: Functional / Visual / Accessibility / Console / UX / Content + +### Phase 4: Categorize + +1. Review all collected issues. +2. De-duplicate — merge issues that are the same bug manifesting in different places. +3. Assign final severity and category to each issue. +4. Sort by severity (Critical first, then High, Medium, Low). +5. Count issues by severity and category for the executive summary. + +### Phase 5: Report + +Generate the final report using the template at `templates/dogfood-report-template.md`. + +The report must include: +1. **Executive summary** with total issue count, breakdown by severity, and testing scope +2. **Per-issue sections** with: + - Issue number and title + - Severity and category badges + - URL where observed + - Description of the issue + - Steps to reproduce + - Expected vs actual behavior + - Screenshot references (use `MEDIA:` for inline images) + - Console errors if relevant +3. **Summary table** of all issues +4. **Testing notes** — what was tested, what was not, any blockers + +Save the report to `{output_dir}/report.md`. + +## Tools Reference + +| Tool | Purpose | +|------|---------| +| `browser_navigate` | Go to a URL | +| `browser_snapshot` | Get DOM text snapshot (accessibility tree) | +| `browser_click` | Click an element by ref (`@eN`) or text | +| `browser_type` | Type into an input field | +| `browser_scroll` | Scroll up/down on the page | +| `browser_back` | Go back in browser history | +| `browser_press` | Press a keyboard key | +| `browser_vision` | Screenshot + AI analysis; use `annotate=true` for element labels | +| `browser_console` | Get JS console output and errors | +| `browser_close` | Close the browser session | + +## Tips + +- **Always check `browser_console()` after navigating and after significant interactions.** Silent JS errors are among the most valuable findings. +- **Use `annotate=true` with `browser_vision`** when you need to reason about interactive element positions or when the snapshot refs are unclear. +- **Test with both valid and invalid inputs** — form validation bugs are common. +- **Scroll through long pages** — content below the fold may have rendering issues. +- **Test navigation flows** — click through multi-step processes end-to-end. +- **Check responsive behavior** by noting any layout issues visible in screenshots. +- **Don't forget edge cases**: empty states, very long text, special characters, rapid clicking. +- When reporting screenshots to the user, include `MEDIA:` so they can see the evidence inline. diff --git a/skills/dogfood/references/issue-taxonomy.md b/skills/dogfood/references/issue-taxonomy.md new file mode 100644 index 000000000..59489929a --- /dev/null +++ b/skills/dogfood/references/issue-taxonomy.md @@ -0,0 +1,109 @@ +# Issue Taxonomy + +Use this taxonomy to classify issues found during dogfood QA testing. + +## Severity Levels + +### Critical +The issue makes a core feature completely unusable or causes data loss. + +**Examples:** +- Application crashes or shows a blank white page +- Form submission silently loses user data +- Authentication is completely broken (can't log in at all) +- Payment flow fails and charges the user without completing the order +- Security vulnerability (e.g., XSS, exposed credentials in console) + +### High +The issue significantly impairs functionality but a workaround may exist. + +**Examples:** +- A key button does nothing when clicked (but refreshing fixes it) +- Search returns no results for valid queries +- Form validation rejects valid input +- Page loads but critical content is missing or garbled +- Navigation link leads to a 404 or wrong page +- Uncaught JavaScript exceptions in the console on core pages + +### Medium +The issue is noticeable and affects user experience but doesn't block core functionality. + +**Examples:** +- Layout is misaligned or overlapping on certain screen sections +- Images fail to load (broken image icons) +- Slow performance (visible loading delays > 3 seconds) +- Form field lacks proper validation feedback (no error message on bad input) +- Console warnings that suggest deprecated or misconfigured features +- Inconsistent styling between similar pages + +### Low +Minor polish issues that don't affect functionality. + +**Examples:** +- Typos or grammatical errors in text content +- Minor spacing or alignment inconsistencies +- Placeholder text left in production ("Lorem ipsum") +- Favicon missing +- Console info/debug messages that shouldn't be in production +- Subtle color contrast issues that don't fail WCAG requirements + +## Categories + +### Functional +Issues where features don't work as expected. + +- Buttons/links that don't respond +- Forms that don't submit or submit incorrectly +- Broken user flows (can't complete a multi-step process) +- Incorrect data displayed +- Features that work partially + +### Visual +Issues with the visual presentation of the page. + +- Layout problems (overlapping elements, broken grids) +- Broken images or missing media +- Styling inconsistencies +- Responsive design failures +- Z-index issues (elements hidden behind others) +- Text overflow or truncation + +### Accessibility +Issues that prevent or hinder access for users with disabilities. + +- Missing alt text on meaningful images +- Poor color contrast (fails WCAG AA) +- Elements not reachable via keyboard navigation +- Missing form labels or ARIA attributes +- Focus indicators missing or unclear +- Screen reader incompatible content + +### Console +Issues detected through JavaScript console output. + +- Uncaught exceptions and unhandled promise rejections +- Failed network requests (4xx, 5xx errors in console) +- Deprecation warnings +- CORS errors +- Mixed content warnings (HTTP resources on HTTPS page) +- Excessive console.log output left from development + +### UX (User Experience) +Issues where functionality works but the experience is poor. + +- Confusing navigation or information architecture +- Missing loading indicators (user doesn't know something is happening) +- No feedback after user actions (e.g., button click with no visible result) +- Inconsistent interaction patterns +- Missing confirmation dialogs for destructive actions +- Poor error messages that don't help the user recover + +### Content +Issues with the text, media, or information on the page. + +- Typos and grammatical errors +- Placeholder/dummy content in production +- Outdated information +- Missing content (empty sections) +- Broken or dead links to external resources +- Incorrect or misleading labels diff --git a/skills/dogfood/templates/dogfood-report-template.md b/skills/dogfood/templates/dogfood-report-template.md new file mode 100644 index 000000000..9a500c5c8 --- /dev/null +++ b/skills/dogfood/templates/dogfood-report-template.md @@ -0,0 +1,86 @@ +# Dogfood QA Report + +**Target:** {target_url} +**Date:** {date} +**Scope:** {scope_description} +**Tester:** Hermes Agent (automated exploratory QA) + +--- + +## Executive Summary + +| Severity | Count | +|----------|-------| +| 🔴 Critical | {critical_count} | +| 🟠 High | {high_count} | +| 🟡 Medium | {medium_count} | +| 🔵 Low | {low_count} | +| **Total** | **{total_count}** | + +**Overall Assessment:** {one_sentence_assessment} + +--- + +## Issues + + + +### Issue #{issue_number}: {issue_title} + +| Field | Value | +|-------|-------| +| **Severity** | {severity} | +| **Category** | {category} | +| **URL** | {url_where_found} | + +**Description:** +{detailed_description_of_the_issue} + +**Steps to Reproduce:** +1. {step_1} +2. {step_2} +3. {step_3} + +**Expected Behavior:** +{what_should_happen} + +**Actual Behavior:** +{what_actually_happens} + +**Screenshot:** +MEDIA:{screenshot_path} + +**Console Errors** (if applicable): +``` +{console_error_output} +``` + +--- + + + +## Issues Summary Table + +| # | Title | Severity | Category | URL | +|---|-------|----------|----------|-----| +| {n} | {title} | {severity} | {category} | {url} | + +## Testing Coverage + +### Pages Tested +- {list_of_pages_visited} + +### Features Tested +- {list_of_features_exercised} + +### Not Tested / Out of Scope +- {areas_not_covered_and_why} + +### Blockers +- {any_issues_that_prevented_testing_certain_areas} + +--- + +## Notes + +{any_additional_observations_or_recommendations} diff --git a/tests/tools/test_browser_console.py b/tests/tools/test_browser_console.py new file mode 100644 index 000000000..962b49f02 --- /dev/null +++ b/tests/tools/test_browser_console.py @@ -0,0 +1,276 @@ +"""Tests for browser_console tool and browser_vision annotate param.""" + +import json +import os +import sys +from unittest.mock import patch, MagicMock + +import pytest + +sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..")) + + +# ── browser_console ────────────────────────────────────────────────── + + +class TestBrowserConsole: + """browser_console() returns console messages + JS errors in one call.""" + + def test_returns_console_messages_and_errors(self): + from tools.browser_tool import browser_console + + console_response = { + "success": True, + "data": { + "messages": [ + {"text": "hello", "type": "log", "timestamp": 1}, + {"text": "oops", "type": "error", "timestamp": 2}, + ] + }, + } + errors_response = { + "success": True, + "data": { + "errors": [ + {"message": "Uncaught TypeError", "timestamp": 3}, + ] + }, + } + + with patch("tools.browser_tool._run_browser_command") as mock_cmd: + mock_cmd.side_effect = [console_response, errors_response] + result = json.loads(browser_console(task_id="test")) + + assert result["success"] is True + assert result["total_messages"] == 2 + assert result["total_errors"] == 1 + assert result["console_messages"][0]["text"] == "hello" + assert result["console_messages"][1]["text"] == "oops" + assert result["js_errors"][0]["message"] == "Uncaught TypeError" + + def test_passes_clear_flag(self): + from tools.browser_tool import browser_console + + empty = {"success": True, "data": {"messages": [], "errors": []}} + with patch("tools.browser_tool._run_browser_command", return_value=empty) as mock_cmd: + browser_console(clear=True, task_id="test") + + calls = mock_cmd.call_args_list + # Both console and errors should get --clear + assert calls[0][0] == ("test", "console", ["--clear"]) + assert calls[1][0] == ("test", "errors", ["--clear"]) + + def test_no_clear_by_default(self): + from tools.browser_tool import browser_console + + empty = {"success": True, "data": {"messages": [], "errors": []}} + with patch("tools.browser_tool._run_browser_command", return_value=empty) as mock_cmd: + browser_console(task_id="test") + + calls = mock_cmd.call_args_list + assert calls[0][0] == ("test", "console", []) + assert calls[1][0] == ("test", "errors", []) + + def test_empty_console_and_errors(self): + from tools.browser_tool import browser_console + + empty = {"success": True, "data": {"messages": [], "errors": []}} + with patch("tools.browser_tool._run_browser_command", return_value=empty): + result = json.loads(browser_console(task_id="test")) + + assert result["total_messages"] == 0 + assert result["total_errors"] == 0 + assert result["console_messages"] == [] + assert result["js_errors"] == [] + + def test_handles_failed_commands(self): + from tools.browser_tool import browser_console + + failed = {"success": False, "error": "No session"} + with patch("tools.browser_tool._run_browser_command", return_value=failed): + result = json.loads(browser_console(task_id="test")) + + # Should still return success with empty data + assert result["success"] is True + assert result["total_messages"] == 0 + assert result["total_errors"] == 0 + + +# ── browser_console schema ─────────────────────────────────────────── + + +class TestBrowserConsoleSchema: + """browser_console is properly registered in the tool registry.""" + + def test_schema_in_browser_schemas(self): + from tools.browser_tool import BROWSER_TOOL_SCHEMAS + + names = [s["name"] for s in BROWSER_TOOL_SCHEMAS] + assert "browser_console" in names + + def test_schema_has_clear_param(self): + from tools.browser_tool import BROWSER_TOOL_SCHEMAS + + schema = next(s for s in BROWSER_TOOL_SCHEMAS if s["name"] == "browser_console") + props = schema["parameters"]["properties"] + assert "clear" in props + assert props["clear"]["type"] == "boolean" + + +# ── browser_vision annotate ────────────────────────────────────────── + + +class TestBrowserVisionAnnotate: + """browser_vision supports annotate parameter.""" + + def test_schema_has_annotate_param(self): + from tools.browser_tool import BROWSER_TOOL_SCHEMAS + + schema = next(s for s in BROWSER_TOOL_SCHEMAS if s["name"] == "browser_vision") + props = schema["parameters"]["properties"] + assert "annotate" in props + assert props["annotate"]["type"] == "boolean" + + def test_annotate_false_no_flag(self): + """Without annotate, screenshot command has no --annotate flag.""" + from tools.browser_tool import browser_vision + + with ( + patch("tools.browser_tool._run_browser_command") as mock_cmd, + patch("tools.browser_tool._aux_vision_client") as mock_client, + patch("tools.browser_tool._DEFAULT_VISION_MODEL", "test-model"), + patch("tools.browser_tool._get_vision_model", return_value="test-model"), + ): + mock_cmd.return_value = {"success": True, "data": {}} + # Will fail at screenshot file read, but we can check the command + try: + browser_vision("test", annotate=False, task_id="test") + except Exception: + pass + + if mock_cmd.called: + args = mock_cmd.call_args[0] + cmd_args = args[2] if len(args) > 2 else [] + assert "--annotate" not in cmd_args + + def test_annotate_true_adds_flag(self): + """With annotate=True, screenshot command includes --annotate.""" + from tools.browser_tool import browser_vision + + with ( + patch("tools.browser_tool._run_browser_command") as mock_cmd, + patch("tools.browser_tool._aux_vision_client") as mock_client, + patch("tools.browser_tool._DEFAULT_VISION_MODEL", "test-model"), + patch("tools.browser_tool._get_vision_model", return_value="test-model"), + ): + mock_cmd.return_value = {"success": True, "data": {}} + try: + browser_vision("test", annotate=True, task_id="test") + except Exception: + pass + + if mock_cmd.called: + args = mock_cmd.call_args[0] + cmd_args = args[2] if len(args) > 2 else [] + assert "--annotate" in cmd_args + + +# ── auto-recording config ──────────────────────────────────────────── + + +class TestRecordSessionsConfig: + """browser.record_sessions config option.""" + + def test_default_config_has_record_sessions(self): + from hermes_cli.config import DEFAULT_CONFIG + + browser_cfg = DEFAULT_CONFIG.get("browser", {}) + assert "record_sessions" in browser_cfg + assert browser_cfg["record_sessions"] is False + + def test_maybe_start_recording_disabled(self): + """Recording doesn't start when config says record_sessions: false.""" + from tools.browser_tool import _maybe_start_recording, _recording_sessions + + with ( + patch("tools.browser_tool._run_browser_command") as mock_cmd, + patch("builtins.open", side_effect=FileNotFoundError), + ): + _maybe_start_recording("test-task") + + mock_cmd.assert_not_called() + assert "test-task" not in _recording_sessions + + def test_maybe_stop_recording_noop_when_not_recording(self): + """Stopping when not recording is a no-op.""" + from tools.browser_tool import _maybe_stop_recording, _recording_sessions + + _recording_sessions.discard("test-task") # ensure not in set + with patch("tools.browser_tool._run_browser_command") as mock_cmd: + _maybe_stop_recording("test-task") + + mock_cmd.assert_not_called() + + +# ── dogfood skill files ────────────────────────────────────────────── + + +class TestDogfoodSkill: + """Dogfood skill files exist and have correct structure.""" + + @pytest.fixture(autouse=True) + def _skill_dir(self): + # Use the actual repo skills dir (not temp) + self.skill_dir = os.path.join( + os.path.dirname(__file__), "..", "..", "skills", "dogfood" + ) + + def test_skill_md_exists(self): + assert os.path.exists(os.path.join(self.skill_dir, "SKILL.md")) + + def test_taxonomy_exists(self): + assert os.path.exists( + os.path.join(self.skill_dir, "references", "issue-taxonomy.md") + ) + + def test_report_template_exists(self): + assert os.path.exists( + os.path.join(self.skill_dir, "templates", "dogfood-report-template.md") + ) + + def test_skill_md_has_frontmatter(self): + with open(os.path.join(self.skill_dir, "SKILL.md")) as f: + content = f.read() + assert content.startswith("---") + assert "name: dogfood" in content + assert "description:" in content + + def test_skill_references_browser_console(self): + with open(os.path.join(self.skill_dir, "SKILL.md")) as f: + content = f.read() + assert "browser_console" in content + + def test_skill_references_annotate(self): + with open(os.path.join(self.skill_dir, "SKILL.md")) as f: + content = f.read() + assert "annotate" in content + + def test_taxonomy_has_severity_levels(self): + with open( + os.path.join(self.skill_dir, "references", "issue-taxonomy.md") + ) as f: + content = f.read() + assert "Critical" in content + assert "High" in content + assert "Medium" in content + assert "Low" in content + + def test_taxonomy_has_categories(self): + with open( + os.path.join(self.skill_dir, "references", "issue-taxonomy.md") + ) as f: + content = f.read() + assert "Functional" in content + assert "Visual" in content + assert "Accessibility" in content + assert "Console" in content diff --git a/tools/browser_tool.py b/tools/browser_tool.py index d238d1435..480093eaa 100644 --- a/tools/browser_tool.py +++ b/tools/browser_tool.py @@ -144,6 +144,7 @@ def _socket_safe_tmpdir() -> str: # Track active sessions per task # Stores: session_name (always), bb_session_id + cdp_url (cloud mode only) _active_sessions: Dict[str, Dict[str, str]] = {} # task_id -> {session_name, ...} +_recording_sessions: set = set() # task_ids with active recordings # Flag to track if cleanup has been done _cleanup_done = False @@ -478,11 +479,31 @@ BROWSER_TOOL_SCHEMAS = [ "question": { "type": "string", "description": "What you want to know about the page visually. Be specific about what you're looking for." + }, + "annotate": { + "type": "boolean", + "default": False, + "description": "If true, overlay numbered [N] labels on interactive elements. Each [N] maps to ref @eN for subsequent browser commands. Useful for QA and spatial reasoning about page layout." } }, "required": ["question"] } }, + { + "name": "browser_console", + "description": "Get browser console output and JavaScript errors from the current page. Returns console.log/warn/error/info messages and uncaught JS exceptions. Use this to detect silent JavaScript errors, failed API calls, and application warnings. Requires browser_navigate to be called first.", + "parameters": { + "type": "object", + "properties": { + "clear": { + "type": "boolean", + "default": False, + "description": "If true, clear the message buffers after reading" + } + }, + "required": [] + } + }, ] @@ -998,9 +1019,10 @@ def browser_navigate(url: str, task_id: Optional[str] = None) -> str: session_info = _get_session_info(effective_task_id) is_first_nav = session_info.get("_first_nav", True) - # Mark that we've done at least one navigation + # Auto-start recording if configured and this is first navigation if is_first_nav: session_info["_first_nav"] = False + _maybe_start_recording(effective_task_id) result = _run_browser_command(effective_task_id, "open", [url], timeout=60) @@ -1264,6 +1286,10 @@ def browser_close(task_id: Optional[str] = None) -> str: JSON string with close result """ effective_task_id = task_id or "default" + + # Stop auto-recording before closing + _maybe_stop_recording(effective_task_id) + result = _run_browser_command(effective_task_id, "close", []) # Close the backend session (Browserbase API in cloud mode, nothing extra in local mode) @@ -1294,6 +1320,103 @@ def browser_close(task_id: Optional[str] = None) -> str: }, ensure_ascii=False) +def browser_console(clear: bool = False, task_id: Optional[str] = None) -> str: + """Get browser console messages and JavaScript errors. + + Returns both console output (log/warn/error/info from the page's JS) + and uncaught exceptions (crashes, unhandled promise rejections). + + Args: + clear: If True, clear the message/error buffers after reading + task_id: Task identifier for session isolation + + Returns: + JSON string with console messages and JS errors + """ + effective_task_id = task_id or "default" + + console_args = ["--clear"] if clear else [] + error_args = ["--clear"] if clear else [] + + console_result = _run_browser_command(effective_task_id, "console", console_args) + errors_result = _run_browser_command(effective_task_id, "errors", error_args) + + messages = [] + if console_result.get("success"): + for msg in console_result.get("data", {}).get("messages", []): + messages.append({ + "type": msg.get("type", "log"), + "text": msg.get("text", ""), + "source": "console", + }) + + errors = [] + if errors_result.get("success"): + for err in errors_result.get("data", {}).get("errors", []): + errors.append({ + "message": err.get("message", ""), + "source": "exception", + }) + + return json.dumps({ + "success": True, + "console_messages": messages, + "js_errors": errors, + "total_messages": len(messages), + "total_errors": len(errors), + }, ensure_ascii=False) + + +def _maybe_start_recording(task_id: str): + """Start recording if browser.record_sessions is enabled in config.""" + if task_id in _recording_sessions: + return + try: + hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes")) + config_path = hermes_home / "config.yaml" + record_enabled = False + if config_path.exists(): + import yaml + with open(config_path) as f: + cfg = yaml.safe_load(f) or {} + record_enabled = cfg.get("browser", {}).get("record_sessions", False) + + if not record_enabled: + return + + recordings_dir = hermes_home / "browser_recordings" + recordings_dir.mkdir(parents=True, exist_ok=True) + _cleanup_old_recordings(max_age_hours=72) + + import time + timestamp = time.strftime("%Y%m%d_%H%M%S") + recording_path = recordings_dir / f"session_{timestamp}_{task_id[:16]}.webm" + + result = _run_browser_command(task_id, "record", ["start", str(recording_path)]) + if result.get("success"): + _recording_sessions.add(task_id) + logger.info("Auto-recording browser session %s to %s", task_id, recording_path) + else: + logger.debug("Could not start auto-recording: %s", result.get("error")) + except Exception as e: + logger.debug("Auto-recording setup failed: %s", e) + + +def _maybe_stop_recording(task_id: str): + """Stop recording if one is active for this session.""" + if task_id not in _recording_sessions: + return + try: + result = _run_browser_command(task_id, "record", ["stop"]) + if result.get("success"): + path = result.get("data", {}).get("path", "") + logger.info("Saved browser recording for session %s: %s", task_id, path) + except Exception as e: + logger.debug("Could not stop recording for %s: %s", task_id, e) + finally: + _recording_sessions.discard(task_id) + + def browser_get_images(task_id: Optional[str] = None) -> str: """ Get all images on the current page. @@ -1348,7 +1471,7 @@ def browser_get_images(task_id: Optional[str] = None) -> str: }, ensure_ascii=False) -def browser_vision(question: str, task_id: Optional[str] = None) -> str: +def browser_vision(question: str, annotate: bool = False, task_id: Optional[str] = None) -> str: """ Take a screenshot of the current page and analyze it with vision AI. @@ -1362,6 +1485,7 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str: Args: question: What you want to know about the page visually + annotate: If True, overlay numbered [N] labels on interactive elements task_id: Task identifier for session isolation Returns: @@ -1393,10 +1517,13 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str: _cleanup_old_screenshots(screenshots_dir, max_age_hours=24) # Take screenshot using agent-browser + screenshot_args = [str(screenshot_path)] + if annotate: + screenshot_args.insert(0, "--annotate") result = _run_browser_command( effective_task_id, "screenshot", - [str(screenshot_path)], + screenshot_args, timeout=30 ) @@ -1456,11 +1583,15 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str: ) analysis = response.choices[0].message.content - return json.dumps({ + response_data = { "success": True, "analysis": analysis, "screenshot_path": str(screenshot_path), - }, ensure_ascii=False) + } + # Include annotation data if annotated screenshot was taken + if annotate and result.get("data", {}).get("annotations"): + response_data["annotations"] = result["data"]["annotations"] + return json.dumps(response_data, ensure_ascii=False) except Exception as e: # Keep the screenshot if it was captured successfully — the failure is @@ -1490,6 +1621,25 @@ def _cleanup_old_screenshots(screenshots_dir, max_age_hours=24): pass # Non-critical — don't fail the screenshot operation +def _cleanup_old_recordings(max_age_hours=72): + """Remove browser recordings older than max_age_hours to prevent disk bloat.""" + import time + try: + hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes")) + recordings_dir = hermes_home / "browser_recordings" + if not recordings_dir.exists(): + return + cutoff = time.time() - (max_age_hours * 3600) + for f in recordings_dir.glob("session_*.webm"): + try: + if f.stat().st_mtime < cutoff: + f.unlink() + except Exception: + pass + except Exception: + pass + + # ============================================================================ # Cleanup and Management Functions # ============================================================================ @@ -1561,6 +1711,9 @@ def cleanup_browser(task_id: Optional[str] = None) -> None: bb_session_id = session_info.get("bb_session_id", "unknown") logger.debug("Found session for task %s: bb_session_id=%s", task_id, bb_session_id) + # Stop auto-recording before closing (saves the file) + _maybe_stop_recording(task_id) + # Try to close via agent-browser first (needs session in _active_sessions) try: _run_browser_command(task_id, "close", [], timeout=10) @@ -1776,6 +1929,13 @@ registry.register( name="browser_vision", toolset="browser", schema=_BROWSER_SCHEMA_MAP["browser_vision"], - handler=lambda args, **kw: browser_vision(question=args.get("question", ""), task_id=kw.get("task_id")), + handler=lambda args, **kw: browser_vision(question=args.get("question", ""), annotate=args.get("annotate", False), task_id=kw.get("task_id")), + check_fn=check_browser_requirements, +) +registry.register( + name="browser_console", + toolset="browser", + schema=_BROWSER_SCHEMA_MAP["browser_console"], + handler=lambda args, **kw: browser_console(clear=args.get("clear", False), task_id=kw.get("task_id")), check_fn=check_browser_requirements, ) diff --git a/website/docs/user-guide/configuration.md b/website/docs/user-guide/configuration.md index f2abd16ca..0420b435f 100644 --- a/website/docs/user-guide/configuration.md +++ b/website/docs/user-guide/configuration.md @@ -620,6 +620,16 @@ code_execution: max_tool_calls: 50 # Max tool calls within code execution ``` +## Browser + +Configure browser automation behavior: + +```yaml +browser: + inactivity_timeout: 120 # Seconds before auto-closing idle sessions + record_sessions: false # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/ +``` + ## Delegation Configure subagent behavior for the delegate tool: diff --git a/website/docs/user-guide/features/browser.md b/website/docs/user-guide/features/browser.md index 70201100b..f7822c884 100644 --- a/website/docs/user-guide/features/browser.md +++ b/website/docs/user-guide/features/browser.md @@ -142,6 +142,16 @@ What does the chart on this page show? Screenshots are stored in `~/.hermes/browser_screenshots/` and automatically cleaned up after 24 hours. +### `browser_console` + +Get browser console output (log/warn/error messages) and uncaught JavaScript exceptions from the current page. Essential for detecting silent JS errors that don't appear in the accessibility tree. + +``` +Check the browser console for any JavaScript errors +``` + +Use `clear=True` to clear the console after reading, so subsequent calls only show new messages. + ### `browser_close` Close the browser session and release resources. Call this when done to free up Browserbase session quota. @@ -175,6 +185,17 @@ Agent workflow: 4. browser_close() ``` +## Session Recording + +Automatically record browser sessions as WebM video files: + +```yaml +browser: + record_sessions: true # default: false +``` + +When enabled, recording starts automatically on the first `browser_navigate` and saves to `~/.hermes/browser_recordings/` when the session closes. Works in both local and cloud (Browserbase) modes. Recordings older than 72 hours are automatically cleaned up. + ## Stealth Features Browserbase provides automatic stealth capabilities: diff --git a/website/docs/user-guide/features/tools.md b/website/docs/user-guide/features/tools.md index daf982fea..e054adf14 100644 --- a/website/docs/user-guide/features/tools.md +++ b/website/docs/user-guide/features/tools.md @@ -15,7 +15,7 @@ Tools are functions that extend the agent's capabilities. They're organized into | **Web** | `web_search`, `web_extract` | Search the web, extract page content | | **Terminal** | `terminal`, `process` | Execute commands (local/docker/singularity/modal/daytona/ssh backends), manage background processes | | **File** | `read_file`, `write_file`, `patch`, `search_files` | Read, write, edit, and search files | -| **Browser** | `browser_navigate`, `browser_click`, `browser_type`, etc. | Full browser automation via Browserbase | +| **Browser** | `browser_navigate`, `browser_click`, `browser_type`, `browser_console`, etc. | Full browser automation via Browserbase | | **Vision** | `vision_analyze` | Image analysis via multimodal models | | **Image Gen** | `image_generate` | Generate images (FLUX via FAL) | | **TTS** | `text_to_speech` | Text-to-speech (Edge TTS / ElevenLabs / OpenAI) |