Merge pull request #745 from NousResearch/hermes/hermes-f8d56335

feat: browser console tool, annotated screenshots, auto-recording, and dogfood QA skill
2026-03-08 21:29:52 -07:00
parent 3b312d45c5 a8bf414f4a
commit 816a3ef6f1
11 changed files with 835 additions and 9 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -69,7 +69,7 @@ hermes-agent/
 │   ├── file_tools.py          # File read/write/search/patch tools
 │   ├── file_operations.py     # File operations helpers
 │   ├── web_tools.py           # Firecrawl search/extract
-│   ├── browser_tool.py        # Browserbase browser automation
+│   ├── browser_tool.py        # Browserbase browser automation (browser_console, session recording)
 │   ├── vision_tools.py        # Image analysis via auxiliary LLM
 │   ├── image_generation_tool.py # FLUX image generation via fal.ai
 │   ├── tts_tool.py            # Text-to-speech
@@ -113,7 +113,7 @@ hermes-agent/
 ├── cron/                 # Scheduler implementation
 ├── environments/         # RL training environments (Atropos integration)
 ├── honcho_integration/   # Honcho client & session management
-├── skills/               # Bundled skill sources
+├── skills/               # Bundled skill sources (includes dogfood QA testing)
 ├── optional-skills/      # Official optional skills (not activated by default)
 ├── scripts/              # Install scripts, utilities
 ├── tests/                # Full pytest suite (~2300+ tests)
--- a/cli.py
+++ b/cli.py
@@ -161,6 +161,7 @@ def load_cli_config() -> Dict[str, Any]:
        },
        "browser": {
            "inactivity_timeout": 120,  # Auto-cleanup inactive browser sessions after 2 min
+            "record_sessions": False,  # Auto-record browser sessions as WebM videos
        },
        "compression": {
            "enabled": True,      # Auto-compress when approaching context limit
--- a/hermes_cli/config.py
+++ b/hermes_cli/config.py
@@ -81,6 +81,7 @@ DEFAULT_CONFIG = {
    
    "browser": {
        "inactivity_timeout": 120,
+        "record_sessions": False,  # Auto-record browser sessions as WebM videos
    },
    
    "compression": {
--- a/skills/dogfood/SKILL.md
+++ b/skills/dogfood/SKILL.md
@@ -0,0 +1,162 @@
+---
+name: dogfood
+description: Systematic exploratory QA testing of web applications — find bugs, capture evidence, and generate structured reports
+version: 1.0.0
+metadata:
+  hermes:
+    tags: [qa, testing, browser, web, dogfood]
+    related_skills: []
+---
+
+# Dogfood: Systematic Web Application QA Testing
+
+## Overview
+
+This skill guides you through systematic exploratory QA testing of web applications using the browser toolset. You will navigate the application, interact with elements, capture evidence of issues, and produce a structured bug report.
+
+## Prerequisites
+
+- Browser toolset must be available (`browser_navigate`, `browser_snapshot`, `browser_click`, `browser_type`, `browser_vision`, `browser_console`, `browser_scroll`, `browser_back`, `browser_press`, `browser_close`)
+- A target URL and testing scope from the user
+
+## Inputs
+
+The user provides:
+1. **Target URL** — the entry point for testing
+2. **Scope** — what areas/features to focus on (or "full site" for comprehensive testing)
+3. **Output directory** (optional) — where to save screenshots and the report (default: `./dogfood-output`)
+
+## Workflow
+
+Follow this 5-phase systematic workflow:
+
+### Phase 1: Plan
+
+1. Create the output directory structure:
+   ```
+   {output_dir}/
+   ├── screenshots/       # Evidence screenshots
+   └── report.md          # Final report (generated in Phase 5)
+   ```
+2. Identify the testing scope based on user input.
+3. Build a rough sitemap by planning which pages and features to test:
+   - Landing/home page
+   - Navigation links (header, footer, sidebar)
+   - Key user flows (sign up, login, search, checkout, etc.)
+   - Forms and interactive elements
+   - Edge cases (empty states, error pages, 404s)
+
+### Phase 2: Explore
+
+For each page or feature in your plan:
+
+1. **Navigate** to the page:
+   ```
+   browser_navigate(url="https://example.com/page")
+   ```
+
+2. **Take a snapshot** to understand the DOM structure:
+   ```
+   browser_snapshot()
+   ```
+
+3. **Check the console** for JavaScript errors:
+   ```
+   browser_console(clear=true)
+   ```
+   Do this after every navigation and after every significant interaction. Silent JS errors are high-value findings.
+
+4. **Take an annotated screenshot** to visually assess the page and identify interactive elements:
+   ```
+   browser_vision(question="Describe the page layout, identify any visual issues, broken elements, or accessibility concerns", annotate=true)
+   ```
+   The `annotate=true` flag overlays numbered `[N]` labels on interactive elements. Each `[N]` maps to ref `@eN` for subsequent browser commands.
+
+5. **Test interactive elements** systematically:
+   - Click buttons and links: `browser_click(ref="@eN")`
+   - Fill forms: `browser_type(ref="@eN", text="test input")`
+   - Test keyboard navigation: `browser_press(key="Tab")`, `browser_press(key="Enter")`
+   - Scroll through content: `browser_scroll(direction="down")`
+   - Test form validation with invalid inputs
+   - Test empty submissions
+
+6. **After each interaction**, check for:
+   - Console errors: `browser_console()`
+   - Visual changes: `browser_vision(question="What changed after the interaction?")`
+   - Expected vs actual behavior
+
+### Phase 3: Collect Evidence
+
+For every issue found:
+
+1. **Take a screenshot** showing the issue:
+   ```
+   browser_vision(question="Capture and describe the issue visible on this page", annotate=false)
+   ```
+   Save the `screenshot_path` from the response — you will reference it in the report.
+
+2. **Record the details**:
+   - URL where the issue occurs
+   - Steps to reproduce
+   - Expected behavior
+   - Actual behavior
+   - Console errors (if any)
+   - Screenshot path
+
+3. **Classify the issue** using the issue taxonomy (see `references/issue-taxonomy.md`):
+   - Severity: Critical / High / Medium / Low
+   - Category: Functional / Visual / Accessibility / Console / UX / Content
+
+### Phase 4: Categorize
+
+1. Review all collected issues.
+2. De-duplicate — merge issues that are the same bug manifesting in different places.
+3. Assign final severity and category to each issue.
+4. Sort by severity (Critical first, then High, Medium, Low).
+5. Count issues by severity and category for the executive summary.
+
+### Phase 5: Report
+
+Generate the final report using the template at `templates/dogfood-report-template.md`.
+
+The report must include:
+1. **Executive summary** with total issue count, breakdown by severity, and testing scope
+2. **Per-issue sections** with:
+   - Issue number and title
+   - Severity and category badges
+   - URL where observed
+   - Description of the issue
+   - Steps to reproduce
+   - Expected vs actual behavior
+   - Screenshot references (use `MEDIA:<screenshot_path>` for inline images)
+   - Console errors if relevant
+3. **Summary table** of all issues
+4. **Testing notes** — what was tested, what was not, any blockers
+
+Save the report to `{output_dir}/report.md`.
+
+## Tools Reference
+
+| Tool | Purpose |
+|------|---------|
+| `browser_navigate` | Go to a URL |
+| `browser_snapshot` | Get DOM text snapshot (accessibility tree) |
+| `browser_click` | Click an element by ref (`@eN`) or text |
+| `browser_type` | Type into an input field |
+| `browser_scroll` | Scroll up/down on the page |
+| `browser_back` | Go back in browser history |
+| `browser_press` | Press a keyboard key |
+| `browser_vision` | Screenshot + AI analysis; use `annotate=true` for element labels |
+| `browser_console` | Get JS console output and errors |
+| `browser_close` | Close the browser session |
+
+## Tips
+
+- **Always check `browser_console()` after navigating and after significant interactions.** Silent JS errors are among the most valuable findings.
+- **Use `annotate=true` with `browser_vision`** when you need to reason about interactive element positions or when the snapshot refs are unclear.
+- **Test with both valid and invalid inputs** — form validation bugs are common.
+- **Scroll through long pages** — content below the fold may have rendering issues.
+- **Test navigation flows** — click through multi-step processes end-to-end.
+- **Check responsive behavior** by noting any layout issues visible in screenshots.
+- **Don't forget edge cases**: empty states, very long text, special characters, rapid clicking.
+- When reporting screenshots to the user, include `MEDIA:<screenshot_path>` so they can see the evidence inline.
--- a/skills/dogfood/references/issue-taxonomy.md
+++ b/skills/dogfood/references/issue-taxonomy.md
@@ -0,0 +1,109 @@
+# Issue Taxonomy
+
+Use this taxonomy to classify issues found during dogfood QA testing.
+
+## Severity Levels
+
+### Critical
+The issue makes a core feature completely unusable or causes data loss.
+
+**Examples:**
+- Application crashes or shows a blank white page
+- Form submission silently loses user data
+- Authentication is completely broken (can't log in at all)
+- Payment flow fails and charges the user without completing the order
+- Security vulnerability (e.g., XSS, exposed credentials in console)
+
+### High
+The issue significantly impairs functionality but a workaround may exist.
+
+**Examples:**
+- A key button does nothing when clicked (but refreshing fixes it)
+- Search returns no results for valid queries
+- Form validation rejects valid input
+- Page loads but critical content is missing or garbled
+- Navigation link leads to a 404 or wrong page
+- Uncaught JavaScript exceptions in the console on core pages
+
+### Medium
+The issue is noticeable and affects user experience but doesn't block core functionality.
+
+**Examples:**
+- Layout is misaligned or overlapping on certain screen sections
+- Images fail to load (broken image icons)
+- Slow performance (visible loading delays > 3 seconds)
+- Form field lacks proper validation feedback (no error message on bad input)
+- Console warnings that suggest deprecated or misconfigured features
+- Inconsistent styling between similar pages
+
+### Low
+Minor polish issues that don't affect functionality.
+
+**Examples:**
+- Typos or grammatical errors in text content
+- Minor spacing or alignment inconsistencies
+- Placeholder text left in production ("Lorem ipsum")
+- Favicon missing
+- Console info/debug messages that shouldn't be in production
+- Subtle color contrast issues that don't fail WCAG requirements
+
+## Categories
+
+### Functional
+Issues where features don't work as expected.
+
+- Buttons/links that don't respond
+- Forms that don't submit or submit incorrectly
+- Broken user flows (can't complete a multi-step process)
+- Incorrect data displayed
+- Features that work partially
+
+### Visual
+Issues with the visual presentation of the page.
+
+- Layout problems (overlapping elements, broken grids)
+- Broken images or missing media
+- Styling inconsistencies
+- Responsive design failures
+- Z-index issues (elements hidden behind others)
+- Text overflow or truncation
+
+### Accessibility
+Issues that prevent or hinder access for users with disabilities.
+
+- Missing alt text on meaningful images
+- Poor color contrast (fails WCAG AA)
+- Elements not reachable via keyboard navigation
+- Missing form labels or ARIA attributes
+- Focus indicators missing or unclear
+- Screen reader incompatible content
+
+### Console
+Issues detected through JavaScript console output.
+
+- Uncaught exceptions and unhandled promise rejections
+- Failed network requests (4xx, 5xx errors in console)
+- Deprecation warnings
+- CORS errors
+- Mixed content warnings (HTTP resources on HTTPS page)
+- Excessive console.log output left from development
+
+### UX (User Experience)
+Issues where functionality works but the experience is poor.
+
+- Confusing navigation or information architecture
+- Missing loading indicators (user doesn't know something is happening)
+- No feedback after user actions (e.g., button click with no visible result)
+- Inconsistent interaction patterns
+- Missing confirmation dialogs for destructive actions
+- Poor error messages that don't help the user recover
+
+### Content
+Issues with the text, media, or information on the page.
+
+- Typos and grammatical errors
+- Placeholder/dummy content in production
+- Outdated information
+- Missing content (empty sections)
+- Broken or dead links to external resources
+- Incorrect or misleading labels
--- a/skills/dogfood/templates/dogfood-report-template.md
+++ b/skills/dogfood/templates/dogfood-report-template.md
@@ -0,0 +1,86 @@
+# Dogfood QA Report
+
+**Target:** {target_url}
+**Date:** {date}
+**Scope:** {scope_description}
+**Tester:** Hermes Agent (automated exploratory QA)
+
+---
+
+## Executive Summary
+
+| Severity | Count |
+|----------|-------|
+| 🔴 Critical | {critical_count} |
+| 🟠 High | {high_count} |
+| 🟡 Medium | {medium_count} |
+| 🔵 Low | {low_count} |
+| **Total** | **{total_count}** |
+
+**Overall Assessment:** {one_sentence_assessment}
+
+---
+
+## Issues
+
+<!-- Repeat this section for each issue found, sorted by severity (Critical first) -->
+
+### Issue #{issue_number}: {issue_title}
+
+| Field | Value |
+|-------|-------|
+| **Severity** | {severity} |
+| **Category** | {category} |
+| **URL** | {url_where_found} |
+
+**Description:**
+{detailed_description_of_the_issue}
+
+**Steps to Reproduce:**
+1. {step_1}
+2. {step_2}
+3. {step_3}
+
+**Expected Behavior:**
+{what_should_happen}
+
+**Actual Behavior:**
+{what_actually_happens}
+
+**Screenshot:**
+MEDIA:{screenshot_path}
+
+**Console Errors** (if applicable):
+```
+{console_error_output}
+```
+
+---
+
+<!-- End of per-issue section -->
+
+## Issues Summary Table
+
+| # | Title | Severity | Category | URL |
+|---|-------|----------|----------|-----|
+| {n} | {title} | {severity} | {category} | {url} |
+
+## Testing Coverage
+
+### Pages Tested
+- {list_of_pages_visited}
+
+### Features Tested
+- {list_of_features_exercised}
+
+### Not Tested / Out of Scope
+- {areas_not_covered_and_why}
+
+### Blockers
+- {any_issues_that_prevented_testing_certain_areas}
+
+---
+
+## Notes
+
+{any_additional_observations_or_recommendations}
--- a/tests/tools/test_browser_console.py
+++ b/tests/tools/test_browser_console.py
@@ -0,0 +1,276 @@
+"""Tests for browser_console tool and browser_vision annotate param."""
+
+import json
+import os
+import sys
+from unittest.mock import patch, MagicMock
+
+import pytest
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))
+
+
+# ── browser_console ──────────────────────────────────────────────────
+
+
+class TestBrowserConsole:
+    """browser_console() returns console messages + JS errors in one call."""
+
+    def test_returns_console_messages_and_errors(self):
+        from tools.browser_tool import browser_console
+
+        console_response = {
+            "success": True,
+            "data": {
+                "messages": [
+                    {"text": "hello", "type": "log", "timestamp": 1},
+                    {"text": "oops", "type": "error", "timestamp": 2},
+                ]
+            },
+        }
+        errors_response = {
+            "success": True,
+            "data": {
+                "errors": [
+                    {"message": "Uncaught TypeError", "timestamp": 3},
+                ]
+            },
+        }
+
+        with patch("tools.browser_tool._run_browser_command") as mock_cmd:
+            mock_cmd.side_effect = [console_response, errors_response]
+            result = json.loads(browser_console(task_id="test"))
+
+        assert result["success"] is True
+        assert result["total_messages"] == 2
+        assert result["total_errors"] == 1
+        assert result["console_messages"][0]["text"] == "hello"
+        assert result["console_messages"][1]["text"] == "oops"
+        assert result["js_errors"][0]["message"] == "Uncaught TypeError"
+
+    def test_passes_clear_flag(self):
+        from tools.browser_tool import browser_console
+
+        empty = {"success": True, "data": {"messages": [], "errors": []}}
+        with patch("tools.browser_tool._run_browser_command", return_value=empty) as mock_cmd:
+            browser_console(clear=True, task_id="test")
+
+        calls = mock_cmd.call_args_list
+        # Both console and errors should get --clear
+        assert calls[0][0] == ("test", "console", ["--clear"])
+        assert calls[1][0] == ("test", "errors", ["--clear"])
+
+    def test_no_clear_by_default(self):
+        from tools.browser_tool import browser_console
+
+        empty = {"success": True, "data": {"messages": [], "errors": []}}
+        with patch("tools.browser_tool._run_browser_command", return_value=empty) as mock_cmd:
+            browser_console(task_id="test")
+
+        calls = mock_cmd.call_args_list
+        assert calls[0][0] == ("test", "console", [])
+        assert calls[1][0] == ("test", "errors", [])
+
+    def test_empty_console_and_errors(self):
+        from tools.browser_tool import browser_console
+
+        empty = {"success": True, "data": {"messages": [], "errors": []}}
+        with patch("tools.browser_tool._run_browser_command", return_value=empty):
+            result = json.loads(browser_console(task_id="test"))
+
+        assert result["total_messages"] == 0
+        assert result["total_errors"] == 0
+        assert result["console_messages"] == []
+        assert result["js_errors"] == []
+
+    def test_handles_failed_commands(self):
+        from tools.browser_tool import browser_console
+
+        failed = {"success": False, "error": "No session"}
+        with patch("tools.browser_tool._run_browser_command", return_value=failed):
+            result = json.loads(browser_console(task_id="test"))
+
+        # Should still return success with empty data
+        assert result["success"] is True
+        assert result["total_messages"] == 0
+        assert result["total_errors"] == 0
+
+
+# ── browser_console schema ───────────────────────────────────────────
+
+
+class TestBrowserConsoleSchema:
+    """browser_console is properly registered in the tool registry."""
+
+    def test_schema_in_browser_schemas(self):
+        from tools.browser_tool import BROWSER_TOOL_SCHEMAS
+
+        names = [s["name"] for s in BROWSER_TOOL_SCHEMAS]
+        assert "browser_console" in names
+
+    def test_schema_has_clear_param(self):
+        from tools.browser_tool import BROWSER_TOOL_SCHEMAS
+
+        schema = next(s for s in BROWSER_TOOL_SCHEMAS if s["name"] == "browser_console")
+        props = schema["parameters"]["properties"]
+        assert "clear" in props
+        assert props["clear"]["type"] == "boolean"
+
+
+# ── browser_vision annotate ──────────────────────────────────────────
+
+
+class TestBrowserVisionAnnotate:
+    """browser_vision supports annotate parameter."""
+
+    def test_schema_has_annotate_param(self):
+        from tools.browser_tool import BROWSER_TOOL_SCHEMAS
+
+        schema = next(s for s in BROWSER_TOOL_SCHEMAS if s["name"] == "browser_vision")
+        props = schema["parameters"]["properties"]
+        assert "annotate" in props
+        assert props["annotate"]["type"] == "boolean"
+
+    def test_annotate_false_no_flag(self):
+        """Without annotate, screenshot command has no --annotate flag."""
+        from tools.browser_tool import browser_vision
+
+        with (
+            patch("tools.browser_tool._run_browser_command") as mock_cmd,
+            patch("tools.browser_tool._aux_vision_client") as mock_client,
+            patch("tools.browser_tool._DEFAULT_VISION_MODEL", "test-model"),
+            patch("tools.browser_tool._get_vision_model", return_value="test-model"),
+        ):
+            mock_cmd.return_value = {"success": True, "data": {}}
+            # Will fail at screenshot file read, but we can check the command
+            try:
+                browser_vision("test", annotate=False, task_id="test")
+            except Exception:
+                pass
+
+            if mock_cmd.called:
+                args = mock_cmd.call_args[0]
+                cmd_args = args[2] if len(args) > 2 else []
+                assert "--annotate" not in cmd_args
+
+    def test_annotate_true_adds_flag(self):
+        """With annotate=True, screenshot command includes --annotate."""
+        from tools.browser_tool import browser_vision
+
+        with (
+            patch("tools.browser_tool._run_browser_command") as mock_cmd,
+            patch("tools.browser_tool._aux_vision_client") as mock_client,
+            patch("tools.browser_tool._DEFAULT_VISION_MODEL", "test-model"),
+            patch("tools.browser_tool._get_vision_model", return_value="test-model"),
+        ):
+            mock_cmd.return_value = {"success": True, "data": {}}
+            try:
+                browser_vision("test", annotate=True, task_id="test")
+            except Exception:
+                pass
+
+            if mock_cmd.called:
+                args = mock_cmd.call_args[0]
+                cmd_args = args[2] if len(args) > 2 else []
+                assert "--annotate" in cmd_args
+
+
+# ── auto-recording config ────────────────────────────────────────────
+
+
+class TestRecordSessionsConfig:
+    """browser.record_sessions config option."""
+
+    def test_default_config_has_record_sessions(self):
+        from hermes_cli.config import DEFAULT_CONFIG
+
+        browser_cfg = DEFAULT_CONFIG.get("browser", {})
+        assert "record_sessions" in browser_cfg
+        assert browser_cfg["record_sessions"] is False
+
+    def test_maybe_start_recording_disabled(self):
+        """Recording doesn't start when config says record_sessions: false."""
+        from tools.browser_tool import _maybe_start_recording, _recording_sessions
+
+        with (
+            patch("tools.browser_tool._run_browser_command") as mock_cmd,
+            patch("builtins.open", side_effect=FileNotFoundError),
+        ):
+            _maybe_start_recording("test-task")
+
+        mock_cmd.assert_not_called()
+        assert "test-task" not in _recording_sessions
+
+    def test_maybe_stop_recording_noop_when_not_recording(self):
+        """Stopping when not recording is a no-op."""
+        from tools.browser_tool import _maybe_stop_recording, _recording_sessions
+
+        _recording_sessions.discard("test-task")  # ensure not in set
+        with patch("tools.browser_tool._run_browser_command") as mock_cmd:
+            _maybe_stop_recording("test-task")
+
+        mock_cmd.assert_not_called()
+
+
+# ── dogfood skill files ──────────────────────────────────────────────
+
+
+class TestDogfoodSkill:
+    """Dogfood skill files exist and have correct structure."""
+
+    @pytest.fixture(autouse=True)
+    def _skill_dir(self):
+        # Use the actual repo skills dir (not temp)
+        self.skill_dir = os.path.join(
+            os.path.dirname(__file__), "..", "..", "skills", "dogfood"
+        )
+
+    def test_skill_md_exists(self):
+        assert os.path.exists(os.path.join(self.skill_dir, "SKILL.md"))
+
+    def test_taxonomy_exists(self):
+        assert os.path.exists(
+            os.path.join(self.skill_dir, "references", "issue-taxonomy.md")
+        )
+
+    def test_report_template_exists(self):
+        assert os.path.exists(
+            os.path.join(self.skill_dir, "templates", "dogfood-report-template.md")
+        )
+
+    def test_skill_md_has_frontmatter(self):
+        with open(os.path.join(self.skill_dir, "SKILL.md")) as f:
+            content = f.read()
+        assert content.startswith("---")
+        assert "name: dogfood" in content
+        assert "description:" in content
+
+    def test_skill_references_browser_console(self):
+        with open(os.path.join(self.skill_dir, "SKILL.md")) as f:
+            content = f.read()
+        assert "browser_console" in content
+
+    def test_skill_references_annotate(self):
+        with open(os.path.join(self.skill_dir, "SKILL.md")) as f:
+            content = f.read()
+        assert "annotate" in content
+
+    def test_taxonomy_has_severity_levels(self):
+        with open(
+            os.path.join(self.skill_dir, "references", "issue-taxonomy.md")
+        ) as f:
+            content = f.read()
+        assert "Critical" in content
+        assert "High" in content
+        assert "Medium" in content
+        assert "Low" in content
+
+    def test_taxonomy_has_categories(self):
+        with open(
+            os.path.join(self.skill_dir, "references", "issue-taxonomy.md")
+        ) as f:
+            content = f.read()
+        assert "Functional" in content
+        assert "Visual" in content
+        assert "Accessibility" in content
+        assert "Console" in content
--- a/tools/browser_tool.py
+++ b/tools/browser_tool.py
@@ -144,6 +144,7 @@ def _socket_safe_tmpdir() -> str:
 # Track active sessions per task
 # Stores: session_name (always), bb_session_id + cdp_url (cloud mode only)
 _active_sessions: Dict[str, Dict[str, str]] = {}  # task_id -> {session_name, ...}
+_recording_sessions: set = set()  # task_ids with active recordings

 # Flag to track if cleanup has been done
 _cleanup_done = False
@@ -478,11 +479,31 @@ BROWSER_TOOL_SCHEMAS = [
                "question": {
                    "type": "string",
                    "description": "What you want to know about the page visually. Be specific about what you're looking for."
+                },
+                "annotate": {
+                    "type": "boolean",
+                    "default": False,
+                    "description": "If true, overlay numbered [N] labels on interactive elements. Each [N] maps to ref @eN for subsequent browser commands. Useful for QA and spatial reasoning about page layout."
                }
            },
            "required": ["question"]
        }
    },
+    {
+        "name": "browser_console",
+        "description": "Get browser console output and JavaScript errors from the current page. Returns console.log/warn/error/info messages and uncaught JS exceptions. Use this to detect silent JavaScript errors, failed API calls, and application warnings. Requires browser_navigate to be called first.",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "clear": {
+                    "type": "boolean",
+                    "default": False,
+                    "description": "If true, clear the message buffers after reading"
+                }
+            },
+            "required": []
+        }
+    },
 ]


@@ -998,9 +1019,10 @@ def browser_navigate(url: str, task_id: Optional[str] = None) -> str:
    session_info = _get_session_info(effective_task_id)
    is_first_nav = session_info.get("_first_nav", True)
    
-    # Mark that we've done at least one navigation
+    # Auto-start recording if configured and this is first navigation
    if is_first_nav:
        session_info["_first_nav"] = False
+        _maybe_start_recording(effective_task_id)
    
    result = _run_browser_command(effective_task_id, "open", [url], timeout=60)
    
@@ -1264,6 +1286,10 @@ def browser_close(task_id: Optional[str] = None) -> str:
        JSON string with close result
    """
    effective_task_id = task_id or "default"
+    
+    # Stop auto-recording before closing
+    _maybe_stop_recording(effective_task_id)
+    
    result = _run_browser_command(effective_task_id, "close", [])
    
    # Close the backend session (Browserbase API in cloud mode, nothing extra in local mode)
@@ -1294,6 +1320,103 @@ def browser_close(task_id: Optional[str] = None) -> str:
        }, ensure_ascii=False)


+def browser_console(clear: bool = False, task_id: Optional[str] = None) -> str:
+    """Get browser console messages and JavaScript errors.
+    
+    Returns both console output (log/warn/error/info from the page's JS)
+    and uncaught exceptions (crashes, unhandled promise rejections).
+    
+    Args:
+        clear: If True, clear the message/error buffers after reading
+        task_id: Task identifier for session isolation
+        
+    Returns:
+        JSON string with console messages and JS errors
+    """
+    effective_task_id = task_id or "default"
+    
+    console_args = ["--clear"] if clear else []
+    error_args = ["--clear"] if clear else []
+    
+    console_result = _run_browser_command(effective_task_id, "console", console_args)
+    errors_result = _run_browser_command(effective_task_id, "errors", error_args)
+    
+    messages = []
+    if console_result.get("success"):
+        for msg in console_result.get("data", {}).get("messages", []):
+            messages.append({
+                "type": msg.get("type", "log"),
+                "text": msg.get("text", ""),
+                "source": "console",
+            })
+    
+    errors = []
+    if errors_result.get("success"):
+        for err in errors_result.get("data", {}).get("errors", []):
+            errors.append({
+                "message": err.get("message", ""),
+                "source": "exception",
+            })
+    
+    return json.dumps({
+        "success": True,
+        "console_messages": messages,
+        "js_errors": errors,
+        "total_messages": len(messages),
+        "total_errors": len(errors),
+    }, ensure_ascii=False)
+
+
+def _maybe_start_recording(task_id: str):
+    """Start recording if browser.record_sessions is enabled in config."""
+    if task_id in _recording_sessions:
+        return
+    try:
+        hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
+        config_path = hermes_home / "config.yaml"
+        record_enabled = False
+        if config_path.exists():
+            import yaml
+            with open(config_path) as f:
+                cfg = yaml.safe_load(f) or {}
+            record_enabled = cfg.get("browser", {}).get("record_sessions", False)
+        
+        if not record_enabled:
+            return
+        
+        recordings_dir = hermes_home / "browser_recordings"
+        recordings_dir.mkdir(parents=True, exist_ok=True)
+        _cleanup_old_recordings(max_age_hours=72)
+        
+        import time
+        timestamp = time.strftime("%Y%m%d_%H%M%S")
+        recording_path = recordings_dir / f"session_{timestamp}_{task_id[:16]}.webm"
+        
+        result = _run_browser_command(task_id, "record", ["start", str(recording_path)])
+        if result.get("success"):
+            _recording_sessions.add(task_id)
+            logger.info("Auto-recording browser session %s to %s", task_id, recording_path)
+        else:
+            logger.debug("Could not start auto-recording: %s", result.get("error"))
+    except Exception as e:
+        logger.debug("Auto-recording setup failed: %s", e)
+
+
+def _maybe_stop_recording(task_id: str):
+    """Stop recording if one is active for this session."""
+    if task_id not in _recording_sessions:
+        return
+    try:
+        result = _run_browser_command(task_id, "record", ["stop"])
+        if result.get("success"):
+            path = result.get("data", {}).get("path", "")
+            logger.info("Saved browser recording for session %s: %s", task_id, path)
+    except Exception as e:
+        logger.debug("Could not stop recording for %s: %s", task_id, e)
+    finally:
+        _recording_sessions.discard(task_id)
+
+
 def browser_get_images(task_id: Optional[str] = None) -> str:
    """
    Get all images on the current page.
@@ -1348,7 +1471,7 @@ def browser_get_images(task_id: Optional[str] = None) -> str:
        }, ensure_ascii=False)


-def browser_vision(question: str, task_id: Optional[str] = None) -> str:
+def browser_vision(question: str, annotate: bool = False, task_id: Optional[str] = None) -> str:
    """
    Take a screenshot of the current page and analyze it with vision AI.
    
@@ -1362,6 +1485,7 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
    
    Args:
        question: What you want to know about the page visually
+        annotate: If True, overlay numbered [N] labels on interactive elements
        task_id: Task identifier for session isolation
        
    Returns:
@@ -1393,10 +1517,13 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
        _cleanup_old_screenshots(screenshots_dir, max_age_hours=24)
        
        # Take screenshot using agent-browser
+        screenshot_args = [str(screenshot_path)]
+        if annotate:
+            screenshot_args.insert(0, "--annotate")
        result = _run_browser_command(
            effective_task_id, 
            "screenshot", 
-            [str(screenshot_path)],
+            screenshot_args,
            timeout=30
        )
        
@@ -1456,11 +1583,15 @@ def browser_vision(question: str, task_id: Optional[str] = None) -> str:
        )
        
        analysis = response.choices[0].message.content
-        return json.dumps({
+        response_data = {
            "success": True,
            "analysis": analysis,
            "screenshot_path": str(screenshot_path),
-        }, ensure_ascii=False)
+        }
+        # Include annotation data if annotated screenshot was taken
+        if annotate and result.get("data", {}).get("annotations"):
+            response_data["annotations"] = result["data"]["annotations"]
+        return json.dumps(response_data, ensure_ascii=False)
    
    except Exception as e:
        # Keep the screenshot if it was captured successfully — the failure is
@@ -1490,6 +1621,25 @@ def _cleanup_old_screenshots(screenshots_dir, max_age_hours=24):
        pass  # Non-critical — don't fail the screenshot operation


+def _cleanup_old_recordings(max_age_hours=72):
+    """Remove browser recordings older than max_age_hours to prevent disk bloat."""
+    import time
+    try:
+        hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
+        recordings_dir = hermes_home / "browser_recordings"
+        if not recordings_dir.exists():
+            return
+        cutoff = time.time() - (max_age_hours * 3600)
+        for f in recordings_dir.glob("session_*.webm"):
+            try:
+                if f.stat().st_mtime < cutoff:
+                    f.unlink()
+            except Exception:
+                pass
+    except Exception:
+        pass
+
+
 # ============================================================================
 # Cleanup and Management Functions
 # ============================================================================
@@ -1561,6 +1711,9 @@ def cleanup_browser(task_id: Optional[str] = None) -> None:
        bb_session_id = session_info.get("bb_session_id", "unknown")
        logger.debug("Found session for task %s: bb_session_id=%s", task_id, bb_session_id)
        
+        # Stop auto-recording before closing (saves the file)
+        _maybe_stop_recording(task_id)
+        
        # Try to close via agent-browser first (needs session in _active_sessions)
        try:
            _run_browser_command(task_id, "close", [], timeout=10)
@@ -1776,6 +1929,13 @@ registry.register(
    name="browser_vision",
    toolset="browser",
    schema=_BROWSER_SCHEMA_MAP["browser_vision"],
-    handler=lambda args, **kw: browser_vision(question=args.get("question", ""), task_id=kw.get("task_id")),
+    handler=lambda args, **kw: browser_vision(question=args.get("question", ""), annotate=args.get("annotate", False), task_id=kw.get("task_id")),
+    check_fn=check_browser_requirements,
+)
+registry.register(
+    name="browser_console",
+    toolset="browser",
+    schema=_BROWSER_SCHEMA_MAP["browser_console"],
+    handler=lambda args, **kw: browser_console(clear=args.get("clear", False), task_id=kw.get("task_id")),
    check_fn=check_browser_requirements,
 )
--- a/website/docs/user-guide/configuration.md
+++ b/website/docs/user-guide/configuration.md
@@ -620,6 +620,16 @@ code_execution:
  max_tool_calls: 50           # Max tool calls within code execution
 ```

+## Browser
+
+Configure browser automation behavior:
+
+```yaml
+browser:
+  inactivity_timeout: 120        # Seconds before auto-closing idle sessions
+  record_sessions: false         # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/
+```
+
 ## Delegation

 Configure subagent behavior for the delegate tool:
--- a/website/docs/user-guide/features/browser.md
+++ b/website/docs/user-guide/features/browser.md
@@ -142,6 +142,16 @@ What does the chart on this page show?

 Screenshots are stored in `~/.hermes/browser_screenshots/` and automatically cleaned up after 24 hours.

+### `browser_console`
+
+Get browser console output (log/warn/error messages) and uncaught JavaScript exceptions from the current page. Essential for detecting silent JS errors that don't appear in the accessibility tree.
+
+```
+Check the browser console for any JavaScript errors
+```
+
+Use `clear=True` to clear the console after reading, so subsequent calls only show new messages.
+
 ### `browser_close`

 Close the browser session and release resources. Call this when done to free up Browserbase session quota.
@@ -175,6 +185,17 @@ Agent workflow:
 4. browser_close()
 ```

+## Session Recording
+
+Automatically record browser sessions as WebM video files:
+
+```yaml
+browser:
+  record_sessions: true  # default: false
+```
+
+When enabled, recording starts automatically on the first `browser_navigate` and saves to `~/.hermes/browser_recordings/` when the session closes. Works in both local and cloud (Browserbase) modes. Recordings older than 72 hours are automatically cleaned up.
+
 ## Stealth Features

 Browserbase provides automatic stealth capabilities:
--- a/website/docs/user-guide/features/tools.md
+++ b/website/docs/user-guide/features/tools.md
@@ -15,7 +15,7 @@ Tools are functions that extend the agent's capabilities. They're organized into
 | **Web** | `web_search`, `web_extract` | Search the web, extract page content |
 | **Terminal** | `terminal`, `process` | Execute commands (local/docker/singularity/modal/daytona/ssh backends), manage background processes |
 | **File** | `read_file`, `write_file`, `patch`, `search_files` | Read, write, edit, and search files |
-| **Browser** | `browser_navigate`, `browser_click`, `browser_type`, etc. | Full browser automation via Browserbase |
+| **Browser** | `browser_navigate`, `browser_click`, `browser_type`, `browser_console`, etc. | Full browser automation via Browserbase |
 | **Vision** | `vision_analyze` | Image analysis via multimodal models |
 | **Image Gen** | `image_generate` | Generate images (FLUX via FAL) |
 | **TTS** | `text_to_speech` | Text-to-speech (Edge TTS / ElevenLabs / OpenAI) |