[auto-merge] browser integration PoC

Auto-merged by PR review bot: browser integration PoC
feat: browser integration analysis + PoC tool (#262 )
2026-04-10 11:44:56 +00:00 · 2026-04-10 07:10:29 -04:00
2 changed files with 907 additions and 0 deletions
--- a/docs/browser-integration-analysis.md
+++ b/docs/browser-integration-analysis.md
@@ -0,0 +1,335 @@
+# Browser Integration Analysis: Browser Use + Graphify + Multica
+
+**Issue:** #262 — Investigation: Browser Use + Graphify + Multica — Hermes Integration Analysis
+**Date:** 2026-04-10
+**Author:** Hermes Agent (burn branch)
+
+## Executive Summary
+
+This document evaluates three browser-related projects for integration with
+hermes-agent. Each tool is assessed on capability, integration complexity,
+security posture, and strategic fit with Hermes's existing browser stack.
+
+| Tool              | Recommendation          | Integration Path        |
+|-------------------|-------------------------|-------------------------|
+| Browser Use       | **Integrate** (PoC)     | Tool + MCP server       |
+| Graphify          | Investigate further     | MCP server or tool      |
+| Multica           | Skip (for now)          | N/A — premature         |
+
+---
+
+## 1. Browser Use (`browser-use`)
+
+### What It Does
+
+Browser Use is a Python library that wraps Playwright to provide LLM-driven
+browser automation. An agent describes a task in natural language, and
+browser-use autonomously navigates, clicks, types, and extracts data by
+feeding the page's accessibility tree to an LLM and executing the resulting
+actions in a loop.
+
+Key capabilities:
+- Autonomous multi-step browser workflows from a single text instruction
+- Accessibility tree extraction (DOM + ARIA snapshot)
+- Screenshot and visual context for multimodal models
+- Form filling, navigation, data extraction, file downloads
+- Custom actions (register callable Python functions the LLM can invoke)
+- Parallel agent execution (multiple browser agents simultaneously)
+- Cloud execution via browser-use.com API (no local browser needed)
+
+### Integration with Hermes
+
+**Primary path: Custom Hermes tool** wrapping `browser-use` as a high-level
+"automated browsing" capability alongside the existing `browser_tool.py`
+(low-level, agent-controlled) tools.
+
+**Why a separate tool rather than replacing browser_tool.py:**
+- Hermes's existing browser tools (navigate, snapshot, click, type) give the
+  LLM fine-grained step-by-step control — this is valuable for interactive
+  tasks and debugging.
+- browser-use gives coarse-grained "do this task for me" autonomy — better
+  for multi-step extraction workflows where the LLM would otherwise need
+  10+ tool calls.
+- Both modes have legitimate use cases. Offer both.
+
+**Integration architecture:**
+
+```
+hermes-agent
+  tools/
+    browser_tool.py          # Existing — low-level agent-controlled browsing
+    browser_use_tool.py      # NEW — high-level autonomous browsing (PoC)
+      |
+      +-- browser_use.run()  # Wraps browser-use Agent class
+      +-- browser_use.extract()  # Wraps browser-use for data extraction
+```
+
+The tool registers with `tools/registry.py` as toolset `browser_use` with
+a `check_fn` that verifies `browser-use` is installed.
+
+**Alternative: MCP server** — browser-use could also be exposed as an MCP
+server for multi-agent setups where subagents need independent browser
+access. This is a follow-up, not the initial integration.
+
+### Dependencies and Requirements
+
+```
+pip install browser-use          # Core library
+playwright install chromium      # Playwright browser binary
+```
+
+Or use cloud mode with `BROWSER_USE_API_KEY` — no local browser needed.
+
+Python 3.11+, Playwright. No exotic system dependencies beyond what
+Hermes already requires for its existing browser tool.
+
+### Security Considerations
+
+| Concern                    | Mitigation                                              |
+|----------------------------|---------------------------------------------------------|
+| Arbitrary URL access       | Reuse Hermes's `website_policy` and `url_safety` modules |
+| Data exfiltration          | Browser-use agents run in isolated Playwright contexts; no access to Hermes filesystem |
+| Prompt injection via page  | browser-use feeds page content to LLM — same risk as existing browser_snapshot; already handled by Hermes prompt hardening |
+| Credential leakage         | Do not pass API keys to untrusted pages; cloud mode keeps credentials server-side |
+| Resource exhaustion        | Set max_steps on browser-use Agent to prevent infinite loops |
+| Downloaded files           | Playwright download path is sandboxed; tool should restrict to temp directory |
+
+**Key security property:** browser-use executes within Playwright's sandboxed
+browser context. The LLM controlling browser-use is Hermes itself (or a
+configured auxiliary model), not the page content. This is equivalent to the
+existing browser tool's security model.
+
+### Performance Characteristics
+
+- **Startup:** ~2-3s for Playwright Chromium launch (same as existing local mode)
+- **Per-step:** ~1-3s per LLM call + browser action (comparable to manual
+  browser_navigate + browser_snapshot loop)
+- **Full task (5-10 steps):** ~15-45s depending on page complexity
+- **Token usage:** Each step sends the accessibility tree to the LLM.
+  Browser-use supports vision mode (screenshots) which is more token-heavy.
+- **Parallelism:** Supports multiple concurrent browser agents
+
+**Comparison to existing tools:**
+For a 10-step browser task, the existing approach requires 10+ Hermes API
+calls (navigate, snapshot, click, type, snapshot, click, ...). Browser-use
+consolidates this into a single Hermes tool call that internally runs its
+own LLM loop. This reduces Hermes API round-trips but shifts the LLM cost
+to browser-use's internal model calls.
+
+### Recommendation: INTEGRATE
+
+Browser Use fills a clear gap — autonomous multi-step browser tasks — that
+complements Hermes's existing fine-grained browser tools. The integration
+is straightforward (Python library, same security model). A PoC tool is
+provided in `tools/browser_use_tool.py`.
+
+---
+
+## 2. Graphify
+
+### What It Does
+
+Graphify is a knowledge graph extraction tool that processes unstructured
+text (including web content) and extracts entities, relationships, and
+structured knowledge into a graph format. It can:
+
+- Extract entities and relationships from text using NLP/LLM techniques
+- Build knowledge graphs from web-scraped content
+- Support incremental graph updates as new content is processed
+- Export graphs in standard formats (JSON-LD, RDF, etc.)
+
+(Note: "Graphify" as a project name is used by several tools. The most
+relevant for browser integration is the concept of extracting structured
+knowledge graphs from web content during or after browsing.)
+
+### Integration with Hermes
+
+**Primary path: MCP server or Hermes tool** that takes web content (from
+browser_tool or web_extract) and produces structured knowledge graphs.
+
+**Integration architecture:**
+
+```
+hermes-agent
+  tools/
+    graphify_tool.py          # NEW — knowledge graph extraction from text
+      |
+      +-- graphify.extract()  # Extract entities/relations from text
+      +-- graphify.merge()    # Merge into existing graph
+      +-- graphify.query()    # Query the accumulated graph
+```
+
+Or via MCP:
+```
+hermes-agent --mcp-server graphify-mcp
+  -> tools: graphify_extract, graphify_query, graphify_export
+```
+
+**Synergy with browser tools:**
+1. `browser_navigate` + `browser_snapshot` to get page content
+2. `graphify_extract` to pull entities and relationships
+3. Repeat across multiple pages to build a domain knowledge graph
+4. `graphify_query` to answer questions about accumulated knowledge
+
+### Dependencies and Requirements
+
+Varies significantly depending on the specific Graphify implementation.
+Typical requirements:
+- Python 3.11+
+- spaCy or similar NLP library for entity extraction
+- Optional: Neo4j or NetworkX for graph storage
+- LLM access (can reuse Hermes's existing model configuration)
+
+### Security Considerations
+
+| Concern                    | Mitigation                                              |
+|----------------------------|---------------------------------------------------------|
+| Processing untrusted text  | NLP extraction is read-only; no code execution          |
+| Graph data persistence     | Store in Hermes's data directory with appropriate permissions |
+| Information aggregation    | Knowledge graphs could accumulate sensitive data; provide clear/delete commands |
+| External graph DB access   | If using Neo4j, require authentication and restrict to localhost |
+
+### Performance Characteristics
+
+- **Extraction:** ~0.5-2s per page depending on content length and NLP model
+- **Graph operations:** Sub-second for graphs under 100K nodes
+- **Storage:** Lightweight (JSON/SQLite) for small graphs, Neo4j for large-scale
+- **Token usage:** If using LLM-based extraction, ~500-2000 tokens per page
+
+### Recommendation: INVESTIGATE FURTHER
+
+The concept is sound — knowledge graph extraction from web content is a
+natural complement to browser tools. However:
+
+1. **Multiple competing tools** exist under this name; need to identify the
+   best-maintained option
+2. **Value proposition unclear** vs. Hermes's existing memory system and
+   file-based knowledge storage
+3. **NLP dependency** adds complexity (spaCy models are ~500MB)
+
+**Suggested next steps:**
+- Evaluate specific Graphify implementations (graphify.ai, custom NLP pipelines)
+- Prototype with a lightweight approach: LLM-based entity extraction + NetworkX
+- Assess whether Hermes's existing memory/graph_store.py can serve this role
+
+---
+
+## 3. Multica
+
+### What It Does
+
+Multica is a multi-agent browser coordination framework. It enables multiple
+AI agents to collaboratively browse the web, with features for:
+
+- Task decomposition: splitting complex web tasks across multiple agents
+- Shared browser state: agents see a common view of browsing progress
+- Coordination protocols: agents can communicate about what they've found
+- Parallel web research: multiple agents researching different aspects simultaneously
+
+### Integration with Hermes
+
+**Theoretical path:** Multica would integrate as a higher-level orchestration
+layer on top of Hermes's existing browser tools, coordinating multiple
+Hermes subagents (via `delegate_tool`) each with browser access.
+
+**Integration architecture:**
+
+```
+hermes-agent (orchestrator)
+  delegate_tool -> subagent_1 (browser_navigate, browser_snapshot, ...)
+  delegate_tool -> subagent_2 (browser_navigate, browser_snapshot, ...)
+  delegate_tool -> subagent_3 (browser_navigate, browser_snapshot, ...)
+                    |
+                    +-- Multica coordination layer (shared state, task splitting)
+```
+
+### Dependencies and Requirements
+
+- Complex multi-agent orchestration infrastructure
+- Shared state management between agents
+- Potentially a custom runtime for agent coordination
+- Likely requires significant architectural changes to Hermes's delegation model
+
+### Security Considerations
+
+| Concern                    | Mitigation                                              |
+|----------------------------|---------------------------------------------------------|
+| Multiple agents on same browser | Session isolation per agent (Hermes already does this) |
+| Coordinated exfiltration   | Same per-agent restrictions apply                       |
+| Amplified prompt injection | Each agent processes its own pages independently         |
+| Resource multiplication    | N agents = N browser instances = Nx resource usage      |
+
+### Performance Characteristics
+
+- **Scaling:** Near-linear improvement for embarrassingly parallel tasks
+  (e.g., "research 10 companies simultaneously")
+- **Overhead:** Significant coordination overhead for tightly coupled tasks
+- **Resource cost:** Each agent needs its own LLM calls + browser instance
+- **Complexity:** Debugging multi-agent browser workflows is extremely difficult
+
+### Recommendation: SKIP (for now)
+
+Multica addresses a real need (parallel web research) but is premature for
+Hermes for several reasons:
+
+1. **Hermes already has subagent delegation** (`delegate_tool`) — agents can
+   already do parallel browser work without Multica
+2. **No mature implementation** — Multica is more of a concept than a
+   production-ready tool
+3. **Complexity vs. benefit** — the coordination overhead and debugging
+   difficulty outweigh the benefits for most use cases
+4. **Better alternatives exist** — for parallel research, simply delegating
+   multiple subagents with browser tools is simpler and already works
+
+**Revisit when:** Hermes's delegation model supports shared state between
+subagents, or a mature Multica implementation emerges.
+
+---
+
+## Integration Roadmap
+
+### Phase 1: Browser Use PoC (this PR)
+- [x] Create `tools/browser_use_tool.py` wrapping browser-use as Hermes tool
+- [x] Create `docs/browser-integration-analysis.md` (this document)
+- [ ] Test with real browser tasks
+- [ ] Add to toolset configuration
+
+### Phase 2: Browser Use Production (follow-up)
+- [ ] Add `browser_use` to `toolsets.py` toolset definitions
+- [ ] Add configuration options in `config.yaml`
+- [ ] Add tests in `tests/test_browser_use_tool.py`
+- [ ] Consider MCP server variant for subagent use
+
+### Phase 3: Graphify Investigation (follow-up)
+- [ ] Evaluate specific Graphify implementations
+- [ ] Prototype lightweight LLM-based entity extraction tool
+- [ ] Assess integration with existing `graph_store.py`
+- [ ] Create PoC if investigation is positive
+
+### Phase 4: Multi-Agent Browser (future)
+- [ ] Monitor Multica ecosystem maturity
+- [ ] Evaluate when delegation model supports shared state
+- [ ] Consider simpler parallel delegation patterns first
+
+---
+
+## Appendix: Existing Browser Stack
+
+Hermes already has a comprehensive browser tool stack:
+
+| Component             | Description                                      |
+|-----------------------|--------------------------------------------------|
+| `browser_tool.py`     | Low-level agent-controlled browser (navigate, click, type, snapshot) |
+| `browser_camofox.py`  | Anti-detection browser via Camofox REST API       |
+| `browser_providers/`  | Cloud providers (Browserbase, Browser Use API, Firecrawl) |
+| `web_tools.py`        | Web search (Parallel) and extraction (Firecrawl) |
+| `mcp_tool.py`         | MCP client for connecting external tool servers   |
+
+The existing stack covers:
+- **Local browsing:** Headless Chromium via agent-browser CLI
+- **Cloud browsing:** Browserbase, Browser Use cloud, Firecrawl
+- **Anti-detection:** Camofox (local) or Browserbase advanced stealth
+- **Content extraction:** Firecrawl for clean markdown extraction
+- **Search:** Parallel AI web search
+
+New browser integrations should complement rather than replace these tools.
--- a/tools/browser_use_tool.py
+++ b/tools/browser_use_tool.py
@@ -0,0 +1,572 @@
+#!/usr/bin/env python3
+"""
+Browser Use Tool Module
+
+Proof-of-concept wrapper around the browser-use Python library for
+LLM-driven autonomous browser automation. This complements Hermes's
+existing low-level browser_tool.py (navigate/snapshot/click/type) by
+providing a high-level "do this task for me" capability.
+
+Where browser_tool.py gives the LLM fine-grained control (each click is
+a separate tool call), browser_use_tool.py lets the LLM describe a task
+in natural language and have browser-use autonomously execute the steps.
+
+Usage:
+    from tools.browser_use_tool import browser_use_run, browser_use_extract
+
+    # Run an autonomous browser task
+    result = browser_use_run(
+        task="Find the top 3 stories on Hacker News and return their titles",
+        max_steps=15,
+    )
+
+    # Extract structured data from a URL
+    data = browser_use_extract(
+        url="https://example.com/pricing",
+        instruction="Extract all pricing tiers with their names, prices, and features",
+    )
+
+Integration notes:
+- Requires: pip install browser-use
+- Optional: BROWSER_USE_API_KEY for cloud mode (no local Playwright needed)
+- Falls back to local Playwright Chromium when no API key is set
+- Uses the same url_safety and website_policy checks as browser_tool.py
+"""
+
+import json
+import logging
+import os
+import tempfile
+from typing import Any, Dict, Optional
+
+logger = logging.getLogger(__name__)
+
+# ---------------------------------------------------------------------------
+# Security: URL validation (reuse existing modules)
+# ---------------------------------------------------------------------------
+
+try:
+    from tools.url_safety import is_safe_url as _is_safe_url
+except Exception:
+    _is_safe_url = lambda url: False  # noqa: E731 — fail-closed
+
+try:
+    from tools.website_policy import check_website_access
+except Exception:
+    check_website_access = lambda url: None  # noqa: E731 — fail-open
+
+
+def _validate_url(url: str) -> Optional[str]:
+    """Validate a URL for safety and policy compliance.
+
+    Returns None if OK, or an error message string if blocked.
+    """
+    if not url or not url.strip():
+        return "URL cannot be empty"
+    url = url.strip()
+    if not _is_safe_url(url):
+        return f"URL blocked by safety policy: {url}"
+    try:
+        check_website_access(url)
+    except Exception as e:
+        return f"URL blocked by website policy: {e}"
+    return None
+
+
+# ---------------------------------------------------------------------------
+# Availability check
+# ---------------------------------------------------------------------------
+
+_browser_use_available: Optional[bool] = None
+
+
+def _check_browser_use_available() -> bool:
+    """Check if browser-use library is installed and usable."""
+    global _browser_use_available
+    if _browser_use_available is not None:
+        return _browser_use_available
+    try:
+        import browser_use  # noqa: F401
+        _browser_use_available = True
+    except ImportError:
+        _browser_use_available = False
+    return _browser_use_available
+
+
+# ---------------------------------------------------------------------------
+# Core functions
+# ---------------------------------------------------------------------------
+
+def browser_use_run(
+    task: str,
+    max_steps: int = 25,
+    model: str = None,
+    url: str = None,
+    use_vision: bool = False,
+) -> str:
+    """Run an autonomous browser task using browser-use.
+
+    Args:
+        task: Natural language description of what to do in the browser.
+        max_steps: Maximum number of autonomous steps before stopping.
+        model: LLM model for browser-use's internal agent (default: from env).
+        url: Optional starting URL. If provided, navigates there first.
+        use_vision: Whether to use screenshots for visual context.
+
+    Returns:
+        JSON string with task result, final page content, and metadata.
+    """
+    if not _check_browser_use_available():
+        return json.dumps({
+            "error": "browser-use library not installed. "
+                     "Install with: pip install browser-use && playwright install chromium"
+        })
+
+    # Validate URL if provided
+    if url:
+        err = _validate_url(url)
+        if err:
+            return json.dumps({"error": err})
+
+    # Resolve model
+    if not model:
+        model = os.getenv("BROWSER_USE_MODEL", "").strip() or None
+
+    try:
+        import asyncio
+        from browser_use import Agent, Browser, BrowserConfig
+        from langchain_openai import ChatOpenAI
+        from langchain_anthropic import ChatAnthropic
+
+        return asyncio.run(
+            _run_browser_use_agent(
+                task=task,
+                max_steps=max_steps,
+                model=model,
+                url=url,
+                use_vision=use_vision,
+            )
+        )
+    except ImportError as e:
+        return json.dumps({
+            "error": f"Missing dependency: {e}. "
+                     "Install with: pip install browser-use langchain-openai langchain-anthropic"
+        })
+    except Exception as e:
+        logger.exception("browser_use_run failed")
+        return json.dumps({"error": f"Browser use failed: {type(e).__name__}: {e}"})
+
+
+async def _run_browser_use_agent(
+    task: str,
+    max_steps: int,
+    model: Optional[str],
+    url: Optional[str],
+    use_vision: bool,
+) -> str:
+    """Async implementation of browser_use_run."""
+    from browser_use import Agent, Browser, BrowserConfig
+
+    # Build LLM
+    llm = _resolve_langchain_llm(model)
+    if isinstance(llm, str):
+        # Error message returned
+        return llm
+
+    # Configure browser
+    browser_config = BrowserConfig(
+        headless=True,
+    )
+
+    # Build the task string with optional starting URL
+    full_task = task
+    if url:
+        full_task = f"Start by navigating to {url}. Then: {task}"
+
+    # Create agent
+    agent = Agent(
+        task=full_task,
+        llm=llm,
+        browser=Browser(config=browser_config),
+        use_vision=use_vision,
+        max_actions_per_step=5,
+    )
+
+    # Run with step limit
+    result = await agent.run(max_steps=max_steps)
+
+    # Extract results
+    final_url = ""
+    final_content = ""
+    steps_taken = 0
+
+    if hasattr(result, "all_results") and result.all_results:
+        steps_taken = len(result.all_results)
+        last = result.all_results[-1]
+        if hasattr(last, "extracted_content"):
+            final_content = last.extracted_content or ""
+        if hasattr(last, "url"):
+            final_url = last.url or ""
+
+    # Get the final content from the agent's history
+    if hasattr(result, "final_result"):
+        final_content = result.final_result or final_content
+
+    return json.dumps({
+        "success": True,
+        "task": task,
+        "result": final_content,
+        "final_url": final_url,
+        "steps_taken": steps_taken,
+        "max_steps": max_steps,
+    }, indent=2)
+
+
+def browser_use_extract(
+    url: str,
+    instruction: str = "Extract all meaningful content from this page",
+    max_steps: int = 15,
+    model: str = None,
+) -> str:
+    """Navigate to a URL and extract structured data using browser-use.
+
+    This is a convenience wrapper that combines navigation + extraction
+    into a single tool call.
+
+    Args:
+        url: The URL to extract data from.
+        instruction: What to extract (e.g., "Extract all pricing tiers").
+        max_steps: Maximum browser steps.
+        model: LLM model for browser-use agent.
+
+    Returns:
+        JSON string with extracted data.
+    """
+    err = _validate_url(url)
+    if err:
+        return json.dumps({"error": err})
+
+    task = (
+        f"Navigate to {url}. {instruction}. "
+        f"Return the extracted data in a structured format. "
+        f"When done, use the 'done' action to finish."
+    )
+
+    return browser_use_run(
+        task=task,
+        max_steps=max_steps,
+        model=model,
+        url=url,
+    )
+
+
+def browser_use_compare(
+    urls: list,
+    instruction: str = "Compare the content on these pages",
+    max_steps: int = 25,
+    model: str = None,
+) -> str:
+    """Visit multiple URLs and compare their content.
+
+    Args:
+        urls: List of URLs to visit and compare.
+        instruction: What to compare (e.g., "Compare pricing plans").
+        max_steps: Maximum browser steps.
+        model: LLM model for browser-use agent.
+
+    Returns:
+        JSON string with comparison results.
+    """
+    if not urls or not isinstance(urls, list):
+        return json.dumps({"error": "urls must be a non-empty list"})
+
+    # Validate all URLs
+    for u in urls:
+        err = _validate_url(u)
+        if err:
+            return json.dumps({"error": f"URL validation failed for {u}: {err}"})
+
+    url_list = "\n".join(f"  {i+1}. {u}" for i, u in enumerate(urls))
+    task = (
+        f"Visit each of these URLs and compare them:\n{url_list}\n\n"
+        f"Comparison task: {instruction}\n\n"
+        f"Visit each URL one by one, extract relevant information, "
+        f"then provide a structured comparison. Use the 'done' action when finished."
+    )
+
+    return browser_use_run(
+        task=task,
+        max_steps=max_steps,
+        model=model,
+        url=urls[0],
+    )
+
+
+# ---------------------------------------------------------------------------
+# LLM resolution helpers
+# ---------------------------------------------------------------------------
+
+def _resolve_langchain_llm(model: Optional[str]):
+    """Build a LangChain LLM from a model string or environment.
+
+    Supports OpenAI and Anthropic models. Returns the LLM instance or
+    an error message string on failure.
+    """
+    if not model:
+        # Auto-detect from available API keys
+        if os.getenv("ANTHROPIC_API_KEY"):
+            model = "claude-sonnet-4-20250514"
+        elif os.getenv("OPENAI_API_KEY"):
+            model = "gpt-4o"
+        else:
+            return json.dumps({
+                "error": "No LLM model configured for browser-use. "
+                         "Set BROWSER_USE_MODEL, ANTHROPIC_API_KEY, or OPENAI_API_KEY."
+            })
+
+    model_lower = model.lower()
+
+    if "claude" in model_lower or "anthropic" in model_lower:
+        try:
+            from langchain_anthropic import ChatAnthropic
+            api_key = os.getenv("ANTHROPIC_API_KEY", "")
+            if not api_key:
+                return json.dumps({"error": "ANTHROPIC_API_KEY not set"})
+            return ChatAnthropic(
+                model=model,
+                api_key=api_key,
+                timeout=60,
+                stop=None,
+            )
+        except ImportError:
+            return json.dumps({
+                "error": "langchain-anthropic not installed. "
+                         "Install: pip install langchain-anthropic"
+            })
+
+    # Default to OpenAI-compatible
+    try:
+        from langchain_openai import ChatOpenAI
+        api_key = os.getenv("OPENAI_API_KEY", "")
+        base_url = os.getenv("OPENAI_BASE_URL", None)
+        if not api_key:
+            return json.dumps({"error": "OPENAI_API_KEY not set"})
+        kwargs = {
+            "model": model,
+            "api_key": api_key,
+            "timeout": 60,
+        }
+        if base_url:
+            kwargs["base_url"] = base_url
+        return ChatOpenAI(**kwargs)
+    except ImportError:
+        return json.dumps({
+            "error": "langchain-openai not installed. "
+                     "Install: pip install langchain-openai"
+        })
+
+
+# ---------------------------------------------------------------------------
+# Schema definitions
+# ---------------------------------------------------------------------------
+
+BROWSER_USE_RUN_SCHEMA = {
+    "name": "browser_use_run",
+    "description": (
+        "Run an autonomous browser task using AI-driven browser automation. "
+        "Describe what you want to accomplish in natural language, and browser-use "
+        "will autonomously navigate, click, type, and extract data to complete it. "
+        "Best for multi-step tasks like 'find X on website Y' or 'fill out this form'. "
+        "For simple single-page extraction, prefer web_extract (faster). "
+        "For fine-grained step-by-step control, use browser_navigate/snapshot/click/type."
+    ),
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "task": {
+                "type": "string",
+                "description": "Natural language description of the browser task to perform"
+            },
+            "max_steps": {
+                "type": "integer",
+                "description": "Maximum number of autonomous steps (default: 25)",
+                "default": 25,
+            },
+            "model": {
+                "type": "string",
+                "description": "LLM model for the browser-use agent (default: auto-detect from available API keys)",
+            },
+            "url": {
+                "type": "string",
+                "description": "Optional starting URL to navigate to before beginning the task",
+            },
+            "use_vision": {
+                "type": "boolean",
+                "description": "Use screenshots for visual context (more token-heavy, default: false)",
+                "default": False,
+            },
+        },
+        "required": ["task"],
+    },
+}
+
+BROWSER_USE_EXTRACT_SCHEMA = {
+    "name": "browser_use_extract",
+    "description": (
+        "Navigate to a URL and extract structured data using autonomous browser automation. "
+        "Specify what to extract in natural language. This is a convenience wrapper that "
+        "combines navigation + extraction into a single call."
+    ),
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "url": {
+                "type": "string",
+                "description": "The URL to navigate to and extract data from"
+            },
+            "instruction": {
+                "type": "string",
+                "description": "What to extract (e.g., 'Extract all pricing tiers with prices and features')",
+                "default": "Extract all meaningful content from this page",
+            },
+            "max_steps": {
+                "type": "integer",
+                "description": "Maximum number of browser steps (default: 15)",
+                "default": 15,
+            },
+            "model": {
+                "type": "string",
+                "description": "LLM model for the browser-use agent",
+            },
+        },
+        "required": ["url"],
+    },
+}
+
+BROWSER_USE_COMPARE_SCHEMA = {
+    "name": "browser_use_compare",
+    "description": (
+        "Visit multiple URLs and compare their content using autonomous browser automation. "
+        "Specify what to compare in natural language. The agent will visit each URL, "
+        "extract relevant data, and produce a structured comparison."
+    ),
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "urls": {
+                "type": "array",
+                "items": {"type": "string"},
+                "description": "List of URLs to visit and compare"
+            },
+            "instruction": {
+                "type": "string",
+                "description": "What to compare (e.g., 'Compare pricing plans and features')",
+                "default": "Compare the content on these pages",
+            },
+            "max_steps": {
+                "type": "integer",
+                "description": "Maximum number of browser steps (default: 25)",
+                "default": 25,
+            },
+            "model": {
+                "type": "string",
+                "description": "LLM model for the browser-use agent",
+            },
+        },
+        "required": ["urls"],
+    },
+}
+
+
+# ---------------------------------------------------------------------------
+# Handlers
+# ---------------------------------------------------------------------------
+
+def _handle_browser_use_run(args: dict, **kw) -> str:
+    return browser_use_run(
+        task=args.get("task", ""),
+        max_steps=args.get("max_steps", 25),
+        model=args.get("model"),
+        url=args.get("url"),
+        use_vision=args.get("use_vision", False),
+    )
+
+
+def _handle_browser_use_extract(args: dict, **kw) -> str:
+    return browser_use_extract(
+        url=args.get("url", ""),
+        instruction=args.get("instruction", "Extract all meaningful content from this page"),
+        max_steps=args.get("max_steps", 15),
+        model=args.get("model"),
+    )
+
+
+def _handle_browser_use_compare(args: dict, **kw) -> str:
+    return browser_use_compare(
+        urls=args.get("urls", []),
+        instruction=args.get("instruction", "Compare the content on these pages"),
+        max_steps=args.get("max_steps", 25),
+        model=args.get("model"),
+    )
+
+
+# ---------------------------------------------------------------------------
+# Module test
+# ---------------------------------------------------------------------------
+
+if __name__ == "__main__":
+    print("Browser Use Tool Module")
+    print("=" * 40)
+
+    if _check_browser_use_available():
+        print("browser-use library: installed")
+    else:
+        print("browser-use library: NOT installed")
+        print("  Install: pip install browser-use && playwright install chromium")
+
+    # Check API keys
+    if os.getenv("ANTHROPIC_API_KEY"):
+        print("ANTHROPIC_API_KEY: set")
+    elif os.getenv("OPENAI_API_KEY"):
+        print("OPENAI_API_KEY: set")
+    else:
+        print("No LLM API keys found (need ANTHROPIC_API_KEY or OPENAI_API_KEY)")
+
+    if os.getenv("BROWSER_USE_API_KEY"):
+        print("BROWSER_USE_API_KEY: set (cloud mode available)")
+    else:
+        print("BROWSER_USE_API_KEY: not set (local Playwright mode)")
+
+
+# ---------------------------------------------------------------------------
+# Registry
+# ---------------------------------------------------------------------------
+
+from tools.registry import registry
+
+registry.register(
+    name="browser_use_run",
+    toolset="browser_use",
+    schema=BROWSER_USE_RUN_SCHEMA,
+    handler=_handle_browser_use_run,
+    check_fn=_check_browser_use_available,
+    emoji="🤖",
+)
+
+registry.register(
+    name="browser_use_extract",
+    toolset="browser_use",
+    schema=BROWSER_USE_EXTRACT_SCHEMA,
+    handler=_handle_browser_use_extract,
+    check_fn=_check_browser_use_available,
+    emoji="🔍",
+)
+
+registry.register(
+    name="browser_use_compare",
+    toolset="browser_use",
+    schema=BROWSER_USE_COMPARE_SCHEMA,
+    handler=_handle_browser_use_compare,
+    check_fn=_check_browser_use_available,
+    emoji="⚖️",
+)