# Browser Integration Analysis: Browser Use + Graphify + Multica **Issue:** #262 — Investigation: Browser Use + Graphify + Multica — Hermes Integration Analysis **Date:** 2026-04-10 **Author:** Hermes Agent (burn branch) ## Executive Summary This document evaluates three browser-related projects for integration with hermes-agent. Each tool is assessed on capability, integration complexity, security posture, and strategic fit with Hermes's existing browser stack. | Tool | Recommendation | Integration Path | |-------------------|-------------------------|-------------------------| | Browser Use | **Integrate** (PoC) | Tool + MCP server | | Graphify | Investigate further | MCP server or tool | | Multica | Skip (for now) | N/A — premature | --- ## 1. Browser Use (`browser-use`) ### What It Does Browser Use is a Python library that wraps Playwright to provide LLM-driven browser automation. An agent describes a task in natural language, and browser-use autonomously navigates, clicks, types, and extracts data by feeding the page's accessibility tree to an LLM and executing the resulting actions in a loop. Key capabilities: - Autonomous multi-step browser workflows from a single text instruction - Accessibility tree extraction (DOM + ARIA snapshot) - Screenshot and visual context for multimodal models - Form filling, navigation, data extraction, file downloads - Custom actions (register callable Python functions the LLM can invoke) - Parallel agent execution (multiple browser agents simultaneously) - Cloud execution via browser-use.com API (no local browser needed) ### Integration with Hermes **Primary path: Custom Hermes tool** wrapping `browser-use` as a high-level "automated browsing" capability alongside the existing `browser_tool.py` (low-level, agent-controlled) tools. **Why a separate tool rather than replacing browser_tool.py:** - Hermes's existing browser tools (navigate, snapshot, click, type) give the LLM fine-grained step-by-step control — this is valuable for interactive tasks and debugging. - browser-use gives coarse-grained "do this task for me" autonomy — better for multi-step extraction workflows where the LLM would otherwise need 10+ tool calls. - Both modes have legitimate use cases. Offer both. **Integration architecture:** ``` hermes-agent tools/ browser_tool.py # Existing — low-level agent-controlled browsing browser_use_tool.py # NEW — high-level autonomous browsing (PoC) | +-- browser_use.run() # Wraps browser-use Agent class +-- browser_use.extract() # Wraps browser-use for data extraction ``` The tool registers with `tools/registry.py` as toolset `browser_use` with a `check_fn` that verifies `browser-use` is installed. **Alternative: MCP server** — browser-use could also be exposed as an MCP server for multi-agent setups where subagents need independent browser access. This is a follow-up, not the initial integration. ### Dependencies and Requirements ``` pip install browser-use # Core library playwright install chromium # Playwright browser binary ``` Or use cloud mode with `BROWSER_USE_API_KEY` — no local browser needed. Python 3.11+, Playwright. No exotic system dependencies beyond what Hermes already requires for its existing browser tool. ### Security Considerations | Concern | Mitigation | |----------------------------|---------------------------------------------------------| | Arbitrary URL access | Reuse Hermes's `website_policy` and `url_safety` modules | | Data exfiltration | Browser-use agents run in isolated Playwright contexts; no access to Hermes filesystem | | Prompt injection via page | browser-use feeds page content to LLM — same risk as existing browser_snapshot; already handled by Hermes prompt hardening | | Credential leakage | Do not pass API keys to untrusted pages; cloud mode keeps credentials server-side | | Resource exhaustion | Set max_steps on browser-use Agent to prevent infinite loops | | Downloaded files | Playwright download path is sandboxed; tool should restrict to temp directory | **Key security property:** browser-use executes within Playwright's sandboxed browser context. The LLM controlling browser-use is Hermes itself (or a configured auxiliary model), not the page content. This is equivalent to the existing browser tool's security model. ### Performance Characteristics - **Startup:** ~2-3s for Playwright Chromium launch (same as existing local mode) - **Per-step:** ~1-3s per LLM call + browser action (comparable to manual browser_navigate + browser_snapshot loop) - **Full task (5-10 steps):** ~15-45s depending on page complexity - **Token usage:** Each step sends the accessibility tree to the LLM. Browser-use supports vision mode (screenshots) which is more token-heavy. - **Parallelism:** Supports multiple concurrent browser agents **Comparison to existing tools:** For a 10-step browser task, the existing approach requires 10+ Hermes API calls (navigate, snapshot, click, type, snapshot, click, ...). Browser-use consolidates this into a single Hermes tool call that internally runs its own LLM loop. This reduces Hermes API round-trips but shifts the LLM cost to browser-use's internal model calls. ### Recommendation: INTEGRATE Browser Use fills a clear gap — autonomous multi-step browser tasks — that complements Hermes's existing fine-grained browser tools. The integration is straightforward (Python library, same security model). A PoC tool is provided in `tools/browser_use_tool.py`. --- ## 2. Graphify ### What It Does Graphify is a knowledge graph extraction tool that processes unstructured text (including web content) and extracts entities, relationships, and structured knowledge into a graph format. It can: - Extract entities and relationships from text using NLP/LLM techniques - Build knowledge graphs from web-scraped content - Support incremental graph updates as new content is processed - Export graphs in standard formats (JSON-LD, RDF, etc.) (Note: "Graphify" as a project name is used by several tools. The most relevant for browser integration is the concept of extracting structured knowledge graphs from web content during or after browsing.) ### Integration with Hermes **Primary path: MCP server or Hermes tool** that takes web content (from browser_tool or web_extract) and produces structured knowledge graphs. **Integration architecture:** ``` hermes-agent tools/ graphify_tool.py # NEW — knowledge graph extraction from text | +-- graphify.extract() # Extract entities/relations from text +-- graphify.merge() # Merge into existing graph +-- graphify.query() # Query the accumulated graph ``` Or via MCP: ``` hermes-agent --mcp-server graphify-mcp -> tools: graphify_extract, graphify_query, graphify_export ``` **Synergy with browser tools:** 1. `browser_navigate` + `browser_snapshot` to get page content 2. `graphify_extract` to pull entities and relationships 3. Repeat across multiple pages to build a domain knowledge graph 4. `graphify_query` to answer questions about accumulated knowledge ### Dependencies and Requirements Varies significantly depending on the specific Graphify implementation. Typical requirements: - Python 3.11+ - spaCy or similar NLP library for entity extraction - Optional: Neo4j or NetworkX for graph storage - LLM access (can reuse Hermes's existing model configuration) ### Security Considerations | Concern | Mitigation | |----------------------------|---------------------------------------------------------| | Processing untrusted text | NLP extraction is read-only; no code execution | | Graph data persistence | Store in Hermes's data directory with appropriate permissions | | Information aggregation | Knowledge graphs could accumulate sensitive data; provide clear/delete commands | | External graph DB access | If using Neo4j, require authentication and restrict to localhost | ### Performance Characteristics - **Extraction:** ~0.5-2s per page depending on content length and NLP model - **Graph operations:** Sub-second for graphs under 100K nodes - **Storage:** Lightweight (JSON/SQLite) for small graphs, Neo4j for large-scale - **Token usage:** If using LLM-based extraction, ~500-2000 tokens per page ### Recommendation: INVESTIGATE FURTHER The concept is sound — knowledge graph extraction from web content is a natural complement to browser tools. However: 1. **Multiple competing tools** exist under this name; need to identify the best-maintained option 2. **Value proposition unclear** vs. Hermes's existing memory system and file-based knowledge storage 3. **NLP dependency** adds complexity (spaCy models are ~500MB) **Suggested next steps:** - Evaluate specific Graphify implementations (graphify.ai, custom NLP pipelines) - Prototype with a lightweight approach: LLM-based entity extraction + NetworkX - Assess whether Hermes's existing memory/graph_store.py can serve this role --- ## 3. Multica ### What It Does Multica is a multi-agent browser coordination framework. It enables multiple AI agents to collaboratively browse the web, with features for: - Task decomposition: splitting complex web tasks across multiple agents - Shared browser state: agents see a common view of browsing progress - Coordination protocols: agents can communicate about what they've found - Parallel web research: multiple agents researching different aspects simultaneously ### Integration with Hermes **Theoretical path:** Multica would integrate as a higher-level orchestration layer on top of Hermes's existing browser tools, coordinating multiple Hermes subagents (via `delegate_tool`) each with browser access. **Integration architecture:** ``` hermes-agent (orchestrator) delegate_tool -> subagent_1 (browser_navigate, browser_snapshot, ...) delegate_tool -> subagent_2 (browser_navigate, browser_snapshot, ...) delegate_tool -> subagent_3 (browser_navigate, browser_snapshot, ...) | +-- Multica coordination layer (shared state, task splitting) ``` ### Dependencies and Requirements - Complex multi-agent orchestration infrastructure - Shared state management between agents - Potentially a custom runtime for agent coordination - Likely requires significant architectural changes to Hermes's delegation model ### Security Considerations | Concern | Mitigation | |----------------------------|---------------------------------------------------------| | Multiple agents on same browser | Session isolation per agent (Hermes already does this) | | Coordinated exfiltration | Same per-agent restrictions apply | | Amplified prompt injection | Each agent processes its own pages independently | | Resource multiplication | N agents = N browser instances = Nx resource usage | ### Performance Characteristics - **Scaling:** Near-linear improvement for embarrassingly parallel tasks (e.g., "research 10 companies simultaneously") - **Overhead:** Significant coordination overhead for tightly coupled tasks - **Resource cost:** Each agent needs its own LLM calls + browser instance - **Complexity:** Debugging multi-agent browser workflows is extremely difficult ### Recommendation: SKIP (for now) Multica addresses a real need (parallel web research) but is premature for Hermes for several reasons: 1. **Hermes already has subagent delegation** (`delegate_tool`) — agents can already do parallel browser work without Multica 2. **No mature implementation** — Multica is more of a concept than a production-ready tool 3. **Complexity vs. benefit** — the coordination overhead and debugging difficulty outweigh the benefits for most use cases 4. **Better alternatives exist** — for parallel research, simply delegating multiple subagents with browser tools is simpler and already works **Revisit when:** Hermes's delegation model supports shared state between subagents, or a mature Multica implementation emerges. --- ## Integration Roadmap ### Phase 1: Browser Use PoC (this PR) - [x] Create `tools/browser_use_tool.py` wrapping browser-use as Hermes tool - [x] Create `docs/browser-integration-analysis.md` (this document) - [ ] Test with real browser tasks - [ ] Add to toolset configuration ### Phase 2: Browser Use Production (follow-up) - [ ] Add `browser_use` to `toolsets.py` toolset definitions - [ ] Add configuration options in `config.yaml` - [ ] Add tests in `tests/test_browser_use_tool.py` - [ ] Consider MCP server variant for subagent use ### Phase 3: Graphify Investigation (follow-up) - [ ] Evaluate specific Graphify implementations - [ ] Prototype lightweight LLM-based entity extraction tool - [ ] Assess integration with existing `graph_store.py` - [ ] Create PoC if investigation is positive ### Phase 4: Multi-Agent Browser (future) - [ ] Monitor Multica ecosystem maturity - [ ] Evaluate when delegation model supports shared state - [ ] Consider simpler parallel delegation patterns first --- ## Appendix: Existing Browser Stack Hermes already has a comprehensive browser tool stack: | Component | Description | |-----------------------|--------------------------------------------------| | `browser_tool.py` | Low-level agent-controlled browser (navigate, click, type, snapshot) | | `browser_camofox.py` | Anti-detection browser via Camofox REST API | | `browser_providers/` | Cloud providers (Browserbase, Browser Use API, Firecrawl) | | `web_tools.py` | Web search (Parallel) and extraction (Firecrawl) | | `mcp_tool.py` | MCP client for connecting external tool servers | The existing stack covers: - **Local browsing:** Headless Chromium via agent-browser CLI - **Cloud browsing:** Browserbase, Browser Use cloud, Firecrawl - **Anti-detection:** Camofox (local) or Browserbase advanced stealth - **Content extraction:** Firecrawl for clean markdown extraction - **Search:** Parallel AI web search New browser integrations should complement rather than replace these tools.