Add docs/browser-integration-analysis.md: - Technical analysis of Browser Use, Graphify, and Multica for Hermes - Integration paths, security considerations, performance characteristics - Clear recommendations: Browser Use (integrate), Graphify (investigate), Multica (skip) - Phased integration roadmap Add tools/browser_use_tool.py: - Wraps browser-use library as Hermes tool (toolset: browser_use) - Three tools: browser_use_run, browser_use_extract, browser_use_compare - Autonomous multi-step browser automation from natural language tasks - Integrates with existing url_safety and website_policy security modules - Supports both local Playwright and cloud execution modes - Follows existing tool registration pattern (registry.register) Refs: #262
14 KiB
Browser Integration Analysis: Browser Use + Graphify + Multica
Issue: #262 — Investigation: Browser Use + Graphify + Multica — Hermes Integration Analysis Date: 2026-04-10 Author: Hermes Agent (burn branch)
Executive Summary
This document evaluates three browser-related projects for integration with hermes-agent. Each tool is assessed on capability, integration complexity, security posture, and strategic fit with Hermes's existing browser stack.
| Tool | Recommendation | Integration Path |
|---|---|---|
| Browser Use | Integrate (PoC) | Tool + MCP server |
| Graphify | Investigate further | MCP server or tool |
| Multica | Skip (for now) | N/A — premature |
1. Browser Use (browser-use)
What It Does
Browser Use is a Python library that wraps Playwright to provide LLM-driven browser automation. An agent describes a task in natural language, and browser-use autonomously navigates, clicks, types, and extracts data by feeding the page's accessibility tree to an LLM and executing the resulting actions in a loop.
Key capabilities:
- Autonomous multi-step browser workflows from a single text instruction
- Accessibility tree extraction (DOM + ARIA snapshot)
- Screenshot and visual context for multimodal models
- Form filling, navigation, data extraction, file downloads
- Custom actions (register callable Python functions the LLM can invoke)
- Parallel agent execution (multiple browser agents simultaneously)
- Cloud execution via browser-use.com API (no local browser needed)
Integration with Hermes
Primary path: Custom Hermes tool wrapping browser-use as a high-level
"automated browsing" capability alongside the existing browser_tool.py
(low-level, agent-controlled) tools.
Why a separate tool rather than replacing browser_tool.py:
- Hermes's existing browser tools (navigate, snapshot, click, type) give the LLM fine-grained step-by-step control — this is valuable for interactive tasks and debugging.
- browser-use gives coarse-grained "do this task for me" autonomy — better for multi-step extraction workflows where the LLM would otherwise need 10+ tool calls.
- Both modes have legitimate use cases. Offer both.
Integration architecture:
hermes-agent
tools/
browser_tool.py # Existing — low-level agent-controlled browsing
browser_use_tool.py # NEW — high-level autonomous browsing (PoC)
|
+-- browser_use.run() # Wraps browser-use Agent class
+-- browser_use.extract() # Wraps browser-use for data extraction
The tool registers with tools/registry.py as toolset browser_use with
a check_fn that verifies browser-use is installed.
Alternative: MCP server — browser-use could also be exposed as an MCP server for multi-agent setups where subagents need independent browser access. This is a follow-up, not the initial integration.
Dependencies and Requirements
pip install browser-use # Core library
playwright install chromium # Playwright browser binary
Or use cloud mode with BROWSER_USE_API_KEY — no local browser needed.
Python 3.11+, Playwright. No exotic system dependencies beyond what Hermes already requires for its existing browser tool.
Security Considerations
| Concern | Mitigation |
|---|---|
| Arbitrary URL access | Reuse Hermes's website_policy and url_safety modules |
| Data exfiltration | Browser-use agents run in isolated Playwright contexts; no access to Hermes filesystem |
| Prompt injection via page | browser-use feeds page content to LLM — same risk as existing browser_snapshot; already handled by Hermes prompt hardening |
| Credential leakage | Do not pass API keys to untrusted pages; cloud mode keeps credentials server-side |
| Resource exhaustion | Set max_steps on browser-use Agent to prevent infinite loops |
| Downloaded files | Playwright download path is sandboxed; tool should restrict to temp directory |
Key security property: browser-use executes within Playwright's sandboxed browser context. The LLM controlling browser-use is Hermes itself (or a configured auxiliary model), not the page content. This is equivalent to the existing browser tool's security model.
Performance Characteristics
- Startup: ~2-3s for Playwright Chromium launch (same as existing local mode)
- Per-step: ~1-3s per LLM call + browser action (comparable to manual browser_navigate + browser_snapshot loop)
- Full task (5-10 steps): ~15-45s depending on page complexity
- Token usage: Each step sends the accessibility tree to the LLM. Browser-use supports vision mode (screenshots) which is more token-heavy.
- Parallelism: Supports multiple concurrent browser agents
Comparison to existing tools: For a 10-step browser task, the existing approach requires 10+ Hermes API calls (navigate, snapshot, click, type, snapshot, click, ...). Browser-use consolidates this into a single Hermes tool call that internally runs its own LLM loop. This reduces Hermes API round-trips but shifts the LLM cost to browser-use's internal model calls.
Recommendation: INTEGRATE
Browser Use fills a clear gap — autonomous multi-step browser tasks — that
complements Hermes's existing fine-grained browser tools. The integration
is straightforward (Python library, same security model). A PoC tool is
provided in tools/browser_use_tool.py.
2. Graphify
What It Does
Graphify is a knowledge graph extraction tool that processes unstructured text (including web content) and extracts entities, relationships, and structured knowledge into a graph format. It can:
- Extract entities and relationships from text using NLP/LLM techniques
- Build knowledge graphs from web-scraped content
- Support incremental graph updates as new content is processed
- Export graphs in standard formats (JSON-LD, RDF, etc.)
(Note: "Graphify" as a project name is used by several tools. The most relevant for browser integration is the concept of extracting structured knowledge graphs from web content during or after browsing.)
Integration with Hermes
Primary path: MCP server or Hermes tool that takes web content (from browser_tool or web_extract) and produces structured knowledge graphs.
Integration architecture:
hermes-agent
tools/
graphify_tool.py # NEW — knowledge graph extraction from text
|
+-- graphify.extract() # Extract entities/relations from text
+-- graphify.merge() # Merge into existing graph
+-- graphify.query() # Query the accumulated graph
Or via MCP:
hermes-agent --mcp-server graphify-mcp
-> tools: graphify_extract, graphify_query, graphify_export
Synergy with browser tools:
browser_navigate+browser_snapshotto get page contentgraphify_extractto pull entities and relationships- Repeat across multiple pages to build a domain knowledge graph
graphify_queryto answer questions about accumulated knowledge
Dependencies and Requirements
Varies significantly depending on the specific Graphify implementation. Typical requirements:
- Python 3.11+
- spaCy or similar NLP library for entity extraction
- Optional: Neo4j or NetworkX for graph storage
- LLM access (can reuse Hermes's existing model configuration)
Security Considerations
| Concern | Mitigation |
|---|---|
| Processing untrusted text | NLP extraction is read-only; no code execution |
| Graph data persistence | Store in Hermes's data directory with appropriate permissions |
| Information aggregation | Knowledge graphs could accumulate sensitive data; provide clear/delete commands |
| External graph DB access | If using Neo4j, require authentication and restrict to localhost |
Performance Characteristics
- Extraction: ~0.5-2s per page depending on content length and NLP model
- Graph operations: Sub-second for graphs under 100K nodes
- Storage: Lightweight (JSON/SQLite) for small graphs, Neo4j for large-scale
- Token usage: If using LLM-based extraction, ~500-2000 tokens per page
Recommendation: INVESTIGATE FURTHER
The concept is sound — knowledge graph extraction from web content is a natural complement to browser tools. However:
- Multiple competing tools exist under this name; need to identify the best-maintained option
- Value proposition unclear vs. Hermes's existing memory system and file-based knowledge storage
- NLP dependency adds complexity (spaCy models are ~500MB)
Suggested next steps:
- Evaluate specific Graphify implementations (graphify.ai, custom NLP pipelines)
- Prototype with a lightweight approach: LLM-based entity extraction + NetworkX
- Assess whether Hermes's existing memory/graph_store.py can serve this role
3. Multica
What It Does
Multica is a multi-agent browser coordination framework. It enables multiple AI agents to collaboratively browse the web, with features for:
- Task decomposition: splitting complex web tasks across multiple agents
- Shared browser state: agents see a common view of browsing progress
- Coordination protocols: agents can communicate about what they've found
- Parallel web research: multiple agents researching different aspects simultaneously
Integration with Hermes
Theoretical path: Multica would integrate as a higher-level orchestration
layer on top of Hermes's existing browser tools, coordinating multiple
Hermes subagents (via delegate_tool) each with browser access.
Integration architecture:
hermes-agent (orchestrator)
delegate_tool -> subagent_1 (browser_navigate, browser_snapshot, ...)
delegate_tool -> subagent_2 (browser_navigate, browser_snapshot, ...)
delegate_tool -> subagent_3 (browser_navigate, browser_snapshot, ...)
|
+-- Multica coordination layer (shared state, task splitting)
Dependencies and Requirements
- Complex multi-agent orchestration infrastructure
- Shared state management between agents
- Potentially a custom runtime for agent coordination
- Likely requires significant architectural changes to Hermes's delegation model
Security Considerations
| Concern | Mitigation |
|---|---|
| Multiple agents on same browser | Session isolation per agent (Hermes already does this) |
| Coordinated exfiltration | Same per-agent restrictions apply |
| Amplified prompt injection | Each agent processes its own pages independently |
| Resource multiplication | N agents = N browser instances = Nx resource usage |
Performance Characteristics
- Scaling: Near-linear improvement for embarrassingly parallel tasks (e.g., "research 10 companies simultaneously")
- Overhead: Significant coordination overhead for tightly coupled tasks
- Resource cost: Each agent needs its own LLM calls + browser instance
- Complexity: Debugging multi-agent browser workflows is extremely difficult
Recommendation: SKIP (for now)
Multica addresses a real need (parallel web research) but is premature for Hermes for several reasons:
- Hermes already has subagent delegation (
delegate_tool) — agents can already do parallel browser work without Multica - No mature implementation — Multica is more of a concept than a production-ready tool
- Complexity vs. benefit — the coordination overhead and debugging difficulty outweigh the benefits for most use cases
- Better alternatives exist — for parallel research, simply delegating multiple subagents with browser tools is simpler and already works
Revisit when: Hermes's delegation model supports shared state between subagents, or a mature Multica implementation emerges.
Integration Roadmap
Phase 1: Browser Use PoC (this PR)
- Create
tools/browser_use_tool.pywrapping browser-use as Hermes tool - Create
docs/browser-integration-analysis.md(this document) - Test with real browser tasks
- Add to toolset configuration
Phase 2: Browser Use Production (follow-up)
- Add
browser_usetotoolsets.pytoolset definitions - Add configuration options in
config.yaml - Add tests in
tests/test_browser_use_tool.py - Consider MCP server variant for subagent use
Phase 3: Graphify Investigation (follow-up)
- Evaluate specific Graphify implementations
- Prototype lightweight LLM-based entity extraction tool
- Assess integration with existing
graph_store.py - Create PoC if investigation is positive
Phase 4: Multi-Agent Browser (future)
- Monitor Multica ecosystem maturity
- Evaluate when delegation model supports shared state
- Consider simpler parallel delegation patterns first
Appendix: Existing Browser Stack
Hermes already has a comprehensive browser tool stack:
| Component | Description |
|---|---|
browser_tool.py |
Low-level agent-controlled browser (navigate, click, type, snapshot) |
browser_camofox.py |
Anti-detection browser via Camofox REST API |
browser_providers/ |
Cloud providers (Browserbase, Browser Use API, Firecrawl) |
web_tools.py |
Web search (Parallel) and extraction (Firecrawl) |
mcp_tool.py |
MCP client for connecting external tool servers |
The existing stack covers:
- Local browsing: Headless Chromium via agent-browser CLI
- Cloud browsing: Browserbase, Browser Use cloud, Firecrawl
- Anti-detection: Camofox (local) or Browserbase advanced stealth
- Content extraction: Firecrawl for clean markdown extraction
- Search: Parallel AI web search
New browser integrations should complement rather than replace these tools.