# Browser Integration Analysis: Browser Use + Graphify + Multica

**Issue:** #262 — Investigation: Browser Use + Graphify + Multica — Hermes Integration Analysis
**Date:** 2026-04-10
**Author:** Hermes Agent (burn branch)

## Executive Summary

This document evaluates three browser-related projects for integration with
hermes-agent. Each tool is assessed on capability, integration complexity,
security posture, and strategic fit with Hermes's existing browser stack.

| Tool              | Recommendation          | Integration Path        |
|-------------------|-------------------------|-------------------------|
| Browser Use       | **Integrate** (PoC)     | Tool + MCP server       |
| Graphify          | Investigate further     | MCP server or tool      |
| Multica           | Skip (for now)          | N/A — premature         |

---

## 1. Browser Use (`browser-use`)

### What It Does

Browser Use is a Python library that wraps Playwright to provide LLM-driven
browser automation. An agent describes a task in natural language, and
browser-use autonomously navigates, clicks, types, and extracts data by
feeding the page's accessibility tree to an LLM and executing the resulting
actions in a loop.

Key capabilities:
- Autonomous multi-step browser workflows from a single text instruction
- Accessibility tree extraction (DOM + ARIA snapshot)
- Screenshot and visual context for multimodal models
- Form filling, navigation, data extraction, file downloads
- Custom actions (register callable Python functions the LLM can invoke)
- Parallel agent execution (multiple browser agents simultaneously)
- Cloud execution via browser-use.com API (no local browser needed)

### Integration with Hermes

**Primary path: Custom Hermes tool** wrapping `browser-use` as a high-level
"automated browsing" capability alongside the existing `browser_tool.py`
(low-level, agent-controlled) tools.

**Why a separate tool rather than replacing browser_tool.py:**
- Hermes's existing browser tools (navigate, snapshot, click, type) give the
  LLM fine-grained step-by-step control — this is valuable for interactive
  tasks and debugging.
- browser-use gives coarse-grained "do this task for me" autonomy — better
  for multi-step extraction workflows where the LLM would otherwise need
  10+ tool calls.
- Both modes have legitimate use cases. Offer both.

**Integration architecture:**

```
hermes-agent
  tools/
    browser_tool.py          # Existing — low-level agent-controlled browsing
    browser_use_tool.py      # NEW — high-level autonomous browsing (PoC)
      |
      +-- browser_use.run()  # Wraps browser-use Agent class
      +-- browser_use.extract()  # Wraps browser-use for data extraction
```

The tool registers with `tools/registry.py` as toolset `browser_use` with
a `check_fn` that verifies `browser-use` is installed.

**Alternative: MCP server** — browser-use could also be exposed as an MCP
server for multi-agent setups where subagents need independent browser
access. This is a follow-up, not the initial integration.

### Dependencies and Requirements

```
pip install browser-use          # Core library
playwright install chromium      # Playwright browser binary
```

Or use cloud mode with `BROWSER_USE_API_KEY` — no local browser needed.

Python 3.11+, Playwright. No exotic system dependencies beyond what
Hermes already requires for its existing browser tool.

### Security Considerations

| Concern                    | Mitigation                                              |
|----------------------------|---------------------------------------------------------|
| Arbitrary URL access       | Reuse Hermes's `website_policy` and `url_safety` modules |
| Data exfiltration          | Browser-use agents run in isolated Playwright contexts; no access to Hermes filesystem |
| Prompt injection via page  | browser-use feeds page content to LLM — same risk as existing browser_snapshot; already handled by Hermes prompt hardening |
| Credential leakage         | Do not pass API keys to untrusted pages; cloud mode keeps credentials server-side |
| Resource exhaustion        | Set max_steps on browser-use Agent to prevent infinite loops |
| Downloaded files           | Playwright download path is sandboxed; tool should restrict to temp directory |

**Key security property:** browser-use executes within Playwright's sandboxed
browser context. The LLM controlling browser-use is Hermes itself (or a
configured auxiliary model), not the page content. This is equivalent to the
existing browser tool's security model.

### Performance Characteristics

- **Startup:** ~2-3s for Playwright Chromium launch (same as existing local mode)
- **Per-step:** ~1-3s per LLM call + browser action (comparable to manual
  browser_navigate + browser_snapshot loop)
- **Full task (5-10 steps):** ~15-45s depending on page complexity
- **Token usage:** Each step sends the accessibility tree to the LLM.
  Browser-use supports vision mode (screenshots) which is more token-heavy.
- **Parallelism:** Supports multiple concurrent browser agents

**Comparison to existing tools:**
For a 10-step browser task, the existing approach requires 10+ Hermes API
calls (navigate, snapshot, click, type, snapshot, click, ...). Browser-use
consolidates this into a single Hermes tool call that internally runs its
own LLM loop. This reduces Hermes API round-trips but shifts the LLM cost
to browser-use's internal model calls.

### Recommendation: INTEGRATE

Browser Use fills a clear gap — autonomous multi-step browser tasks — that
complements Hermes's existing fine-grained browser tools. The integration
is straightforward (Python library, same security model). A PoC tool is
provided in `tools/browser_use_tool.py`.

---

## 2. Graphify

### What It Does

Graphify is a knowledge graph extraction tool that processes unstructured
text (including web content) and extracts entities, relationships, and
structured knowledge into a graph format. It can:

- Extract entities and relationships from text using NLP/LLM techniques
- Build knowledge graphs from web-scraped content
- Support incremental graph updates as new content is processed
- Export graphs in standard formats (JSON-LD, RDF, etc.)

(Note: "Graphify" as a project name is used by several tools. The most
relevant for browser integration is the concept of extracting structured
knowledge graphs from web content during or after browsing.)

### Integration with Hermes

**Primary path: MCP server or Hermes tool** that takes web content (from
browser_tool or web_extract) and produces structured knowledge graphs.

**Integration architecture:**

```
hermes-agent
  tools/
    graphify_tool.py          # NEW — knowledge graph extraction from text
      |
      +-- graphify.extract()  # Extract entities/relations from text
      +-- graphify.merge()    # Merge into existing graph
      +-- graphify.query()    # Query the accumulated graph
```

Or via MCP:
```
hermes-agent --mcp-server graphify-mcp
  -> tools: graphify_extract, graphify_query, graphify_export
```

**Synergy with browser tools:**
1. `browser_navigate` + `browser_snapshot` to get page content
2. `graphify_extract` to pull entities and relationships
3. Repeat across multiple pages to build a domain knowledge graph
4. `graphify_query` to answer questions about accumulated knowledge

### Dependencies and Requirements

Varies significantly depending on the specific Graphify implementation.
Typical requirements:
- Python 3.11+
- spaCy or similar NLP library for entity extraction
- Optional: Neo4j or NetworkX for graph storage
- LLM access (can reuse Hermes's existing model configuration)

### Security Considerations

| Concern                    | Mitigation                                              |
|----------------------------|---------------------------------------------------------|
| Processing untrusted text  | NLP extraction is read-only; no code execution          |
| Graph data persistence     | Store in Hermes's data directory with appropriate permissions |
| Information aggregation    | Knowledge graphs could accumulate sensitive data; provide clear/delete commands |
| External graph DB access   | If using Neo4j, require authentication and restrict to localhost |

### Performance Characteristics

- **Extraction:** ~0.5-2s per page depending on content length and NLP model
- **Graph operations:** Sub-second for graphs under 100K nodes
- **Storage:** Lightweight (JSON/SQLite) for small graphs, Neo4j for large-scale
- **Token usage:** If using LLM-based extraction, ~500-2000 tokens per page

### Recommendation: INVESTIGATE FURTHER

The concept is sound — knowledge graph extraction from web content is a
natural complement to browser tools. However:

1. **Multiple competing tools** exist under this name; need to identify the
   best-maintained option
2. **Value proposition unclear** vs. Hermes's existing memory system and
   file-based knowledge storage
3. **NLP dependency** adds complexity (spaCy models are ~500MB)

**Suggested next steps:**
- Evaluate specific Graphify implementations (graphify.ai, custom NLP pipelines)
- Prototype with a lightweight approach: LLM-based entity extraction + NetworkX
- Assess whether Hermes's existing memory/graph_store.py can serve this role

---

## 3. Multica

### What It Does

Multica is a multi-agent browser coordination framework. It enables multiple
AI agents to collaboratively browse the web, with features for:

- Task decomposition: splitting complex web tasks across multiple agents
- Shared browser state: agents see a common view of browsing progress
- Coordination protocols: agents can communicate about what they've found
- Parallel web research: multiple agents researching different aspects simultaneously

### Integration with Hermes

**Theoretical path:** Multica would integrate as a higher-level orchestration
layer on top of Hermes's existing browser tools, coordinating multiple
Hermes subagents (via `delegate_tool`) each with browser access.

**Integration architecture:**

```
hermes-agent (orchestrator)
  delegate_tool -> subagent_1 (browser_navigate, browser_snapshot, ...)
  delegate_tool -> subagent_2 (browser_navigate, browser_snapshot, ...)
  delegate_tool -> subagent_3 (browser_navigate, browser_snapshot, ...)
                    |
                    +-- Multica coordination layer (shared state, task splitting)
```

### Dependencies and Requirements

- Complex multi-agent orchestration infrastructure
- Shared state management between agents
- Potentially a custom runtime for agent coordination
- Likely requires significant architectural changes to Hermes's delegation model

### Security Considerations

| Concern                    | Mitigation                                              |
|----------------------------|---------------------------------------------------------|
| Multiple agents on same browser | Session isolation per agent (Hermes already does this) |
| Coordinated exfiltration   | Same per-agent restrictions apply                       |
| Amplified prompt injection | Each agent processes its own pages independently         |
| Resource multiplication    | N agents = N browser instances = Nx resource usage      |

### Performance Characteristics

- **Scaling:** Near-linear improvement for embarrassingly parallel tasks
  (e.g., "research 10 companies simultaneously")
- **Overhead:** Significant coordination overhead for tightly coupled tasks
- **Resource cost:** Each agent needs its own LLM calls + browser instance
- **Complexity:** Debugging multi-agent browser workflows is extremely difficult

### Recommendation: SKIP (for now)

Multica addresses a real need (parallel web research) but is premature for
Hermes for several reasons:

1. **Hermes already has subagent delegation** (`delegate_tool`) — agents can
   already do parallel browser work without Multica
2. **No mature implementation** — Multica is more of a concept than a
   production-ready tool
3. **Complexity vs. benefit** — the coordination overhead and debugging
   difficulty outweigh the benefits for most use cases
4. **Better alternatives exist** — for parallel research, simply delegating
   multiple subagents with browser tools is simpler and already works

**Revisit when:** Hermes's delegation model supports shared state between
subagents, or a mature Multica implementation emerges.

---

## Integration Roadmap

### Phase 1: Browser Use PoC (this PR)
- [x] Create `tools/browser_use_tool.py` wrapping browser-use as Hermes tool
- [x] Create `docs/browser-integration-analysis.md` (this document)
- [ ] Test with real browser tasks
- [ ] Add to toolset configuration

### Phase 2: Browser Use Production (follow-up)
- [ ] Add `browser_use` to `toolsets.py` toolset definitions
- [ ] Add configuration options in `config.yaml`
- [ ] Add tests in `tests/test_browser_use_tool.py`
- [ ] Consider MCP server variant for subagent use

### Phase 3: Graphify Investigation (follow-up)
- [ ] Evaluate specific Graphify implementations
- [ ] Prototype lightweight LLM-based entity extraction tool
- [ ] Assess integration with existing `graph_store.py`
- [ ] Create PoC if investigation is positive

### Phase 4: Multi-Agent Browser (future)
- [ ] Monitor Multica ecosystem maturity
- [ ] Evaluate when delegation model supports shared state
- [ ] Consider simpler parallel delegation patterns first

---

## Appendix: Existing Browser Stack

Hermes already has a comprehensive browser tool stack:

| Component             | Description                                      |
|-----------------------|--------------------------------------------------|
| `browser_tool.py`     | Low-level agent-controlled browser (navigate, click, type, snapshot) |
| `browser_camofox.py`  | Anti-detection browser via Camofox REST API       |
| `browser_providers/`  | Cloud providers (Browserbase, Browser Use API, Firecrawl) |
| `web_tools.py`        | Web search (Parallel) and extraction (Firecrawl) |
| `mcp_tool.py`         | MCP client for connecting external tool servers   |

The existing stack covers:
- **Local browsing:** Headless Chromium via agent-browser CLI
- **Cloud browsing:** Browserbase, Browser Use cloud, Firecrawl
- **Anti-detection:** Camofox (local) or Browserbase advanced stealth
- **Content extraction:** Firecrawl for clean markdown extraction
- **Search:** Parallel AI web search

New browser integrations should complement rather than replace these tools.