Compare commits

...

6 Commits

Author SHA1 Message Date
Alexander Whitestone
6e8631fdc0 burn: add remove action to on_memory_write bridge
Some checks failed
Forge CI / smoke-and-build (pull_request) Failing after 34s
Extends the memory bridge to fire on_memory_write for the 'remove'
action in addition to 'add' and 'replace'. The holographic provider
now searches for matching facts and lowers trust by 0.4 on remove,
allowing orphaned facts to decay naturally.

Fixes #277
2026-04-10 16:49:48 -04:00
f5f028d981 auto-merge PR #276
Some checks failed
Forge CI / smoke-and-build (push) Failing after 42s
2026-04-10 19:03:02 +00:00
Alexander Whitestone
a703fb823c docs: add Matrix integration setup guide and interactive script
Some checks failed
Forge CI / smoke-and-build (pull_request) Failing after 36s
Phase 2 of Matrix integration — wires Hermes to any Matrix homeserver.

- docs/matrix-setup.md: step-by-step guide covering matrix.org (testing)
  and self-hosted (sovereignty) options, auth methods, E2EE setup, room
  config, and troubleshooting
- scripts/setup_matrix.py: interactive wizard that prompts for homeserver,
  supports token/password auth, generates MATRIX_DEVICE_ID, writes
  ~/.hermes/.env and config.yaml, and optionally creates a test room +
  sends a test message

No config.py changes needed — all Matrix env vars (MATRIX_HOMESERVER,
MATRIX_ACCESS_TOKEN, MATRIX_USER_ID, MATRIX_PASSWORD, MATRIX_ENCRYPTION,
MATRIX_DEVICE_ID, MATRIX_ALLOWED_USERS, MATRIX_HOME_ROOM, etc.) are
already registered in OPTIONAL_ENV_VARS and _EXTRA_ENV_KEYS.

Closes #271
2026-04-10 07:46:42 -04:00
a89dae9942 [auto-merge] browser integration PoC
Some checks failed
Forge CI / smoke-and-build (push) Failing after 38s
Notebook CI / notebook-smoke (push) Failing after 7s
Auto-merged by PR review bot: browser integration PoC
2026-04-10 11:44:56 +00:00
Alexander Whitestone
f85c07551a feat: browser integration analysis + PoC tool (#262)
Some checks failed
Forge CI / smoke-and-build (pull_request) Failing after 36s
Add docs/browser-integration-analysis.md:
- Technical analysis of Browser Use, Graphify, and Multica for Hermes
- Integration paths, security considerations, performance characteristics
- Clear recommendations: Browser Use (integrate), Graphify (investigate),
  Multica (skip)
- Phased integration roadmap

Add tools/browser_use_tool.py:
- Wraps browser-use library as Hermes tool (toolset: browser_use)
- Three tools: browser_use_run, browser_use_extract, browser_use_compare
- Autonomous multi-step browser automation from natural language tasks
- Integrates with existing url_safety and website_policy security modules
- Supports both local Playwright and cloud execution modes
- Follows existing tool registration pattern (registry.register)

Refs: #262
2026-04-10 07:10:29 -04:00
f81c60a5b3 Merge pull request 'docs: Improve KNOWN_VIOLATIONS justifications for SOUL.md alignment' (#267) from feature/improve-sovereignty-justification into main
Some checks failed
Forge CI / smoke-and-build (push) Failing after 41s
Merge PR #267: docs: Improve KNOWN_VIOLATIONS justifications for SOUL.md alignment
2026-04-10 09:35:51 +00:00
6 changed files with 1633 additions and 2 deletions

View File

@@ -0,0 +1,335 @@
# Browser Integration Analysis: Browser Use + Graphify + Multica
**Issue:** #262 — Investigation: Browser Use + Graphify + Multica — Hermes Integration Analysis
**Date:** 2026-04-10
**Author:** Hermes Agent (burn branch)
## Executive Summary
This document evaluates three browser-related projects for integration with
hermes-agent. Each tool is assessed on capability, integration complexity,
security posture, and strategic fit with Hermes's existing browser stack.
| Tool | Recommendation | Integration Path |
|-------------------|-------------------------|-------------------------|
| Browser Use | **Integrate** (PoC) | Tool + MCP server |
| Graphify | Investigate further | MCP server or tool |
| Multica | Skip (for now) | N/A — premature |
---
## 1. Browser Use (`browser-use`)
### What It Does
Browser Use is a Python library that wraps Playwright to provide LLM-driven
browser automation. An agent describes a task in natural language, and
browser-use autonomously navigates, clicks, types, and extracts data by
feeding the page's accessibility tree to an LLM and executing the resulting
actions in a loop.
Key capabilities:
- Autonomous multi-step browser workflows from a single text instruction
- Accessibility tree extraction (DOM + ARIA snapshot)
- Screenshot and visual context for multimodal models
- Form filling, navigation, data extraction, file downloads
- Custom actions (register callable Python functions the LLM can invoke)
- Parallel agent execution (multiple browser agents simultaneously)
- Cloud execution via browser-use.com API (no local browser needed)
### Integration with Hermes
**Primary path: Custom Hermes tool** wrapping `browser-use` as a high-level
"automated browsing" capability alongside the existing `browser_tool.py`
(low-level, agent-controlled) tools.
**Why a separate tool rather than replacing browser_tool.py:**
- Hermes's existing browser tools (navigate, snapshot, click, type) give the
LLM fine-grained step-by-step control — this is valuable for interactive
tasks and debugging.
- browser-use gives coarse-grained "do this task for me" autonomy — better
for multi-step extraction workflows where the LLM would otherwise need
10+ tool calls.
- Both modes have legitimate use cases. Offer both.
**Integration architecture:**
```
hermes-agent
tools/
browser_tool.py # Existing — low-level agent-controlled browsing
browser_use_tool.py # NEW — high-level autonomous browsing (PoC)
|
+-- browser_use.run() # Wraps browser-use Agent class
+-- browser_use.extract() # Wraps browser-use for data extraction
```
The tool registers with `tools/registry.py` as toolset `browser_use` with
a `check_fn` that verifies `browser-use` is installed.
**Alternative: MCP server** — browser-use could also be exposed as an MCP
server for multi-agent setups where subagents need independent browser
access. This is a follow-up, not the initial integration.
### Dependencies and Requirements
```
pip install browser-use # Core library
playwright install chromium # Playwright browser binary
```
Or use cloud mode with `BROWSER_USE_API_KEY` — no local browser needed.
Python 3.11+, Playwright. No exotic system dependencies beyond what
Hermes already requires for its existing browser tool.
### Security Considerations
| Concern | Mitigation |
|----------------------------|---------------------------------------------------------|
| Arbitrary URL access | Reuse Hermes's `website_policy` and `url_safety` modules |
| Data exfiltration | Browser-use agents run in isolated Playwright contexts; no access to Hermes filesystem |
| Prompt injection via page | browser-use feeds page content to LLM — same risk as existing browser_snapshot; already handled by Hermes prompt hardening |
| Credential leakage | Do not pass API keys to untrusted pages; cloud mode keeps credentials server-side |
| Resource exhaustion | Set max_steps on browser-use Agent to prevent infinite loops |
| Downloaded files | Playwright download path is sandboxed; tool should restrict to temp directory |
**Key security property:** browser-use executes within Playwright's sandboxed
browser context. The LLM controlling browser-use is Hermes itself (or a
configured auxiliary model), not the page content. This is equivalent to the
existing browser tool's security model.
### Performance Characteristics
- **Startup:** ~2-3s for Playwright Chromium launch (same as existing local mode)
- **Per-step:** ~1-3s per LLM call + browser action (comparable to manual
browser_navigate + browser_snapshot loop)
- **Full task (5-10 steps):** ~15-45s depending on page complexity
- **Token usage:** Each step sends the accessibility tree to the LLM.
Browser-use supports vision mode (screenshots) which is more token-heavy.
- **Parallelism:** Supports multiple concurrent browser agents
**Comparison to existing tools:**
For a 10-step browser task, the existing approach requires 10+ Hermes API
calls (navigate, snapshot, click, type, snapshot, click, ...). Browser-use
consolidates this into a single Hermes tool call that internally runs its
own LLM loop. This reduces Hermes API round-trips but shifts the LLM cost
to browser-use's internal model calls.
### Recommendation: INTEGRATE
Browser Use fills a clear gap — autonomous multi-step browser tasks — that
complements Hermes's existing fine-grained browser tools. The integration
is straightforward (Python library, same security model). A PoC tool is
provided in `tools/browser_use_tool.py`.
---
## 2. Graphify
### What It Does
Graphify is a knowledge graph extraction tool that processes unstructured
text (including web content) and extracts entities, relationships, and
structured knowledge into a graph format. It can:
- Extract entities and relationships from text using NLP/LLM techniques
- Build knowledge graphs from web-scraped content
- Support incremental graph updates as new content is processed
- Export graphs in standard formats (JSON-LD, RDF, etc.)
(Note: "Graphify" as a project name is used by several tools. The most
relevant for browser integration is the concept of extracting structured
knowledge graphs from web content during or after browsing.)
### Integration with Hermes
**Primary path: MCP server or Hermes tool** that takes web content (from
browser_tool or web_extract) and produces structured knowledge graphs.
**Integration architecture:**
```
hermes-agent
tools/
graphify_tool.py # NEW — knowledge graph extraction from text
|
+-- graphify.extract() # Extract entities/relations from text
+-- graphify.merge() # Merge into existing graph
+-- graphify.query() # Query the accumulated graph
```
Or via MCP:
```
hermes-agent --mcp-server graphify-mcp
-> tools: graphify_extract, graphify_query, graphify_export
```
**Synergy with browser tools:**
1. `browser_navigate` + `browser_snapshot` to get page content
2. `graphify_extract` to pull entities and relationships
3. Repeat across multiple pages to build a domain knowledge graph
4. `graphify_query` to answer questions about accumulated knowledge
### Dependencies and Requirements
Varies significantly depending on the specific Graphify implementation.
Typical requirements:
- Python 3.11+
- spaCy or similar NLP library for entity extraction
- Optional: Neo4j or NetworkX for graph storage
- LLM access (can reuse Hermes's existing model configuration)
### Security Considerations
| Concern | Mitigation |
|----------------------------|---------------------------------------------------------|
| Processing untrusted text | NLP extraction is read-only; no code execution |
| Graph data persistence | Store in Hermes's data directory with appropriate permissions |
| Information aggregation | Knowledge graphs could accumulate sensitive data; provide clear/delete commands |
| External graph DB access | If using Neo4j, require authentication and restrict to localhost |
### Performance Characteristics
- **Extraction:** ~0.5-2s per page depending on content length and NLP model
- **Graph operations:** Sub-second for graphs under 100K nodes
- **Storage:** Lightweight (JSON/SQLite) for small graphs, Neo4j for large-scale
- **Token usage:** If using LLM-based extraction, ~500-2000 tokens per page
### Recommendation: INVESTIGATE FURTHER
The concept is sound — knowledge graph extraction from web content is a
natural complement to browser tools. However:
1. **Multiple competing tools** exist under this name; need to identify the
best-maintained option
2. **Value proposition unclear** vs. Hermes's existing memory system and
file-based knowledge storage
3. **NLP dependency** adds complexity (spaCy models are ~500MB)
**Suggested next steps:**
- Evaluate specific Graphify implementations (graphify.ai, custom NLP pipelines)
- Prototype with a lightweight approach: LLM-based entity extraction + NetworkX
- Assess whether Hermes's existing memory/graph_store.py can serve this role
---
## 3. Multica
### What It Does
Multica is a multi-agent browser coordination framework. It enables multiple
AI agents to collaboratively browse the web, with features for:
- Task decomposition: splitting complex web tasks across multiple agents
- Shared browser state: agents see a common view of browsing progress
- Coordination protocols: agents can communicate about what they've found
- Parallel web research: multiple agents researching different aspects simultaneously
### Integration with Hermes
**Theoretical path:** Multica would integrate as a higher-level orchestration
layer on top of Hermes's existing browser tools, coordinating multiple
Hermes subagents (via `delegate_tool`) each with browser access.
**Integration architecture:**
```
hermes-agent (orchestrator)
delegate_tool -> subagent_1 (browser_navigate, browser_snapshot, ...)
delegate_tool -> subagent_2 (browser_navigate, browser_snapshot, ...)
delegate_tool -> subagent_3 (browser_navigate, browser_snapshot, ...)
|
+-- Multica coordination layer (shared state, task splitting)
```
### Dependencies and Requirements
- Complex multi-agent orchestration infrastructure
- Shared state management between agents
- Potentially a custom runtime for agent coordination
- Likely requires significant architectural changes to Hermes's delegation model
### Security Considerations
| Concern | Mitigation |
|----------------------------|---------------------------------------------------------|
| Multiple agents on same browser | Session isolation per agent (Hermes already does this) |
| Coordinated exfiltration | Same per-agent restrictions apply |
| Amplified prompt injection | Each agent processes its own pages independently |
| Resource multiplication | N agents = N browser instances = Nx resource usage |
### Performance Characteristics
- **Scaling:** Near-linear improvement for embarrassingly parallel tasks
(e.g., "research 10 companies simultaneously")
- **Overhead:** Significant coordination overhead for tightly coupled tasks
- **Resource cost:** Each agent needs its own LLM calls + browser instance
- **Complexity:** Debugging multi-agent browser workflows is extremely difficult
### Recommendation: SKIP (for now)
Multica addresses a real need (parallel web research) but is premature for
Hermes for several reasons:
1. **Hermes already has subagent delegation** (`delegate_tool`) — agents can
already do parallel browser work without Multica
2. **No mature implementation** — Multica is more of a concept than a
production-ready tool
3. **Complexity vs. benefit** — the coordination overhead and debugging
difficulty outweigh the benefits for most use cases
4. **Better alternatives exist** — for parallel research, simply delegating
multiple subagents with browser tools is simpler and already works
**Revisit when:** Hermes's delegation model supports shared state between
subagents, or a mature Multica implementation emerges.
---
## Integration Roadmap
### Phase 1: Browser Use PoC (this PR)
- [x] Create `tools/browser_use_tool.py` wrapping browser-use as Hermes tool
- [x] Create `docs/browser-integration-analysis.md` (this document)
- [ ] Test with real browser tasks
- [ ] Add to toolset configuration
### Phase 2: Browser Use Production (follow-up)
- [ ] Add `browser_use` to `toolsets.py` toolset definitions
- [ ] Add configuration options in `config.yaml`
- [ ] Add tests in `tests/test_browser_use_tool.py`
- [ ] Consider MCP server variant for subagent use
### Phase 3: Graphify Investigation (follow-up)
- [ ] Evaluate specific Graphify implementations
- [ ] Prototype lightweight LLM-based entity extraction tool
- [ ] Assess integration with existing `graph_store.py`
- [ ] Create PoC if investigation is positive
### Phase 4: Multi-Agent Browser (future)
- [ ] Monitor Multica ecosystem maturity
- [ ] Evaluate when delegation model supports shared state
- [ ] Consider simpler parallel delegation patterns first
---
## Appendix: Existing Browser Stack
Hermes already has a comprehensive browser tool stack:
| Component | Description |
|-----------------------|--------------------------------------------------|
| `browser_tool.py` | Low-level agent-controlled browser (navigate, click, type, snapshot) |
| `browser_camofox.py` | Anti-detection browser via Camofox REST API |
| `browser_providers/` | Cloud providers (Browserbase, Browser Use API, Firecrawl) |
| `web_tools.py` | Web search (Parallel) and extraction (Firecrawl) |
| `mcp_tool.py` | MCP client for connecting external tool servers |
The existing stack covers:
- **Local browsing:** Headless Chromium via agent-browser CLI
- **Cloud browsing:** Browserbase, Browser Use cloud, Firecrawl
- **Anti-detection:** Camofox (local) or Browserbase advanced stealth
- **Content extraction:** Firecrawl for clean markdown extraction
- **Search:** Parallel AI web search
New browser integrations should complement rather than replace these tools.

271
docs/matrix-setup.md Normal file
View File

@@ -0,0 +1,271 @@
# Matrix Integration Setup Guide
Connect Hermes Agent to any Matrix homeserver for sovereign, encrypted messaging.
## Prerequisites
- Python 3.10+
- matrix-nio SDK: `pip install "matrix-nio[e2e]"`
- For E2EE: libolm C library (see below)
## Option A: matrix.org Public Homeserver (Testing)
Best for quick evaluation. No server to run.
### 1. Create a Matrix Account
Go to https://app.element.io and create an account on matrix.org.
Choose a username like `@hermes-bot:matrix.org`.
### 2. Get an Access Token
The recommended auth method. Token avoids storing passwords and survives
password changes.
```bash
# Using curl (replace user/password):
curl -X POST 'https://matrix-client.matrix.org/_matrix/client/v3/login' \
-H 'Content-Type: application/json' \
-d '{
"type": "m.login.password",
"user": "your-bot-username",
"password": "your-password"
}'
```
Look for `access_token` and `device_id` in the response.
Alternatively, in Element: Settings -> Help & About -> Advanced -> Access Token.
### 3. Set Environment Variables
Add to `~/.hermes/.env`:
```bash
MATRIX_HOMESERVER=https://matrix-client.matrix.org
MATRIX_ACCESS_TOKEN=syt_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
MATRIX_USER_ID=@hermes-bot:matrix.org
MATRIX_DEVICE_ID=HERMES_BOT
```
### 4. Install Dependencies
```bash
pip install "matrix-nio[e2e]"
```
### 5. Start Hermes Gateway
```bash
hermes gateway
```
## Option B: Self-Hosted Homeserver (Sovereignty)
For full control over your data and encryption keys.
### Popular Homeservers
- **Synapse** (reference impl): https://github.com/element-hq/synapse
- **Conduit** (lightweight, Rust): https://conduit.rs
- **Dendrite** (Go): https://github.com/matrix-org/dendrite
### 1. Deploy Your Homeserver
Follow your chosen server's documentation. Common setup with Docker:
```bash
# Synapse example:
docker run -d --name synapse \
-v /opt/synapse/data:/data \
-e SYNAPSE_SERVER_NAME=your.domain.com \
-e SYNAPSE_REPORT_STATS=no \
matrixdotorg/synapse:latest
```
### 2. Create Bot Account
Register on your homeserver:
```bash
# Synapse: register new user (run inside container)
docker exec -it synapse register_new_matrix_user http://localhost:8008 \
-c /data/homeserver.yaml -u hermes-bot -p 'secure-password' --admin
```
### 3. Configure Hermes
Set in `~/.hermes/.env`:
```bash
MATRIX_HOMESERVER=https://matrix.your.domain.com
MATRIX_ACCESS_TOKEN=<obtain via login API>
MATRIX_USER_ID=@hermes-bot:your.domain.com
MATRIX_DEVICE_ID=HERMES_BOT
```
## Environment Variables Reference
| Variable | Required | Description |
|----------|----------|-------------|
| `MATRIX_HOMESERVER` | Yes | Homeserver URL (e.g. `https://matrix.org`) |
| `MATRIX_ACCESS_TOKEN` | Yes* | Access token (preferred over password) |
| `MATRIX_USER_ID` | With password | Full user ID (`@user:server`) |
| `MATRIX_PASSWORD` | Alt* | Password (alternative to token) |
| `MATRIX_DEVICE_ID` | Recommended | Stable device ID for E2EE persistence |
| `MATRIX_ENCRYPTION` | No | Set `true` to enable E2EE |
| `MATRIX_ALLOWED_USERS` | No | Comma-separated allowed user IDs |
| `MATRIX_HOME_ROOM` | No | Room ID for cron/notifications |
| `MATRIX_REACTIONS` | No | Enable processing reactions (default: true) |
| `MATRIX_REQUIRE_MENTION` | No | Require @mention in rooms (default: true) |
| `MATRIX_FREE_RESPONSE_ROOMS` | No | Room IDs exempt from mention requirement |
| `MATRIX_AUTO_THREAD` | No | Auto-create threads (default: true) |
\* Either `MATRIX_ACCESS_TOKEN` or `MATRIX_USER_ID` + `MATRIX_PASSWORD` is required.
## Config YAML Entries
Add to `~/.hermes/config.yaml` under a `matrix:` key for declarative settings:
```yaml
matrix:
require_mention: true
free_response_rooms:
- "!roomid1:matrix.org"
- "!roomid2:matrix.org"
auto_thread: true
```
These override to env vars only if the env var is not already set.
## End-to-End Encryption (E2EE)
E2EE protects messages so only participants can read them. Hermes uses
matrix-nio's Olm/Megolm implementation.
### 1. Install E2EE Dependencies
```bash
# macOS
brew install libolm
# Ubuntu/Debian
sudo apt install libolm-dev
# Then install matrix-nio with E2EE support:
pip install "matrix-nio[e2e]"
```
### 2. Enable Encryption
Set in `~/.hermes/.env`:
```bash
MATRIX_ENCRYPTION=true
MATRIX_DEVICE_ID=HERMES_BOT
```
### 3. How It Works
- On first connect, Hermes creates a device and uploads encryption keys.
- Keys are stored in `~/.hermes/platforms/matrix/store/`.
- On shutdown, Megolm session keys are exported to `exported_keys.txt`.
- On next startup, keys are imported so the bot can decrypt old messages.
- The `MATRIX_DEVICE_ID` ensures the bot reuses the same device identity
across restarts. Without it, each restart creates a new "device" in
Matrix and old keys become unusable.
### 4. Verifying E2EE
1. Create an encrypted room in Element.
2. Invite your bot user.
3. Send a message — the bot should respond.
4. Check logs: `grep -i "e2ee\|crypto\|encrypt" ~/.hermes/logs/gateway.log`
## Room Configuration
### Inviting the Bot
1. Create a room in Element or any Matrix client.
2. Invite the bot: `/invite @hermes-bot:your.domain.com`
3. The bot auto-accepts invites (controlled by `MATRIX_ALLOWED_USERS`).
### Home Room
Set `MATRIX_HOME_ROOM` to a room ID for cron jobs and notifications:
```bash
MATRIX_HOME_ROOM=!abcde12345:matrix.org
```
### Free-Response Rooms
Rooms where the bot responds to all messages without @mention:
```bash
MATRIX_FREE_RESPONSE_ROOMS=!room1:matrix.org,!room2:matrix.org
```
Or in config.yaml:
```yaml
matrix:
free_response_rooms:
- "!room1:matrix.org"
```
## Troubleshooting
### "Matrix: need MATRIX_ACCESS_TOKEN or MATRIX_USER_ID + MATRIX_PASSWORD"
Neither auth method is configured. Set `MATRIX_ACCESS_TOKEN` in `~/.hermes/.env`
or provide `MATRIX_USER_ID` + `MATRIX_PASSWORD`.
### "Matrix: whoami failed"
The access token is invalid or expired. Generate a new one via the login API.
### "Matrix: E2EE dependencies are missing"
Install libolm and matrix-nio with E2EE support:
```bash
brew install libolm # macOS
pip install "matrix-nio[e2e]"
```
### "Matrix: login failed"
- Check username and password.
- Ensure the account exists on the target homeserver.
- Some homeservers require admin approval for new registrations.
### Bot Not Responding in Rooms
1. Check `MATRIX_REQUIRE_MENTION` — if `true` (default), messages must
@mention the bot.
2. Check `MATRIX_ALLOWED_USERS` — if set, only listed users can interact.
3. Check logs: `tail -f ~/.hermes/logs/gateway.log`
### E2EE Rooms Show "Unable to Decrypt"
1. Ensure `MATRIX_DEVICE_ID` is set to a stable value.
2. Check that `~/.hermes/platforms/matrix/store/` has read/write permissions.
3. Verify libolm is installed: `python -c "from nio.crypto import ENCRYPTION_ENABLED; print(ENCRYPTION_ENABLED)"`
### Slow Message Delivery
Matrix federation can add latency. For faster responses:
- Use the same homeserver for the bot and users.
- Set `MATRIX_HOME_ROOM` to a local room.
- Check network connectivity between Hermes and the homeserver.
## Quick Start (Automated)
Run the interactive setup script:
```bash
python scripts/setup_matrix.py
```
This guides you through homeserver selection, authentication, and verification.

View File

@@ -248,6 +248,25 @@ class HolographicMemoryProvider(MemoryProvider):
self._store.add_fact(content, category=category)
except Exception as e:
logger.debug("Holographic memory_write mirror failed: %s", e)
elif action == "remove" and self._store and content:
try:
# Search for matching facts and lower trust so they decay naturally
facts = self._store.search_facts(content, limit=5)
for fact in facts:
self._store.update_fact(fact["fact_id"], trust_delta=-0.4)
logger.debug(
"Holographic remove: decayed trust for fact %s: %s",
fact["fact_id"], fact["content"][:60],
)
except Exception as e:
logger.debug("Holographic memory_write remove failed: %s", e)
elif action == "replace" and self._store and content:
try:
# Re-add the new content as a fresh fact
category = "user_pref" if target == "user" else "general"
self._store.add_fact(content, category=category)
except Exception as e:
logger.debug("Holographic memory_write replace failed: %s", e)
def shutdown(self) -> None:
self._store = None

View File

@@ -6086,12 +6086,16 @@ class AIAgent:
store=self._memory_store,
)
# Bridge: notify external memory provider of built-in memory writes
if self._memory_manager and function_args.get("action") in ("add", "replace"):
if self._memory_manager and function_args.get("action") in ("add", "replace", "remove"):
try:
# For remove, use old_text as the searchable content
bridge_content = function_args.get("content", "")
if not bridge_content and function_args.get("action") == "remove":
bridge_content = function_args.get("old_text", "")
self._memory_manager.on_memory_write(
function_args.get("action", ""),
target,
function_args.get("content", ""),
bridge_content,
)
except Exception:
pass

430
scripts/setup_matrix.py Executable file
View File

@@ -0,0 +1,430 @@
#!/usr/bin/env python3
"""Interactive Matrix setup wizard for Hermes Agent.
Guides you through configuring Matrix integration:
- Homeserver URL
- Token auth or password auth
- Device ID generation
- Config/env file writing
- Optional: test room creation and message send
- E2EE verification
Usage:
python scripts/setup_matrix.py
"""
import getpass
import json
import os
import secrets
import sys
import urllib.error
import urllib.request
from pathlib import Path
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _hermes_home() -> Path:
"""Resolve ~/.hermes (or HERMES_HOME override)."""
return Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
def _prompt(msg: str, default: str = "") -> str:
"""Prompt with optional default. Returns stripped input or default."""
suffix = f" [{default}]" if default else ""
val = input(f"{msg}{suffix}: ").strip()
return val or default
def _prompt_bool(msg: str, default: bool = True) -> bool:
"""Yes/no prompt."""
d = "Y/n" if default else "y/N"
val = input(f"{msg} [{d}]: ").strip().lower()
if not val:
return default
return val in ("y", "yes")
def _http_post_json(url: str, data: dict, timeout: int = 15) -> dict:
"""POST JSON and return parsed response. Raises on HTTP errors."""
body = json.dumps(data).encode()
req = urllib.request.Request(
url,
data=body,
headers={"Content-Type": "application/json"},
method="POST",
)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
return json.loads(resp.read())
except urllib.error.HTTPError as exc:
detail = exc.read().decode(errors="replace")
raise RuntimeError(f"HTTP {exc.code}: {detail}") from exc
except urllib.error.URLError as exc:
raise RuntimeError(f"Connection error: {exc.reason}") from exc
def _http_get_json(url: str, token: str = "", timeout: int = 15) -> dict:
"""GET JSON, optionally with Bearer auth."""
req = urllib.request.Request(url, method="GET")
if token:
req.add_header("Authorization", f"Bearer {token}")
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
return json.loads(resp.read())
except urllib.error.HTTPError as exc:
detail = exc.read().decode(errors="replace")
raise RuntimeError(f"HTTP {exc.code}: {detail}") from exc
except urllib.error.URLError as exc:
raise RuntimeError(f"Connection error: {exc.reason}") from exc
def _write_env_file(env_path: Path, vars: dict) -> None:
"""Write/update ~/.hermes/.env with given variables."""
existing: dict[str, str] = {}
if env_path.exists():
for line in env_path.read_text().splitlines():
line = line.strip()
if line and not line.startswith("#") and "=" in line:
k, v = line.split("=", 1)
existing[k.strip()] = v.strip().strip("'\"")
existing.update(vars)
lines = ["# Hermes Agent environment variables"]
for k, v in sorted(existing.items()):
# Quote values with spaces or special chars
if any(c in v for c in " \t#\"'$"):
lines.append(f'{k}="{v}"')
else:
lines.append(f"{k}={v}")
env_path.parent.mkdir(parents=True, exist_ok=True)
env_path.write_text("\n".join(lines) + "\n")
try:
os.chmod(str(env_path), 0o600)
except (OSError, NotImplementedError):
pass
print(f" -> Wrote {len(vars)} vars to {env_path}")
def _write_config_yaml(config_path: Path, matrix_section: dict) -> None:
"""Add/update matrix: section in config.yaml (creates file if needed)."""
try:
import yaml
except ImportError:
print(" [!] PyYAML not installed — skipping config.yaml update.")
print(" Add manually under 'matrix:' key.")
return
config: dict = {}
if config_path.exists():
try:
config = yaml.safe_load(config_path.read_text()) or {}
except Exception:
config = {}
config["matrix"] = matrix_section
config_path.parent.mkdir(parents=True, exist_ok=True)
config_path.write_text(yaml.dump(config, default_flow_style=False, sort_keys=False))
try:
os.chmod(str(config_path), 0o600)
except (OSError, NotImplementedError):
pass
print(f" -> Updated matrix section in {config_path}")
def _generate_device_id() -> str:
"""Generate a stable, human-readable device ID."""
return f"HERMES_{secrets.token_hex(4).upper()}"
# ---------------------------------------------------------------------------
# Login flows
# ---------------------------------------------------------------------------
def login_with_token(homeserver: str) -> dict:
"""Validate an existing access token via whoami."""
token = getpass.getpass("Access token (hidden): ").strip()
if not token:
print(" [!] Token cannot be empty.")
sys.exit(1)
whoami_url = f"{homeserver}/_matrix/client/v3/account/whoami"
print(" Validating token...")
resp = _http_get_json(whoami_url, token=token)
user_id = resp.get("user_id", "")
device_id = resp.get("device_id", "")
print(f" Authenticated as: {user_id}")
if device_id:
print(f" Server device ID: {device_id}")
return {
"MATRIX_ACCESS_TOKEN": token,
"MATRIX_USER_ID": user_id,
}
def login_with_password(homeserver: str) -> dict:
"""Login with username + password, get access token."""
user_id = _prompt("Full user ID (e.g. @bot:matrix.org)")
if not user_id:
print(" [!] User ID cannot be empty.")
sys.exit(1)
password = getpass.getpass("Password (hidden): ").strip()
if not password:
print(" [!] Password cannot be empty.")
sys.exit(1)
login_url = f"{homeserver}/_matrix/client/v3/login"
print(" Logging in...")
resp = _http_post_json(login_url, {
"type": "m.login.password",
"identifier": {
"type": "m.id.user",
"user": user_id,
},
"password": password,
"device_name": "Hermes Agent",
})
access_token = resp.get("access_token", "")
device_id = resp.get("device_id", "")
resolved_user = resp.get("user_id", user_id)
if not access_token:
print(" [!] Login succeeded but no access_token in response.")
sys.exit(1)
print(f" Authenticated as: {resolved_user}")
if device_id:
print(f" Device ID: {device_id}")
return {
"MATRIX_ACCESS_TOKEN": access_token,
"MATRIX_USER_ID": resolved_user,
"_server_device_id": device_id,
}
# ---------------------------------------------------------------------------
# Test room + message
# ---------------------------------------------------------------------------
def create_test_room(homeserver: str, token: str) -> str | None:
"""Create a private test room and return the room ID."""
create_url = f"{homeserver}/_matrix/client/v3/createRoom"
try:
resp = _http_post_json(create_url, {
"name": "Hermes Test Room",
"topic": "Auto-created by hermes setup_matrix.py — safe to delete",
"preset": "private_chat",
"visibility": "private",
}, timeout=30)
# Set auth header manually (createRoom needs proper auth)
room_id = resp.get("room_id", "")
if room_id:
print(f" Created test room: {room_id}")
return room_id
except Exception:
pass
# Fallback: use curl-style with auth
req = urllib.request.Request(
create_url,
data=json.dumps({
"name": "Hermes Test Room",
"topic": "Auto-created by hermes setup_matrix.py — safe to delete",
"preset": "private_chat",
"visibility": "private",
}).encode(),
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {token}",
},
method="POST",
)
try:
with urllib.request.urlopen(req, timeout=30) as resp:
data = json.loads(resp.read())
room_id = data.get("room_id", "")
if room_id:
print(f" Created test room: {room_id}")
return room_id
except Exception as exc:
print(f" [!] Room creation failed: {exc}")
return None
def send_test_message(homeserver: str, token: str, room_id: str) -> bool:
"""Send a test message to a room. Returns True on success."""
txn_id = secrets.token_hex(8)
url = (
f"{homeserver}/_matrix/client/v3/rooms/"
f"{urllib.request.quote(room_id, safe='')}/send/m.room.message/{txn_id}"
)
req = urllib.request.Request(
url,
data=json.dumps({
"msgtype": "m.text",
"body": "Hermes Agent setup verified successfully!",
}).encode(),
headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {token}",
},
method="PUT",
)
try:
with urllib.request.urlopen(req, timeout=15) as resp:
data = json.loads(resp.read())
event_id = data.get("event_id", "")
if event_id:
print(f" Test message sent: {event_id}")
return True
except Exception as exc:
print(f" [!] Test message failed: {exc}")
return False
def check_e2ee_support() -> bool:
"""Check if E2EE dependencies are available."""
try:
import nio
from nio.crypto import ENCRYPTION_ENABLED
return bool(ENCRYPTION_ENABLED)
except (ImportError, AttributeError):
return False
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def main():
print("=" * 60)
print(" Hermes Agent — Matrix Setup Wizard")
print("=" * 60)
print()
# -- Homeserver --
print("Step 1: Homeserver")
print(" A) matrix.org (public, for testing)")
print(" B) Custom homeserver (self-hosted)")
choice = _prompt("Choose [A/B]", "A").upper()
if choice == "B":
homeserver = _prompt("Homeserver URL (e.g. https://matrix.example.com)")
if not homeserver:
print(" [!] Homeserver URL is required.")
sys.exit(1)
else:
homeserver = "https://matrix-client.matrix.org"
homeserver = homeserver.rstrip("/")
print(f" Using: {homeserver}")
print()
# -- Authentication --
print("Step 2: Authentication")
print(" A) Access token (recommended)")
print(" B) Username + password")
auth_choice = _prompt("Choose [A/B]", "A").upper()
if auth_choice == "B":
auth_vars = login_with_password(homeserver)
else:
auth_vars = login_with_token(homeserver)
print()
# -- Device ID --
print("Step 3: Device ID (for E2EE persistence)")
server_device = auth_vars.pop("_server_device_id", "")
default_device = server_device or _generate_device_id()
device_id = _prompt("Device ID", default_device)
auth_vars["MATRIX_DEVICE_ID"] = device_id
print()
# -- E2EE --
print("Step 4: End-to-End Encryption")
e2ee_available = check_e2ee_support()
if e2ee_available:
enable_e2ee = _prompt_bool("Enable E2EE?", default=False)
if enable_e2ee:
auth_vars["MATRIX_ENCRYPTION"] = "true"
print(" E2EE enabled. Keys will be stored in:")
print(" ~/.hermes/platforms/matrix/store/")
else:
print(" E2EE dependencies not found. Skipping.")
print(" To enable later: pip install 'matrix-nio[e2e]'")
print()
# -- Optional settings --
print("Step 5: Optional Settings")
allowed = _prompt("Allowed user IDs (comma-separated, or empty for all)")
if allowed:
auth_vars["MATRIX_ALLOWED_USERS"] = allowed
home_room = _prompt("Home room ID for notifications (or empty)")
if home_room:
auth_vars["MATRIX_HOME_ROOM"] = home_room
require_mention = _prompt_bool("Require @mention in rooms?", default=True)
auto_thread = _prompt_bool("Auto-create threads?", default=True)
print()
# -- Write files --
print("Step 6: Writing Configuration")
hermes_home = _hermes_home()
env_path = hermes_home / ".env"
_write_env_file(env_path, auth_vars)
config_path = hermes_home / "config.yaml"
matrix_cfg = {
"require_mention": require_mention,
"auto_thread": auto_thread,
}
_write_config_yaml(config_path, matrix_cfg)
print()
# -- Verify connection --
print("Step 7: Verification")
token = auth_vars.get("MATRIX_ACCESS_TOKEN", "")
do_test = _prompt_bool("Create test room and send message?", default=True)
if do_test and token:
room_id = create_test_room(homeserver, token)
if room_id:
send_test_message(homeserver, token, room_id)
print()
# -- Summary --
print("=" * 60)
print(" Setup Complete!")
print("=" * 60)
print()
print(" Config written to:")
print(f" {env_path}")
print(f" {config_path}")
print()
print(" To start the Matrix gateway:")
print(" hermes gateway --platform matrix")
print()
if not e2ee_available:
print(" To enable E2EE later:")
print(" pip install 'matrix-nio[e2e]'")
print(" Then set MATRIX_ENCRYPTION=true in .env")
print()
print(" Docs: docs/matrix-setup.md")
print()
if __name__ == "__main__":
main()

572
tools/browser_use_tool.py Normal file
View File

@@ -0,0 +1,572 @@
#!/usr/bin/env python3
"""
Browser Use Tool Module
Proof-of-concept wrapper around the browser-use Python library for
LLM-driven autonomous browser automation. This complements Hermes's
existing low-level browser_tool.py (navigate/snapshot/click/type) by
providing a high-level "do this task for me" capability.
Where browser_tool.py gives the LLM fine-grained control (each click is
a separate tool call), browser_use_tool.py lets the LLM describe a task
in natural language and have browser-use autonomously execute the steps.
Usage:
from tools.browser_use_tool import browser_use_run, browser_use_extract
# Run an autonomous browser task
result = browser_use_run(
task="Find the top 3 stories on Hacker News and return their titles",
max_steps=15,
)
# Extract structured data from a URL
data = browser_use_extract(
url="https://example.com/pricing",
instruction="Extract all pricing tiers with their names, prices, and features",
)
Integration notes:
- Requires: pip install browser-use
- Optional: BROWSER_USE_API_KEY for cloud mode (no local Playwright needed)
- Falls back to local Playwright Chromium when no API key is set
- Uses the same url_safety and website_policy checks as browser_tool.py
"""
import json
import logging
import os
import tempfile
from typing import Any, Dict, Optional
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Security: URL validation (reuse existing modules)
# ---------------------------------------------------------------------------
try:
from tools.url_safety import is_safe_url as _is_safe_url
except Exception:
_is_safe_url = lambda url: False # noqa: E731 — fail-closed
try:
from tools.website_policy import check_website_access
except Exception:
check_website_access = lambda url: None # noqa: E731 — fail-open
def _validate_url(url: str) -> Optional[str]:
"""Validate a URL for safety and policy compliance.
Returns None if OK, or an error message string if blocked.
"""
if not url or not url.strip():
return "URL cannot be empty"
url = url.strip()
if not _is_safe_url(url):
return f"URL blocked by safety policy: {url}"
try:
check_website_access(url)
except Exception as e:
return f"URL blocked by website policy: {e}"
return None
# ---------------------------------------------------------------------------
# Availability check
# ---------------------------------------------------------------------------
_browser_use_available: Optional[bool] = None
def _check_browser_use_available() -> bool:
"""Check if browser-use library is installed and usable."""
global _browser_use_available
if _browser_use_available is not None:
return _browser_use_available
try:
import browser_use # noqa: F401
_browser_use_available = True
except ImportError:
_browser_use_available = False
return _browser_use_available
# ---------------------------------------------------------------------------
# Core functions
# ---------------------------------------------------------------------------
def browser_use_run(
task: str,
max_steps: int = 25,
model: str = None,
url: str = None,
use_vision: bool = False,
) -> str:
"""Run an autonomous browser task using browser-use.
Args:
task: Natural language description of what to do in the browser.
max_steps: Maximum number of autonomous steps before stopping.
model: LLM model for browser-use's internal agent (default: from env).
url: Optional starting URL. If provided, navigates there first.
use_vision: Whether to use screenshots for visual context.
Returns:
JSON string with task result, final page content, and metadata.
"""
if not _check_browser_use_available():
return json.dumps({
"error": "browser-use library not installed. "
"Install with: pip install browser-use && playwright install chromium"
})
# Validate URL if provided
if url:
err = _validate_url(url)
if err:
return json.dumps({"error": err})
# Resolve model
if not model:
model = os.getenv("BROWSER_USE_MODEL", "").strip() or None
try:
import asyncio
from browser_use import Agent, Browser, BrowserConfig
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
return asyncio.run(
_run_browser_use_agent(
task=task,
max_steps=max_steps,
model=model,
url=url,
use_vision=use_vision,
)
)
except ImportError as e:
return json.dumps({
"error": f"Missing dependency: {e}. "
"Install with: pip install browser-use langchain-openai langchain-anthropic"
})
except Exception as e:
logger.exception("browser_use_run failed")
return json.dumps({"error": f"Browser use failed: {type(e).__name__}: {e}"})
async def _run_browser_use_agent(
task: str,
max_steps: int,
model: Optional[str],
url: Optional[str],
use_vision: bool,
) -> str:
"""Async implementation of browser_use_run."""
from browser_use import Agent, Browser, BrowserConfig
# Build LLM
llm = _resolve_langchain_llm(model)
if isinstance(llm, str):
# Error message returned
return llm
# Configure browser
browser_config = BrowserConfig(
headless=True,
)
# Build the task string with optional starting URL
full_task = task
if url:
full_task = f"Start by navigating to {url}. Then: {task}"
# Create agent
agent = Agent(
task=full_task,
llm=llm,
browser=Browser(config=browser_config),
use_vision=use_vision,
max_actions_per_step=5,
)
# Run with step limit
result = await agent.run(max_steps=max_steps)
# Extract results
final_url = ""
final_content = ""
steps_taken = 0
if hasattr(result, "all_results") and result.all_results:
steps_taken = len(result.all_results)
last = result.all_results[-1]
if hasattr(last, "extracted_content"):
final_content = last.extracted_content or ""
if hasattr(last, "url"):
final_url = last.url or ""
# Get the final content from the agent's history
if hasattr(result, "final_result"):
final_content = result.final_result or final_content
return json.dumps({
"success": True,
"task": task,
"result": final_content,
"final_url": final_url,
"steps_taken": steps_taken,
"max_steps": max_steps,
}, indent=2)
def browser_use_extract(
url: str,
instruction: str = "Extract all meaningful content from this page",
max_steps: int = 15,
model: str = None,
) -> str:
"""Navigate to a URL and extract structured data using browser-use.
This is a convenience wrapper that combines navigation + extraction
into a single tool call.
Args:
url: The URL to extract data from.
instruction: What to extract (e.g., "Extract all pricing tiers").
max_steps: Maximum browser steps.
model: LLM model for browser-use agent.
Returns:
JSON string with extracted data.
"""
err = _validate_url(url)
if err:
return json.dumps({"error": err})
task = (
f"Navigate to {url}. {instruction}. "
f"Return the extracted data in a structured format. "
f"When done, use the 'done' action to finish."
)
return browser_use_run(
task=task,
max_steps=max_steps,
model=model,
url=url,
)
def browser_use_compare(
urls: list,
instruction: str = "Compare the content on these pages",
max_steps: int = 25,
model: str = None,
) -> str:
"""Visit multiple URLs and compare their content.
Args:
urls: List of URLs to visit and compare.
instruction: What to compare (e.g., "Compare pricing plans").
max_steps: Maximum browser steps.
model: LLM model for browser-use agent.
Returns:
JSON string with comparison results.
"""
if not urls or not isinstance(urls, list):
return json.dumps({"error": "urls must be a non-empty list"})
# Validate all URLs
for u in urls:
err = _validate_url(u)
if err:
return json.dumps({"error": f"URL validation failed for {u}: {err}"})
url_list = "\n".join(f" {i+1}. {u}" for i, u in enumerate(urls))
task = (
f"Visit each of these URLs and compare them:\n{url_list}\n\n"
f"Comparison task: {instruction}\n\n"
f"Visit each URL one by one, extract relevant information, "
f"then provide a structured comparison. Use the 'done' action when finished."
)
return browser_use_run(
task=task,
max_steps=max_steps,
model=model,
url=urls[0],
)
# ---------------------------------------------------------------------------
# LLM resolution helpers
# ---------------------------------------------------------------------------
def _resolve_langchain_llm(model: Optional[str]):
"""Build a LangChain LLM from a model string or environment.
Supports OpenAI and Anthropic models. Returns the LLM instance or
an error message string on failure.
"""
if not model:
# Auto-detect from available API keys
if os.getenv("ANTHROPIC_API_KEY"):
model = "claude-sonnet-4-20250514"
elif os.getenv("OPENAI_API_KEY"):
model = "gpt-4o"
else:
return json.dumps({
"error": "No LLM model configured for browser-use. "
"Set BROWSER_USE_MODEL, ANTHROPIC_API_KEY, or OPENAI_API_KEY."
})
model_lower = model.lower()
if "claude" in model_lower or "anthropic" in model_lower:
try:
from langchain_anthropic import ChatAnthropic
api_key = os.getenv("ANTHROPIC_API_KEY", "")
if not api_key:
return json.dumps({"error": "ANTHROPIC_API_KEY not set"})
return ChatAnthropic(
model=model,
api_key=api_key,
timeout=60,
stop=None,
)
except ImportError:
return json.dumps({
"error": "langchain-anthropic not installed. "
"Install: pip install langchain-anthropic"
})
# Default to OpenAI-compatible
try:
from langchain_openai import ChatOpenAI
api_key = os.getenv("OPENAI_API_KEY", "")
base_url = os.getenv("OPENAI_BASE_URL", None)
if not api_key:
return json.dumps({"error": "OPENAI_API_KEY not set"})
kwargs = {
"model": model,
"api_key": api_key,
"timeout": 60,
}
if base_url:
kwargs["base_url"] = base_url
return ChatOpenAI(**kwargs)
except ImportError:
return json.dumps({
"error": "langchain-openai not installed. "
"Install: pip install langchain-openai"
})
# ---------------------------------------------------------------------------
# Schema definitions
# ---------------------------------------------------------------------------
BROWSER_USE_RUN_SCHEMA = {
"name": "browser_use_run",
"description": (
"Run an autonomous browser task using AI-driven browser automation. "
"Describe what you want to accomplish in natural language, and browser-use "
"will autonomously navigate, click, type, and extract data to complete it. "
"Best for multi-step tasks like 'find X on website Y' or 'fill out this form'. "
"For simple single-page extraction, prefer web_extract (faster). "
"For fine-grained step-by-step control, use browser_navigate/snapshot/click/type."
),
"parameters": {
"type": "object",
"properties": {
"task": {
"type": "string",
"description": "Natural language description of the browser task to perform"
},
"max_steps": {
"type": "integer",
"description": "Maximum number of autonomous steps (default: 25)",
"default": 25,
},
"model": {
"type": "string",
"description": "LLM model for the browser-use agent (default: auto-detect from available API keys)",
},
"url": {
"type": "string",
"description": "Optional starting URL to navigate to before beginning the task",
},
"use_vision": {
"type": "boolean",
"description": "Use screenshots for visual context (more token-heavy, default: false)",
"default": False,
},
},
"required": ["task"],
},
}
BROWSER_USE_EXTRACT_SCHEMA = {
"name": "browser_use_extract",
"description": (
"Navigate to a URL and extract structured data using autonomous browser automation. "
"Specify what to extract in natural language. This is a convenience wrapper that "
"combines navigation + extraction into a single call."
),
"parameters": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The URL to navigate to and extract data from"
},
"instruction": {
"type": "string",
"description": "What to extract (e.g., 'Extract all pricing tiers with prices and features')",
"default": "Extract all meaningful content from this page",
},
"max_steps": {
"type": "integer",
"description": "Maximum number of browser steps (default: 15)",
"default": 15,
},
"model": {
"type": "string",
"description": "LLM model for the browser-use agent",
},
},
"required": ["url"],
},
}
BROWSER_USE_COMPARE_SCHEMA = {
"name": "browser_use_compare",
"description": (
"Visit multiple URLs and compare their content using autonomous browser automation. "
"Specify what to compare in natural language. The agent will visit each URL, "
"extract relevant data, and produce a structured comparison."
),
"parameters": {
"type": "object",
"properties": {
"urls": {
"type": "array",
"items": {"type": "string"},
"description": "List of URLs to visit and compare"
},
"instruction": {
"type": "string",
"description": "What to compare (e.g., 'Compare pricing plans and features')",
"default": "Compare the content on these pages",
},
"max_steps": {
"type": "integer",
"description": "Maximum number of browser steps (default: 25)",
"default": 25,
},
"model": {
"type": "string",
"description": "LLM model for the browser-use agent",
},
},
"required": ["urls"],
},
}
# ---------------------------------------------------------------------------
# Handlers
# ---------------------------------------------------------------------------
def _handle_browser_use_run(args: dict, **kw) -> str:
return browser_use_run(
task=args.get("task", ""),
max_steps=args.get("max_steps", 25),
model=args.get("model"),
url=args.get("url"),
use_vision=args.get("use_vision", False),
)
def _handle_browser_use_extract(args: dict, **kw) -> str:
return browser_use_extract(
url=args.get("url", ""),
instruction=args.get("instruction", "Extract all meaningful content from this page"),
max_steps=args.get("max_steps", 15),
model=args.get("model"),
)
def _handle_browser_use_compare(args: dict, **kw) -> str:
return browser_use_compare(
urls=args.get("urls", []),
instruction=args.get("instruction", "Compare the content on these pages"),
max_steps=args.get("max_steps", 25),
model=args.get("model"),
)
# ---------------------------------------------------------------------------
# Module test
# ---------------------------------------------------------------------------
if __name__ == "__main__":
print("Browser Use Tool Module")
print("=" * 40)
if _check_browser_use_available():
print("browser-use library: installed")
else:
print("browser-use library: NOT installed")
print(" Install: pip install browser-use && playwright install chromium")
# Check API keys
if os.getenv("ANTHROPIC_API_KEY"):
print("ANTHROPIC_API_KEY: set")
elif os.getenv("OPENAI_API_KEY"):
print("OPENAI_API_KEY: set")
else:
print("No LLM API keys found (need ANTHROPIC_API_KEY or OPENAI_API_KEY)")
if os.getenv("BROWSER_USE_API_KEY"):
print("BROWSER_USE_API_KEY: set (cloud mode available)")
else:
print("BROWSER_USE_API_KEY: not set (local Playwright mode)")
# ---------------------------------------------------------------------------
# Registry
# ---------------------------------------------------------------------------
from tools.registry import registry
registry.register(
name="browser_use_run",
toolset="browser_use",
schema=BROWSER_USE_RUN_SCHEMA,
handler=_handle_browser_use_run,
check_fn=_check_browser_use_available,
emoji="🤖",
)
registry.register(
name="browser_use_extract",
toolset="browser_use",
schema=BROWSER_USE_EXTRACT_SCHEMA,
handler=_handle_browser_use_extract,
check_fn=_check_browser_use_available,
emoji="🔍",
)
registry.register(
name="browser_use_compare",
toolset="browser_use",
schema=BROWSER_USE_COMPARE_SCHEMA,
handler=_handle_browser_use_compare,
check_fn=_check_browser_use_available,
emoji="⚖️",
)