feat: add documentation website (Docusaurus)

- 25 documentation pages covering Getting Started, User Guide, Developer Guide, and Reference
- Docusaurus with custom amber/gold theme matching the landing page branding
- GitHub Actions workflow to deploy landing page + docs to GitHub Pages
- Landing page at root, docs at /docs/ on hermes-agent.nousresearch.com
- Content extracted and restructured from existing repo docs (README, AGENTS.md, CONTRIBUTING.md, docs/)
- Auto-deploy on push to main when website/ or landingpage/ changes
This commit is contained in:
teknium1
2026-03-05 05:24:55 -08:00
parent 1708dcd2b2
commit ada3713e77
45 changed files with 22822 additions and 0 deletions

View File

@@ -0,0 +1,8 @@
{
"label": "User Guide",
"position": 2,
"link": {
"type": "generated-index",
"description": "Learn how to use Hermes Agent effectively."
}
}

View File

@@ -0,0 +1,268 @@
---
sidebar_position: 1
title: "CLI Interface"
description: "Master the Hermes Agent terminal interface — commands, keybindings, personalities, and more"
---
# CLI Interface
Hermes Agent's CLI is a full terminal user interface (TUI) — not a web UI. It features multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output. Built for people who live in the terminal.
## Running the CLI
```bash
# Start an interactive session (default)
hermes
# Single query mode (non-interactive)
hermes chat -q "Hello"
# With a specific model
hermes --model "anthropic/claude-sonnet-4"
# With a specific provider
hermes --provider nous # Use Nous Portal
hermes --provider openrouter # Force OpenRouter
# With specific toolsets
hermes --toolsets "web,terminal,skills"
# Resume previous sessions
hermes --continue # Resume the most recent CLI session (-c)
hermes --resume <session_id> # Resume a specific session by ID (-r)
# Verbose mode (debug output)
hermes --verbose
```
## Interface Layout
```text
┌─────────────────────────────────────────────────┐
│ HERMES-AGENT ASCII Logo │
│ ┌─────────────┐ ┌────────────────────────────┐ │
│ │ Caduceus │ │ Model: claude-sonnet-4 │ │
│ │ ASCII Art │ │ Terminal: local │ │
│ │ │ │ Working Dir: /home/user │ │
│ │ │ │ Available Tools: 19 │ │
│ │ │ │ Available Skills: 12 │ │
│ └─────────────┘ └────────────────────────────┘ │
├─────────────────────────────────────────────────┤
│ Conversation output scrolls here... │
│ │
│ (◕‿◕✿) 🧠 pondering... (2.3s) │
│ ✧٩(ˊᗜˋ*)و✧ got it! (2.3s) │
│ │
│ Assistant: Hello! How can I help you today? │
├─────────────────────────────────────────────────┤
[Fixed input area at bottom] │
└─────────────────────────────────────────────────┘
```
The welcome banner shows your model, terminal backend, working directory, available tools, and installed skills at a glance.
## Keybindings
| Key | Action |
|-----|--------|
| `Enter` | Send message |
| `Alt+Enter` or `Ctrl+J` | New line (multi-line input) |
| `Ctrl+C` | Interrupt agent (double-press within 2s to force exit) |
| `Ctrl+D` | Exit |
| `Tab` | Autocomplete slash commands |
## Slash Commands
Type `/` to see an autocomplete dropdown of all available commands.
### Navigation & Control
| Command | Description |
|---------|-------------|
| `/help` | Show available commands |
| `/quit` | Exit the CLI (also: `/exit`, `/q`) |
| `/clear` | Clear screen and reset conversation |
| `/new` | Start a new conversation |
| `/reset` | Reset conversation only (keep screen) |
### Tools & Configuration
| Command | Description |
|---------|-------------|
| `/tools` | List all available tools grouped by toolset |
| `/toolsets` | List available toolsets with descriptions |
| `/model [name]` | Show or change the current model |
| `/config` | Show current configuration |
| `/prompt [text]` | View/set/clear custom system prompt |
| `/personality [name]` | Set a predefined personality |
### Conversation Management
| Command | Description |
|---------|-------------|
| `/history` | Show conversation history |
| `/retry` | Retry the last message |
| `/undo` | Remove the last user/assistant exchange |
| `/save` | Save the current conversation |
| `/compress` | Manually compress conversation context |
| `/usage` | Show token usage for this session |
### Skills & Scheduling
| Command | Description |
|---------|-------------|
| `/cron` | Manage scheduled tasks |
| `/skills` | Search, install, inspect, or manage skills |
| `/platforms` | Show gateway/messaging platform status |
| `/verbose` | Cycle tool progress display: off → new → all → verbose |
| `/<skill-name>` | Invoke any installed skill (e.g., `/axolotl`, `/gif-search`) |
:::tip
Commands are case-insensitive — `/HELP` works the same as `/help`. Most commands work mid-conversation.
:::
## Skill Slash Commands
Every installed skill in `~/.hermes/skills/` is automatically registered as a slash command. The skill name becomes the command:
```
/gif-search funny cats
/axolotl help me fine-tune Llama 3 on my dataset
/github-pr-workflow create a PR for the auth refactor
# Just the skill name loads it and lets the agent ask what you need:
/excalidraw
```
## Personalities
Set a predefined personality to change the agent's tone:
```
/personality pirate
/personality kawaii
/personality concise
```
Built-in personalities include: `helpful`, `concise`, `technical`, `creative`, `teacher`, `kawaii`, `catgirl`, `pirate`, `shakespeare`, `surfer`, `noir`, `uwu`, `philosopher`, `hype`.
You can also define custom personalities in `~/.hermes/config.yaml`:
```yaml
agent:
personalities:
helpful: "You are a helpful, friendly AI assistant."
kawaii: "You are a kawaii assistant! Use cute expressions..."
pirate: "Arrr! Ye be talkin' to Captain Hermes..."
# Add your own!
```
## Multi-line Input
There are two ways to enter multi-line messages:
1. **`Alt+Enter` or `Ctrl+J`** — inserts a new line
2. **Backslash continuation** — end a line with `\` to continue:
```
Write a function that:\
1. Takes a list of numbers\
2. Returns the sum
```
:::info
Pasting 5+ lines of text automatically saves to `~/.hermes/pastes/` and collapses to a reference, keeping your prompt clean.
:::
## Interrupting the Agent
You can interrupt the agent at any point:
- **Type a new message + Enter** while the agent is working — it interrupts and processes your new instructions
- **`Ctrl+C`** — interrupt the current operation (press twice within 2s to force exit)
- In-progress terminal commands are killed immediately (SIGTERM, then SIGKILL after 1s)
- Multiple messages typed during interrupt are combined into one prompt
## Tool Progress Display
The CLI shows animated feedback as the agent works:
**Thinking animation** (during API calls):
```
◜ (。•́︿•̀。) pondering... (1.2s)
◠ (⊙_⊙) contemplating... (2.4s)
✧٩(ˊᗜˋ*)و✧ got it! (3.1s)
```
**Tool execution feed:**
```
┊ 💻 terminal `ls -la` (0.3s)
┊ 🔍 web_search (1.2s)
┊ 📄 web_extract (2.1s)
```
Cycle through display modes with `/verbose`: `off → new → all → verbose`.
## Session Management
### Resuming Sessions
When you exit a CLI session, a resume command is printed:
```
Resume this session with:
hermes --resume 20260225_143052_a1b2c3
Session: 20260225_143052_a1b2c3
Duration: 12m 34s
Messages: 28 (5 user, 18 tool calls)
```
Resume options:
```bash
hermes --continue # Resume the most recent CLI session
hermes -c # Short form
hermes --resume 20260225_143052_a1b2c3 # Resume a specific session by ID
hermes -r 20260225_143052_a1b2c3 # Short form
```
Resuming restores the full conversation history from SQLite. The agent sees all previous messages, tool calls, and responses — just as if you never left.
Use `hermes sessions list` to browse past sessions.
### Session Logging
Sessions are automatically logged to `~/.hermes/sessions/`:
```
sessions/
├── session_20260201_143052_a1b2c3.json
├── session_20260201_150217_d4e5f6.json
└── ...
```
### Context Compression
Long conversations are automatically summarized when approaching context limits:
```yaml
# In ~/.hermes/config.yaml
compression:
enabled: true
threshold: 0.85 # Compress at 85% of context limit
```
When compression triggers, middle turns are summarized while the first 3 and last 4 turns are always preserved.
## Quiet Mode
By default, the CLI runs in quiet mode which:
- Suppresses verbose logging from tools
- Enables kawaii-style animated feedback
- Keeps output clean and user-friendly
For debug output:
```bash
hermes --verbose
```

View File

@@ -0,0 +1,204 @@
---
sidebar_position: 2
title: "Configuration"
description: "Configure Hermes Agent — config.yaml, providers, models, API keys, and more"
---
# Configuration
All settings are stored in the `~/.hermes/` directory for easy access.
## Directory Structure
```text
~/.hermes/
├── config.yaml # Settings (model, terminal, TTS, compression, etc.)
├── .env # API keys and secrets
├── auth.json # OAuth provider credentials (Nous Portal, etc.)
├── SOUL.md # Optional: global persona (agent embodies this personality)
├── memories/ # Persistent memory (MEMORY.md, USER.md)
├── skills/ # Agent-created skills (managed via skill_manage tool)
├── cron/ # Scheduled jobs
├── sessions/ # Gateway sessions
└── logs/ # Logs (errors.log, gateway.log — secrets auto-redacted)
```
## Managing Configuration
```bash
hermes config # View current configuration
hermes config edit # Open config.yaml in your editor
hermes config set KEY VAL # Set a specific value
hermes config check # Check for missing options (after updates)
hermes config migrate # Interactively add missing options
# Examples:
hermes config set model anthropic/claude-opus-4
hermes config set terminal.backend docker
hermes config set OPENROUTER_API_KEY sk-or-... # Saves to .env
```
:::tip
The `hermes config set` command automatically routes values to the right file — API keys are saved to `.env`, everything else to `config.yaml`.
:::
## Configuration Precedence
Settings are resolved in this order (highest priority first):
1. **CLI arguments**`hermes chat --max-turns 100` (per-invocation override)
2. **`~/.hermes/config.yaml`** — the primary config file for all non-secret settings
3. **`~/.hermes/.env`** — fallback for env vars; **required** for secrets (API keys, tokens, passwords)
4. **Built-in defaults** — hardcoded safe defaults when nothing else is set
:::info Rule of Thumb
Secrets (API keys, bot tokens, passwords) go in `.env`. Everything else (model, terminal backend, compression settings, memory limits, toolsets) goes in `config.yaml`. When both are set, `config.yaml` wins for non-secret settings.
:::
## Inference Providers
You need at least one way to connect to an LLM. Use `hermes model` to switch providers and models interactively, or configure directly:
| Provider | Setup |
|----------|-------|
| **Nous Portal** | `hermes model` (OAuth, subscription-based) |
| **OpenAI Codex** | `hermes model` (ChatGPT OAuth, uses Codex models) |
| **OpenRouter** | `OPENROUTER_API_KEY` in `~/.hermes/.env` |
| **Custom Endpoint** | `OPENAI_BASE_URL` + `OPENAI_API_KEY` in `~/.hermes/.env` |
:::info Codex Note
The OpenAI Codex provider authenticates via device code (open a URL, enter a code). Credentials are stored at `~/.codex/auth.json` and auto-refresh. No Codex CLI installation required.
:::
:::warning
Even when using Nous Portal, Codex, or a custom endpoint, some tools (vision, web summarization, MoA) use OpenRouter independently. An `OPENROUTER_API_KEY` enables these tools.
:::
## Optional API Keys
| Feature | Provider | Env Variable |
|---------|----------|--------------|
| Web scraping | [Firecrawl](https://firecrawl.dev/) | `FIRECRAWL_API_KEY` |
| Browser automation | [Browserbase](https://browserbase.com/) | `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID` |
| Image generation | [FAL](https://fal.ai/) | `FAL_KEY` |
| Premium TTS voices | [ElevenLabs](https://elevenlabs.io/) | `ELEVENLABS_API_KEY` |
| OpenAI TTS + voice transcription | [OpenAI](https://platform.openai.com/api-keys) | `VOICE_TOOLS_OPENAI_KEY` |
| RL Training | [Tinker](https://tinker-console.thinkingmachines.ai/) + [WandB](https://wandb.ai/) | `TINKER_API_KEY`, `WANDB_API_KEY` |
| Cross-session user modeling | [Honcho](https://honcho.dev/) | `HONCHO_API_KEY` |
## OpenRouter Provider Routing
When using OpenRouter, you can control how requests are routed across providers. Add a `provider_routing` section to `~/.hermes/config.yaml`:
```yaml
provider_routing:
sort: "throughput" # "price" (default), "throughput", or "latency"
# only: ["anthropic"] # Only use these providers
# ignore: ["deepinfra"] # Skip these providers
# order: ["anthropic", "google"] # Try providers in this order
# require_parameters: true # Only use providers that support all request params
# data_collection: "deny" # Exclude providers that may store/train on data
```
**Shortcuts:** Append `:nitro` to any model name for throughput sorting (e.g., `anthropic/claude-sonnet-4:nitro`), or `:floor` for price sorting.
## Terminal Backend Configuration
Configure which environment the agent uses for terminal commands:
```yaml
terminal:
backend: local # or: docker, ssh, singularity, modal
cwd: "." # Working directory ("." = current dir)
timeout: 180 # Command timeout in seconds
```
See [Code Execution](features/code-execution.md) and the [Terminal section of the README](features/tools.md) for details on each backend.
## Memory Configuration
```yaml
memory:
memory_enabled: true
user_profile_enabled: true
memory_char_limit: 2200 # ~800 tokens
user_char_limit: 1375 # ~500 tokens
```
## Context Compression
```yaml
compression:
enabled: true
threshold: 0.85 # Compress at 85% of context limit
```
## Reasoning Effort
Control how much "thinking" the model does before responding:
```yaml
agent:
reasoning_effort: "xhigh" # xhigh (max), high, medium, low, minimal, none
```
Higher reasoning effort gives better results on complex tasks at the cost of more tokens and latency.
## TTS Configuration
```yaml
tts:
provider: "edge" # "edge" | "elevenlabs" | "openai"
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
elevenlabs:
voice_id: "pNInz6obpgDQGcFmaJgB"
model_id: "eleven_multilingual_v2"
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
```
## Display Settings
```yaml
display:
tool_progress: all # off | new | all | verbose
```
| Mode | What you see |
|------|-------------|
| `off` | Silent — just the final response |
| `new` | Tool indicator only when the tool changes |
| `all` | Every tool call with a short preview (default) |
| `verbose` | Full args, results, and debug logs |
## Context Files (SOUL.md, AGENTS.md)
Drop these files in your project directory and the agent automatically picks them up:
| File | Purpose |
|------|---------|
| `AGENTS.md` | Project-specific instructions, coding conventions |
| `SOUL.md` | Persona definition — the agent embodies this personality |
| `.cursorrules` | Cursor IDE rules (also detected) |
| `.cursor/rules/*.mdc` | Cursor rule files (also detected) |
- **AGENTS.md** is hierarchical: if subdirectories also have AGENTS.md, all are combined.
- **SOUL.md** checks cwd first, then `~/.hermes/SOUL.md` as a global fallback.
- All context files are capped at 20,000 characters with smart truncation.
## Working Directory
| Context | Default |
|---------|---------|
| **CLI (`hermes`)** | Current directory where you run the command |
| **Messaging gateway** | Home directory `~` (override with `MESSAGING_CWD`) |
| **Docker / Singularity / Modal / SSH** | User's home directory inside the container or remote machine |
Override the working directory:
```bash
# In ~/.hermes/.env or ~/.hermes/config.yaml:
MESSAGING_CWD=/home/myuser/projects # Gateway sessions
TERMINAL_CWD=/workspace # All terminal sessions
```

View File

@@ -0,0 +1,8 @@
{
"label": "Features",
"position": 4,
"link": {
"type": "generated-index",
"description": "Explore the powerful features of Hermes Agent."
}
}

View File

@@ -0,0 +1,51 @@
---
sidebar_position: 8
title: "Code Execution"
description: "Sandboxed Python execution with RPC tool access — collapse multi-step workflows into a single turn"
---
# Code Execution (Programmatic Tool Calling)
The `execute_code` tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn. The script runs in a sandboxed child process on the agent host, communicating via Unix domain socket RPC.
## How It Works
```python
# The agent can write scripts like:
from hermes_tools import web_search, web_extract
results = web_search("Python 3.13 features", limit=5)
for r in results["data"]["web"]:
content = web_extract([r["url"]])
# ... filter and process ...
print(summary)
```
**Available tools in sandbox:** `web_search`, `web_extract`, `read_file`, `write_file`, `search`, `patch`, `terminal` (foreground only).
## When the Agent Uses This
The agent uses `execute_code` when there are:
- **3+ tool calls** with processing logic between them
- Bulk data filtering or conditional branching
- Loops over results
The key benefit: intermediate tool results never enter the context window — only the final `print()` output comes back, dramatically reducing token usage.
## Security
:::danger Security Model
The child process runs with a **minimal environment**. API keys, tokens, and credentials are stripped entirely. The script accesses tools exclusively via the RPC channel — it cannot read secrets from environment variables.
:::
Only safe system variables (`PATH`, `HOME`, `LANG`, etc.) are passed through.
## Configuration
```yaml
# In ~/.hermes/config.yaml
code_execution:
timeout: 300 # Max seconds per script (default: 300)
max_tool_calls: 50 # Max tool calls per execution (default: 50)
```

View File

@@ -0,0 +1,87 @@
---
sidebar_position: 5
title: "Scheduled Tasks (Cron)"
description: "Schedule automated tasks with natural language — cron jobs, delivery options, and the gateway scheduler"
---
# Scheduled Tasks (Cron)
Schedule tasks to run automatically with natural language or cron expressions. The agent can self-schedule using the `schedule_cronjob` tool from any platform.
## Creating Scheduled Tasks
### In the CLI
Use the `/cron` slash command:
```
/cron add 30m "Remind me to check the build"
/cron add "every 2h" "Check server status"
/cron add "0 9 * * *" "Morning briefing"
/cron list
/cron remove <job_id>
```
### Through Natural Conversation
Simply ask the agent on any platform:
```
Every morning at 9am, check Hacker News for AI news and send me a summary on Telegram.
```
The agent will use the `schedule_cronjob` tool to set it up.
## How It Works
**Cron execution is handled by the gateway daemon.** The gateway ticks the scheduler every 60 seconds, running any due jobs in isolated agent sessions:
```bash
hermes gateway install # Install as system service (recommended)
hermes gateway # Or run in foreground
hermes cron list # View scheduled jobs
hermes cron status # Check if gateway is running
```
:::info
Even if no messaging platforms are configured, the gateway stays running for cron. A file lock prevents duplicate execution if multiple processes overlap.
:::
## Delivery Options
When scheduling jobs, you specify where the output goes:
| Option | Description |
|--------|-------------|
| `"origin"` | Back to where the job was created |
| `"local"` | Save to local files only |
| `"telegram"` | Telegram home channel |
| `"discord"` | Discord home channel |
| `"telegram:123456"` | Specific Telegram chat |
The agent knows your connected platforms and home channels — it'll choose sensible defaults.
## Schedule Formats
- **Relative:** `30m`, `2h`, `1d`
- **Human-readable:** `"every 2 hours"`, `"daily at 9am"`
- **Cron expressions:** `"0 9 * * *"` (standard 5-field cron syntax)
## Managing Jobs
```bash
# CLI commands
hermes cron list # View all scheduled jobs
hermes cron status # Check if the scheduler is running
# Slash commands (inside chat)
/cron list
/cron remove <job_id>
```
## Security
:::warning
Scheduled task prompts are scanned for instruction-override patterns (prompt injection). Jobs with suspicious content are blocked.
:::

View File

@@ -0,0 +1,60 @@
---
sidebar_position: 7
title: "Subagent Delegation"
description: "Spawn isolated child agents for parallel workstreams with delegate_task"
---
# Subagent Delegation
The `delegate_task` tool spawns child AIAgent instances with isolated context, restricted toolsets, and their own terminal sessions. Each child gets a fresh conversation and works independently — only its final summary enters the parent's context.
## Single Task
```python
delegate_task(
goal="Debug why tests fail",
context="Error: assertion in test_foo.py line 42",
toolsets=["terminal", "file"]
)
```
## Parallel Batch
Up to 3 concurrent subagents:
```python
delegate_task(tasks=[
{"goal": "Research topic A", "toolsets": ["web"]},
{"goal": "Research topic B", "toolsets": ["web"]},
{"goal": "Fix the build", "toolsets": ["terminal", "file"]}
])
```
## Key Properties
- Each subagent gets its **own terminal session** (separate from the parent)
- **Depth limit of 2** — no grandchildren
- Subagents **cannot** call: `delegate_task`, `clarify`, `memory`, `send_message`, `execute_code`
- **Interrupt propagation** — interrupting the parent interrupts all active children
- Only the final summary enters the parent's context, keeping token usage efficient
## Configuration
```yaml
# In ~/.hermes/config.yaml
delegation:
max_iterations: 50 # Max turns per child (default: 50)
default_toolsets: ["terminal", "file", "web"] # Default toolsets
```
## When to Use Delegation
Delegation is most useful when:
- You have **independent workstreams** that can run in parallel
- A subtask needs a **clean context** (e.g., debugging a long error trace without polluting the main conversation)
- You want to **fan out** research across multiple topics and collect summaries
:::tip
The agent handles delegation automatically based on the task complexity. You don't need to explicitly ask it to delegate — it will do so when it makes sense.
:::

View File

@@ -0,0 +1,182 @@
---
sidebar_position: 6
title: "Event Hooks"
description: "Run custom code at key lifecycle points — log activity, send alerts, post to webhooks"
---
# Event Hooks
The hooks system lets you run custom code at key points in the agent lifecycle — session creation, slash commands, each tool-calling step, and more. Hooks fire automatically during gateway operation without blocking the main agent pipeline.
## Creating a Hook
Each hook is a directory under `~/.hermes/hooks/` containing two files:
```
~/.hermes/hooks/
└── my-hook/
├── HOOK.yaml # Declares which events to listen for
└── handler.py # Python handler function
```
### HOOK.yaml
```yaml
name: my-hook
description: Log all agent activity to a file
events:
- agent:start
- agent:end
- agent:step
```
The `events` list determines which events trigger your handler. You can subscribe to any combination of events, including wildcards like `command:*`.
### handler.py
```python
import json
from datetime import datetime
from pathlib import Path
LOG_FILE = Path.home() / ".hermes" / "hooks" / "my-hook" / "activity.log"
async def handle(event_type: str, context: dict):
"""Called for each subscribed event. Must be named 'handle'."""
entry = {
"timestamp": datetime.now().isoformat(),
"event": event_type,
**context,
}
with open(LOG_FILE, "a") as f:
f.write(json.dumps(entry) + "\n")
```
**Handler rules:**
- Must be named `handle`
- Receives `event_type` (string) and `context` (dict)
- Can be `async def` or regular `def` — both work
- Errors are caught and logged, never crashing the agent
## Available Events
| Event | When it fires | Context keys |
|-------|---------------|--------------|
| `gateway:startup` | Gateway process starts | `platforms` (list of active platform names) |
| `session:start` | New messaging session created | `platform`, `user_id`, `session_id`, `session_key` |
| `session:reset` | User ran `/new` or `/reset` | `platform`, `user_id`, `session_key` |
| `agent:start` | Agent begins processing a message | `platform`, `user_id`, `session_id`, `message` |
| `agent:step` | Each iteration of the tool-calling loop | `platform`, `user_id`, `session_id`, `iteration`, `tool_names` |
| `agent:end` | Agent finishes processing | `platform`, `user_id`, `session_id`, `message`, `response` |
| `command:*` | Any slash command executed | `platform`, `user_id`, `command`, `args` |
### Wildcard Matching
Handlers registered for `command:*` fire for any `command:` event (`command:model`, `command:reset`, etc.). Monitor all slash commands with a single subscription.
## Examples
### Telegram Alert on Long Tasks
Send yourself a message when the agent takes more than 10 steps:
```yaml
# ~/.hermes/hooks/long-task-alert/HOOK.yaml
name: long-task-alert
description: Alert when agent is taking many steps
events:
- agent:step
```
```python
# ~/.hermes/hooks/long-task-alert/handler.py
import os
import httpx
THRESHOLD = 10
BOT_TOKEN = os.getenv("TELEGRAM_BOT_TOKEN")
CHAT_ID = os.getenv("TELEGRAM_HOME_CHANNEL")
async def handle(event_type: str, context: dict):
iteration = context.get("iteration", 0)
if iteration == THRESHOLD and BOT_TOKEN and CHAT_ID:
tools = ", ".join(context.get("tool_names", []))
text = f"⚠️ Agent has been running for {iteration} steps. Last tools: {tools}"
async with httpx.AsyncClient() as client:
await client.post(
f"https://api.telegram.org/bot{BOT_TOKEN}/sendMessage",
json={"chat_id": CHAT_ID, "text": text},
)
```
### Command Usage Logger
Track which slash commands are used:
```yaml
# ~/.hermes/hooks/command-logger/HOOK.yaml
name: command-logger
description: Log slash command usage
events:
- command:*
```
```python
# ~/.hermes/hooks/command-logger/handler.py
import json
from datetime import datetime
from pathlib import Path
LOG = Path.home() / ".hermes" / "logs" / "command_usage.jsonl"
def handle(event_type: str, context: dict):
LOG.parent.mkdir(parents=True, exist_ok=True)
entry = {
"ts": datetime.now().isoformat(),
"command": context.get("command"),
"args": context.get("args"),
"platform": context.get("platform"),
"user": context.get("user_id"),
}
with open(LOG, "a") as f:
f.write(json.dumps(entry) + "\n")
```
### Session Start Webhook
POST to an external service on new sessions:
```yaml
# ~/.hermes/hooks/session-webhook/HOOK.yaml
name: session-webhook
description: Notify external service on new sessions
events:
- session:start
- session:reset
```
```python
# ~/.hermes/hooks/session-webhook/handler.py
import httpx
WEBHOOK_URL = "https://your-service.example.com/hermes-events"
async def handle(event_type: str, context: dict):
async with httpx.AsyncClient() as client:
await client.post(WEBHOOK_URL, json={
"event": event_type,
**context,
}, timeout=5)
```
## How It Works
1. On gateway startup, `HookRegistry.discover_and_load()` scans `~/.hermes/hooks/`
2. Each subdirectory with `HOOK.yaml` + `handler.py` is loaded dynamically
3. Handlers are registered for their declared events
4. At each lifecycle point, `hooks.emit()` fires all matching handlers
5. Errors in any handler are caught and logged — a broken hook never crashes the agent
:::info
Hooks only fire in the **gateway** (Telegram, Discord, Slack, WhatsApp). The CLI does not currently load hooks.
:::

View File

@@ -0,0 +1,269 @@
---
sidebar_position: 4
title: "MCP (Model Context Protocol)"
description: "Connect Hermes Agent to external tool servers via MCP — databases, APIs, filesystems, and more"
---
# MCP (Model Context Protocol)
MCP lets Hermes Agent connect to external tool servers — giving the agent access to databases, APIs, filesystems, and more without any code changes.
## Overview
The [Model Context Protocol](https://modelcontextprotocol.io/) (MCP) is an open standard for connecting AI agents to external tools and data sources. MCP servers expose tools over a lightweight RPC protocol, and Hermes Agent can connect to any compliant server automatically.
What this means for you:
- **Thousands of ready-made tools** — browse the [MCP server directory](https://github.com/modelcontextprotocol/servers) for servers covering GitHub, Slack, databases, file systems, web scraping, and more
- **No code changes needed** — add a few lines to `~/.hermes/config.yaml` and the tools appear alongside built-in ones
- **Mix and match** — run multiple MCP servers simultaneously, combining stdio-based and HTTP-based servers
- **Secure by default** — environment variables are filtered and credentials are stripped from error messages
## Prerequisites
```bash
pip install hermes-agent[mcp]
```
| Server Type | Runtime Needed | Example |
|-------------|---------------|---------|
| HTTP/remote | Nothing extra | `url: "https://mcp.example.com"` |
| npm-based (npx) | Node.js 18+ | `command: "npx"` |
| Python-based | uv (recommended) | `command: "uvx"` |
## Configuration
MCP servers are configured in `~/.hermes/config.yaml` under the `mcp_servers` key.
### Stdio Servers
Stdio servers run as local subprocesses, communicating over stdin/stdout:
```yaml
mcp_servers:
filesystem:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/projects"]
env: {}
github:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-github"]
env:
GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_xxxxxxxxxxxx"
```
| Key | Required | Description |
|-----|----------|-------------|
| `command` | Yes | Executable to run (`npx`, `uvx`, `python`) |
| `args` | No | Command-line arguments |
| `env` | No | Environment variables for the subprocess |
:::info Security
Only explicitly listed `env` variables plus a safe baseline (`PATH`, `HOME`, `USER`, `LANG`, `SHELL`, `TMPDIR`, `XDG_*`) are passed to the subprocess. Your API keys and secrets are **not** leaked.
:::
### HTTP Servers
```yaml
mcp_servers:
remote_api:
url: "https://my-mcp-server.example.com/mcp"
headers:
Authorization: "Bearer sk-xxxxxxxxxxxx"
```
### Per-Server Timeouts
```yaml
mcp_servers:
slow_database:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-postgres"]
env:
DATABASE_URL: "postgres://user:pass@localhost/mydb"
timeout: 300 # Tool call timeout (default: 120s)
connect_timeout: 90 # Initial connection timeout (default: 60s)
```
### Mixed Configuration Example
```yaml
mcp_servers:
# Local filesystem via stdio
filesystem:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
# GitHub API via stdio with auth
github:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-github"]
env:
GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_xxxxxxxxxxxx"
# Remote database via HTTP
company_db:
url: "https://mcp.internal.company.com/db"
headers:
Authorization: "Bearer sk-xxxxxxxxxxxx"
timeout: 180
# Python-based server via uvx
memory:
command: "uvx"
args: ["mcp-server-memory"]
```
## Translating from Claude Desktop Config
Many MCP server docs show Claude Desktop JSON format. Here's the translation:
**Claude Desktop JSON:**
```json
{
"mcpServers": {
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
}
}
}
```
**Hermes YAML:**
```yaml
mcp_servers: # mcpServers → mcp_servers (snake_case)
filesystem:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
```
Rules: `mcpServers``mcp_servers` (snake_case), JSON → YAML. Keys like `command`, `args`, `env` are identical.
## How It Works
### Tool Registration
Each MCP tool is registered with a prefixed name:
```
mcp_{server_name}_{tool_name}
```
| Server Name | MCP Tool Name | Registered As |
|-------------|--------------|---------------|
| `filesystem` | `read_file` | `mcp_filesystem_read_file` |
| `github` | `create-issue` | `mcp_github_create_issue` |
| `my-api` | `query.data` | `mcp_my_api_query_data` |
Tools appear alongside built-in tools — the agent calls them like any other tool.
### Reconnection
If an MCP server disconnects, Hermes automatically reconnects with exponential backoff (1s, 2s, 4s, 8s, 16s — max 5 attempts). Initial connection failures are reported immediately.
### Shutdown
On agent exit, all MCP server connections are cleanly shut down.
## Popular MCP Servers
| Server | Package | Description |
|--------|---------|-------------|
| Filesystem | `@modelcontextprotocol/server-filesystem` | Read/write/search local files |
| GitHub | `@modelcontextprotocol/server-github` | Issues, PRs, repos, code search |
| Git | `@modelcontextprotocol/server-git` | Git operations on local repos |
| Fetch | `@modelcontextprotocol/server-fetch` | HTTP fetching and web content |
| Memory | `@modelcontextprotocol/server-memory` | Persistent key-value memory |
| SQLite | `@modelcontextprotocol/server-sqlite` | Query SQLite databases |
| PostgreSQL | `@modelcontextprotocol/server-postgres` | Query PostgreSQL databases |
| Brave Search | `@modelcontextprotocol/server-brave-search` | Web search via Brave API |
| Puppeteer | `@modelcontextprotocol/server-puppeteer` | Browser automation |
### Example Configs
```yaml
mcp_servers:
# No API key needed
filesystem:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/projects"]
git:
command: "uvx"
args: ["mcp-server-git", "--repository", "/home/user/my-repo"]
fetch:
command: "uvx"
args: ["mcp-server-fetch"]
sqlite:
command: "uvx"
args: ["mcp-server-sqlite", "--db-path", "/home/user/data.db"]
# Requires API key
github:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-github"]
env:
GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_xxxxxxxxxxxx"
brave_search:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-brave-search"]
env:
BRAVE_API_KEY: "BSA_xxxxxxxxxxxx"
```
## Troubleshooting
### "MCP SDK not available"
```bash
pip install hermes-agent[mcp]
```
### Server fails to start
The MCP server command (`npx`, `uvx`) is not on PATH. Install the required runtime:
```bash
# For npm-based servers
npm install -g npx # or ensure Node.js 18+ is installed
# For Python-based servers
pip install uv # then use "uvx" as the command
```
### Server connects but tools fail with auth errors
Ensure the key is in the server's `env` block:
```yaml
mcp_servers:
github:
command: "npx"
args: ["-y", "@modelcontextprotocol/server-github"]
env:
GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_your_actual_token" # Check this
```
### Connection timeout
Increase `connect_timeout` for slow-starting servers:
```yaml
mcp_servers:
slow_server:
command: "npx"
args: ["-y", "heavy-server-package"]
connect_timeout: 120 # default is 60
```
### Reload MCP Servers
You can reload MCP servers without restarting Hermes:
- In the CLI: the agent reconnects automatically
- In messaging: send `/reload-mcp`

View File

@@ -0,0 +1,97 @@
---
sidebar_position: 3
title: "Persistent Memory"
description: "How Hermes Agent remembers across sessions — MEMORY.md, USER.md, and session search"
---
# Persistent Memory
Hermes Agent has bounded, curated memory that persists across sessions. This lets it remember your preferences, your projects, your environment, and things it has learned.
## How It Works
Two files make up the agent's memory:
| File | Purpose | Token Budget |
|------|---------|-------------|
| **MEMORY.md** | Agent's personal notes — environment facts, conventions, things learned | ~800 tokens (~2200 chars) |
| **USER.md** | User profile — your preferences, communication style, expectations | ~500 tokens (~1375 chars) |
Both are stored in `~/.hermes/memories/` and are injected into the system prompt as a frozen snapshot at session start. The agent manages its own memory via the `memory` tool — it can add, replace, or remove entries.
:::info
Character limits keep memory focused. When memory is full, the agent consolidates or replaces entries to make room for new information.
:::
## Configuration
```yaml
# In ~/.hermes/config.yaml
memory:
memory_enabled: true
user_profile_enabled: true
memory_char_limit: 2200 # ~800 tokens
user_char_limit: 1375 # ~500 tokens
```
## Memory Tool Actions
The agent uses the `memory` tool with these actions:
- **add** — Add a new memory entry
- **replace** — Replace an existing entry with updated content
- **remove** — Remove an entry that's no longer relevant
- **read** — Read current memory contents
## Session Search
Beyond MEMORY.md and USER.md, the agent can search its past conversations using the `session_search` tool:
- All CLI and messaging sessions are stored in SQLite (`~/.hermes/state.db`) with FTS5 full-text search
- Search queries return relevant past conversations with Gemini Flash summarization
- The agent can find things it discussed weeks ago, even if they're not in its active memory
```bash
hermes sessions list # Browse past sessions
```
## Honcho Integration (Cross-Session User Modeling)
For deeper, AI-generated user understanding that works across tools, you can optionally enable [Honcho](https://honcho.dev/) by Plastic Labs. Honcho runs alongside existing memory — USER.md stays as-is, and Honcho adds an additional layer of context.
When enabled:
- **Prefetch**: Each turn, Honcho's user representation is injected into the system prompt
- **Sync**: After each conversation, messages are synced to Honcho
- **Query tool**: The agent can actively query its understanding of you via `query_user_context`
**Setup:**
```bash
# 1. Install the optional dependency
uv pip install honcho-ai
# 2. Get an API key from https://app.honcho.dev
# 3. Create ~/.honcho/config.json
cat > ~/.honcho/config.json << 'EOF'
{
"enabled": true,
"apiKey": "your-honcho-api-key",
"peerName": "your-name",
"hosts": {
"hermes": {
"workspace": "hermes"
}
}
}
EOF
```
Or via environment variable:
```bash
hermes config set HONCHO_API_KEY your-key
```
:::tip
Honcho is fully opt-in — zero behavior change when disabled or unconfigured. All Honcho calls are non-fatal; if the service is unreachable, the agent continues normally.
:::

View File

@@ -0,0 +1,159 @@
---
sidebar_position: 2
title: "Skills System"
description: "On-demand knowledge documents — progressive disclosure, agent-managed skills, and the Skills Hub"
---
# Skills System
Skills are on-demand knowledge documents the agent can load when needed. They follow a **progressive disclosure** pattern to minimize token usage and are compatible with the [agentskills.io](https://agentskills.io/specification) open standard.
All skills live in **`~/.hermes/skills/`** — a single directory that serves as the source of truth. On fresh install, bundled skills are copied from the repo. Hub-installed and agent-created skills also go here. The agent can modify or delete any skill.
## Using Skills
Every installed skill is automatically available as a slash command:
```bash
# In the CLI or any messaging platform:
/gif-search funny cats
/axolotl help me fine-tune Llama 3 on my dataset
/github-pr-workflow create a PR for the auth refactor
# Just the skill name loads it and lets the agent ask what you need:
/excalidraw
```
You can also interact with skills through natural conversation:
```bash
hermes --toolsets skills -q "What skills do you have?"
hermes --toolsets skills -q "Show me the axolotl skill"
```
## Progressive Disclosure
Skills use a token-efficient loading pattern:
```
Level 0: skills_categories() → ["mlops", "devops"] (~50 tokens)
Level 1: skills_list(category) → [{name, description}, ...] (~3k tokens)
Level 2: skill_view(name) → Full content + metadata (varies)
Level 3: skill_view(name, path) → Specific reference file (varies)
```
The agent only loads the full skill content when it actually needs it.
## SKILL.md Format
```markdown
---
name: my-skill
description: Brief description of what this skill does
version: 1.0.0
metadata:
hermes:
tags: [python, automation]
category: devops
---
# Skill Title
## When to Use
Trigger conditions for this skill.
## Procedure
1. Step one
2. Step two
## Pitfalls
- Known failure modes and fixes
## Verification
How to confirm it worked.
```
## Skill Directory Structure
```
~/.hermes/skills/ # Single source of truth
├── mlops/ # Category directory
│ ├── axolotl/
│ │ ├── SKILL.md # Main instructions (required)
│ │ ├── references/ # Additional docs
│ │ ├── templates/ # Output formats
│ │ └── assets/ # Supplementary files
│ └── vllm/
│ └── SKILL.md
├── devops/
│ └── deploy-k8s/ # Agent-created skill
│ ├── SKILL.md
│ └── references/
├── .hub/ # Skills Hub state
│ ├── lock.json
│ ├── quarantine/
│ └── audit.log
└── .bundled_manifest # Tracks seeded bundled skills
```
## Agent-Managed Skills (skill_manage tool)
The agent can create, update, and delete its own skills via the `skill_manage` tool. This is the agent's **procedural memory** — when it figures out a non-trivial workflow, it saves the approach as a skill for future reuse.
### When the Agent Creates Skills
- After completing a complex task (5+ tool calls) successfully
- When it hit errors or dead ends and found the working path
- When the user corrected its approach
- When it discovered a non-trivial workflow
### Actions
| Action | Use for | Key params |
|--------|---------|------------|
| `create` | New skill from scratch | `name`, `content` (full SKILL.md), optional `category` |
| `patch` | Targeted fixes (preferred) | `name`, `old_string`, `new_string` |
| `edit` | Major structural rewrites | `name`, `content` (full SKILL.md replacement) |
| `delete` | Remove a skill entirely | `name` |
| `write_file` | Add/update supporting files | `name`, `file_path`, `file_content` |
| `remove_file` | Remove a supporting file | `name`, `file_path` |
:::tip
The `patch` action is preferred for updates — it's more token-efficient than `edit` because only the changed text appears in the tool call.
:::
## Skills Hub
Search, install, and manage skills from online registries:
```bash
hermes skills search kubernetes # Search all sources
hermes skills install openai/skills/k8s # Install with security scan
hermes skills inspect openai/skills/k8s # Preview before installing
hermes skills list --source hub # List hub-installed skills
hermes skills audit # Re-scan all hub skills
hermes skills uninstall k8s # Remove a hub skill
hermes skills publish skills/my-skill --to github --repo owner/repo
hermes skills snapshot export setup.json # Export skill config
hermes skills tap add myorg/skills-repo # Add a custom source
```
All hub-installed skills go through a **security scanner** that checks for data exfiltration, prompt injection, destructive commands, and other threats.
### Trust Levels
| Level | Source | Policy |
|-------|--------|--------|
| `builtin` | Ships with Hermes | Always trusted |
| `trusted` | openai/skills, anthropics/skills | Trusted sources |
| `community` | Everything else | Any findings = blocked unless `--force` |
### Slash Commands (Inside Chat)
All the same commands work with `/skills` prefix:
```
/skills search kubernetes
/skills install openai/skills/skill-creator
/skills list
```

View File

@@ -0,0 +1,163 @@
---
sidebar_position: 1
title: "Tools & Toolsets"
description: "Overview of Hermes Agent's tools — what's available, how toolsets work, and terminal backends"
---
# Tools & Toolsets
Tools are functions that extend the agent's capabilities. They're organized into logical **toolsets** that can be enabled or disabled per platform.
## Available Tools
| Category | Tools | Description |
|----------|-------|-------------|
| **Web** | `web_search`, `web_extract`, `web_crawl` | Search the web, extract page content, crawl sites |
| **Terminal** | `terminal` | Execute commands (local/docker/singularity/modal/ssh backends) |
| **File** | `read_file`, `write_file`, `patch`, `search` | Read, write, edit, and search files |
| **Browser** | `browser_navigate`, `browser_click`, `browser_type`, etc. | Full browser automation via Browserbase |
| **Vision** | `vision_analyze` | Image analysis via multimodal models |
| **Image Gen** | `image_generate` | Generate images (FLUX via FAL) |
| **TTS** | `text_to_speech` | Text-to-speech (Edge TTS / ElevenLabs / OpenAI) |
| **Reasoning** | `mixture_of_agents` | Multi-model reasoning |
| **Skills** | `skills_list`, `skill_view`, `skill_manage` | Find, view, create, and manage skills |
| **Todo** | `todo` | Read/write task list for multi-step planning |
| **Memory** | `memory` | Persistent notes + user profile across sessions |
| **Session Search** | `session_search` | Search + summarize past conversations (FTS5) |
| **Cronjob** | `schedule_cronjob`, `list_cronjobs`, `remove_cronjob` | Scheduled task management |
| **Code Execution** | `execute_code` | Run Python scripts that call tools via RPC sandbox |
| **Delegation** | `delegate_task` | Spawn subagents with isolated context |
| **Clarify** | `clarify` | Ask the user multiple-choice or open-ended questions (CLI-only) |
| **MCP** | Auto-discovered | External tools from MCP servers |
## Using Toolsets
```bash
# Use specific toolsets
hermes --toolsets "web,terminal"
# See all available tools
hermes tools
# Configure tools per platform (interactive)
hermes tools
```
**Available toolsets:** `web`, `terminal`, `file`, `browser`, `vision`, `image_gen`, `moa`, `skills`, `tts`, `todo`, `memory`, `session_search`, `cronjob`, `code_execution`, `delegation`, `clarify`, and more.
## Terminal Backends
The terminal tool can execute commands in different environments:
| Backend | Description | Use Case |
|---------|-------------|----------|
| `local` | Run on your machine (default) | Development, trusted tasks |
| `docker` | Isolated containers | Security, reproducibility |
| `ssh` | Remote server | Sandboxing, keep agent away from its own code |
| `singularity` | HPC containers | Cluster computing, rootless |
| `modal` | Cloud execution | Serverless, scale |
### Configuration
```yaml
# In ~/.hermes/config.yaml
terminal:
backend: local # or: docker, ssh, singularity, modal
cwd: "." # Working directory
timeout: 180 # Command timeout in seconds
```
### Docker Backend
```yaml
terminal:
backend: docker
docker_image: python:3.11-slim
```
### SSH Backend
Recommended for security — agent can't modify its own code:
```yaml
terminal:
backend: ssh
```
```bash
# Set credentials in ~/.hermes/.env
TERMINAL_SSH_HOST=my-server.example.com
TERMINAL_SSH_USER=myuser
TERMINAL_SSH_KEY=~/.ssh/id_rsa
```
### Singularity/Apptainer
```bash
# Pre-build SIF for parallel workers
apptainer build ~/python.sif docker://python:3.11-slim
# Configure
hermes config set terminal.backend singularity
hermes config set terminal.singularity_image ~/python.sif
```
### Modal (Serverless Cloud)
```bash
uv pip install "swe-rex[modal]"
modal setup
hermes config set terminal.backend modal
```
### Container Resources
Configure CPU, memory, disk, and persistence for all container backends:
```yaml
terminal:
backend: docker # or singularity, modal
container_cpu: 1 # CPU cores (default: 1)
container_memory: 5120 # Memory in MB (default: 5GB)
container_disk: 51200 # Disk in MB (default: 50GB)
container_persistent: true # Persist filesystem across sessions (default: true)
```
When `container_persistent: true`, installed packages, files, and config survive across sessions.
### Container Security
All container backends run with security hardening:
- Read-only root filesystem (Docker)
- All Linux capabilities dropped
- No privilege escalation
- PID limits (256 processes)
- Full namespace isolation
- Persistent workspace via volumes, not writable root layer
## Background Process Management
Start background processes and manage them:
```python
terminal(command="pytest -v tests/", background=true)
# Returns: {"session_id": "proc_abc123", "pid": 12345}
# Then manage with the process tool:
process(action="list") # Show all running processes
process(action="poll", session_id="proc_abc123") # Check status
process(action="wait", session_id="proc_abc123") # Block until done
process(action="log", session_id="proc_abc123") # Full output
process(action="kill", session_id="proc_abc123") # Terminate
process(action="write", session_id="proc_abc123", data="y") # Send input
```
PTY mode (`pty=true`) enables interactive CLI tools like Codex and Claude Code.
## Sudo Support
If a command needs sudo, you'll be prompted for your password (cached for the session). Or set `SUDO_PASSWORD` in `~/.hermes/.env`.
:::warning
On messaging platforms, if sudo fails, the output includes a tip to add `SUDO_PASSWORD` to `~/.hermes/.env`.
:::

View File

@@ -0,0 +1,89 @@
---
sidebar_position: 9
title: "Voice & TTS"
description: "Text-to-speech and voice message transcription across all platforms"
---
# Voice & TTS
Hermes Agent supports both text-to-speech output and voice message transcription across all messaging platforms.
## Text-to-Speech
Convert text to speech with three providers:
| Provider | Quality | Cost | API Key |
|----------|---------|------|---------|
| **Edge TTS** (default) | Good | Free | None needed |
| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` |
### Platform Delivery
| Platform | Delivery | Format |
|----------|----------|--------|
| Telegram | Voice bubble (plays inline) | Opus `.ogg` |
| Discord | Audio file attachment | MP3 |
| WhatsApp | Audio file attachment | MP3 |
| CLI | Saved to `~/voice-memos/` | MP3 |
### Configuration
```yaml
# In ~/.hermes/config.yaml
tts:
provider: "edge" # "edge" | "elevenlabs" | "openai"
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
elevenlabs:
voice_id: "pNInz6obpgDQGcFmaJgB" # Adam
model_id: "eleven_multilingual_v2"
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
```
### Telegram Voice Bubbles & ffmpeg
Telegram voice bubbles require Opus/OGG audio format:
- **OpenAI and ElevenLabs** produce Opus natively — no extra setup
- **Edge TTS** (default) outputs MP3 and needs **ffmpeg** to convert:
```bash
# Ubuntu/Debian
sudo apt install ffmpeg
# macOS
brew install ffmpeg
# Fedora
sudo dnf install ffmpeg
```
Without ffmpeg, Edge TTS audio is sent as a regular audio file (playable, but shows as a rectangular player instead of a voice bubble).
:::tip
If you want voice bubbles without installing ffmpeg, switch to the OpenAI or ElevenLabs provider.
:::
## Voice Message Transcription
Voice messages sent on Telegram, Discord, WhatsApp, or Slack are automatically transcribed and injected as text into the conversation. The agent sees the transcript as normal text.
| Provider | Model | Quality | Cost |
|----------|-------|---------|------|
| **OpenAI Whisper** | `whisper-1` (default) | Good | Low |
| **OpenAI GPT-4o** | `gpt-4o-mini-transcribe` | Better | Medium |
| **OpenAI GPT-4o** | `gpt-4o-transcribe` | Best | Higher |
Requires `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`.
### Configuration
```yaml
# In ~/.hermes/config.yaml
stt:
enabled: true
model: "whisper-1"
```

View File

@@ -0,0 +1,8 @@
{
"label": "Messaging Gateway",
"position": 3,
"link": {
"type": "doc",
"id": "user-guide/messaging/index"
}
}

View File

@@ -0,0 +1,57 @@
---
sidebar_position: 3
title: "Discord"
description: "Set up Hermes Agent as a Discord bot"
---
# Discord Setup
Connect Hermes Agent to Discord to chat with it in DMs or server channels.
## Setup Steps
1. **Create a bot:** Go to the [Discord Developer Portal](https://discord.com/developers/applications)
2. **Enable intents:** Bot → Privileged Gateway Intents → enable **Message Content Intent**
3. **Get your user ID:** Enable Developer Mode in Discord settings, right-click your name → Copy ID
4. **Invite to your server:** OAuth2 → URL Generator → scopes: `bot`, `applications.commands` → permissions: Send Messages, Read Message History, Attach Files
5. **Configure:** Run `hermes gateway setup` and select Discord, or add to `~/.hermes/.env` manually:
```bash
DISCORD_BOT_TOKEN=MTIz...
DISCORD_ALLOWED_USERS=YOUR_USER_ID
```
6. **Start the gateway:**
```bash
hermes gateway
```
## Optional: Home Channel
Set a default channel for cron job delivery:
```bash
DISCORD_HOME_CHANNEL=123456789012345678
DISCORD_HOME_CHANNEL_NAME="#bot-updates"
```
Or use `/sethome` in any Discord channel.
## Required Bot Permissions
When generating the invite URL, make sure to include:
- **Send Messages** — bot needs to reply
- **Read Message History** — for context
- **Attach Files** — for audio, images, and file outputs
## Voice Messages
Voice messages on Discord are automatically transcribed (requires `VOICE_TOOLS_OPENAI_KEY`). TTS audio is sent as MP3 file attachments.
## Security
:::warning
Always set `DISCORD_ALLOWED_USERS` to restrict who can use the bot. Without it, the gateway denies all users by default.
:::

View File

@@ -0,0 +1,204 @@
---
sidebar_position: 1
title: "Messaging Gateway"
description: "Chat with Hermes from Telegram, Discord, Slack, or WhatsApp — architecture and setup overview"
---
# Messaging Gateway
Chat with Hermes from Telegram, Discord, Slack, or WhatsApp. The gateway is a single background process that connects to all your configured platforms, handles sessions, runs cron jobs, and delivers voice messages.
## Architecture
```text
┌─────────────────────────────────────────────────────────────────┐
│ Hermes Gateway │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Telegram │ │ Discord │ │ WhatsApp │ │ Slack │ │
│ │ Adapter │ │ Adapter │ │ Adapter │ │ Adapter │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │ │
│ └─────────────┼────────────┼─────────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Session Store │ │
│ │ (per-chat) │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ AIAgent │ │
│ │ (run_agent) │ │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
Each platform adapter receives messages, routes them through a per-chat session store, and dispatches them to the AIAgent for processing. The gateway also runs the cron scheduler, ticking every 60 seconds to execute any due jobs.
## Quick Setup
The easiest way to configure messaging platforms is the interactive wizard:
```bash
hermes gateway setup # Interactive setup for all messaging platforms
```
This walks you through configuring each platform with arrow-key selection, shows which platforms are already configured, and offers to start/restart the gateway when done.
## Gateway Commands
```bash
hermes gateway # Run in foreground
hermes gateway setup # Configure messaging platforms interactively
hermes gateway install # Install as systemd service (Linux) / launchd (macOS)
hermes gateway start # Start the service
hermes gateway stop # Stop the service
hermes gateway status # Check service status
```
## Chat Commands (Inside Messaging)
| Command | Description |
|---------|-------------|
| `/new` or `/reset` | Start fresh conversation |
| `/model [name]` | Show or change the model |
| `/personality [name]` | Set a personality |
| `/retry` | Retry the last message |
| `/undo` | Remove the last exchange |
| `/status` | Show session info |
| `/stop` | Stop the running agent |
| `/sethome` | Set this chat as the home channel |
| `/compress` | Manually compress conversation context |
| `/usage` | Show token usage for this session |
| `/reload-mcp` | Reload MCP servers from config |
| `/update` | Update Hermes Agent to the latest version |
| `/help` | Show available commands |
| `/<skill-name>` | Invoke any installed skill |
## Session Management
### Session Persistence
Sessions persist across messages until they reset. The agent remembers your conversation context.
### Reset Policies
Sessions reset based on configurable policies:
| Policy | Default | Description |
|--------|---------|-------------|
| Daily | 4:00 AM | Reset at a specific hour each day |
| Idle | 120 min | Reset after N minutes of inactivity |
| Both | (combined) | Whichever triggers first |
Configure per-platform overrides in `~/.hermes/gateway.json`:
```json
{
"reset_by_platform": {
"telegram": { "mode": "idle", "idle_minutes": 240 },
"discord": { "mode": "idle", "idle_minutes": 60 }
}
}
```
## Security
**By default, the gateway denies all users who are not in an allowlist or paired via DM.** This is the safe default for a bot with terminal access.
```bash
# Restrict to specific users (recommended):
TELEGRAM_ALLOWED_USERS=123456789,987654321
DISCORD_ALLOWED_USERS=123456789012345678
# Or explicitly allow all users (NOT recommended for bots with terminal access):
GATEWAY_ALLOW_ALL_USERS=true
```
### DM Pairing (Alternative to Allowlists)
Instead of manually configuring user IDs, unknown users receive a one-time pairing code when they DM the bot:
```bash
# The user sees: "Pairing code: XKGH5N7P"
# You approve them with:
hermes pairing approve telegram XKGH5N7P
# Other pairing commands:
hermes pairing list # View pending + approved users
hermes pairing revoke telegram 123456789 # Remove access
```
Pairing codes expire after 1 hour, are rate-limited, and use cryptographic randomness.
## Interrupting the Agent
Send any message while the agent is working to interrupt it. Key behaviors:
- **In-progress terminal commands are killed immediately** (SIGTERM, then SIGKILL after 1s)
- **Tool calls are cancelled** — only the currently-executing one runs, the rest are skipped
- **Multiple messages are combined** — messages sent during interruption are joined into one prompt
- **`/stop` command** — interrupts without queuing a follow-up message
## Tool Progress Notifications
Control how much tool activity is displayed in `~/.hermes/config.yaml`:
```yaml
display:
tool_progress: all # off | new | all | verbose
```
When enabled, the bot sends status messages as it works:
```text
💻 `ls -la`...
🔍 web_search...
📄 web_extract...
🐍 execute_code...
```
## Service Management
### Linux (systemd)
```bash
hermes gateway install # Install as user service
systemctl --user start hermes-gateway
systemctl --user stop hermes-gateway
systemctl --user status hermes-gateway
journalctl --user -u hermes-gateway -f
# Enable lingering (keeps running after logout)
sudo loginctl enable-linger $USER
```
### macOS (launchd)
```bash
hermes gateway install
launchctl start ai.hermes.gateway
launchctl stop ai.hermes.gateway
tail -f ~/.hermes/logs/gateway.log
```
## Platform-Specific Toolsets
Each platform has its own toolset:
| Platform | Toolset | Capabilities |
|----------|---------|--------------|
| CLI | `hermes-cli` | Full access |
| Telegram | `hermes-telegram` | Full tools including terminal |
| Discord | `hermes-discord` | Full tools including terminal |
| WhatsApp | `hermes-whatsapp` | Full tools including terminal |
| Slack | `hermes-slack` | Full tools including terminal |
## Next Steps
- [Telegram Setup](telegram.md)
- [Discord Setup](discord.md)
- [Slack Setup](slack.md)
- [WhatsApp Setup](whatsapp.md)

View File

@@ -0,0 +1,57 @@
---
sidebar_position: 4
title: "Slack"
description: "Set up Hermes Agent as a Slack bot"
---
# Slack Setup
Connect Hermes Agent to Slack using Socket Mode for real-time communication.
## Setup Steps
1. **Create an app:** Go to [Slack API](https://api.slack.com/apps), create a new app
2. **Enable Socket Mode:** In app settings → Socket Mode → Enable
3. **Get tokens:**
- Bot Token (`xoxb-...`): OAuth & Permissions → Install to Workspace
- App Token (`xapp-...`): Basic Information → App-Level Tokens → Generate (with `connections:write` scope)
4. **Configure:** Run `hermes gateway setup` and select Slack, or add to `~/.hermes/.env` manually:
```bash
SLACK_BOT_TOKEN=xoxb-...
SLACK_APP_TOKEN=xapp-...
SLACK_ALLOWED_USERS=U01234ABCDE # Comma-separated Slack user IDs
```
5. **Start the gateway:**
```bash
hermes gateway
```
## Optional: Home Channel
Set a default channel for cron job delivery:
```bash
SLACK_HOME_CHANNEL=C01234567890
```
## Required Bot Scopes
Make sure your Slack app has these OAuth scopes:
- `chat:write` — Send messages
- `channels:history` — Read channel messages
- `im:history` — Read DM messages
- `files:write` — Upload files (audio, images)
## Voice Messages
Voice messages on Slack are automatically transcribed (requires `VOICE_TOOLS_OPENAI_KEY`). TTS audio is sent as file attachments.
## Security
:::warning
Always set `SLACK_ALLOWED_USERS` to restrict who can use the bot. Without it, the gateway denies all users by default.
:::

View File

@@ -0,0 +1,74 @@
---
sidebar_position: 2
title: "Telegram"
description: "Set up Hermes Agent as a Telegram bot"
---
# Telegram Setup
Connect Hermes Agent to Telegram so you can chat from your phone, send voice memos, and receive scheduled task results.
## Setup Steps
1. **Create a bot:** Message [@BotFather](https://t.me/BotFather) on Telegram, use `/newbot`
2. **Get your user ID:** Message [@userinfobot](https://t.me/userinfobot) — it replies with your numeric ID
3. **Configure:** Run `hermes gateway setup` and select Telegram, or add to `~/.hermes/.env` manually:
```bash
TELEGRAM_BOT_TOKEN=123456:ABC-DEF...
TELEGRAM_ALLOWED_USERS=YOUR_USER_ID # Comma-separated for multiple users
```
4. **Start the gateway:**
```bash
hermes gateway
```
## Optional: Home Channel
Set a home channel for cron job delivery:
```bash
TELEGRAM_HOME_CHANNEL=-1001234567890
TELEGRAM_HOME_CHANNEL_NAME="My Notes"
```
Or use the `/sethome` command in any Telegram chat to set it dynamically.
## Voice Messages
Voice messages sent on Telegram are automatically transcribed using OpenAI's Whisper API and injected as text into the conversation. Requires `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`.
### Voice Bubbles (TTS)
When the agent generates audio via text-to-speech, it's delivered as native Telegram voice bubbles (the round, inline-playable kind).
- **OpenAI and ElevenLabs** produce Opus natively — no extra setup needed
- **Edge TTS** (the default free provider) outputs MP3 and needs **ffmpeg** to convert to Opus:
```bash
# Ubuntu/Debian
sudo apt install ffmpeg
# macOS
brew install ffmpeg
```
Without ffmpeg, Edge TTS audio is sent as a regular audio file (still playable, but rectangular player instead of voice bubble).
## Exec Approval
When the agent tries to run a potentially dangerous command, it asks you for approval in the chat:
> ⚠️ This command is potentially dangerous (recursive delete). Reply "yes" to approve.
Reply "yes"/"y" to approve or "no"/"n" to deny.
## Security
:::warning
Always set `TELEGRAM_ALLOWED_USERS` to restrict who can use the bot. Without it, the gateway denies all users by default.
:::
You can also use [DM pairing](/user-guide/messaging#dm-pairing-alternative-to-allowlists) for a more dynamic approach.

View File

@@ -0,0 +1,77 @@
---
sidebar_position: 5
title: "WhatsApp"
description: "Set up Hermes Agent as a WhatsApp bot via the built-in Baileys bridge"
---
# WhatsApp Setup
WhatsApp doesn't have a simple bot API like Telegram or Discord. Hermes includes a built-in bridge using [Baileys](https://github.com/WhiskeySockets/Baileys) that connects via WhatsApp Web.
## Two Modes
| Mode | How it works | Best for |
|------|-------------|----------|
| **Separate bot number** (recommended) | Dedicate a phone number to the bot. People message that number directly. | Clean UX, multiple users |
| **Personal self-chat** | Use your own WhatsApp. You message yourself to talk to the agent. | Quick setup, single user |
## Setup
```bash
hermes whatsapp
```
The wizard will:
1. Ask which mode you want
2. For **bot mode**: guide you through getting a second number
3. Configure the allowlist
4. Install bridge dependencies (Node.js required)
5. Display a QR code — scan from WhatsApp → Settings → Linked Devices → Link a Device
6. Exit once paired
## Getting a Second Number (Bot Mode)
| Option | Cost | Notes |
|--------|------|-------|
| WhatsApp Business app + dual-SIM | Free (if you have dual-SIM) | Install alongside personal WhatsApp, no second phone needed |
| Google Voice | Free (US only) | voice.google.com, verify WhatsApp via the Google Voice app |
| Prepaid SIM | $3-10/month | Any carrier; verify once, phone can go in a drawer on WiFi |
## Starting the Gateway
```bash
hermes gateway # Foreground
hermes gateway install # Or install as a system service
```
The gateway starts the WhatsApp bridge automatically using the saved session.
## Environment Variables
```bash
WHATSAPP_ENABLED=true
WHATSAPP_MODE=bot # "bot" or "self-chat"
WHATSAPP_ALLOWED_USERS=15551234567 # Comma-separated phone numbers with country code
```
## Important Notes
- Agent responses are prefixed with "⚕ **Hermes Agent**" for easy identification
- WhatsApp Web sessions can disconnect if WhatsApp updates their protocol
- The gateway reconnects automatically
- If you see persistent failures, re-pair with `hermes whatsapp`
:::info Re-pairing
If WhatsApp Web sessions disconnect (protocol updates, phone reset), re-pair with `hermes whatsapp`. The gateway handles temporary disconnections automatically.
:::
## Voice Messages
Voice messages sent on WhatsApp are automatically transcribed (requires `VOICE_TOOLS_OPENAI_KEY`). TTS audio is sent as MP3 file attachments.
## Security
:::warning
Always set `WHATSAPP_ALLOWED_USERS` with phone numbers (including country code) to restrict who can use the bot.
:::