From 7e0c2c3ce3afa8c80467609edd9084431391a33c Mon Sep 17 00:00:00 2001 From: Teknium <127238744+teknium1@users.noreply.github.com> Date: Mon, 30 Mar 2026 17:15:21 -0700 Subject: [PATCH] =?UTF-8?q?docs:=20comprehensive=20documentation=20audit?= =?UTF-8?q?=20=E2=80=94=20fix=209=20HIGH,=2020+=20MEDIUM=20gaps=20(#4087)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Reference docs fixes: - cli-commands.md: remove non-existent --provider alibaba, add hermes profile/completion/plugins/mcp to top-level table, add --profile/-p global flag, add --source chat option - slash-commands.md: add /yolo and /commands, fix /q alias conflict (resolves to /queue not /quit), add missing aliases (/bg, /set-home, /reload_mcp, /gateway) - toolsets-reference.md: fix hermes-api-server (not same as hermes-cli, omits clarify/send_message/text_to_speech) - profile-commands.md: fix show name required not optional, --clone-from not --from, add --remove/--name to alias, fix alias path, fix export/ import arg types, remove non-existent fish completion - tools-reference.md: add EXA_API_KEY to web tools requires_env - mcp-config-reference.md: add auth key for OAuth, tool name sanitization - environment-variables.md: add EXA_API_KEY, update provider values - plugins.md: remove non-existent ctx.register_command(), add ctx.inject_message() Feature docs additions: - security.md: add /yolo mode, approval modes (manual/smart/off), configurable timeout, expanded dangerous patterns table - cron.md: add wrap_response config, [SILENT] suppression - mcp.md: add dynamic tool discovery, MCP sampling support - cli.md: add Ctrl+Z suspend, busy_input_mode, tool_preview_length - docker.md: add skills/credential file mounting Messaging platform docs: - telegram.md: add webhook mode, DoH fallback IPs - slack.md: add multi-workspace OAuth support - discord.md: add DISCORD_IGNORE_NO_MENTION - matrix.md: add MSC3245 native voice messages - feishu.md: expand from 129 to 365 lines (encrypt key, verification token, group policy, card actions, media, rate limiting, markdown, troubleshooting) - wecom.md: expand from 86 to 264 lines (per-group allowlists, media, AES decryption, stream replies, reconnection, troubleshooting) Configuration docs: - quickstart.md: add DeepSeek, Copilot, Copilot ACP providers - configuration.md: add DeepSeek provider, Exa web backend, terminal env_passthrough/images, browser.command_timeout, compression params, discord config, security/tirith config, timezone, auxiliary models 21 files changed, ~1000 lines added --- website/docs/getting-started/quickstart.md | 3 + website/docs/reference/cli-commands.md | 58 ++++- .../docs/reference/environment-variables.md | 3 +- .../docs/reference/mcp-config-reference.md | 32 +++ website/docs/reference/profile-commands.md | 55 ++-- website/docs/reference/slash-commands.md | 22 +- website/docs/reference/tools-reference.md | 4 +- website/docs/reference/toolsets-reference.md | 2 +- website/docs/user-guide/cli.md | 40 +++ website/docs/user-guide/configuration.md | 96 ++++++- website/docs/user-guide/docker.md | 6 + website/docs/user-guide/features/cron.md | 34 +++ website/docs/user-guide/features/mcp.md | 43 ++- website/docs/user-guide/features/plugins.md | 55 ++-- website/docs/user-guide/messaging/discord.md | 4 + website/docs/user-guide/messaging/feishu.md | 246 +++++++++++++++++- website/docs/user-guide/messaging/matrix.md | 1 + website/docs/user-guide/messaging/slack.md | 54 ++++ website/docs/user-guide/messaging/telegram.md | 67 +++++ website/docs/user-guide/messaging/wecom.md | 186 ++++++++++++- website/docs/user-guide/security.md | 76 +++++- 21 files changed, 1004 insertions(+), 83 deletions(-) diff --git a/website/docs/getting-started/quickstart.md b/website/docs/getting-started/quickstart.md index 27cee7084..bc182f655 100644 --- a/website/docs/getting-started/quickstart.md +++ b/website/docs/getting-started/quickstart.md @@ -54,6 +54,9 @@ hermes setup # Or configure everything at once | **Kilo Code** | KiloCode-hosted models | Set `KILOCODE_API_KEY` | | **OpenCode Zen** | Pay-as-you-go access to curated models | Set `OPENCODE_ZEN_API_KEY` | | **OpenCode Go** | $10/month subscription for open models | Set `OPENCODE_GO_API_KEY` | +| **DeepSeek** | Direct DeepSeek API access | Set `DEEPSEEK_API_KEY` | +| **GitHub Copilot** | GitHub Copilot subscription (GPT-5.x, Claude, Gemini, etc.) | OAuth via `hermes model`, or `COPILOT_GITHUB_TOKEN` / `GH_TOKEN` | +| **GitHub Copilot ACP** | Copilot ACP agent backend (spawns local `copilot` CLI) | `hermes model` (requires `copilot` CLI + `copilot login`) | | **Vercel AI Gateway** | Vercel AI Gateway routing | Set `AI_GATEWAY_API_KEY` | | **Custom Endpoint** | VLLM, SGLang, Ollama, or any OpenAI-compatible API | Set base URL + API key | diff --git a/website/docs/reference/cli-commands.md b/website/docs/reference/cli-commands.md index a9f12d76b..cd0cff39c 100644 --- a/website/docs/reference/cli-commands.md +++ b/website/docs/reference/cli-commands.md @@ -21,6 +21,7 @@ hermes [global-options] [subcommand/options] | Option | Description | |--------|-------------| | `--version`, `-V` | Show version and exit. | +| `--profile `, `-p ` | Select which Hermes profile to use for this invocation. Overrides the sticky default set by `hermes profile use`. | | `--resume `, `-r ` | Resume a previous session by ID or title. | | `--continue [name]`, `-c [name]` | Resume the most recent session, or the most recent session matching a title. | | `--worktree`, `-w` | Start in an isolated git worktree for parallel-agent workflows. | @@ -46,10 +47,14 @@ hermes [global-options] [subcommand/options] | `hermes skills` | Browse, install, publish, audit, and configure skills. | | `hermes honcho` | Manage Honcho cross-session memory integration. | | `hermes acp` | Run Hermes as an ACP server for editor integration. | +| `hermes mcp` | Manage MCP server configurations and run Hermes as an MCP server. | +| `hermes plugins` | Manage Hermes Agent plugins (install, enable, disable, remove). | | `hermes tools` | Configure enabled tools per platform. | | `hermes sessions` | Browse, export, prune, rename, and delete sessions. | | `hermes insights` | Show token/cost/activity analytics. | | `hermes claw` | OpenClaw migration helpers. | +| `hermes profile` | Manage profiles — multiple isolated Hermes instances. | +| `hermes completion` | Print shell completion scripts (bash/zsh). | | `hermes version` | Show version information. | | `hermes update` | Pull latest code and reinstall dependencies. | | `hermes uninstall` | Remove Hermes from the system. | @@ -67,7 +72,7 @@ Common options: | `-q`, `--query "..."` | One-shot, non-interactive prompt. | | `-m`, `--model ` | Override the model for this run. | | `-t`, `--toolsets ` | Enable a comma-separated set of toolsets. | -| `--provider ` | Force a provider: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `alibaba`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`. | +| `--provider ` | Force a provider: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot-acp`, `copilot`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`. | | `-s`, `--skills ` | Preload one or more skills for the session (can be repeated or comma-separated). | | `-v`, `--verbose` | Verbose output. | | `-Q`, `--quiet` | Programmatic mode: suppress banner/spinner/tool previews. | @@ -76,6 +81,7 @@ Common options: | `--checkpoints` | Enable filesystem checkpoints before destructive file changes. | | `--yolo` | Skip approval prompts. | | `--pass-session-id` | Pass the session ID into the system prompt. | +| `--source ` | Session source tag for filtering (default: `cli`). Use `tool` for third-party integrations that should not appear in user session lists. | Examples: @@ -507,6 +513,56 @@ hermes claw migrate --preset user-data --overwrite hermes claw migrate --source /home/user/old-openclaw ``` +## `hermes profile` + +```bash +hermes profile +``` + +Manage profiles — multiple isolated Hermes instances, each with its own config, sessions, skills, and home directory. + +| Subcommand | Description | +|------------|-------------| +| `list` | List all profiles. | +| `use ` | Set a sticky default profile. | +| `create [--clone] [--no-alias]` | Create a new profile. `--clone` copies config, `.env`, and `SOUL.md` from the active profile. | +| `delete [-y]` | Delete a profile. | +| `show ` | Show profile details (home directory, config, etc.). | +| `alias [--remove] [--name NAME]` | Manage wrapper scripts for quick profile access. | +| `rename ` | Rename a profile. | +| `export [-o FILE]` | Export a profile to a `.tar.gz` archive. | +| `import [--name NAME]` | Import a profile from a `.tar.gz` archive. | + +Examples: + +```bash +hermes profile list +hermes profile create work --clone +hermes profile use work +hermes profile alias work --name h-work +hermes profile export work -o work-backup.tar.gz +hermes profile import work-backup.tar.gz --name restored +hermes -p work chat -q "Hello from work profile" +``` + +## `hermes completion` + +```bash +hermes completion [bash|zsh] +``` + +Print a shell completion script to stdout. Source the output in your shell profile for tab-completion of Hermes commands, subcommands, and profile names. + +Examples: + +```bash +# Bash +hermes completion bash >> ~/.bashrc + +# Zsh +hermes completion zsh >> ~/.zshrc +``` + ## Maintenance commands | Command | Description | diff --git a/website/docs/reference/environment-variables.md b/website/docs/reference/environment-variables.md index 715c9fbc1..d94121481 100644 --- a/website/docs/reference/environment-variables.md +++ b/website/docs/reference/environment-variables.md @@ -63,7 +63,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe | Variable | Description | |----------|-------------| -| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`, `alibaba` (default: `auto`) | +| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `copilot`, `copilot-acp`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`, `alibaba`, `deepseek`, `opencode-zen`, `opencode-go`, `ai-gateway` (default: `auto`) | | `HERMES_PORTAL_BASE_URL` | Override Nous Portal URL (for development/testing) | | `NOUS_INFERENCE_BASE_URL` | Override Nous inference API URL | | `HERMES_NOUS_MIN_KEY_TTL_SECONDS` | Min agent key TTL before re-mint (default: 1800 = 30min) | @@ -80,6 +80,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe | `FIRECRAWL_API_KEY` | Web scraping ([firecrawl.dev](https://firecrawl.dev/)) | | `FIRECRAWL_API_URL` | Custom Firecrawl API endpoint for self-hosted instances (optional) | | `TAVILY_API_KEY` | Tavily API key for AI-native web search, extract, and crawl ([app.tavily.com](https://app.tavily.com/home)) | +| `EXA_API_KEY` | Exa API key for AI-native web search and contents ([exa.ai](https://exa.ai/)) | | `BROWSERBASE_API_KEY` | Browser automation ([browserbase.com](https://browserbase.com/)) | | `BROWSERBASE_PROJECT_ID` | Browserbase project ID | | `BROWSER_USE_API_KEY` | Browser Use cloud browser API key ([browser-use.com](https://browser-use.com/)) | diff --git a/website/docs/reference/mcp-config-reference.md b/website/docs/reference/mcp-config-reference.md index 5f78185b9..a87478f91 100644 --- a/website/docs/reference/mcp-config-reference.md +++ b/website/docs/reference/mcp-config-reference.md @@ -48,6 +48,8 @@ mcp_servers: | `timeout` | number | both | Tool call timeout | | `connect_timeout` | number | both | Initial connection timeout | | `tools` | mapping | both | Filtering and utility-tool policy | +| `auth` | string | HTTP | Authentication method. Set to `oauth` to enable OAuth 2.1 with PKCE | +| `sampling` | mapping | both | Server-initiated LLM request policy (see MCP guide) | ## `tools` policy keys @@ -213,3 +215,33 @@ Utility tools follow the same prefixing pattern: - `mcp__read_resource` - `mcp__list_prompts` - `mcp__get_prompt` + +### Name sanitization + +Hyphens (`-`) and dots (`.`) in both server names and tool names are replaced with underscores before registration. This ensures tool names are valid identifiers for LLM function-calling APIs. + +For example, a server named `my-api` exposing a tool called `list-items.v2` becomes: + +```text +mcp_my_api_list_items_v2 +``` + +Keep this in mind when writing `include` / `exclude` filters — use the **original** MCP tool name (with hyphens/dots), not the sanitized version. + +## OAuth 2.1 authentication + +For HTTP servers that require OAuth, set `auth: oauth` on the server entry: + +```yaml +mcp_servers: + protected_api: + url: "https://mcp.example.com/mcp" + auth: oauth +``` + +Behavior: +- Hermes uses the MCP SDK's OAuth 2.1 PKCE flow (metadata discovery, dynamic client registration, token exchange, and refresh) +- On first connect, a browser window opens for authorization +- Tokens are persisted to `~/.hermes/mcp-tokens/.json` and reused across sessions +- Token refresh is automatic; re-authorization only happens when refresh fails +- Only applies to HTTP/StreamableHTTP transport (`url`-based servers) diff --git a/website/docs/reference/profile-commands.md b/website/docs/reference/profile-commands.md index a59e27574..d2d7adb8f 100644 --- a/website/docs/reference/profile-commands.md +++ b/website/docs/reference/profile-commands.md @@ -78,7 +78,7 @@ Creates a new profile. | `` | Name for the new profile. Must be a valid directory name (alphanumeric, hyphens, underscores). | | `--clone` | Copy `config.yaml`, `.env`, and `SOUL.md` from the current profile. | | `--clone-all` | Copy everything (config, memories, skills, sessions, state) from the current profile. | -| `--from ` | Clone from a specific profile instead of the current one. Used with `--clone` or `--clone-all`. | +| `--clone-from ` | Clone from a specific profile instead of the current one. Used with `--clone` or `--clone-all`. | **Examples:** @@ -93,7 +93,7 @@ hermes profile create work --clone hermes profile create backup --clone-all # Clone config from a specific profile -hermes profile create work2 --clone --from work +hermes profile create work2 --clone --clone-from work ``` ## `hermes profile delete` @@ -123,14 +123,14 @@ This permanently deletes the profile's entire directory including all config, me ## `hermes profile show` ```bash -hermes profile show [name] +hermes profile show ``` Displays details about a profile including its home directory, configured model, active platforms, and disk usage. | Argument | Description | |----------|-------------| -| `[name]` | Profile to inspect. Defaults to the current active profile if omitted. | +| `` | Profile to inspect. | **Example:** @@ -147,20 +147,28 @@ Disk: 48 MB ## `hermes profile alias` ```bash -hermes profile alias +hermes profile alias [options] ``` -Regenerates the shell alias script at `~/.local/bin/hermes-`. Useful if the alias was accidentally deleted or if you need to update it after moving your Hermes installation. +Regenerates the shell alias script at `~/.local/bin/`. Useful if the alias was accidentally deleted or if you need to update it after moving your Hermes installation. -| Argument | Description | -|----------|-------------| +| Argument / Option | Description | +|-------------------|-------------| | `` | Profile to create/update the alias for. | +| `--remove` | Remove the wrapper script instead of creating it. | +| `--name ` | Custom alias name (default: profile name). | **Example:** ```bash hermes profile alias work # Creates/updates ~/.local/bin/work + +hermes profile alias work --name mywork +# Creates ~/.local/bin/mywork + +hermes profile alias work --remove +# Removes the wrapper script ``` ## `hermes profile rename` @@ -187,39 +195,45 @@ hermes profile rename mybot assistant ## `hermes profile export` ```bash -hermes profile export +hermes profile export [options] ``` Exports a profile as a compressed tar.gz archive. -| Argument | Description | -|----------|-------------| +| Argument / Option | Description | +|-------------------|-------------| | `` | Profile to export. | -| `` | Path for the output archive (e.g., `./work-backup.tar.gz`). | +| `-o`, `--output ` | Output file path (default: `.tar.gz`). | **Example:** ```bash -hermes profile export work ./work-2026-03-29.tar.gz +hermes profile export work +# Creates work.tar.gz in the current directory + +hermes profile export work -o ./work-2026-03-29.tar.gz ``` ## `hermes profile import` ```bash -hermes profile import [name] +hermes profile import [options] ``` Imports a profile from a tar.gz archive. -| Argument | Description | -|----------|-------------| -| `` | Path to the tar.gz archive to import. | -| `[name]` | Name for the imported profile. Defaults to the original profile name from the archive. | +| Argument / Option | Description | +|-------------------|-------------| +| `` | Path to the tar.gz archive to import. | +| `--name ` | Name for the imported profile (default: inferred from archive). | **Example:** ```bash -hermes profile import ./work-2026-03-29.tar.gz work-restored +hermes profile import ./work-2026-03-29.tar.gz +# Infers profile name from the archive + +hermes profile import ./work-2026-03-29.tar.gz --name work-restored ``` ## `hermes -p` / `hermes --profile` @@ -254,7 +268,7 @@ Generates shell completion scripts. Includes completions for profile names and p | Argument | Description | |----------|-------------| -| `` | Shell to generate completions for: `bash`, `zsh`, or `fish`. | +| `` | Shell to generate completions for: `bash` or `zsh`. | **Examples:** @@ -262,7 +276,6 @@ Generates shell completion scripts. Includes completions for profile names and p # Install completions hermes completion bash >> ~/.bashrc hermes completion zsh >> ~/.zshrc -hermes completion fish > ~/.config/fish/completions/hermes.fish # Reload shell source ~/.bashrc diff --git a/website/docs/reference/slash-commands.md b/website/docs/reference/slash-commands.md index 70b15efa9..94e413445 100644 --- a/website/docs/reference/slash-commands.md +++ b/website/docs/reference/slash-commands.md @@ -31,10 +31,10 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in | `/compress` | Manually compress conversation context (flush memories + summarize) | | `/rollback` | List or restore filesystem checkpoints (usage: /rollback [number]) | | `/stop` | Kill all running background processes | -| `/queue ` (alias: `/q`) | Queue a prompt for the next turn (doesn't interrupt the current agent response) | +| `/queue ` (alias: `/q`) | Queue a prompt for the next turn (doesn't interrupt the current agent response). **Note:** `/q` is claimed by both `/queue` and `/quit`; the last registration wins, so `/q` resolves to `/quit` in practice. Use `/queue` explicitly. | | `/resume [name]` | Resume a previously-named session | | `/statusbar` (alias: `/sb`) | Toggle the context/model status bar on or off | -| `/background ` | Run a prompt in a separate background session. The agent processes your prompt independently — your current session stays free for other work. Results appear as a panel when the task finishes. See [CLI Background Sessions](/docs/user-guide/cli#background-sessions). | +| `/background ` (alias: `/bg`) | Run a prompt in a separate background session. The agent processes your prompt independently — your current session stays free for other work. Results appear as a panel when the task finishes. See [CLI Background Sessions](/docs/user-guide/cli#background-sessions). | | `/plan [request]` | Load the bundled `plan` skill to write a markdown plan instead of executing the work. Plans are saved under `.hermes/plans/` relative to the active workspace/backend working directory. | ### Configuration @@ -50,6 +50,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in | `/reasoning` | Manage reasoning effort and display (usage: /reasoning [level\|show\|hide]) | | `/skin` | Show or change the display skin/theme | | `/voice [on\|off\|tts\|status]` | Toggle CLI voice mode and spoken playback. Recording uses `voice.record_key` (default: `Ctrl+B`). | +| `/yolo` | Toggle YOLO mode — skip all dangerous command approval prompts. | ### Tools & Skills @@ -60,7 +61,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in | `/browser [connect\|disconnect\|status]` | Manage local Chrome CDP connection. `connect` attaches browser tools to a running Chrome instance (default: `ws://localhost:9222`). `disconnect` detaches. `status` shows current connection. Auto-launches Chrome if no debugger is detected. | | `/skills` | Search, install, inspect, or manage skills from online registries | | `/cron` | Manage scheduled tasks (list, add/create, edit, pause, resume, run, remove) | -| `/reload-mcp` | Reload MCP servers from config.yaml | +| `/reload-mcp` (alias: `/reload_mcp`) | Reload MCP servers from config.yaml | | `/plugins` | List installed plugins and their status | ### Info @@ -70,14 +71,15 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in | `/help` | Show this help message | | `/usage` | Show token usage, cost breakdown, and session duration | | `/insights` | Show usage insights and analytics (last 30 days) | -| `/platforms` | Show gateway/messaging platform status | +| `/platforms` (alias: `/gateway`) | Show gateway/messaging platform status | | `/paste` | Check clipboard for an image and attach it | +| `/profile` | Show active profile name and home directory | ### Exit | Command | Description | |---------|-------------| -| `/quit` | Exit the CLI (also: /exit, /q) | +| `/quit` | Exit the CLI (also: `/exit`). See note on `/q` under `/queue` above. | ### Dynamic CLI slash commands @@ -105,7 +107,7 @@ The messaging gateway supports the following built-in commands inside Telegram, | `/personality [name]` | Set a personality overlay for the session. | | `/retry` | Retry the last message. | | `/undo` | Remove the last exchange. | -| `/sethome` | Mark the current chat as the platform home channel for deliveries. | +| `/sethome` (alias: `/set-home`) | Mark the current chat as the platform home channel for deliveries. | | `/compress` | Manually compress conversation context. | | `/title [name]` | Set or show the session title. | | `/resume [name]` | Resume a previously named session. | @@ -116,7 +118,9 @@ The messaging gateway supports the following built-in commands inside Telegram, | `/rollback [number]` | List or restore filesystem checkpoints. | | `/background ` | Run a prompt in a separate background session. Results are delivered back to the same chat when the task finishes. See [Messaging Background Sessions](/docs/user-guide/messaging/#background-sessions). | | `/plan [request]` | Load the bundled `plan` skill to write a markdown plan instead of executing the work. Plans are saved under `.hermes/plans/` relative to the active workspace/backend working directory. | -| `/reload-mcp` | Reload MCP servers from config. | +| `/reload-mcp` (alias: `/reload_mcp`) | Reload MCP servers from config. | +| `/yolo` | Toggle YOLO mode — skip all dangerous command approval prompts. | +| `/commands [page]` | Browse all commands and skills (paginated). | | `/approve [session\|always]` | Approve and execute a pending dangerous command. `session` approves for this session only; `always` adds to permanent allowlist. | | `/deny` | Reject a pending dangerous command. | | `/update` | Update Hermes Agent to the latest version. | @@ -127,6 +131,6 @@ The messaging gateway supports the following built-in commands inside Telegram, - `/skin`, `/tools`, `/toolsets`, `/browser`, `/config`, `/prompt`, `/cron`, `/skills`, `/platforms`, `/paste`, `/statusbar`, and `/plugins` are **CLI-only** commands. - `/verbose` is **CLI-only by default**, but can be enabled for messaging platforms by setting `display.tool_progress_command: true` in `config.yaml`. When enabled, it cycles the `display.tool_progress` mode and saves to config. -- `/status`, `/sethome`, `/update`, `/approve`, and `/deny` are **messaging-only** commands. -- `/background`, `/voice`, `/reload-mcp`, and `/rollback` work in **both** the CLI and the messaging gateway. +- `/status`, `/sethome`, `/update`, `/approve`, `/deny`, and `/commands` are **messaging-only** commands. +- `/background`, `/voice`, `/reload-mcp`, `/rollback`, and `/yolo` work in **both** the CLI and the messaging gateway. - `/voice join`, `/voice channel`, and `/voice leave` are only meaningful on Discord. diff --git a/website/docs/reference/tools-reference.md b/website/docs/reference/tools-reference.md index 9a30bab33..275dea4fe 100644 --- a/website/docs/reference/tools-reference.md +++ b/website/docs/reference/tools-reference.md @@ -151,8 +151,8 @@ This page documents the built-in Hermes tool registry as it exists in code. Avai | Tool | Description | Requires environment | |------|-------------|----------------------| -| `web_search` | Search the web for information on any topic. Returns up to 5 relevant results with titles, URLs, and descriptions. | PARALLEL_API_KEY or FIRECRAWL_API_KEY or TAVILY_API_KEY | -| `web_extract` | Extract content from web page URLs. Returns page content in markdown format. Also works with PDF URLs — pass the PDF link directly and it converts to markdown text. Pages under 5000 chars return full markdown; larger pages are LLM-summarized. | PARALLEL_API_KEY or FIRECRAWL_API_KEY or TAVILY_API_KEY | +| `web_search` | Search the web for information on any topic. Returns up to 5 relevant results with titles, URLs, and descriptions. | EXA_API_KEY or PARALLEL_API_KEY or FIRECRAWL_API_KEY or TAVILY_API_KEY | +| `web_extract` | Extract content from web page URLs. Returns page content in markdown format. Also works with PDF URLs — pass the PDF link directly and it converts to markdown text. Pages under 5000 chars return full markdown; larger pages are LLM-summarized. | EXA_API_KEY or PARALLEL_API_KEY or FIRECRAWL_API_KEY or TAVILY_API_KEY | ## `tts` toolset diff --git a/website/docs/reference/toolsets-reference.md b/website/docs/reference/toolsets-reference.md index 83cf92e4c..7999acc01 100644 --- a/website/docs/reference/toolsets-reference.md +++ b/website/docs/reference/toolsets-reference.md @@ -19,7 +19,7 @@ Toolsets are named bundles of tools that you can enable with `hermes chat --tool | `file` | core | `patch`, `read_file`, `search_files`, `write_file` | | `hermes-acp` | platform | `browser_back`, `browser_click`, `browser_close`, `browser_console`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `delegate_task`, `execute_code`, `memory`, `patch`, `process`, `read_file`, `search_files`, `session_search`, `skill_manage`, `skill_view`, `skills_list`, `terminal`, `todo`, `vision_analyze`, `web_extract`, `web_search`, `write_file` | | `hermes-cli` | platform | `browser_back`, `browser_click`, `browser_close`, `browser_console`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `clarify`, `cronjob`, `delegate_task`, `execute_code`, `ha_call_service`, `ha_get_state`, `ha_list_entities`, `ha_list_services`, `honcho_conclude`, `honcho_context`, `honcho_profile`, `honcho_search`, `image_generate`, `memory`, `mixture_of_agents`, `patch`, `process`, `read_file`, `search_files`, `send_message`, `session_search`, `skill_manage`, `skill_view`, `skills_list`, `terminal`, `text_to_speech`, `todo`, `vision_analyze`, `web_extract`, `web_search`, `write_file` | -| `hermes-api-server` | platform | _(same as hermes-cli)_ | +| `hermes-api-server` | platform | `browser_back`, `browser_click`, `browser_close`, `browser_console`, `browser_get_images`, `browser_navigate`, `browser_press`, `browser_scroll`, `browser_snapshot`, `browser_type`, `browser_vision`, `cronjob`, `delegate_task`, `execute_code`, `ha_call_service`, `ha_get_state`, `ha_list_entities`, `ha_list_services`, `honcho_conclude`, `honcho_context`, `honcho_profile`, `honcho_search`, `image_generate`, `memory`, `mixture_of_agents`, `patch`, `process`, `read_file`, `search_files`, `session_search`, `skill_manage`, `skill_view`, `skills_list`, `terminal`, `todo`, `vision_analyze`, `web_extract`, `web_search`, `write_file` | | `hermes-dingtalk` | platform | _(same as hermes-cli)_ | | `hermes-feishu` | platform | _(same as hermes-cli)_ | | `hermes-wecom` | platform | _(same as hermes-cli)_ | diff --git a/website/docs/user-guide/cli.md b/website/docs/user-guide/cli.md index 1c4857d71..e37b1ddba 100644 --- a/website/docs/user-guide/cli.md +++ b/website/docs/user-guide/cli.md @@ -94,6 +94,7 @@ When resuming a previous session (`hermes -c` or `hermes --resume `), a "Pre | `Ctrl+B` | Start/stop voice recording when voice mode is enabled (`voice.record_key`, default: `ctrl+b`) | | `Ctrl+C` | Interrupt agent (double-press within 2s to force exit) | | `Ctrl+D` | Exit | +| `Ctrl+Z` | Suspend Hermes to background (Unix only). Run `fg` in the shell to resume. | | `Tab` | Accept auto-suggestion (ghost text) or autocomplete slash commands | ## Slash Commands @@ -212,6 +213,33 @@ You can interrupt the agent at any point: - In-progress terminal commands are killed immediately (SIGTERM, then SIGKILL after 1s) - Multiple messages typed during interrupt are combined into one prompt +### Busy Input Mode + +The `display.busy_input_mode` config key controls what happens when you press Enter while the agent is working: + +| Mode | Behavior | +|------|----------| +| `"interrupt"` (default) | Your message interrupts the current operation and is processed immediately | +| `"queue"` | Your message is silently queued and sent as the next turn after the agent finishes | + +```yaml +# ~/.hermes/config.yaml +display: + busy_input_mode: "queue" # or "interrupt" (default) +``` + +Queue mode is useful when you want to prepare follow-up messages without accidentally canceling in-flight work. Unknown values fall back to `"interrupt"`. + +### Suspending to Background + +On Unix systems, press **`Ctrl+Z`** to suspend Hermes to the background — just like any terminal process. The shell prints a confirmation: + +``` +Hermes Agent has been suspended. Run `fg` to bring Hermes Agent back. +``` + +Type `fg` in your shell to resume the session exactly where you left off. This is not supported on Windows. + ## Tool Progress Display The CLI shows animated feedback as the agent works: @@ -232,6 +260,18 @@ The CLI shows animated feedback as the agent works: Cycle through display modes with `/verbose`: `off → new → all → verbose`. This command can also be enabled for messaging platforms — see [configuration](/docs/user-guide/configuration#display-settings). +### Tool Preview Length + +The `display.tool_preview_length` config key controls the maximum number of characters shown in tool call preview lines (e.g. file paths, terminal commands). The default is `0`, which means no limit — full paths and commands are shown. + +```yaml +# ~/.hermes/config.yaml +display: + tool_preview_length: 80 # Truncate tool previews to 80 chars (0 = no limit) +``` + +This is useful on narrow terminals or when tool arguments contain very long file paths. + ## Session Management ### Resuming Sessions diff --git a/website/docs/user-guide/configuration.md b/website/docs/user-guide/configuration.md index c3aa96f53..b0ea0482d 100644 --- a/website/docs/user-guide/configuration.md +++ b/website/docs/user-guide/configuration.md @@ -92,6 +92,7 @@ You need at least one way to connect to an LLM. Use `hermes model` to switch pro | **Kilo Code** | `KILOCODE_API_KEY` in `~/.hermes/.env` (provider: `kilocode`) | | **OpenCode Zen** | `OPENCODE_ZEN_API_KEY` in `~/.hermes/.env` (provider: `opencode-zen`) | | **OpenCode Go** | `OPENCODE_GO_API_KEY` in `~/.hermes/.env` (provider: `opencode-go`) | +| **DeepSeek** | `DEEPSEEK_API_KEY` in `~/.hermes/.env` (provider: `deepseek`) | | **Hugging Face** | `HF_TOKEN` in `~/.hermes/.env` (provider: `huggingface`, aliases: `hf`) | | **Custom Endpoint** | `hermes model` (saved in `config.yaml`) or `OPENAI_BASE_URL` + `OPENAI_API_KEY` in `~/.hermes/.env` | @@ -706,6 +707,10 @@ terminal: backend: local # local | docker | ssh | modal | daytona | singularity cwd: "." # Working directory ("." = current dir for local, "/root" for containers) timeout: 180 # Per-command timeout in seconds + env_passthrough: [] # Env var names to forward to sandboxed execution (terminal + execute_code) + singularity_image: "docker://nikolaik/python-nodejs:python3.11-nodejs20" # Container image for Singularity backend + modal_image: "nikolaik/python-nodejs:python3.11-nodejs20" # Container image for Modal backend + daytona_image: "nikolaik/python-nodejs:python3.11-nodejs20" # Container image for Daytona backend ``` ### Backend Overview @@ -1012,6 +1017,8 @@ All compression settings live in `config.yaml` (no environment variables). compression: enabled: true # Toggle compression on/off threshold: 0.50 # Compress at this % of context limit + target_ratio: 0.20 # Fraction of threshold to preserve as recent tail + protect_last_n: 20 # Min recent messages to keep uncompressed summary_model: "google/gemini-3-flash-preview" # Model for summarization summary_provider: "auto" # Provider: "auto", "openrouter", "nous", "codex", "main", etc. summary_base_url: null # Custom OpenAI-compatible endpoint (overrides provider) @@ -1146,6 +1153,38 @@ auxiliary: # Context compression timeout (separate from compression.* config) compression: timeout: 120 # seconds — compression summarizes long conversations, needs more time + + # Session search — summarizes past session matches + session_search: + provider: "auto" + model: "" + base_url: "" + api_key: "" + timeout: 30 + + # Skills hub — skill matching and search + skills_hub: + provider: "auto" + model: "" + base_url: "" + api_key: "" + timeout: 30 + + # MCP tool dispatch + mcp: + provider: "auto" + model: "" + base_url: "" + api_key: "" + timeout: 30 + + # Memory flush — summarizes conversation for persistent memory + flush_memories: + provider: "auto" + model: "" + base_url: "" + api_key: "" + timeout: 30 ``` :::tip @@ -1340,6 +1379,7 @@ display: streaming: false # Stream tokens to terminal as they arrive (real-time output) background_process_notifications: all # all | result | error | off (gateway only) show_cost: false # Show estimated $ cost in the CLI status bar + tool_preview_length: 0 # Max chars for tool call previews (0 = no limit, show full paths/commands) ``` ### Theme mode @@ -1554,11 +1594,11 @@ code_execution: ## Web Search Backends -The `web_search`, `web_extract`, and `web_crawl` tools support three backend providers. Configure the backend in `config.yaml` or via `hermes tools`: +The `web_search`, `web_extract`, and `web_crawl` tools support four backend providers. Configure the backend in `config.yaml` or via `hermes tools`: ```yaml web: - backend: firecrawl # firecrawl | parallel | tavily + backend: firecrawl # firecrawl | parallel | tavily | exa ``` | Backend | Env Var | Search | Extract | Crawl | @@ -1566,8 +1606,9 @@ web: | **Firecrawl** (default) | `FIRECRAWL_API_KEY` | ✔ | ✔ | ✔ | | **Parallel** | `PARALLEL_API_KEY` | ✔ | ✔ | — | | **Tavily** | `TAVILY_API_KEY` | ✔ | ✔ | ✔ | +| **Exa** | `EXA_API_KEY` | ✔ | ✔ | — | -**Backend selection:** If `web.backend` is not set, the backend is auto-detected from available API keys. If only `TAVILY_API_KEY` is set, Tavily is used. If only `PARALLEL_API_KEY` is set, Parallel is used. Otherwise Firecrawl is the default. +**Backend selection:** If `web.backend` is not set, the backend is auto-detected from available API keys. If only `EXA_API_KEY` is set, Exa is used. If only `TAVILY_API_KEY` is set, Tavily is used. If only `PARALLEL_API_KEY` is set, Parallel is used. Otherwise Firecrawl is the default. **Self-hosted Firecrawl:** Set `FIRECRAWL_API_URL` to point at your own instance. When a custom URL is set, the API key becomes optional (set `USE_DB_AUTHENTICATION=false` on the server to disable auth). @@ -1580,11 +1621,60 @@ Configure browser automation behavior: ```yaml browser: inactivity_timeout: 120 # Seconds before auto-closing idle sessions + command_timeout: 30 # Timeout in seconds for browser commands (screenshot, navigate, etc.) record_sessions: false # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/ ``` The browser toolset supports multiple providers. See the [Browser feature page](/docs/user-guide/features/browser) for details on Browserbase, Browser Use, and local Chrome CDP setup. +## Timezone + +Override the server-local timezone with an IANA timezone string. Affects timestamps in logs, cron scheduling, and system prompt time injection. + +```yaml +timezone: "America/New_York" # IANA timezone (default: "" = server-local time) +``` + +Supported values: any IANA timezone identifier (e.g. `America/New_York`, `Europe/London`, `Asia/Kolkata`, `UTC`). Leave empty or omit for server-local time. + +## Discord + +Configure Discord-specific behavior for the messaging gateway: + +```yaml +discord: + require_mention: true # Require @mention to respond in server channels + free_response_channels: "" # Comma-separated channel IDs where bot responds without @mention + auto_thread: true # Auto-create threads on @mention in channels +``` + +- `require_mention` — when `true` (default), the bot only responds in server channels when mentioned with `@BotName`. DMs always work without mention. +- `free_response_channels` — comma-separated list of channel IDs where the bot responds to every message without requiring a mention. +- `auto_thread` — when `true` (default), mentions in channels automatically create a thread for the conversation, keeping channels clean (similar to Slack threading). + +## Security + +Pre-execution security scanning and secret redaction: + +```yaml +security: + redact_secrets: true # Redact API key patterns in tool output and logs + tirith_enabled: true # Enable Tirith security scanning for terminal commands + tirith_path: "tirith" # Path to tirith binary (default: "tirith" in $PATH) + tirith_timeout: 5 # Seconds to wait for tirith scan before timing out + tirith_fail_open: true # Allow command execution if tirith is unavailable + website_blocklist: # See Website Blocklist section below + enabled: false + domains: [] + shared_files: [] +``` + +- `redact_secrets` — automatically detects and redacts patterns that look like API keys, tokens, and passwords in tool output before it enters the conversation context and logs. +- `tirith_enabled` — when `true`, terminal commands are scanned by [Tirith](https://github.com/StackGuardian/tirith) before execution to detect potentially dangerous operations. +- `tirith_path` — path to the tirith binary. Set this if tirith is installed in a non-standard location. +- `tirith_timeout` — maximum seconds to wait for a tirith scan. Commands proceed if the scan times out. +- `tirith_fail_open` — when `true` (default), commands are allowed to execute if tirith is unavailable or fails. Set to `false` to block commands when tirith cannot verify them. + ## Website Blocklist Block specific domains from being accessed by the agent's web and browser tools: diff --git a/website/docs/user-guide/docker.md b/website/docs/user-guide/docker.md index 229919774..3fb33a93f 100644 --- a/website/docs/user-guide/docker.md +++ b/website/docs/user-guide/docker.md @@ -54,3 +54,9 @@ docker run -d \ -v ~/.hermes:/opt/data \ nousresearch/hermes-agent ``` + +## Skills and credential files + +When using Docker as the execution environment (not the methods above, but when the agent runs commands inside a Docker sandbox), Hermes automatically bind-mounts the skills directory (`~/.hermes/skills/`) and any credential files declared by skills into the container as read-only volumes. This means skill scripts, templates, and references are available inside the sandbox without manual configuration. + +The same syncing happens for SSH and Modal backends — skills and credential files are uploaded via rsync or the Modal mount API before each command. diff --git a/website/docs/user-guide/features/cron.md b/website/docs/user-guide/features/cron.md index 2d0a4c836..f8b1d2c5a 100644 --- a/website/docs/user-guide/features/cron.md +++ b/website/docs/user-guide/features/cron.md @@ -193,6 +193,40 @@ When scheduling jobs, you specify where the output goes: The agent's final response is automatically delivered. You do not need to call `send_message` in the cron prompt. +### Response wrapping + +By default, delivered cron output is wrapped with a header and footer so the recipient knows it came from a scheduled task: + +``` +Cronjob Response: Morning feeds +------------- + + + +Note: The agent cannot see this message, and therefore cannot respond to it. +``` + +To deliver the raw agent output without the wrapper, set `cron.wrap_response` to `false`: + +```yaml +# ~/.hermes/config.yaml +cron: + wrap_response: false +``` + +### Silent suppression + +If the agent's final response starts with `[SILENT]`, delivery is suppressed entirely. The output is still saved locally for audit (in `~/.hermes/cron/output/`), but no message is sent to the delivery target. + +This is useful for monitoring jobs that should only report when something is wrong: + +```text +Check if nginx is running. If everything is healthy, respond with only [SILENT]. +Otherwise, report the issue. +``` + +Failed jobs always deliver regardless of the `[SILENT]` marker — only successful runs can be silenced. + ## Schedule formats The agent's final response is automatically delivered — you do **not** need to include `send_message` in the cron prompt for that same destination. If a cron run calls `send_message` to the exact target the scheduler will already deliver to, Hermes skips that duplicate send and tells the model to put the user-facing content in the final response instead. Use `send_message` only for additional or different targets. diff --git a/website/docs/user-guide/features/mcp.md b/website/docs/user-guide/features/mcp.md index 9b8326d46..b48f4f656 100644 --- a/website/docs/user-guide/features/mcp.md +++ b/website/docs/user-guide/features/mcp.md @@ -277,6 +277,14 @@ That keeps the tool list clean. Hermes discovers MCP servers at startup and registers their tools into the normal tool registry. +### Dynamic Tool Discovery + +MCP servers can notify Hermes when their available tools change at runtime by sending a `notifications/tools/list_changed` notification. When Hermes receives this notification, it automatically re-fetches the server's tool list and updates the registry — no manual `/reload-mcp` required. + +This is useful for MCP servers whose capabilities change dynamically (e.g. a server that adds tools when a new database schema is loaded, or removes tools when a service goes offline). + +The refresh is lock-protected so rapid-fire notifications from the same server don't cause overlapping refreshes. Prompt and resource change notifications (`prompts/list_changed`, `resources/list_changed`) are received but not yet acted on. + ### Reloading If you change MCP config, use: @@ -285,7 +293,7 @@ If you change MCP config, use: /reload-mcp ``` -This reloads MCP servers from config and refreshes the available tool list. +This reloads MCP servers from config and refreshes the available tool list. For runtime tool changes pushed by the server itself, see [Dynamic Tool Discovery](#dynamic-tool-discovery) above. ### Toolsets @@ -403,6 +411,39 @@ Because Hermes now only registers those wrappers when both are true: This is intentional and keeps the tool list honest. +## MCP Sampling Support + +MCP servers can request LLM inference from Hermes via the `sampling/createMessage` protocol. This allows an MCP server to ask Hermes to generate text on its behalf — useful for servers that need LLM capabilities but don't have their own model access. + +Sampling is **enabled by default** for all MCP servers (when the MCP SDK supports it). Configure it per-server under the `sampling` key: + +```yaml +mcp_servers: + my_server: + command: "my-mcp-server" + sampling: + enabled: true # Enable sampling (default: true) + model: "openai/gpt-4o" # Override model for sampling requests (optional) + max_tokens_cap: 4096 # Max tokens per sampling response (default: 4096) + timeout: 30 # Timeout in seconds per request (default: 30) + max_rpm: 10 # Rate limit: max requests per minute (default: 10) + max_tool_rounds: 5 # Max tool-use rounds in sampling loops (default: 5) + allowed_models: [] # Allowlist of model names the server may request (empty = any) + log_level: "info" # Audit log level: debug, info, or warning (default: info) +``` + +The sampling handler includes a sliding-window rate limiter, per-request timeouts, and tool-loop depth limits to prevent runaway usage. Metrics (request count, errors, tokens used) are tracked per server instance. + +To disable sampling for a specific server: + +```yaml +mcp_servers: + untrusted_server: + url: "https://mcp.example.com" + sampling: + enabled: false +``` + ## Running Hermes as an MCP server In addition to connecting **to** MCP servers, Hermes can also **be** an MCP server. This lets other MCP-capable agents (Claude Code, Cursor, Codex, or any MCP client) use Hermes's messaging capabilities — list conversations, read message history, and send messages across all your connected platforms. diff --git a/website/docs/user-guide/features/plugins.md b/website/docs/user-guide/features/plugins.md index 0f2e20f17..28fc8041e 100644 --- a/website/docs/user-guide/features/plugins.md +++ b/website/docs/user-guide/features/plugins.md @@ -4,7 +4,7 @@ sidebar_position: 20 # Plugins -Hermes has a plugin system for adding custom tools, hooks, slash commands, and integrations without modifying core code. +Hermes has a plugin system for adding custom tools, hooks, and integrations without modifying core code. **→ [Build a Hermes Plugin](/docs/guides/build-a-hermes-plugin)** — step-by-step guide with a complete working example. @@ -30,7 +30,7 @@ Project-local plugins under `./.hermes/plugins/` are disabled by default. Enable |-----------|-----| | Add tools | `ctx.register_tool(name, schema, handler)` | | Add hooks | `ctx.register_hook("post_tool_call", callback)` | -| Add slash commands | `ctx.register_command("mycommand", handler)` | +| Inject messages | `ctx.inject_message(content, role="user")` — see [Injecting Messages](#injecting-messages) | | Ship data files | `Path(__file__).parent / "data" / "file.yaml"` | | Bundle skills | Copy `skill.md` to `~/.hermes/skills/` at load time | | Gate on env vars | `requires_env: [API_KEY]` in plugin.yaml | @@ -57,34 +57,6 @@ Plugins can register callbacks for these lifecycle events. See the **[Event Hook | `on_session_start` | New session created (first turn only) | | `on_session_end` | End of every `run_conversation` call | -## Slash commands - -Plugins can register slash commands that work in both CLI and messaging platforms: - -```python -def register(ctx): - ctx.register_command( - name="greet", - handler=lambda args: f"Hello, {args or 'world'}!", - description="Greet someone", - args_hint="[name]", - aliases=("hi",), - ) -``` - -The handler receives the argument string (everything after `/greet`) and returns a string to display. Registered commands automatically appear in `/help`, tab autocomplete, Telegram bot menu, and Slack subcommand mapping. - -| Parameter | Description | -|-----------|-------------| -| `name` | Command name without slash | -| `handler` | Callable that takes `args: str` and returns `str | None` | -| `description` | Shown in `/help` | -| `args_hint` | Usage hint, e.g. `"[name]"` | -| `aliases` | Tuple of alternative names | -| `cli_only` | Only available in CLI | -| `gateway_only` | Only available in messaging platforms | -| `gateway_config_gate` | Config dotpath (e.g. `"display.my_option"`). When set on a `cli_only` command, the command becomes available in the gateway if the config value is truthy. | - ## Managing plugins ```bash @@ -109,4 +81,27 @@ plugins: In a running session, `/plugins` shows which plugins are currently loaded. +## Injecting Messages + +Plugins can inject messages into the active conversation using `ctx.inject_message()`: + +```python +ctx.inject_message("New data arrived from the webhook", role="user") +``` + +**Signature:** `ctx.inject_message(content: str, role: str = "user") -> bool` + +How it works: + +- If the agent is **idle** (waiting for user input), the message is queued as the next input and starts a new turn. +- If the agent is **mid-turn** (actively running), the message interrupts the current operation — the same as a user typing a new message and pressing Enter. +- For non-`"user"` roles, the content is prefixed with `[role]` (e.g. `[system] ...`). +- Returns `True` if the message was queued successfully, `False` if no CLI reference is available (e.g. in gateway mode). + +This enables plugins like remote control viewers, messaging bridges, or webhook receivers to feed messages into the conversation from external sources. + +:::note +`inject_message` is only available in CLI mode. In gateway mode, there is no CLI reference and the method returns `False`. +::: + See the **[full guide](/docs/guides/build-a-hermes-plugin)** for handler contracts, schema format, hook behavior, error handling, and common mistakes. diff --git a/website/docs/user-guide/messaging/discord.md b/website/docs/user-guide/messaging/discord.md index df97930a6..2f40283ec 100644 --- a/website/docs/user-guide/messaging/discord.md +++ b/website/docs/user-guide/messaging/discord.md @@ -19,6 +19,7 @@ Before setup, here's the part most people want to know: how Hermes behaves once | **Free-response channels** | You can make specific channels mention-free with `DISCORD_FREE_RESPONSE_CHANNELS`, or disable mentions globally with `DISCORD_REQUIRE_MENTION=false`. | | **Threads** | Hermes replies in the same thread. Mention rules still apply unless that thread or its parent channel is configured as free-response. Threads stay isolated from the parent channel for session history. | | **Shared channels with multiple users** | By default, Hermes isolates session history per user inside the channel for safety and clarity. Two people talking in the same channel do not share one transcript unless you explicitly disable that. | +| **Messages mentioning other users** | When `DISCORD_IGNORE_NO_MENTION` is `true` (the default), Hermes stays silent if a message @mentions other users but does **not** mention the bot. This prevents the bot from jumping into conversations directed at other people. Set to `false` if you want the bot to respond to all messages regardless of who is mentioned. This only applies in server channels, not DMs. | :::tip If you want a normal bot-help channel where people can talk to Hermes without tagging it every time, add that channel to `DISCORD_FREE_RESPONSE_CHANNELS`. @@ -253,6 +254,9 @@ DISCORD_ALLOWED_USERS=284102345871466496 # Optional: channels where bot responds without @mention (comma-separated channel IDs) # DISCORD_FREE_RESPONSE_CHANNELS=1234567890,9876543210 + +# Optional: ignore messages that @mention other users but NOT the bot (default: true) +# DISCORD_IGNORE_NO_MENTION=true ``` Optional behavior settings in `~/.hermes/config.yaml`: diff --git a/website/docs/user-guide/messaging/feishu.md b/website/docs/user-guide/messaging/feishu.md index 1b7141e78..47901e353 100644 --- a/website/docs/user-guide/messaging/feishu.md +++ b/website/docs/user-guide/messaging/feishu.md @@ -18,7 +18,7 @@ The integration supports both connection modes: | Context | Behavior | |---------|----------| | Direct messages | Hermes responds to every message. | -| Group chats | Hermes responds when the bot is addressed in the chat. | +| Group chats | Hermes responds only when the bot is @mentioned in the chat. | | Shared group chats | By default, session history is isolated per user inside a shared chat. | This shared-chat behavior is controlled by `config.yaml`: @@ -46,12 +46,16 @@ Keep the App Secret private. Anyone with it can impersonate your app. ### Recommended: WebSocket mode -Use WebSocket mode when Hermes runs on your laptop, workstation, or a private server. No public URL is required. +Use WebSocket mode when Hermes runs on your laptop, workstation, or a private server. No public URL is required. The official Lark SDK opens and maintains a persistent outbound WebSocket connection with automatic reconnection. ```bash FEISHU_CONNECTION_MODE=websocket ``` +**Requirements:** The `websockets` Python package must be installed. The SDK handles connection lifecycle, heartbeats, and auto-reconnection internally. + +**How it works:** The adapter runs the Lark SDK's WebSocket client in a background executor thread. Inbound events (messages, reactions, card actions) are dispatched to the main asyncio loop. On disconnect, the SDK will attempt to reconnect automatically. + ### Optional: Webhook mode Use webhook mode only when you already run Hermes behind a reachable HTTP endpoint. @@ -60,12 +64,24 @@ Use webhook mode only when you already run Hermes behind a reachable HTTP endpoi FEISHU_CONNECTION_MODE=webhook ``` -In webhook mode, Hermes serves a Feishu endpoint at: +In webhook mode, Hermes starts an HTTP server (via `aiohttp`) and serves a Feishu endpoint at: ```text /feishu/webhook ``` +**Requirements:** The `aiohttp` Python package must be installed. + +You can customize the webhook server bind address and path: + +```bash +FEISHU_WEBHOOK_HOST=127.0.0.1 # default: 127.0.0.1 +FEISHU_WEBHOOK_PORT=8765 # default: 8765 +FEISHU_WEBHOOK_PATH=/feishu/webhook # default: /feishu/webhook +``` + +When Feishu sends a URL verification challenge (`type: url_verification`), the webhook responds automatically so you can complete the subscription setup in the Feishu developer console. + ## Step 3: Configure Hermes ### Option A: Interactive Setup @@ -116,13 +132,233 @@ FEISHU_HOME_CHANNEL=oc_xxx ## Security -For production use, set an allowlist: +### User Allowlist + +For production use, set an allowlist of Feishu Open IDs: ```bash FEISHU_ALLOWED_USERS=ou_xxx,ou_yyy ``` -If you leave the allowlist empty, anyone who can reach the bot may be able to use it. +If you leave the allowlist empty, anyone who can reach the bot may be able to use it. In group chats, the allowlist is checked against the sender's open_id before the message is processed. + +### Webhook Encryption Key + +When running in webhook mode, set an encryption key to enable signature verification of inbound webhook payloads: + +```bash +FEISHU_ENCRYPT_KEY=your-encrypt-key +``` + +This key is found in the **Event Subscriptions** section of your Feishu app configuration. When set, the adapter verifies every webhook request using the signature algorithm: + +``` +SHA256(timestamp + nonce + encrypt_key + body) +``` + +The computed hash is compared against the `x-lark-signature` header using timing-safe comparison. Requests with invalid or missing signatures are rejected with HTTP 401. + +:::tip +In WebSocket mode, signature verification is handled by the SDK itself, so `FEISHU_ENCRYPT_KEY` is optional. In webhook mode, it is strongly recommended for production. +::: + +### Verification Token + +An additional layer of authentication that checks the `token` field inside webhook payloads: + +```bash +FEISHU_VERIFICATION_TOKEN=your-verification-token +``` + +This token is also found in the **Event Subscriptions** section of your Feishu app. When set, every inbound webhook payload must contain a matching `token` in its `header` object. Mismatched tokens are rejected with HTTP 401. + +Both `FEISHU_ENCRYPT_KEY` and `FEISHU_VERIFICATION_TOKEN` can be used together for defense in depth. + +## Group Message Policy + +The `FEISHU_GROUP_POLICY` environment variable controls whether and how Hermes responds in group chats: + +```bash +FEISHU_GROUP_POLICY=allowlist # default +``` + +| Value | Behavior | +|-------|----------| +| `open` | Hermes responds to @mentions from any user in any group. | +| `allowlist` | Hermes only responds to @mentions from users listed in `FEISHU_ALLOWED_USERS`. | +| `disabled` | Hermes ignores all group messages entirely. | + +In all modes, the bot must be explicitly @mentioned (or @all) in the group before the message is processed. Direct messages bypass this gate. + +### Bot Identity for @Mention Gating + +For precise @mention detection in groups, the adapter needs to know the bot's identity. It can be provided explicitly: + +```bash +FEISHU_BOT_OPEN_ID=ou_xxx +FEISHU_BOT_USER_ID=xxx +FEISHU_BOT_NAME=MyBot +``` + +If none of these are set, the adapter will attempt to auto-discover the bot name via the Application Info API on startup. For this to work, grant the `admin:app.info:readonly` or `application:application:self_manage` permission scope. + +## Interactive Card Actions + +When users click buttons or interact with interactive cards sent by the bot, the adapter routes these as synthetic `/card` command events: + +- Button clicks become: `/card button {"key": "value", ...}` +- The action's `value` payload from the card definition is included as JSON. +- Card actions are deduplicated with a 15-minute window to prevent double processing. + +Card action events are dispatched with `MessageType.COMMAND`, so they flow through the normal command processing pipeline. + +To use this feature, enable the **Interactive Card** event in your Feishu app's event subscriptions (`card.action.trigger`). + +## Media Support + +### Inbound (receiving) + +The adapter receives and caches the following media types from users: + +| Type | Extensions | How it's processed | +|------|-----------|-------------------| +| **Images** | .jpg, .jpeg, .png, .gif, .webp, .bmp | Downloaded via Feishu API and cached locally | +| **Audio** | .ogg, .mp3, .wav, .m4a, .aac, .flac, .opus, .webm | Downloaded and cached; small text files are auto-extracted | +| **Video** | .mp4, .mov, .avi, .mkv, .webm, .m4v, .3gp | Downloaded and cached as documents | +| **Files** | .pdf, .doc, .docx, .xls, .xlsx, .ppt, .pptx, and more | Downloaded and cached as documents | + +Media from rich-text (post) messages, including inline images and file attachments, is also extracted and cached. + +For small text-based documents (.txt, .md), the file content is automatically injected into the message text so the agent can read it directly without needing tools. + +### Outbound (sending) + +| Method | What it sends | +|--------|--------------| +| `send` | Text or rich post messages (auto-detected based on markdown content) | +| `send_image` / `send_image_file` | Uploads image to Feishu, then sends as native image bubble (with optional caption) | +| `send_document` | Uploads file to Feishu API, then sends as file attachment | +| `send_voice` | Uploads audio file as a Feishu file attachment | +| `send_video` | Uploads video and sends as native media message | +| `send_animation` | GIFs are downgraded to file attachments (Feishu has no native GIF bubble) | + +File upload routing is automatic based on extension: + +- `.ogg`, `.opus` → uploaded as `opus` audio +- `.mp4`, `.mov`, `.avi`, `.m4v` → uploaded as `mp4` media +- `.pdf`, `.doc(x)`, `.xls(x)`, `.ppt(x)` → uploaded with their document type +- Everything else → uploaded as a generic stream file + +## Markdown Rendering and Post Fallback + +When outbound text contains markdown formatting (headings, bold, lists, code blocks, links, etc.), the adapter automatically sends it as a Feishu **post** message with an embedded `md` tag rather than as plain text. This enables rich rendering in the Feishu client. + +If the Feishu API rejects the post payload (e.g., due to unsupported markdown constructs), the adapter automatically falls back to sending as plain text with markdown stripped. This two-stage fallback ensures messages are always delivered. + +Plain text messages (no markdown detected) are sent as the simple `text` message type. + +## ACK Emoji Reactions + +When the adapter receives an inbound message, it immediately adds an ✅ (OK) emoji reaction to signal that the message was received and is being processed. This provides visual feedback before the agent completes its response. + +The reaction is persistent — it remains on the message after the response is sent, serving as a receipt marker. + +User reactions on bot messages are also tracked. If a user adds or removes an emoji reaction on a message sent by the bot, it is routed as a synthetic text event (`reaction:added:EMOJI_TYPE` or `reaction:removed:EMOJI_TYPE`) so the agent can respond to feedback. + +## Burst Protection and Batching + +The adapter includes debouncing for rapid message bursts to avoid overwhelming the agent: + +### Text Batching + +When a user sends multiple text messages in quick succession, they are merged into a single event before being dispatched: + +| Setting | Env Var | Default | +|---------|---------|---------| +| Quiet period | `HERMES_FEISHU_TEXT_BATCH_DELAY_SECONDS` | 0.6s | +| Max messages per batch | `HERMES_FEISHU_TEXT_BATCH_MAX_MESSAGES` | 8 | +| Max characters per batch | `HERMES_FEISHU_TEXT_BATCH_MAX_CHARS` | 4000 | + +### Media Batching + +Multiple media attachments sent in quick succession (e.g., dragging several images) are merged into a single event: + +| Setting | Env Var | Default | +|---------|---------|---------| +| Quiet period | `HERMES_FEISHU_MEDIA_BATCH_DELAY_SECONDS` | 0.8s | + +### Per-Chat Serialization + +Messages within the same chat are processed serially (one at a time) to maintain conversation coherence. Each chat has its own lock, so messages in different chats are processed concurrently. + +## Rate Limiting (Webhook Mode) + +In webhook mode, the adapter enforces per-IP rate limiting to protect against abuse: + +- **Window:** 60-second sliding window +- **Limit:** 120 requests per window per (app_id, path, IP) triple +- **Tracking cap:** Up to 4096 unique keys tracked (prevents unbounded memory growth) + +Requests that exceed the limit receive HTTP 429 (Too Many Requests). + +### Webhook Anomaly Tracking + +The adapter tracks consecutive error responses per IP address. After 25 consecutive errors from the same IP within a 6-hour window, a warning is logged. This helps detect misconfigured clients or probing attempts. + +Additional webhook protections: +- **Body size limit:** 1 MB maximum +- **Body read timeout:** 30 seconds +- **Content-Type enforcement:** Only `application/json` is accepted + +## Deduplication + +Inbound messages are deduplicated using message IDs with a 24-hour TTL. The dedup state is persisted across restarts to `~/.hermes/feishu_seen_message_ids.json`. + +| Setting | Env Var | Default | +|---------|---------|---------| +| Cache size | `HERMES_FEISHU_DEDUP_CACHE_SIZE` | 2048 entries | + +## All Environment Variables + +| Variable | Required | Default | Description | +|----------|----------|---------|-------------| +| `FEISHU_APP_ID` | ✅ | — | Feishu/Lark App ID | +| `FEISHU_APP_SECRET` | ✅ | — | Feishu/Lark App Secret | +| `FEISHU_DOMAIN` | — | `feishu` | `feishu` (China) or `lark` (international) | +| `FEISHU_CONNECTION_MODE` | — | `websocket` | `websocket` or `webhook` | +| `FEISHU_ALLOWED_USERS` | — | _(empty)_ | Comma-separated open_id list for user allowlist | +| `FEISHU_HOME_CHANNEL` | — | — | Chat ID for cron/notification output | +| `FEISHU_ENCRYPT_KEY` | — | _(empty)_ | Encrypt key for webhook signature verification | +| `FEISHU_VERIFICATION_TOKEN` | — | _(empty)_ | Verification token for webhook payload auth | +| `FEISHU_GROUP_POLICY` | — | `allowlist` | Group message policy: `open`, `allowlist`, `disabled` | +| `FEISHU_BOT_OPEN_ID` | — | _(empty)_ | Bot's open_id (for @mention detection) | +| `FEISHU_BOT_USER_ID` | — | _(empty)_ | Bot's user_id (for @mention detection) | +| `FEISHU_BOT_NAME` | — | _(empty)_ | Bot's display name (for @mention detection) | +| `FEISHU_WEBHOOK_HOST` | — | `127.0.0.1` | Webhook server bind address | +| `FEISHU_WEBHOOK_PORT` | — | `8765` | Webhook server port | +| `FEISHU_WEBHOOK_PATH` | — | `/feishu/webhook` | Webhook endpoint path | +| `HERMES_FEISHU_DEDUP_CACHE_SIZE` | — | `2048` | Max deduplicated message IDs to track | +| `HERMES_FEISHU_TEXT_BATCH_DELAY_SECONDS` | — | `0.6` | Text burst debounce quiet period | +| `HERMES_FEISHU_TEXT_BATCH_MAX_MESSAGES` | — | `8` | Max messages merged per text batch | +| `HERMES_FEISHU_TEXT_BATCH_MAX_CHARS` | — | `4000` | Max characters merged per text batch | +| `HERMES_FEISHU_MEDIA_BATCH_DELAY_SECONDS` | — | `0.8` | Media burst debounce quiet period | + +## Troubleshooting + +| Problem | Fix | +|---------|-----| +| `lark-oapi not installed` | Install the SDK: `pip install lark-oapi` | +| `websockets not installed; websocket mode unavailable` | Install websockets: `pip install websockets` | +| `aiohttp not installed; webhook mode unavailable` | Install aiohttp: `pip install aiohttp` | +| `FEISHU_APP_ID or FEISHU_APP_SECRET not set` | Set both env vars or configure via `hermes gateway setup` | +| `Another local Hermes gateway is already using this Feishu app_id` | Only one Hermes instance can use the same app_id at a time. Stop the other gateway first. | +| Bot doesn't respond in groups | Ensure the bot is @mentioned, check `FEISHU_GROUP_POLICY`, and verify the sender is in `FEISHU_ALLOWED_USERS` if policy is `allowlist` | +| `Webhook rejected: invalid verification token` | Ensure `FEISHU_VERIFICATION_TOKEN` matches the token in your Feishu app's Event Subscriptions config | +| `Webhook rejected: invalid signature` | Ensure `FEISHU_ENCRYPT_KEY` matches the encrypt key in your Feishu app config | +| Post messages show as plain text | The Feishu API rejected the post payload; this is normal fallback behavior. Check logs for details. | +| Images/files not received by bot | Grant `im:message` and `im:resource` permission scopes to your Feishu app | +| Bot identity not auto-detected | Grant `admin:app.info:readonly` scope, or set `FEISHU_BOT_OPEN_ID` / `FEISHU_BOT_NAME` manually | +| `Webhook rate limit exceeded` | More than 120 requests/minute from the same IP. This is usually a misconfiguration or loop. | ## Toolset diff --git a/website/docs/user-guide/messaging/matrix.md b/website/docs/user-guide/messaging/matrix.md index 020e15bd6..70b8855a2 100644 --- a/website/docs/user-guide/messaging/matrix.md +++ b/website/docs/user-guide/messaging/matrix.md @@ -352,3 +352,4 @@ For more information on securing your Hermes Agent deployment, see the [Security - **Federation**: If you're on a federated homeserver, the bot can communicate with users from other servers — just add their full `@user:server` IDs to `MATRIX_ALLOWED_USERS`. - **Auto-join**: The bot automatically accepts room invites and joins. It starts responding immediately after joining. - **Media support**: Hermes can send and receive images, audio, video, and file attachments. Media is uploaded to your homeserver using the Matrix content repository API. +- **Native voice messages (MSC3245)**: The Matrix adapter automatically tags outgoing voice messages with the `org.matrix.msc3245.voice` flag. This means TTS responses and voice audio are rendered as **native voice bubbles** in Element and other clients that support MSC3245, rather than as generic audio file attachments. Incoming voice messages with the MSC3245 flag are also correctly identified and routed to speech-to-text transcription. No configuration is needed — this works automatically. diff --git a/website/docs/user-guide/messaging/slack.md b/website/docs/user-guide/messaging/slack.md index f011dcd78..21511f77d 100644 --- a/website/docs/user-guide/messaging/slack.md +++ b/website/docs/user-guide/messaging/slack.md @@ -237,6 +237,60 @@ Make sure the bot has been **invited to the channel** (`/invite @Hermes Agent`). --- +## Multi-Workspace Support + +Hermes can connect to **multiple Slack workspaces** simultaneously using a single gateway instance. Each workspace is authenticated independently with its own bot user ID. + +### Configuration + +Provide multiple bot tokens as a **comma-separated list** in `SLACK_BOT_TOKEN`: + +```bash +# Multiple bot tokens — one per workspace +SLACK_BOT_TOKEN=xoxb-workspace1-token,xoxb-workspace2-token,xoxb-workspace3-token + +# A single app-level token is still used for Socket Mode +SLACK_APP_TOKEN=xapp-your-app-token +``` + +Or in `~/.hermes/config.yaml`: + +```yaml +platforms: + slack: + token: "xoxb-workspace1-token,xoxb-workspace2-token" +``` + +### OAuth Token File + +In addition to tokens in the environment or config, Hermes also loads tokens from an **OAuth token file** at: + +``` +~/.hermes/platforms/slack/slack_tokens.json +``` + +This file is a JSON object mapping team IDs to token entries: + +```json +{ + "T01ABC2DEF3": { + "token": "xoxb-workspace-token-here", + "team_name": "My Workspace" + } +} +``` + +Tokens from this file are merged with any tokens specified via `SLACK_BOT_TOKEN`. Duplicate tokens are automatically deduplicated. + +### How it works + +- The **first token** in the list is the primary token, used for the Socket Mode connection (AsyncApp). +- Each token is authenticated via `auth.test` on startup. The gateway maps each `team_id` to its own `WebClient` and `bot_user_id`. +- When a message arrives, Hermes uses the correct workspace-specific client to respond. +- The primary `bot_user_id` (from the first token) is used for backward compatibility with features that expect a single bot identity. + +--- + ## Voice Messages Hermes supports voice on Slack: diff --git a/website/docs/user-guide/messaging/telegram.md b/website/docs/user-guide/messaging/telegram.md index be99eaa75..c984ecdbc 100644 --- a/website/docs/user-guide/messaging/telegram.md +++ b/website/docs/user-guide/messaging/telegram.md @@ -258,6 +258,73 @@ Topics created outside of the config (e.g., by manually calling the Telegram API - **Privacy policy:** Telegram now requires bots to have a privacy policy. Set one via BotFather with `/setprivacy_policy`, or Telegram may auto-generate a placeholder. This is particularly important if your bot is public-facing. - **Message streaming:** Bot API 9.x added support for streaming long responses, which can improve perceived latency for lengthy agent replies. +## Webhook Mode + +By default, the Telegram adapter connects via **long polling** — the gateway makes outbound connections to Telegram's servers. This works everywhere but keeps a persistent connection open. + +**Webhook mode** is an alternative where Telegram pushes updates to your server over HTTPS. This is ideal for **serverless and cloud deployments** (Fly.io, Railway, etc.) where inbound HTTP can wake a suspended machine. + +### Configuration + +Set the `TELEGRAM_WEBHOOK_URL` environment variable to enable webhook mode: + +```bash +# Required — your public HTTPS endpoint +TELEGRAM_WEBHOOK_URL=https://app.fly.dev/telegram + +# Optional — local listen port (default: 8443) +TELEGRAM_WEBHOOK_PORT=8443 + +# Optional — secret token for update verification (auto-generated if not set) +TELEGRAM_WEBHOOK_SECRET=my-secret-token +``` + +Or in `~/.hermes/config.yaml`: + +```yaml +telegram: + webhook_mode: true +``` + +When `TELEGRAM_WEBHOOK_URL` is set, the gateway starts an HTTP server listening on `0.0.0.0:` and registers the webhook URL with Telegram. The URL path is extracted from the webhook URL (defaults to `/telegram`). + +:::warning +Telegram requires a **valid TLS certificate** on the webhook endpoint. Self-signed certificates will be rejected. Use a reverse proxy (nginx, Caddy) or a platform that provides TLS termination (Fly.io, Railway, Cloudflare Tunnel). +::: + +## DNS-over-HTTPS Fallback IPs + +In some restricted networks, `api.telegram.org` may resolve to an IP that is unreachable. The Telegram adapter includes a **fallback IP** mechanism that transparently retries connections against alternative IPs while preserving the correct TLS hostname and SNI. + +### How it works + +1. If `TELEGRAM_FALLBACK_IPS` is set, those IPs are used directly. +2. Otherwise, the adapter automatically queries **Google DNS** and **Cloudflare DNS** via DNS-over-HTTPS (DoH) to discover alternative IPs for `api.telegram.org`. +3. IPs returned by DoH that differ from the system DNS result are used as fallbacks. +4. If DoH is also blocked, a hardcoded seed IP (`149.154.167.220`) is used as a last resort. +5. Once a fallback IP succeeds, it becomes "sticky" — subsequent requests use it directly without retrying the primary path first. + +### Configuration + +```bash +# Explicit fallback IPs (comma-separated) +TELEGRAM_FALLBACK_IPS=149.154.167.220,149.154.167.221 +``` + +Or in `~/.hermes/config.yaml`: + +```yaml +platforms: + telegram: + extra: + fallback_ips: + - "149.154.167.220" +``` + +:::tip +You usually don't need to configure this manually. The auto-discovery via DoH handles most restricted-network scenarios. The `TELEGRAM_FALLBACK_IPS` env var is only needed if DoH is also blocked on your network. +::: + ## Troubleshooting | Problem | Solution | diff --git a/website/docs/user-guide/messaging/wecom.md b/website/docs/user-guide/messaging/wecom.md index e5a551b8f..1a078a892 100644 --- a/website/docs/user-guide/messaging/wecom.md +++ b/website/docs/user-guide/messaging/wecom.md @@ -13,6 +13,7 @@ Connect Hermes to [WeCom](https://work.weixin.qq.com/) (企业微信), Tencent's - A WeCom organization account - An AI Bot created in the WeCom Admin Console - The Bot ID and Secret from the bot's credentials page +- Python packages: `aiohttp` and `httpx` ## Setup @@ -56,10 +57,12 @@ hermes gateway start - **WebSocket transport** — persistent connection, no public endpoint needed - **DM and group messaging** — configurable access policies +- **Per-group sender allowlists** — fine-grained control over who can interact in each group - **Media support** — images, files, voice, video upload and download - **AES-encrypted media** — automatic decryption for inbound attachments - **Quote context** — preserves reply threading - **Markdown rendering** — rich text responses +- **Reply-mode streaming** — correlates responses to inbound message context - **Auto-reconnect** — exponential backoff on connection drops ## Configuration Options @@ -75,12 +78,187 @@ Set these in `config.yaml` under `platforms.wecom.extra`: | `group_policy` | `open` | Group access: `open`, `allowlist`, `disabled` | | `allow_from` | `[]` | User IDs allowed for DMs (when dm_policy=allowlist) | | `group_allow_from` | `[]` | Group IDs allowed (when group_policy=allowlist) | +| `groups` | `{}` | Per-group configuration (see below) | + +## Access Policies + +### DM Policy + +Controls who can send direct messages to the bot: + +| Value | Behavior | +|-------|----------| +| `open` | Anyone can DM the bot (default) | +| `allowlist` | Only user IDs in `allow_from` can DM | +| `disabled` | All DMs are ignored | +| `pairing` | Pairing mode (for initial setup) | + +```bash +WECOM_DM_POLICY=allowlist +``` + +### Group Policy + +Controls which groups the bot responds in: + +| Value | Behavior | +|-------|----------| +| `open` | Bot responds in all groups (default) | +| `allowlist` | Bot only responds in group IDs listed in `group_allow_from` | +| `disabled` | All group messages are ignored | + +```bash +WECOM_GROUP_POLICY=allowlist +``` + +### Per-Group Sender Allowlists + +For fine-grained control, you can restrict which users are allowed to interact with the bot within specific groups. This is configured in `config.yaml`: + +```yaml +platforms: + wecom: + enabled: true + extra: + bot_id: "your-bot-id" + secret: "your-secret" + group_policy: "allowlist" + group_allow_from: + - "group_id_1" + - "group_id_2" + groups: + group_id_1: + allow_from: + - "user_alice" + - "user_bob" + group_id_2: + allow_from: + - "user_charlie" + "*": + allow_from: + - "user_admin" +``` + +**How it works:** + +1. The `group_policy` and `group_allow_from` controls determine whether a group is allowed at all. +2. If a group passes the top-level check, the `groups..allow_from` list (if present) further restricts which senders within that group can interact with the bot. +3. A wildcard `"*"` group entry serves as a default for groups not explicitly listed. +4. Allowlist entries support the `*` wildcard to allow all users, and entries are case-insensitive. +5. Entries can optionally use the `wecom:user:` or `wecom:group:` prefix format — the prefix is stripped automatically. + +If no `allow_from` is configured for a group, all users in that group are allowed (assuming the group itself passes the top-level policy check). + +## Media Support + +### Inbound (receiving) + +The adapter receives media attachments from users and caches them locally for agent processing: + +| Type | How it's handled | +|------|-----------------| +| **Images** | Downloaded and cached locally. Supports both URL-based and base64-encoded images. | +| **Files** | Downloaded and cached. Filename is preserved from the original message. | +| **Voice** | Voice message text transcription is extracted if available. | +| **Mixed messages** | WeCom mixed-type messages (text + images) are parsed and all components extracted. | + +**Quoted messages:** Media from quoted (replied-to) messages is also extracted, so the agent has context about what the user is replying to. + +### AES-Encrypted Media Decryption + +WeCom encrypts some inbound media attachments with AES-256-CBC. The adapter handles this automatically: + +- When an inbound media item includes an `aeskey` field, the adapter downloads the encrypted bytes and decrypts them using AES-256-CBC with PKCS#7 padding. +- The AES key is the base64-decoded value of the `aeskey` field (must be exactly 32 bytes). +- The IV is derived from the first 16 bytes of the key. +- This requires the `cryptography` Python package (`pip install cryptography`). + +No configuration is needed — decryption happens transparently when encrypted media is received. + +### Outbound (sending) + +| Method | What it sends | Size limit | +|--------|--------------|------------| +| `send` | Markdown text messages | 4000 chars | +| `send_image` / `send_image_file` | Native image messages | 10 MB | +| `send_document` | File attachments | 20 MB | +| `send_voice` | Voice messages (AMR format only for native voice) | 2 MB | +| `send_video` | Video messages | 10 MB | + +**Chunked upload:** Files are uploaded in 512 KB chunks through a three-step protocol (init → chunks → finish). The adapter handles this automatically. + +**Automatic downgrade:** When media exceeds the native type's size limit but is under the absolute 20 MB file limit, it is automatically sent as a generic file attachment instead: + +- Images > 10 MB → sent as file +- Videos > 10 MB → sent as file +- Voice > 2 MB → sent as file +- Non-AMR audio → sent as file (WeCom only supports AMR for native voice) + +Files exceeding the absolute 20 MB limit are rejected with an informational message sent to the chat. + +## Reply-Mode Stream Responses + +When the bot receives a message via the WeCom callback, the adapter remembers the inbound request ID. If a response is sent while the request context is still active, the adapter uses WeCom's reply-mode (`aibot_respond_msg`) with streaming to correlate the response directly to the inbound message. This provides a more natural conversation experience in the WeCom client. + +If the inbound request context has expired or is unavailable, the adapter falls back to proactive message sending via `aibot_send_msg`. + +Reply-mode also works for media: uploaded media can be sent as a reply to the originating message. + +## Connection and Reconnection + +The adapter maintains a persistent WebSocket connection to WeCom's gateway at `wss://openws.work.weixin.qq.com`. + +### Connection Lifecycle + +1. **Connect:** Opens a WebSocket connection and sends an `aibot_subscribe` authentication frame with the bot_id and secret. +2. **Heartbeat:** Sends application-level ping frames every 30 seconds to keep the connection alive. +3. **Listen:** Continuously reads inbound frames and dispatches message callbacks. + +### Reconnection Behavior + +On connection loss, the adapter uses exponential backoff to reconnect: + +| Attempt | Delay | +|---------|-------| +| 1st retry | 2 seconds | +| 2nd retry | 5 seconds | +| 3rd retry | 10 seconds | +| 4th retry | 30 seconds | +| 5th+ retry | 60 seconds | + +After each successful reconnection, the backoff counter resets to zero. All pending request futures are failed on disconnect so callers don't hang indefinitely. + +### Deduplication + +Inbound messages are deduplicated using message IDs with a 5-minute window and a maximum cache of 1000 entries. This prevents double-processing of messages during reconnection or network hiccups. + +## All Environment Variables + +| Variable | Required | Default | Description | +|----------|----------|---------|-------------| +| `WECOM_BOT_ID` | ✅ | — | WeCom AI Bot ID | +| `WECOM_SECRET` | ✅ | — | WeCom AI Bot Secret | +| `WECOM_ALLOWED_USERS` | — | _(empty)_ | Comma-separated user IDs for the gateway-level allowlist | +| `WECOM_HOME_CHANNEL` | — | — | Chat ID for cron/notification output | +| `WECOM_WEBSOCKET_URL` | — | `wss://openws.work.weixin.qq.com` | WebSocket gateway URL | +| `WECOM_DM_POLICY` | — | `open` | DM access policy | +| `WECOM_GROUP_POLICY` | — | `open` | Group access policy | ## Troubleshooting | Problem | Fix | |---------|-----| -| "WECOM_BOT_ID and WECOM_SECRET are required" | Set both env vars or configure in setup wizard | -| "invalid secret (errcode=40013)" | Verify the secret matches your bot's credentials | -| "Timed out waiting for subscribe acknowledgement" | Check network connectivity to `openws.work.weixin.qq.com` | -| Bot doesn't respond in groups | Check `group_policy` setting and group allowlist | +| `WECOM_BOT_ID and WECOM_SECRET are required` | Set both env vars or configure in setup wizard | +| `WeCom startup failed: aiohttp not installed` | Install aiohttp: `pip install aiohttp` | +| `WeCom startup failed: httpx not installed` | Install httpx: `pip install httpx` | +| `invalid secret (errcode=40013)` | Verify the secret matches your bot's credentials | +| `Timed out waiting for subscribe acknowledgement` | Check network connectivity to `openws.work.weixin.qq.com` | +| Bot doesn't respond in groups | Check `group_policy` setting and ensure the group ID is in `group_allow_from` | +| Bot ignores certain users in a group | Check per-group `allow_from` lists in the `groups` config section | +| Media decryption fails | Install `cryptography`: `pip install cryptography` | +| `cryptography is required for WeCom media decryption` | The inbound media is AES-encrypted. Install: `pip install cryptography` | +| Voice messages sent as files | WeCom only supports AMR format for native voice. Other formats are auto-downgraded to file. | +| `File too large` error | WeCom has a 20 MB absolute limit on all file uploads. Compress or split the file. | +| Images sent as files | Images > 10 MB exceed the native image limit and are auto-downgraded to file attachments. | +| `Timeout sending message to WeCom` | The WebSocket may have disconnected. Check logs for reconnection messages. | +| `WeCom websocket closed during authentication` | Network issue or incorrect credentials. Verify bot_id and secret. | diff --git a/website/docs/user-guide/security.md b/website/docs/user-guide/security.md index 4d51161e1..195583639 100644 --- a/website/docs/user-guide/security.md +++ b/website/docs/user-guide/security.md @@ -22,6 +22,61 @@ The security model has five layers: Before executing any command, Hermes checks it against a curated list of dangerous patterns. If a match is found, the user must explicitly approve it. +### Approval Modes + +The approval system supports three modes, configured via `approvals.mode` in `~/.hermes/config.yaml`: + +```yaml +approvals: + mode: manual # manual | smart | off + timeout: 60 # seconds to wait for user response (default: 60) +``` + +| Mode | Behavior | +|------|----------| +| **manual** (default) | Always prompt the user for approval on dangerous commands | +| **smart** | Use an auxiliary LLM to assess risk. Low-risk commands (e.g., `python -c "print('hello')"`) are auto-approved. Genuinely dangerous commands are auto-denied. Uncertain cases escalate to a manual prompt. | +| **off** | Disable all approval checks — equivalent to running with `--yolo`. All commands execute without prompts. | + +:::warning +Setting `approvals.mode: off` disables all safety prompts. Use only in trusted environments (CI/CD, containers, etc.). +::: + +### YOLO Mode + +YOLO mode bypasses **all** dangerous command approval prompts for the current session. It can be activated three ways: + +1. **CLI flag**: Start a session with `hermes --yolo` or `hermes chat --yolo` +2. **Slash command**: Type `/yolo` during a session to toggle it on/off +3. **Environment variable**: Set `HERMES_YOLO_MODE=1` + +The `/yolo` command is a **toggle** — each use flips the mode on or off: + +``` +> /yolo + ⚡ YOLO mode ON — all commands auto-approved. Use with caution. + +> /yolo + ⚠ YOLO mode OFF — dangerous commands will require approval. +``` + +YOLO mode is available in both CLI and gateway sessions. Internally, it sets the `HERMES_YOLO_MODE` environment variable which is checked before every command execution. + +:::danger +YOLO mode disables **all** dangerous command safety checks for the session. Use only when you fully trust the commands being generated (e.g., well-tested automation scripts in disposable environments). +::: + +### Approval Timeout + +When a dangerous command prompt appears, the user has a configurable amount of time to respond. If no response is given within the timeout, the command is **denied** by default (fail-closed). + +Configure the timeout in `~/.hermes/config.yaml`: + +```yaml +approvals: + timeout: 60 # seconds (default: 60) +``` + ### What Triggers Approval The following patterns trigger approval prompts (defined in `tools/approval.py`): @@ -30,21 +85,32 @@ The following patterns trigger approval prompts (defined in `tools/approval.py`) |---------|-------------| | `rm -r` / `rm --recursive` | Recursive delete | | `rm ... /` | Delete in root path | -| `chmod 777` | World-writable permissions | +| `chmod 777/666` / `o+w` / `a+w` | World/other-writable permissions | +| `chmod --recursive` with unsafe perms | Recursive world/other-writable (long flag) | +| `chown -R root` / `chown --recursive root` | Recursive chown to root | | `mkfs` | Format filesystem | | `dd if=` | Disk copy | +| `> /dev/sd` | Write to block device | | `DROP TABLE/DATABASE` | SQL DROP | | `DELETE FROM` (without WHERE) | SQL DELETE without WHERE | | `TRUNCATE TABLE` | SQL TRUNCATE | | `> /etc/` | Overwrite system config | | `systemctl stop/disable/mask` | Stop/disable system services | | `kill -9 -1` | Kill all processes | -| `curl ... \| sh` | Pipe remote content to shell | -| `bash -c`, `python -e` | Shell/script execution via flags | -| `find -exec rm`, `find -delete` | Find with destructive actions | +| `pkill -9` | Force kill processes | | Fork bomb patterns | Fork bombs | +| `bash -c` / `sh -c` / `zsh -c` / `ksh -c` | Shell command execution via `-c` flag (including combined flags like `-lc`) | +| `python -e` / `perl -e` / `ruby -e` / `node -c` | Script execution via `-e`/`-c` flag | +| `curl ... \| sh` / `wget ... \| sh` | Pipe remote content to shell | +| `bash <(curl ...)` / `sh <(wget ...)` | Execute remote script via process substitution | +| `tee` to `/etc/`, `~/.ssh/`, `~/.hermes/.env` | Overwrite sensitive file via tee | +| `>` / `>>` to `/etc/`, `~/.ssh/`, `~/.hermes/.env` | Overwrite sensitive file via redirection | +| `xargs rm` | xargs with rm | +| `find -exec rm` / `find -delete` | Find with destructive actions | +| `cp`/`mv`/`install` to `/etc/` | Copy/move file into system config | +| `sed -i` / `sed --in-place` on `/etc/` | In-place edit of system config | | `pkill`/`killall` hermes/gateway | Self-termination prevention | -| `gateway run` with `&`/`disown`/`nohup` | Prevents starting gateway outside service manager | +| `gateway run` with `&`/`disown`/`nohup`/`setsid` | Prevents starting gateway outside service manager | :::info **Container bypass**: When running in `docker`, `singularity`, `modal`, or `daytona` backends, dangerous command checks are **skipped** because the container itself is the security boundary. Destructive commands inside a container can't harm the host.