docs: comprehensive docs audit — cover 13 features from last week's PRs (#5815)

Cover documentation gaps found by auditing all 50+ merged PRs from the past week:

tools-reference.md:
- Fix stale tool count (47→46, 11→10 browser tools) after browser_close removal
- Document notify_on_complete parameter in terminal tool description

telegram.md:
- Add Interactive Model Picker section (inline keyboard, provider/model drill-down)

discord.md:
- Add Interactive Model Picker section (Select dropdowns, 120s timeout)
- Add Native Slash Commands for Skills section (auto-registration at startup)

signal.md:
- Expand Attachments section with outgoing media delivery (send_image_file,
  send_voice, send_video, send_document via MEDIA: tags)

webhooks.md:
- Document {__raw__} special template token for full payload access
- Document Forum Topic Delivery via message_thread_id in deliver_extra

slack.md:
- Fix stale/misleading thread reply docs — thread replies no longer require
  @mention when bot has active session (3 locations updated)

security.md:
- Add cross-session isolation (layer 6) and input sanitization (layer 7)
  to security layers overview

feishu.md:
- Add WebSocket Tuning section (ws_reconnect_interval, ws_ping_interval)
- Add Per-Group Access Control section (group_rules with 5 policy types)

credential-pools.md:
- Add Delegation & Subagent Sharing section

delegation.md:
- Update key properties to mention credential pool inheritance

providers.md:
- Add Z.AI Endpoint Auto-Detection note
- Add xAI (Grok) Prompt Caching section

skills-catalog.md:
- Add p5js to creative skills category
This commit is contained in:
Teknium
2026-04-07 10:21:03 -07:00
committed by GitHub
parent c58e16757a
commit afe6c63c52
12 changed files with 158 additions and 11 deletions

View File

@@ -168,6 +168,16 @@ model:
Base URLs can be overridden with `GLM_BASE_URL`, `KIMI_BASE_URL`, `MINIMAX_BASE_URL`, `MINIMAX_CN_BASE_URL`, or `DASHSCOPE_BASE_URL` environment variables.
:::note Z.AI Endpoint Auto-Detection
When using the Z.AI / GLM provider, Hermes automatically probes multiple endpoints (global, China, coding variants) to find one that accepts your API key. You don't need to set `GLM_BASE_URL` manually — the working endpoint is detected and cached automatically.
:::
### xAI (Grok) Prompt Caching
When using xAI as a provider (any base URL containing `x.ai`), Hermes automatically enables prompt caching by sending the `x-grok-conv-id` header with every API request. This routes requests to the same server within a conversation session, allowing xAI's infrastructure to reuse cached system prompts and conversation history.
No configuration is needed — caching activates automatically when an xAI endpoint is detected and a session ID is available. This reduces latency and cost for multi-turn conversations.
### Hugging Face Inference Providers
[Hugging Face Inference Providers](https://huggingface.co/docs/inference-providers) routes to 20+ open models through a unified OpenAI-compatible endpoint (`router.huggingface.co/v1`). Requests are automatically routed to the fastest available backend (Groq, Together, SambaNova, etc.) with automatic failover.

View File

@@ -47,6 +47,7 @@ Creative content generation — ASCII art, hand-drawn style diagrams, and visual
| `ascii-art` | Generate ASCII art using pyfiglet (571 fonts), cowsay, boxes, toilet, image-to-ascii, remote APIs (asciified, ascii.co.uk), and LLM fallback. No API keys required. | `creative/ascii-art` |
| `ascii-video` | "Production pipeline for ASCII art video — any format. Converts video/audio/images/generative input into colored ASCII character video output (MP4, GIF, image sequence). Covers: video-to-ASCII conversion, audio-reactive music visualizers, generative ASCII art animations, hybrid… | `creative/ascii-video` |
| `excalidraw` | Create hand-drawn style diagrams using Excalidraw JSON format. Generate .excalidraw files for architecture diagrams, flowcharts, sequence diagrams, concept maps, and more. Files can be opened at excalidraw.com or uploaded for shareable links. | `creative/excalidraw` |
| `p5js` | Production pipeline for interactive and generative visual art using p5.js. Create sketches, render them to images/video via headless browser, and serve live previews. Supports canvas animations, data visualizations, and creative coding experiments. | `creative/p5js` |
## devops

View File

@@ -6,9 +6,9 @@ description: "Authoritative reference for Hermes built-in tools, grouped by tool
# Built-in Tools Reference
This page documents all 48 built-in tools in the Hermes tool registry, grouped by toolset. Availability varies by platform, credentials, and enabled toolsets.
This page documents all 47 built-in tools in the Hermes tool registry, grouped by toolset. Availability varies by platform, credentials, and enabled toolsets.
**Quick counts:** 11 browser tools, 4 file tools, 10 RL tools, 4 Home Assistant tools, 2 terminal tools, 2 web tools, and 15 standalone tools across other toolsets.
**Quick counts:** 10 browser tools, 4 file tools, 10 RL tools, 4 Home Assistant tools, 2 terminal tools, 2 web tools, and 15 standalone tools across other toolsets.
:::tip MCP Tools
In addition to built-in tools, Hermes can load tools dynamically from MCP servers. MCP tools appear with a server-name prefix (e.g., `github_create_issue` for the `github` MCP server). See [MCP Integration](/docs/user-guide/features/mcp) for configuration.
@@ -133,7 +133,7 @@ In addition to built-in tools, Hermes can load tools dynamically from MCP server
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `process` | Manage background processes started with terminal(background=true). Actions: 'list' (show all), 'poll' (check status + new output), 'log' (full output with pagination), 'wait' (block until done or timeout), 'kill' (terminate), 'write' (sen… | — |
| `terminal` | Execute shell commands on a Linux environment. Filesystem persists between calls. Do NOT use cat/head/tail to read files — use read_file instead. Do NOT use grep/rg/find to search — use search_files instead. Do NOT use ls to list directori… | — |
| `terminal` | Execute shell commands on a Linux environment. Filesystem persists between calls. Set `background=true` for long-running servers. Set `notify_on_complete=true` (with `background=true`) to get an automatic notification when the process finishes — no polling needed. Do NOT use cat/head/tail — use read_file. Do NOT use grep/rg/find — use search_files. | — |
## `todo` toolset

View File

@@ -179,6 +179,16 @@ Hermes automatically discovers credentials from multiple sources and seeds the p
Auto-seeded entries are updated on each pool load — if you remove an env var, its pool entry is automatically pruned. Manual entries (added via `hermes auth add`) are never auto-pruned.
## Delegation & Subagent Sharing
When the agent spawns subagents via `delegate_task`, the parent's credential pool is automatically shared with children:
- **Same provider** — the child receives the parent's full pool, enabling key rotation on rate limits
- **Different provider** — the child loads that provider's own pool (if configured)
- **No pool configured** — the child falls back to the inherited single API key
This means subagents benefit from the same rate-limit resilience as the parent, with no extra configuration needed. Per-task credential leasing ensures children don't conflict with each other when rotating keys concurrently.
## Thread Safety
The credential pool uses a threading lock for all state mutations (`select()`, `mark_exhausted_and_rotate()`, `try_refresh_current()`, `mark_used()`). This ensures safe concurrent access when the gateway handles multiple chat sessions simultaneously.

View File

@@ -184,7 +184,7 @@ Delegation has a **depth limit of 2** — a parent (depth 0) can spawn children
- Subagents **cannot** call: `delegate_task`, `clarify`, `memory`, `send_message`, `execute_code`
- **Interrupt propagation** — interrupting the parent interrupts all active children
- Only the final summary enters the parent's context, keeping token usage efficient
- Subagents inherit the parent's **API key and provider configuration**
- Subagents inherit the parent's **API key, provider configuration, and credential pool** (enabling key rotation on rate limits)
## Delegation vs execute_code

View File

@@ -383,6 +383,26 @@ display:
tool_progress_command: true
```
## Interactive Model Picker
Send `/model` with no arguments in a Discord channel to open a dropdown-based model picker:
1. **Provider selection** — a Select dropdown showing available providers (up to 25).
2. **Model selection** — a second dropdown with models for the chosen provider (up to 25).
The picker times out after 120 seconds. Only authorized users (those in `DISCORD_ALLOWED_USERS`) can interact with it. If you know the model name, type `/model <name>` directly.
## Native Slash Commands for Skills
Hermes automatically registers installed skills as **native Discord Application Commands**. This means skills appear in Discord's autocomplete `/` menu alongside built-in commands.
- Each skill becomes a Discord slash command (e.g., `/code-review`, `/ascii-art`)
- Skills accept an optional `args` string parameter
- Discord has a limit of 100 application commands per bot — if you have more skills than available slots, extra skills are skipped with a warning in the logs
- Skills are registered during bot startup alongside built-in commands like `/model`, `/reset`, and `/background`
No extra configuration is needed — any skill installed via `hermes skills install` is automatically registered as a Discord slash command on the next gateway restart.
## Home Channel
You can designate a "home channel" where the bot sends proactive messages (such as cron job output, reminders, and notifications). There are two ways to set it:

View File

@@ -310,6 +310,58 @@ Additional webhook protections:
- **Body read timeout:** 30 seconds
- **Content-Type enforcement:** Only `application/json` is accepted
## WebSocket Tuning
When using `websocket` mode, you can customize reconnect and ping behavior:
```yaml
platforms:
feishu:
extra:
ws_reconnect_interval: 120 # Seconds between reconnect attempts (default: 120)
ws_ping_interval: 30 # Seconds between WebSocket pings (optional; SDK default if unset)
```
| Setting | Config key | Default | Description |
|---------|-----------|---------|-------------|
| Reconnect interval | `ws_reconnect_interval` | 120s | How long to wait between reconnection attempts |
| Ping interval | `ws_ping_interval` | _(SDK default)_ | Frequency of WebSocket keepalive pings |
## Per-Group Access Control
Beyond the global `FEISHU_GROUP_POLICY`, you can set fine-grained rules per group chat using `group_rules` in config.yaml:
```yaml
platforms:
feishu:
extra:
default_group_policy: "open" # Default for groups not in group_rules
admins: # Users who can manage bot settings
- "ou_admin_open_id"
group_rules:
"oc_group_chat_id_1":
policy: "allowlist" # open | allowlist | blacklist | admin_only | disabled
allowlist:
- "ou_user_open_id_1"
- "ou_user_open_id_2"
"oc_group_chat_id_2":
policy: "admin_only"
"oc_group_chat_id_3":
policy: "blacklist"
blacklist:
- "ou_blocked_user"
```
| Policy | Description |
|--------|-------------|
| `open` | Anyone in the group can use the bot |
| `allowlist` | Only users in the group's `allowlist` can use the bot |
| `blacklist` | Everyone except users in the group's `blacklist` can use the bot |
| `admin_only` | Only users in the global `admins` list can use the bot in this group |
| `disabled` | Bot ignores all messages in this group |
Groups not listed in `group_rules` fall back to `default_group_policy` (defaults to the value of `FEISHU_GROUP_POLICY`).
## Deduplication
Inbound messages are deduplicated using message IDs with a 24-hour TTL. The dedup state is persisted across restarts to `~/.hermes/feishu_seen_message_ids.json`.
@@ -343,6 +395,8 @@ Inbound messages are deduplicated using message IDs with a 24-hour TTL. The dedu
| `HERMES_FEISHU_TEXT_BATCH_MAX_CHARS` | — | `4000` | Max characters merged per text batch |
| `HERMES_FEISHU_MEDIA_BATCH_DELAY_SECONDS` | — | `0.8` | Media burst debounce quiet period |
WebSocket and per-group ACL settings are configured via `config.yaml` under `platforms.feishu.extra` (see [WebSocket Tuning](#websocket-tuning) and [Per-Group Access Control](#per-group-access-control) above).
## Troubleshooting
| Problem | Fix |

View File

@@ -147,13 +147,26 @@ Group access is controlled by the `SIGNAL_GROUP_ALLOWED_USERS` env var:
### Attachments
The adapter supports sending and receiving:
The adapter supports sending and receiving media in both directions.
**Incoming** (user → agent):
- **Images** — PNG, JPEG, GIF, WebP (auto-detected via magic bytes)
- **Audio** — MP3, OGG, WAV, M4A (voice messages transcribed if Whisper is configured)
- **Documents** — PDF, ZIP, and other file types
Attachment size limit: **100 MB**.
**Outgoing** (agent → user):
The agent can send media files via `MEDIA:` tags in responses. The following delivery methods are supported:
- **Images** — `send_image_file` sends PNG, JPEG, GIF, WebP as native Signal attachments
- **Voice** — `send_voice` sends audio files (OGG, MP3, WAV, M4A, AAC) as attachments
- **Video** — `send_video` sends MP4 video files
- **Documents** — `send_document` sends any file type (PDF, ZIP, etc.)
All outgoing media goes through Signal's standard attachment API. Unlike some platforms, Signal does not distinguish between voice messages and file attachments at the protocol level.
Attachment size limit: **100 MB** (both directions).
### Typing Indicators

View File

@@ -210,11 +210,10 @@ Understanding how Hermes behaves in different contexts:
|---------|----------|
| **DMs** | Bot responds to every message — no @mention needed |
| **Channels** | Bot **only responds when @mentioned** (e.g., `@Hermes Agent what time is it?`). In channels, Hermes replies in a thread attached to that message. |
| **Threads** | If you @mention Hermes inside an existing thread, it replies in that same thread. |
| **Threads** | If you @mention Hermes inside an existing thread, it replies in that same thread. Once the bot has an active session in a thread, **subsequent replies in that thread do not require @mention** — the bot follows the conversation naturally. |
:::tip
In channels, always @mention the bot. Simply typing a message without mentioning it will be ignored.
This is intentional — it prevents the bot from responding to every message in busy channels.
In channels, always @mention the bot to start a conversation. Once the bot is active in a thread, you can reply in that thread without mentioning it. Outside of threads, messages without @mention are ignored to prevent noise in busy channels.
:::
---
@@ -283,7 +282,7 @@ slack:
```
:::info
Unlike Discord and Telegram, Slack does not have a `free_response_channels` equivalent. The Slack adapter always requires `@mention` in channels — this is hardcoded behavior. In DMs, the bot always responds without needing a mention.
Unlike Discord and Telegram, Slack does not have a `free_response_channels` equivalent. The Slack adapter requires `@mention` to start a conversation in channels. However, once the bot has an active session in a thread, subsequent thread replies do not require a mention. In DMs, the bot always responds without needing a mention.
:::
### Unauthorized User Handling

View File

@@ -383,6 +383,19 @@ To find a topic's `thread_id`, open the topic in Telegram Web or Desktop and loo
- **Privacy policy:** Telegram now requires bots to have a privacy policy. Set one via BotFather with `/setprivacy_policy`, or Telegram may auto-generate a placeholder. This is particularly important if your bot is public-facing.
- **Message streaming:** Bot API 9.x added support for streaming long responses, which can improve perceived latency for lengthy agent replies.
## Interactive Model Picker
When you send `/model` with no arguments in a Telegram chat, Hermes shows an interactive inline keyboard for switching models:
1. **Provider selection** — buttons showing each available provider with model counts (e.g., "OpenAI (15)", "✓ Anthropic (12)" for the current provider).
2. **Model selection** — paginated model list with **Prev**/**Next** navigation, a **Back** button to return to providers, and **Cancel**.
The current model and provider are displayed at the top. All navigation happens by editing the same message in-place (no chat clutter).
:::tip
If you know the exact model name, type `/model <name>` directly to skip the picker. You can also type `/model <name> --global` to persist the change across sessions.
:::
## Webhook Mode
By default, the Telegram adapter connects via **long polling** — the gateway makes outbound connections to Telegram's servers. This works everywhere but keeps a persistent connection open.

View File

@@ -112,13 +112,38 @@ Prompts use dot-notation to access nested fields in the webhook payload:
- `{pull_request.title}` resolves to `payload["pull_request"]["title"]`
- `{repository.full_name}` resolves to `payload["repository"]["full_name"]`
- `{__raw__}` — special token that dumps the **entire payload** as indented JSON (truncated at 4000 characters). Useful for monitoring alerts or generic webhooks where the agent needs the full context.
- Missing keys are left as the literal `{key}` string (no error)
- Nested dicts and lists are JSON-serialized and truncated at 2000 characters
You can mix `{__raw__}` with regular template variables:
```yaml
prompt: "PR #{pull_request.number} by {pull_request.user.login}: {__raw__}"
```
If no `prompt` template is configured for a route, the entire payload is dumped as indented JSON (truncated at 4000 characters).
The same dot-notation templates work in `deliver_extra` values.
### Forum Topic Delivery
When delivering webhook responses to Telegram, you can target a specific forum topic by including `message_thread_id` (or `thread_id`) in `deliver_extra`:
```yaml
webhooks:
routes:
alerts:
events: ["alert"]
prompt: "Alert: {__raw__}"
deliver: "telegram"
deliver_extra:
chat_id: "-1001234567890"
message_thread_id: "42"
```
If `chat_id` is not provided in `deliver_extra`, the delivery falls back to the home channel configured for the target platform.
---
## GitHub PR Review (Step by Step) {#github-pr-review}

View File

@@ -10,13 +10,15 @@ Hermes Agent is designed with a defense-in-depth security model. This page cover
## Overview
The security model has five layers:
The security model has seven layers:
1. **User authorization** — who can talk to the agent (allowlists, DM pairing)
2. **Dangerous command approval** — human-in-the-loop for destructive operations
3. **Container isolation** — Docker/Singularity/Modal sandboxing with hardened settings
4. **MCP credential filtering** — environment variable isolation for MCP subprocesses
5. **Context file scanning** — prompt injection detection in project files
6. **Cross-session isolation** — sessions cannot access each other's data or state; cron job storage paths are hardened against path traversal attacks
7. **Input sanitization** — working directory parameters in terminal tool backends are validated against an allowlist to prevent shell injection
## Dangerous Command Approval