Files
hermes-agent/website/docs/user-guide/configuration.md

1608 lines
67 KiB
Markdown
Raw Permalink Normal View History

---
sidebar_position: 2
title: "Configuration"
description: "Configure Hermes Agent — config.yaml, providers, models, API keys, and more"
---
# Configuration
All settings are stored in the `~/.hermes/` directory for easy access.
## Directory Structure
```text
~/.hermes/
├── config.yaml # Settings (model, terminal, TTS, compression, etc.)
├── .env # API keys and secrets
├── auth.json # OAuth provider credentials (Nous Portal, etc.)
├── SOUL.md # Primary agent identity (slot #1 in system prompt)
├── memories/ # Persistent memory (MEMORY.md, USER.md)
├── skills/ # Agent-created skills (managed via skill_manage tool)
├── cron/ # Scheduled jobs
├── sessions/ # Gateway sessions
└── logs/ # Logs (errors.log, gateway.log — secrets auto-redacted)
```
## Managing Configuration
```bash
hermes config # View current configuration
hermes config edit # Open config.yaml in your editor
hermes config set KEY VAL # Set a specific value
hermes config check # Check for missing options (after updates)
hermes config migrate # Interactively add missing options
# Examples:
hermes config set model anthropic/claude-opus-4
hermes config set terminal.backend docker
hermes config set OPENROUTER_API_KEY sk-or-... # Saves to .env
```
:::tip
The `hermes config set` command automatically routes values to the right file — API keys are saved to `.env`, everything else to `config.yaml`.
:::
## Configuration Precedence
Settings are resolved in this order (highest priority first):
1. **CLI arguments** — e.g., `hermes chat --model anthropic/claude-sonnet-4` (per-invocation override)
2. **`~/.hermes/config.yaml`** — the primary config file for all non-secret settings
3. **`~/.hermes/.env`** — fallback for env vars; **required** for secrets (API keys, tokens, passwords)
4. **Built-in defaults** — hardcoded safe defaults when nothing else is set
:::info Rule of Thumb
Secrets (API keys, bot tokens, passwords) go in `.env`. Everything else (model, terminal backend, compression settings, memory limits, toolsets) goes in `config.yaml`. When both are set, `config.yaml` wins for non-secret settings.
:::
docs: document 9 previously undocumented features New documentation for features that existed in code but had no docs: New page: - context-references.md: Full docs for @-syntax inline context injection (@file:, @folder:, @diff, @staged, @git:, @url:) with line ranges, CLI autocomplete, size limits, sensitive path blocking, and error handling configuration.md additions: - Environment variable substitution: ${VAR_NAME} syntax in config.yaml with expansion, fallback, and multi-reference support - Gateway streaming: Progressive token delivery on messaging platforms via message editing (StreamingConfig: enabled, transport, edit_interval, buffer_threshold, cursor) with platform support matrix - Web search backends: Three providers (Firecrawl, Parallel, Tavily) with web.backend config key, capability matrix, auto-detection from API keys, self-hosted Firecrawl, and Parallel search modes security.md additions: - SSRF protection: Always-on URL validation blocking private networks, loopback, link-local, CGNAT, cloud metadata hostnames, with fail-closed DNS and redirect chain re-validation - Tirith pre-exec security scanning: Content-level command scanning for homograph URLs, pipe-to-interpreter, terminal injection with auto-install, SHA-256/cosign verification, config options, and fail-open/fail-closed modes sessions.md addition: - Auto-generated session titles: Background LLM-powered title generation after first exchange creating-skills.md additions: - Conditional skill activation: requires_toolsets, requires_tools, fallback_for_toolsets, fallback_for_tools frontmatter fields with matching logic and use cases - Environment variable requirements: required_environment_variables frontmatter for automatic env passthrough to sandboxed execution, plus terminal.env_passthrough user config
2026-03-24 08:56:21 -07:00
## Environment Variable Substitution
You can reference environment variables in `config.yaml` using `${VAR_NAME}` syntax:
```yaml
auxiliary:
vision:
api_key: ${GOOGLE_API_KEY}
base_url: ${CUSTOM_VISION_URL}
delegation:
api_key: ${DELEGATION_KEY}
```
Multiple references in a single value work: `url: "${HOST}:${PORT}"`. If a referenced variable is not set, the placeholder is kept verbatim (`${UNDEFINED_VAR}` stays as-is). Only the `${VAR}` syntax is supported — bare `$VAR` is not expanded.
## Inference Providers
You need at least one way to connect to an LLM. Use `hermes model` to switch providers and models interactively, or configure directly:
| Provider | Setup |
|----------|-------|
| **Nous Portal** | `hermes model` (OAuth, subscription-based) |
| **OpenAI Codex** | `hermes model` (ChatGPT OAuth, uses Codex models) |
| **GitHub Copilot** | `hermes model` (OAuth device code flow, `COPILOT_GITHUB_TOKEN`, `GH_TOKEN`, or `gh auth token`) |
| **GitHub Copilot ACP** | `hermes model` (spawns local `copilot --acp --stdio`) |
| **Anthropic** | `hermes model` (Claude Pro/Max via Claude Code auth, Anthropic API key, or manual setup-token) |
| **OpenRouter** | `OPENROUTER_API_KEY` in `~/.hermes/.env` |
| **AI Gateway** | `AI_GATEWAY_API_KEY` in `~/.hermes/.env` (provider: `ai-gateway`) |
feat: add z.ai/GLM, Kimi/Moonshot, MiniMax as first-class providers Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn) to the inference provider system. All use standard OpenAI-compatible chat/completions endpoints with Bearer token auth. Core changes: - auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var fields. Added providers to PROVIDER_REGISTRY. Added provider aliases (glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key providers in resolve_provider(). Added resolve_api_key_provider_credentials() and get_api_key_provider_status() helpers. - runtime_provider.py: Added generic API-key provider branch in resolve_runtime_provider() — any provider with auth_type='api_key' is automatically handled. - main.py: Added providers to hermes model menu with generic _model_flow_api_key_provider() flow. Updated _has_any_provider_configured() to check all provider env vars. Updated argparse --provider choices. - setup.py: Added providers to setup wizard with API key prompts and curated model lists. - config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY, etc.) to OPTIONAL_ENV_VARS. - status.py: Added API key display and provider status section. - doctor.py: Added connectivity checks for each provider endpoint. - cli.py: Updated provider docstrings. Docs: Updated README.md, .env.example, cli-config.yaml.example, cli-commands.md, environment-variables.md, configuration.md. Tests: 50 new tests covering registry, aliases, resolution, auto-detection, credential resolution, and runtime provider dispatch. Inspired by PR #33 (numman-ali) which proposed a provider registry approach. Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related provider improvements merged earlier in this changeset.
2026-03-06 18:55:12 -08:00
| **z.ai / GLM** | `GLM_API_KEY` in `~/.hermes/.env` (provider: `zai`) |
| **Kimi / Moonshot** | `KIMI_API_KEY` in `~/.hermes/.env` (provider: `kimi-coding`) |
| **MiniMax** | `MINIMAX_API_KEY` in `~/.hermes/.env` (provider: `minimax`) |
| **MiniMax China** | `MINIMAX_CN_API_KEY` in `~/.hermes/.env` (provider: `minimax-cn`) |
| **Alibaba Cloud** | `DASHSCOPE_API_KEY` in `~/.hermes/.env` (provider: `alibaba`, aliases: `dashscope`, `qwen`) |
| **Kilo Code** | `KILOCODE_API_KEY` in `~/.hermes/.env` (provider: `kilocode`) |
| **OpenCode Zen** | `OPENCODE_ZEN_API_KEY` in `~/.hermes/.env` (provider: `opencode-zen`) |
| **OpenCode Go** | `OPENCODE_GO_API_KEY` in `~/.hermes/.env` (provider: `opencode-go`) |
| **Hugging Face** | `HF_TOKEN` in `~/.hermes/.env` (provider: `huggingface`, aliases: `hf`) |
| **Custom Endpoint** | `hermes model` (saved in `config.yaml`) or `OPENAI_BASE_URL` + `OPENAI_API_KEY` in `~/.hermes/.env` |
:::tip Model key alias
In the `model:` config section, you can use either `default:` or `model:` as the key name for your model ID. Both `model: { default: my-model }` and `model: { model: my-model }` work identically.
:::
:::info Codex Note
The OpenAI Codex provider authenticates via device code (open a URL, enter a code). Hermes stores the resulting credentials in its own auth store under `~/.hermes/auth.json` and can import existing Codex CLI credentials from `~/.codex/auth.json` when present. No Codex CLI installation is required.
:::
:::warning
Even when using Nous Portal, Codex, or a custom endpoint, some tools (vision, web summarization, MoA) use a separate "auxiliary" model — by default Gemini Flash via OpenRouter. An `OPENROUTER_API_KEY` enables these tools automatically. You can also configure which model and provider these tools use — see [Auxiliary Models](#auxiliary-models) below.
:::
### Anthropic (Native)
Use Claude models directly through the Anthropic API — no OpenRouter proxy needed. Supports three auth methods:
```bash
# With an API key (pay-per-token)
export ANTHROPIC_API_KEY=***
hermes chat --provider anthropic --model claude-sonnet-4-6
# Preferred: authenticate through `hermes model`
# Hermes will use Claude Code's credential store directly when available
hermes model
# Manual override with a setup-token (fallback / legacy)
export ANTHROPIC_TOKEN=*** # setup-token or manual OAuth token
hermes chat --provider anthropic
# Auto-detect Claude Code credentials (if you already use Claude Code)
hermes chat --provider anthropic # reads Claude Code credential files automatically
```
When you choose Anthropic OAuth through `hermes model`, Hermes prefers Claude Code's own credential store over copying the token into `~/.hermes/.env`. That keeps refreshable Claude credentials refreshable.
Or set it permanently:
```yaml
model:
provider: "anthropic"
default: "claude-sonnet-4-6"
```
:::tip Aliases
`--provider claude` and `--provider claude-code` also work as shorthand for `--provider anthropic`.
:::
### GitHub Copilot
feat: add z.ai/GLM, Kimi/Moonshot, MiniMax as first-class providers Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn) to the inference provider system. All use standard OpenAI-compatible chat/completions endpoints with Bearer token auth. Core changes: - auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var fields. Added providers to PROVIDER_REGISTRY. Added provider aliases (glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key providers in resolve_provider(). Added resolve_api_key_provider_credentials() and get_api_key_provider_status() helpers. - runtime_provider.py: Added generic API-key provider branch in resolve_runtime_provider() — any provider with auth_type='api_key' is automatically handled. - main.py: Added providers to hermes model menu with generic _model_flow_api_key_provider() flow. Updated _has_any_provider_configured() to check all provider env vars. Updated argparse --provider choices. - setup.py: Added providers to setup wizard with API key prompts and curated model lists. - config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY, etc.) to OPTIONAL_ENV_VARS. - status.py: Added API key display and provider status section. - doctor.py: Added connectivity checks for each provider endpoint. - cli.py: Updated provider docstrings. Docs: Updated README.md, .env.example, cli-config.yaml.example, cli-commands.md, environment-variables.md, configuration.md. Tests: 50 new tests covering registry, aliases, resolution, auto-detection, credential resolution, and runtime provider dispatch. Inspired by PR #33 (numman-ali) which proposed a provider registry approach. Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related provider improvements merged earlier in this changeset.
2026-03-06 18:55:12 -08:00
Hermes supports GitHub Copilot as a first-class provider with two modes:
**`copilot` — Direct Copilot API** (recommended). Uses your GitHub Copilot subscription to access GPT-5.x, Claude, Gemini, and other models through the Copilot API.
```bash
hermes chat --provider copilot --model gpt-5.4
```
**Authentication options** (checked in this order):
1. `COPILOT_GITHUB_TOKEN` environment variable
2. `GH_TOKEN` environment variable
3. `GITHUB_TOKEN` environment variable
4. `gh auth token` CLI fallback
If no token is found, `hermes model` offers an **OAuth device code login** — the same flow used by the Copilot CLI and opencode.
:::warning Token types
The Copilot API does **not** support classic Personal Access Tokens (`ghp_*`). Supported token types:
| Type | Prefix | How to get |
|------|--------|------------|
| OAuth token | `gho_` | `hermes model` → GitHub Copilot → Login with GitHub |
| Fine-grained PAT | `github_pat_` | GitHub Settings → Developer settings → Fine-grained tokens (needs **Copilot Requests** permission) |
| GitHub App token | `ghu_` | Via GitHub App installation |
If your `gh auth token` returns a `ghp_*` token, use `hermes model` to authenticate via OAuth instead.
:::
**API routing**: GPT-5+ models (except `gpt-5-mini`) automatically use the Responses API. All other models (GPT-4o, Claude, Gemini, etc.) use Chat Completions. Models are auto-detected from the live Copilot catalog.
**`copilot-acp` — Copilot ACP agent backend**. Spawns the local Copilot CLI as a subprocess:
feat: add z.ai/GLM, Kimi/Moonshot, MiniMax as first-class providers Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn) to the inference provider system. All use standard OpenAI-compatible chat/completions endpoints with Bearer token auth. Core changes: - auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var fields. Added providers to PROVIDER_REGISTRY. Added provider aliases (glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key providers in resolve_provider(). Added resolve_api_key_provider_credentials() and get_api_key_provider_status() helpers. - runtime_provider.py: Added generic API-key provider branch in resolve_runtime_provider() — any provider with auth_type='api_key' is automatically handled. - main.py: Added providers to hermes model menu with generic _model_flow_api_key_provider() flow. Updated _has_any_provider_configured() to check all provider env vars. Updated argparse --provider choices. - setup.py: Added providers to setup wizard with API key prompts and curated model lists. - config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY, etc.) to OPTIONAL_ENV_VARS. - status.py: Added API key display and provider status section. - doctor.py: Added connectivity checks for each provider endpoint. - cli.py: Updated provider docstrings. Docs: Updated README.md, .env.example, cli-config.yaml.example, cli-commands.md, environment-variables.md, configuration.md. Tests: 50 new tests covering registry, aliases, resolution, auto-detection, credential resolution, and runtime provider dispatch. Inspired by PR #33 (numman-ali) which proposed a provider registry approach. Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related provider improvements merged earlier in this changeset.
2026-03-06 18:55:12 -08:00
```bash
hermes chat --provider copilot-acp --model copilot-acp
# Requires the GitHub Copilot CLI in PATH and an existing `copilot login` session
```
**Permanent config:**
```yaml
model:
provider: "copilot"
default: "gpt-5.4"
```
| Environment variable | Description |
|---------------------|-------------|
| `COPILOT_GITHUB_TOKEN` | GitHub token for Copilot API (first priority) |
| `HERMES_COPILOT_ACP_COMMAND` | Override the Copilot CLI binary path (default: `copilot`) |
| `HERMES_COPILOT_ACP_ARGS` | Override ACP args (default: `--acp --stdio`) |
### First-Class Chinese AI Providers
These providers have built-in support with dedicated provider IDs. Set the API key and use `--provider` to select:
```bash
feat: add z.ai/GLM, Kimi/Moonshot, MiniMax as first-class providers Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn) to the inference provider system. All use standard OpenAI-compatible chat/completions endpoints with Bearer token auth. Core changes: - auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var fields. Added providers to PROVIDER_REGISTRY. Added provider aliases (glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key providers in resolve_provider(). Added resolve_api_key_provider_credentials() and get_api_key_provider_status() helpers. - runtime_provider.py: Added generic API-key provider branch in resolve_runtime_provider() — any provider with auth_type='api_key' is automatically handled. - main.py: Added providers to hermes model menu with generic _model_flow_api_key_provider() flow. Updated _has_any_provider_configured() to check all provider env vars. Updated argparse --provider choices. - setup.py: Added providers to setup wizard with API key prompts and curated model lists. - config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY, etc.) to OPTIONAL_ENV_VARS. - status.py: Added API key display and provider status section. - doctor.py: Added connectivity checks for each provider endpoint. - cli.py: Updated provider docstrings. Docs: Updated README.md, .env.example, cli-config.yaml.example, cli-commands.md, environment-variables.md, configuration.md. Tests: 50 new tests covering registry, aliases, resolution, auto-detection, credential resolution, and runtime provider dispatch. Inspired by PR #33 (numman-ali) which proposed a provider registry approach. Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related provider improvements merged earlier in this changeset.
2026-03-06 18:55:12 -08:00
# z.ai / ZhipuAI GLM
hermes chat --provider zai --model glm-4-plus
# Requires: GLM_API_KEY in ~/.hermes/.env
# Kimi / Moonshot AI
hermes chat --provider kimi-coding --model moonshot-v1-auto
# Requires: KIMI_API_KEY in ~/.hermes/.env
# MiniMax (global endpoint)
hermes chat --provider minimax --model MiniMax-M2.7
feat: add z.ai/GLM, Kimi/Moonshot, MiniMax as first-class providers Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn) to the inference provider system. All use standard OpenAI-compatible chat/completions endpoints with Bearer token auth. Core changes: - auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var fields. Added providers to PROVIDER_REGISTRY. Added provider aliases (glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key providers in resolve_provider(). Added resolve_api_key_provider_credentials() and get_api_key_provider_status() helpers. - runtime_provider.py: Added generic API-key provider branch in resolve_runtime_provider() — any provider with auth_type='api_key' is automatically handled. - main.py: Added providers to hermes model menu with generic _model_flow_api_key_provider() flow. Updated _has_any_provider_configured() to check all provider env vars. Updated argparse --provider choices. - setup.py: Added providers to setup wizard with API key prompts and curated model lists. - config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY, etc.) to OPTIONAL_ENV_VARS. - status.py: Added API key display and provider status section. - doctor.py: Added connectivity checks for each provider endpoint. - cli.py: Updated provider docstrings. Docs: Updated README.md, .env.example, cli-config.yaml.example, cli-commands.md, environment-variables.md, configuration.md. Tests: 50 new tests covering registry, aliases, resolution, auto-detection, credential resolution, and runtime provider dispatch. Inspired by PR #33 (numman-ali) which proposed a provider registry approach. Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related provider improvements merged earlier in this changeset.
2026-03-06 18:55:12 -08:00
# Requires: MINIMAX_API_KEY in ~/.hermes/.env
# MiniMax (China endpoint)
hermes chat --provider minimax-cn --model MiniMax-M2.7
feat: add z.ai/GLM, Kimi/Moonshot, MiniMax as first-class providers Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn) to the inference provider system. All use standard OpenAI-compatible chat/completions endpoints with Bearer token auth. Core changes: - auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var fields. Added providers to PROVIDER_REGISTRY. Added provider aliases (glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key providers in resolve_provider(). Added resolve_api_key_provider_credentials() and get_api_key_provider_status() helpers. - runtime_provider.py: Added generic API-key provider branch in resolve_runtime_provider() — any provider with auth_type='api_key' is automatically handled. - main.py: Added providers to hermes model menu with generic _model_flow_api_key_provider() flow. Updated _has_any_provider_configured() to check all provider env vars. Updated argparse --provider choices. - setup.py: Added providers to setup wizard with API key prompts and curated model lists. - config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY, etc.) to OPTIONAL_ENV_VARS. - status.py: Added API key display and provider status section. - doctor.py: Added connectivity checks for each provider endpoint. - cli.py: Updated provider docstrings. Docs: Updated README.md, .env.example, cli-config.yaml.example, cli-commands.md, environment-variables.md, configuration.md. Tests: 50 new tests covering registry, aliases, resolution, auto-detection, credential resolution, and runtime provider dispatch. Inspired by PR #33 (numman-ali) which proposed a provider registry approach. Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related provider improvements merged earlier in this changeset.
2026-03-06 18:55:12 -08:00
# Requires: MINIMAX_CN_API_KEY in ~/.hermes/.env
# Alibaba Cloud / DashScope (Qwen models)
hermes chat --provider alibaba --model qwen3.5-plus
# Requires: DASHSCOPE_API_KEY in ~/.hermes/.env
feat: add z.ai/GLM, Kimi/Moonshot, MiniMax as first-class providers Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn) to the inference provider system. All use standard OpenAI-compatible chat/completions endpoints with Bearer token auth. Core changes: - auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var fields. Added providers to PROVIDER_REGISTRY. Added provider aliases (glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key providers in resolve_provider(). Added resolve_api_key_provider_credentials() and get_api_key_provider_status() helpers. - runtime_provider.py: Added generic API-key provider branch in resolve_runtime_provider() — any provider with auth_type='api_key' is automatically handled. - main.py: Added providers to hermes model menu with generic _model_flow_api_key_provider() flow. Updated _has_any_provider_configured() to check all provider env vars. Updated argparse --provider choices. - setup.py: Added providers to setup wizard with API key prompts and curated model lists. - config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY, etc.) to OPTIONAL_ENV_VARS. - status.py: Added API key display and provider status section. - doctor.py: Added connectivity checks for each provider endpoint. - cli.py: Updated provider docstrings. Docs: Updated README.md, .env.example, cli-config.yaml.example, cli-commands.md, environment-variables.md, configuration.md. Tests: 50 new tests covering registry, aliases, resolution, auto-detection, credential resolution, and runtime provider dispatch. Inspired by PR #33 (numman-ali) which proposed a provider registry approach. Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related provider improvements merged earlier in this changeset.
2026-03-06 18:55:12 -08:00
```
Or set the provider permanently in `config.yaml`:
```yaml
model:
provider: "zai" # or: kimi-coding, minimax, minimax-cn, alibaba
default: "glm-4-plus"
feat: add z.ai/GLM, Kimi/Moonshot, MiniMax as first-class providers Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn) to the inference provider system. All use standard OpenAI-compatible chat/completions endpoints with Bearer token auth. Core changes: - auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var fields. Added providers to PROVIDER_REGISTRY. Added provider aliases (glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key providers in resolve_provider(). Added resolve_api_key_provider_credentials() and get_api_key_provider_status() helpers. - runtime_provider.py: Added generic API-key provider branch in resolve_runtime_provider() — any provider with auth_type='api_key' is automatically handled. - main.py: Added providers to hermes model menu with generic _model_flow_api_key_provider() flow. Updated _has_any_provider_configured() to check all provider env vars. Updated argparse --provider choices. - setup.py: Added providers to setup wizard with API key prompts and curated model lists. - config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY, etc.) to OPTIONAL_ENV_VARS. - status.py: Added API key display and provider status section. - doctor.py: Added connectivity checks for each provider endpoint. - cli.py: Updated provider docstrings. Docs: Updated README.md, .env.example, cli-config.yaml.example, cli-commands.md, environment-variables.md, configuration.md. Tests: 50 new tests covering registry, aliases, resolution, auto-detection, credential resolution, and runtime provider dispatch. Inspired by PR #33 (numman-ali) which proposed a provider registry approach. Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related provider improvements merged earlier in this changeset.
2026-03-06 18:55:12 -08:00
```
Base URLs can be overridden with `GLM_BASE_URL`, `KIMI_BASE_URL`, `MINIMAX_BASE_URL`, `MINIMAX_CN_BASE_URL`, or `DASHSCOPE_BASE_URL` environment variables.
feat: add z.ai/GLM, Kimi/Moonshot, MiniMax as first-class providers Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn) to the inference provider system. All use standard OpenAI-compatible chat/completions endpoints with Bearer token auth. Core changes: - auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var fields. Added providers to PROVIDER_REGISTRY. Added provider aliases (glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key providers in resolve_provider(). Added resolve_api_key_provider_credentials() and get_api_key_provider_status() helpers. - runtime_provider.py: Added generic API-key provider branch in resolve_runtime_provider() — any provider with auth_type='api_key' is automatically handled. - main.py: Added providers to hermes model menu with generic _model_flow_api_key_provider() flow. Updated _has_any_provider_configured() to check all provider env vars. Updated argparse --provider choices. - setup.py: Added providers to setup wizard with API key prompts and curated model lists. - config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY, etc.) to OPTIONAL_ENV_VARS. - status.py: Added API key display and provider status section. - doctor.py: Added connectivity checks for each provider endpoint. - cli.py: Updated provider docstrings. Docs: Updated README.md, .env.example, cli-config.yaml.example, cli-commands.md, environment-variables.md, configuration.md. Tests: 50 new tests covering registry, aliases, resolution, auto-detection, credential resolution, and runtime provider dispatch. Inspired by PR #33 (numman-ali) which proposed a provider registry approach. Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related provider improvements merged earlier in this changeset.
2026-03-06 18:55:12 -08:00
### Hugging Face Inference Providers
[Hugging Face Inference Providers](https://huggingface.co/docs/inference-providers) routes to 20+ open models through a unified OpenAI-compatible endpoint (`router.huggingface.co/v1`). Requests are automatically routed to the fastest available backend (Groq, Together, SambaNova, etc.) with automatic failover.
```bash
# Use any available model
hermes chat --provider huggingface --model Qwen/Qwen3-235B-A22B-Thinking-2507
# Requires: HF_TOKEN in ~/.hermes/.env
# Short alias
hermes chat --provider hf --model deepseek-ai/DeepSeek-V3.2
```
Or set it permanently in `config.yaml`:
```yaml
model:
provider: "huggingface"
default: "Qwen/Qwen3-235B-A22B-Thinking-2507"
```
Get your token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) — make sure to enable the "Make calls to Inference Providers" permission. Free tier included ($0.10/month credit, no markup on provider rates).
You can append routing suffixes to model names: `:fastest` (default), `:cheapest`, or `:provider_name` to force a specific backend.
The base URL can be overridden with `HF_BASE_URL`.
## Custom & Self-Hosted LLM Providers
Hermes Agent works with **any OpenAI-compatible API endpoint**. If a server implements `/v1/chat/completions`, you can point Hermes at it. This means you can use local models, GPU inference servers, multi-provider routers, or any third-party API.
### General Setup
Three ways to configure a custom endpoint:
**Interactive setup (recommended):**
```bash
hermes model
# Select "Custom endpoint (self-hosted / VLLM / etc.)"
# Enter: API base URL, API key, Model name
```
**Manual config (`config.yaml`):**
```yaml
# In ~/.hermes/config.yaml
model:
default: your-model-name
provider: custom
base_url: http://localhost:8000/v1
api_key: your-key-or-leave-empty-for-local
```
**Environment variables (`.env` file):**
```bash
# Add to ~/.hermes/.env
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=your-key # Any non-empty string for local servers
LLM_MODEL=your-model-name
```
All three approaches end up in the same runtime path. `hermes model` persists provider, model, and base URL to `config.yaml` so later sessions keep using that endpoint even if env vars are not set.
### Switching Models with `/model`
Once a custom endpoint is configured, you can switch models mid-session:
```
/model custom:qwen-2.5 # Switch to a model on your custom endpoint
/model custom # Auto-detect the model from the endpoint
/model openrouter:claude-sonnet-4 # Switch back to a cloud provider
```
If you have **named custom providers** configured (see below), use the triple syntax:
```
/model custom:local:qwen-2.5 # Use the "local" custom provider with model qwen-2.5
/model custom:work:llama3 # Use the "work" custom provider with llama3
```
When switching providers, Hermes persists the base URL and provider to config so the change survives restarts. When switching away from a custom endpoint to a built-in provider, the stale base URL is automatically cleared.
:::tip
`/model custom` (bare, no model name) queries your endpoint's `/models` API and auto-selects the model if exactly one is loaded. Useful for local servers running a single model.
:::
Everything below follows this same pattern — just change the URL, key, and model name.
---
### Ollama — Local Models, Zero Config
[Ollama](https://ollama.com/) runs open-weight models locally with one command. Best for: quick local experimentation, privacy-sensitive work, offline use.
```bash
# Install and run a model
ollama pull llama3.1:70b
ollama serve # Starts on port 11434
# Configure Hermes
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=ollama # Any non-empty string
LLM_MODEL=llama3.1:70b
```
Ollama's OpenAI-compatible endpoint supports chat completions, streaming, and tool calling (for supported models). No GPU required for smaller models — Ollama handles CPU inference automatically.
:::tip
List available models with `ollama list`. Pull any model from the [Ollama library](https://ollama.com/library) with `ollama pull <model>`.
:::
---
### vLLM — High-Performance GPU Inference
[vLLM](https://docs.vllm.ai/) is the standard for production LLM serving. Best for: maximum throughput on GPU hardware, serving large models, continuous batching.
```bash
# Start vLLM server
pip install vllm
vllm serve meta-llama/Llama-3.1-70B-Instruct \
--port 8000 \
--tensor-parallel-size 2 # Multi-GPU
# Configure Hermes
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=dummy
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
```
vLLM supports tool calling, structured output, and multi-modal models. Use `--enable-auto-tool-choice` and `--tool-call-parser hermes` for Hermes-format tool calling with NousResearch models.
---
### SGLang — Fast Serving with RadixAttention
[SGLang](https://github.com/sgl-project/sglang) is an alternative to vLLM with RadixAttention for KV cache reuse. Best for: multi-turn conversations (prefix caching), constrained decoding, structured output.
```bash
# Start SGLang server
pip install "sglang[all]"
python -m sglang.launch_server \
--model meta-llama/Llama-3.1-70B-Instruct \
--port 8000 \
--tp 2
# Configure Hermes
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=dummy
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
```
---
### llama.cpp / llama-server — CPU & Metal Inference
[llama.cpp](https://github.com/ggml-org/llama.cpp) runs quantized models on CPU, Apple Silicon (Metal), and consumer GPUs. Best for: running models without a datacenter GPU, Mac users, edge deployment.
```bash
# Build and start llama-server
cmake -B build && cmake --build build --config Release
./build/bin/llama-server \
-m models/llama-3.1-8b-instruct-Q4_K_M.gguf \
--port 8080 --host 0.0.0.0
# Configure Hermes
OPENAI_BASE_URL=http://localhost:8080/v1
OPENAI_API_KEY=dummy
LLM_MODEL=llama-3.1-8b-instruct
```
:::tip
Download GGUF models from [Hugging Face](https://huggingface.co/models?library=gguf). Q4_K_M quantization offers the best balance of quality vs. memory usage.
:::
---
### LiteLLM Proxy — Multi-Provider Gateway
[LiteLLM](https://docs.litellm.ai/) is an OpenAI-compatible proxy that unifies 100+ LLM providers behind a single API. Best for: switching between providers without config changes, load balancing, fallback chains, budget controls.
```bash
# Install and start
pip install "litellm[proxy]"
litellm --model anthropic/claude-sonnet-4 --port 4000
# Or with a config file for multiple models:
litellm --config litellm_config.yaml --port 4000
# Configure Hermes
OPENAI_BASE_URL=http://localhost:4000/v1
OPENAI_API_KEY=sk-your-litellm-key
LLM_MODEL=anthropic/claude-sonnet-4
```
Example `litellm_config.yaml` with fallback:
```yaml
model_list:
- model_name: "best"
litellm_params:
model: anthropic/claude-sonnet-4
api_key: sk-ant-...
- model_name: "best"
litellm_params:
model: openai/gpt-4o
api_key: sk-...
router_settings:
routing_strategy: "latency-based-routing"
```
---
### ClawRouter — Cost-Optimized Routing
[ClawRouter](https://github.com/BlockRunAI/ClawRouter) by BlockRunAI is a local routing proxy that auto-selects models based on query complexity. It classifies requests across 14 dimensions and routes to the cheapest model that can handle the task. Payment is via USDC cryptocurrency (no API keys).
```bash
# Install and start
npx @blockrun/clawrouter # Starts on port 8402
# Configure Hermes
OPENAI_BASE_URL=http://localhost:8402/v1
OPENAI_API_KEY=dummy
LLM_MODEL=blockrun/auto # or: blockrun/eco, blockrun/premium, blockrun/agentic
```
Routing profiles:
| Profile | Strategy | Savings |
|---------|----------|---------|
| `blockrun/auto` | Balanced quality/cost | 74-100% |
| `blockrun/eco` | Cheapest possible | 95-100% |
| `blockrun/premium` | Best quality models | 0% |
| `blockrun/free` | Free models only | 100% |
| `blockrun/agentic` | Optimized for tool use | varies |
:::note
ClawRouter requires a USDC-funded wallet on Base or Solana for payment. All requests route through BlockRun's backend API. Run `npx @blockrun/clawrouter doctor` to check wallet status.
:::
---
### Other Compatible Providers
Any service with an OpenAI-compatible API works. Some popular options:
| Provider | Base URL | Notes |
|----------|----------|-------|
| [Together AI](https://together.ai) | `https://api.together.xyz/v1` | Cloud-hosted open models |
| [Groq](https://groq.com) | `https://api.groq.com/openai/v1` | Ultra-fast inference |
| [DeepSeek](https://deepseek.com) | `https://api.deepseek.com/v1` | DeepSeek models |
| [Fireworks AI](https://fireworks.ai) | `https://api.fireworks.ai/inference/v1` | Fast open model hosting |
| [Cerebras](https://cerebras.ai) | `https://api.cerebras.ai/v1` | Wafer-scale chip inference |
| [Mistral AI](https://mistral.ai) | `https://api.mistral.ai/v1` | Mistral models |
| [OpenAI](https://openai.com) | `https://api.openai.com/v1` | Direct OpenAI access |
| [Azure OpenAI](https://azure.microsoft.com) | `https://YOUR.openai.azure.com/` | Enterprise OpenAI |
| [LocalAI](https://localai.io) | `http://localhost:8080/v1` | Self-hosted, multi-model |
| [Jan](https://jan.ai) | `http://localhost:1337/v1` | Desktop app with local models |
```bash
# Example: Together AI
OPENAI_BASE_URL=https://api.together.xyz/v1
OPENAI_API_KEY=your-together-key
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct-Turbo
```
---
fix: detect context length for custom model endpoints via fuzzy matching + config override (#2051) * fix: detect context length for custom model endpoints via fuzzy matching + config override Custom model endpoints (non-OpenRouter, non-known-provider) were silently falling back to 2M tokens when the model name didn't exactly match what the endpoint's /v1/models reported. This happened because: 1. Endpoint metadata lookup used exact match only — model name mismatches (e.g. 'qwen3.5:9b' vs 'Qwen3.5-9B-Q4_K_M.gguf') caused a miss 2. Single-model servers (common for local inference) required exact name match even though only one model was loaded 3. No user escape hatch to manually set context length Changes: - Add fuzzy matching for endpoint model metadata: single-model servers use the only available model regardless of name; multi-model servers try substring matching in both directions - Add model.context_length config override (highest priority) so users can explicitly set their model's context length in config.yaml - Log an informative message when falling back to 2M probe, telling users about the config override option - Thread config_context_length through ContextCompressor and AIAgent init Tests: 6 new tests covering fuzzy match, single-model fallback, config override (including zero/None edge cases). * fix: auto-detect local model name and context length for local servers Cherry-picked from PR #2043 by sudoingX. - Auto-detect model name from local server's /v1/models when only one model is loaded (no manual model name config needed) - Add n_ctx_train and n_ctx to context length detection keys for llama.cpp - Query llama.cpp /props endpoint for actual allocated context (not just training context from GGUF metadata) - Strip .gguf suffix from display in banner and status bar - _auto_detect_local_model() in runtime_provider.py for CLI init Co-authored-by: sudo <sudoingx@users.noreply.github.com> * fix: revert accidental summary_target_tokens change + add docs for context_length config - Revert summary_target_tokens from 2500 back to 500 (accidental change during patching) - Add 'Context Length Detection' section to Custom & Self-Hosted docs explaining model.context_length config override --------- Co-authored-by: Test <test@test.com> Co-authored-by: sudo <sudoingx@users.noreply.github.com>
2026-03-19 06:01:16 -07:00
### Context Length Detection
feat: overhaul context length detection with models.dev and provider-aware resolution (#2158) Replace the fragile hardcoded context length system with a multi-source resolution chain that correctly identifies context windows per provider. Key changes: - New agent/models_dev.py: Fetches and caches the models.dev registry (3800+ models across 100+ providers with per-provider context windows). In-memory cache (1hr TTL) + disk cache for cold starts. - Rewritten get_model_context_length() resolution chain: 0. Config override (model.context_length) 1. Custom providers per-model context_length 2. Persistent disk cache 3. Endpoint /models (local servers) 4. Anthropic /v1/models API (max_input_tokens, API-key only) 5. OpenRouter live API (existing, unchanged) 6. Nous suffix-match via OpenRouter (dot/dash normalization) 7. models.dev registry lookup (provider-aware) 8. Thin hardcoded defaults (broad family patterns) 9. 128K fallback (was 2M) - Provider-aware context: same model now correctly resolves to different context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic, 128K on GitHub Copilot). Provider name flows through ContextCompressor. - DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns. models.dev replaces the per-model hardcoding. - CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K] to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M. - hermes model: prompts for context_length when configuring custom endpoints. Supports shorthand (32k, 128K). Saved to custom_providers per-model config. - custom_providers schema extended with optional models dict for per-model context_length (backward compatible). - Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash normalization. Handles all 15 current Nous models. - Anthropic direct: queries /v1/models for max_input_tokens. Only works with regular API keys (sk-ant-api*), not OAuth tokens. Falls through to models.dev for OAuth users. Tests: 5574 passed (18 new tests for models_dev + updated probe tiers) Docs: Updated configuration.md context length section, AGENTS.md Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
Hermes uses a multi-source resolution chain to detect the correct context window for your model and provider:
1. **Config override**`model.context_length` in config.yaml (highest priority)
2. **Custom provider per-model**`custom_providers[].models.<id>.context_length`
3. **Persistent cache** — previously discovered values (survives restarts)
4. **Endpoint `/models`** — queries your server's API (local/custom endpoints)
5. **Anthropic `/v1/models`** — queries Anthropic's API for `max_input_tokens` (API-key users only)
6. **OpenRouter API** — live model metadata from OpenRouter
7. **Nous Portal** — suffix-matches Nous model IDs against OpenRouter metadata
8. **[models.dev](https://models.dev)** — community-maintained registry with provider-specific context lengths for 3800+ models across 100+ providers
9. **Fallback defaults** — broad model family patterns (128K default)
For most setups this works out of the box. The system is provider-aware — the same model can have different context limits depending on who serves it (e.g., `claude-opus-4.6` is 1M on Anthropic direct but 128K on GitHub Copilot).
fix: detect context length for custom model endpoints via fuzzy matching + config override (#2051) * fix: detect context length for custom model endpoints via fuzzy matching + config override Custom model endpoints (non-OpenRouter, non-known-provider) were silently falling back to 2M tokens when the model name didn't exactly match what the endpoint's /v1/models reported. This happened because: 1. Endpoint metadata lookup used exact match only — model name mismatches (e.g. 'qwen3.5:9b' vs 'Qwen3.5-9B-Q4_K_M.gguf') caused a miss 2. Single-model servers (common for local inference) required exact name match even though only one model was loaded 3. No user escape hatch to manually set context length Changes: - Add fuzzy matching for endpoint model metadata: single-model servers use the only available model regardless of name; multi-model servers try substring matching in both directions - Add model.context_length config override (highest priority) so users can explicitly set their model's context length in config.yaml - Log an informative message when falling back to 2M probe, telling users about the config override option - Thread config_context_length through ContextCompressor and AIAgent init Tests: 6 new tests covering fuzzy match, single-model fallback, config override (including zero/None edge cases). * fix: auto-detect local model name and context length for local servers Cherry-picked from PR #2043 by sudoingX. - Auto-detect model name from local server's /v1/models when only one model is loaded (no manual model name config needed) - Add n_ctx_train and n_ctx to context length detection keys for llama.cpp - Query llama.cpp /props endpoint for actual allocated context (not just training context from GGUF metadata) - Strip .gguf suffix from display in banner and status bar - _auto_detect_local_model() in runtime_provider.py for CLI init Co-authored-by: sudo <sudoingx@users.noreply.github.com> * fix: revert accidental summary_target_tokens change + add docs for context_length config - Revert summary_target_tokens from 2500 back to 500 (accidental change during patching) - Add 'Context Length Detection' section to Custom & Self-Hosted docs explaining model.context_length config override --------- Co-authored-by: Test <test@test.com> Co-authored-by: sudo <sudoingx@users.noreply.github.com>
2026-03-19 06:01:16 -07:00
To set the context length explicitly, add `context_length` to your model config:
```yaml
model:
default: "qwen3.5:9b"
base_url: "http://localhost:8080/v1"
context_length: 131072 # tokens
```
feat: overhaul context length detection with models.dev and provider-aware resolution (#2158) Replace the fragile hardcoded context length system with a multi-source resolution chain that correctly identifies context windows per provider. Key changes: - New agent/models_dev.py: Fetches and caches the models.dev registry (3800+ models across 100+ providers with per-provider context windows). In-memory cache (1hr TTL) + disk cache for cold starts. - Rewritten get_model_context_length() resolution chain: 0. Config override (model.context_length) 1. Custom providers per-model context_length 2. Persistent disk cache 3. Endpoint /models (local servers) 4. Anthropic /v1/models API (max_input_tokens, API-key only) 5. OpenRouter live API (existing, unchanged) 6. Nous suffix-match via OpenRouter (dot/dash normalization) 7. models.dev registry lookup (provider-aware) 8. Thin hardcoded defaults (broad family patterns) 9. 128K fallback (was 2M) - Provider-aware context: same model now correctly resolves to different context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic, 128K on GitHub Copilot). Provider name flows through ContextCompressor. - DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns. models.dev replaces the per-model hardcoding. - CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K] to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M. - hermes model: prompts for context_length when configuring custom endpoints. Supports shorthand (32k, 128K). Saved to custom_providers per-model config. - custom_providers schema extended with optional models dict for per-model context_length (backward compatible). - Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash normalization. Handles all 15 current Nous models. - Anthropic direct: queries /v1/models for max_input_tokens. Only works with regular API keys (sk-ant-api*), not OAuth tokens. Falls through to models.dev for OAuth users. Tests: 5574 passed (18 new tests for models_dev + updated probe tiers) Docs: Updated configuration.md context length section, AGENTS.md Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
For custom endpoints, you can also set context length per model:
```yaml
custom_providers:
- name: "My Local LLM"
base_url: "http://localhost:11434/v1"
models:
qwen3.5:27b:
context_length: 32768
deepseek-r1:70b:
context_length: 65536
```
`hermes model` will prompt for context length when configuring a custom endpoint. Leave it blank for auto-detection.
fix: detect context length for custom model endpoints via fuzzy matching + config override (#2051) * fix: detect context length for custom model endpoints via fuzzy matching + config override Custom model endpoints (non-OpenRouter, non-known-provider) were silently falling back to 2M tokens when the model name didn't exactly match what the endpoint's /v1/models reported. This happened because: 1. Endpoint metadata lookup used exact match only — model name mismatches (e.g. 'qwen3.5:9b' vs 'Qwen3.5-9B-Q4_K_M.gguf') caused a miss 2. Single-model servers (common for local inference) required exact name match even though only one model was loaded 3. No user escape hatch to manually set context length Changes: - Add fuzzy matching for endpoint model metadata: single-model servers use the only available model regardless of name; multi-model servers try substring matching in both directions - Add model.context_length config override (highest priority) so users can explicitly set their model's context length in config.yaml - Log an informative message when falling back to 2M probe, telling users about the config override option - Thread config_context_length through ContextCompressor and AIAgent init Tests: 6 new tests covering fuzzy match, single-model fallback, config override (including zero/None edge cases). * fix: auto-detect local model name and context length for local servers Cherry-picked from PR #2043 by sudoingX. - Auto-detect model name from local server's /v1/models when only one model is loaded (no manual model name config needed) - Add n_ctx_train and n_ctx to context length detection keys for llama.cpp - Query llama.cpp /props endpoint for actual allocated context (not just training context from GGUF metadata) - Strip .gguf suffix from display in banner and status bar - _auto_detect_local_model() in runtime_provider.py for CLI init Co-authored-by: sudo <sudoingx@users.noreply.github.com> * fix: revert accidental summary_target_tokens change + add docs for context_length config - Revert summary_target_tokens from 2500 back to 500 (accidental change during patching) - Add 'Context Length Detection' section to Custom & Self-Hosted docs explaining model.context_length config override --------- Co-authored-by: Test <test@test.com> Co-authored-by: sudo <sudoingx@users.noreply.github.com>
2026-03-19 06:01:16 -07:00
:::tip When to set this manually
feat: overhaul context length detection with models.dev and provider-aware resolution (#2158) Replace the fragile hardcoded context length system with a multi-source resolution chain that correctly identifies context windows per provider. Key changes: - New agent/models_dev.py: Fetches and caches the models.dev registry (3800+ models across 100+ providers with per-provider context windows). In-memory cache (1hr TTL) + disk cache for cold starts. - Rewritten get_model_context_length() resolution chain: 0. Config override (model.context_length) 1. Custom providers per-model context_length 2. Persistent disk cache 3. Endpoint /models (local servers) 4. Anthropic /v1/models API (max_input_tokens, API-key only) 5. OpenRouter live API (existing, unchanged) 6. Nous suffix-match via OpenRouter (dot/dash normalization) 7. models.dev registry lookup (provider-aware) 8. Thin hardcoded defaults (broad family patterns) 9. 128K fallback (was 2M) - Provider-aware context: same model now correctly resolves to different context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic, 128K on GitHub Copilot). Provider name flows through ContextCompressor. - DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns. models.dev replaces the per-model hardcoding. - CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K] to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M. - hermes model: prompts for context_length when configuring custom endpoints. Supports shorthand (32k, 128K). Saved to custom_providers per-model config. - custom_providers schema extended with optional models dict for per-model context_length (backward compatible). - Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash normalization. Handles all 15 current Nous models. - Anthropic direct: queries /v1/models for max_input_tokens. Only works with regular API keys (sk-ant-api*), not OAuth tokens. Falls through to models.dev for OAuth users. Tests: 5574 passed (18 new tests for models_dev + updated probe tiers) Docs: Updated configuration.md context length section, AGENTS.md Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
- You're using Ollama with a custom `num_ctx` that's lower than the model's maximum
fix: detect context length for custom model endpoints via fuzzy matching + config override (#2051) * fix: detect context length for custom model endpoints via fuzzy matching + config override Custom model endpoints (non-OpenRouter, non-known-provider) were silently falling back to 2M tokens when the model name didn't exactly match what the endpoint's /v1/models reported. This happened because: 1. Endpoint metadata lookup used exact match only — model name mismatches (e.g. 'qwen3.5:9b' vs 'Qwen3.5-9B-Q4_K_M.gguf') caused a miss 2. Single-model servers (common for local inference) required exact name match even though only one model was loaded 3. No user escape hatch to manually set context length Changes: - Add fuzzy matching for endpoint model metadata: single-model servers use the only available model regardless of name; multi-model servers try substring matching in both directions - Add model.context_length config override (highest priority) so users can explicitly set their model's context length in config.yaml - Log an informative message when falling back to 2M probe, telling users about the config override option - Thread config_context_length through ContextCompressor and AIAgent init Tests: 6 new tests covering fuzzy match, single-model fallback, config override (including zero/None edge cases). * fix: auto-detect local model name and context length for local servers Cherry-picked from PR #2043 by sudoingX. - Auto-detect model name from local server's /v1/models when only one model is loaded (no manual model name config needed) - Add n_ctx_train and n_ctx to context length detection keys for llama.cpp - Query llama.cpp /props endpoint for actual allocated context (not just training context from GGUF metadata) - Strip .gguf suffix from display in banner and status bar - _auto_detect_local_model() in runtime_provider.py for CLI init Co-authored-by: sudo <sudoingx@users.noreply.github.com> * fix: revert accidental summary_target_tokens change + add docs for context_length config - Revert summary_target_tokens from 2500 back to 500 (accidental change during patching) - Add 'Context Length Detection' section to Custom & Self-Hosted docs explaining model.context_length config override --------- Co-authored-by: Test <test@test.com> Co-authored-by: sudo <sudoingx@users.noreply.github.com>
2026-03-19 06:01:16 -07:00
- You want to limit context below the model's maximum (e.g., 8k on a 128k model to save VRAM)
- You're running behind a proxy that doesn't expose `/v1/models`
:::
---
### Named Custom Providers
If you work with multiple custom endpoints (e.g., a local dev server and a remote GPU server), you can define them as named custom providers in `config.yaml`:
```yaml
custom_providers:
- name: local
base_url: http://localhost:8080/v1
# api_key omitted — Hermes uses "no-key-required" for keyless local servers
- name: work
base_url: https://gpu-server.internal.corp/v1
api_key: corp-api-key
api_mode: chat_completions # optional, auto-detected from URL
- name: anthropic-proxy
base_url: https://proxy.example.com/anthropic
api_key: proxy-key
api_mode: anthropic_messages # for Anthropic-compatible proxies
```
Switch between them mid-session with the triple syntax:
```
/model custom:local:qwen-2.5 # Use the "local" endpoint with qwen-2.5
/model custom:work:llama3-70b # Use the "work" endpoint with llama3-70b
/model custom:anthropic-proxy:claude-sonnet-4 # Use the proxy
```
You can also select named custom providers from the interactive `hermes model` menu.
---
### Choosing the Right Setup
| Use Case | Recommended |
|----------|-------------|
| **Just want it to work** | OpenRouter (default) or Nous Portal |
| **Local models, easy setup** | Ollama |
| **Production GPU serving** | vLLM or SGLang |
| **Mac / no GPU** | Ollama or llama.cpp |
| **Multi-provider routing** | LiteLLM Proxy or OpenRouter |
| **Cost optimization** | ClawRouter or OpenRouter with `sort: "price"` |
| **Maximum privacy** | Ollama, vLLM, or llama.cpp (fully local) |
| **Enterprise / Azure** | Azure OpenAI with custom endpoint |
feat: add z.ai/GLM, Kimi/Moonshot, MiniMax as first-class providers Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn) to the inference provider system. All use standard OpenAI-compatible chat/completions endpoints with Bearer token auth. Core changes: - auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var fields. Added providers to PROVIDER_REGISTRY. Added provider aliases (glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key providers in resolve_provider(). Added resolve_api_key_provider_credentials() and get_api_key_provider_status() helpers. - runtime_provider.py: Added generic API-key provider branch in resolve_runtime_provider() — any provider with auth_type='api_key' is automatically handled. - main.py: Added providers to hermes model menu with generic _model_flow_api_key_provider() flow. Updated _has_any_provider_configured() to check all provider env vars. Updated argparse --provider choices. - setup.py: Added providers to setup wizard with API key prompts and curated model lists. - config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY, etc.) to OPTIONAL_ENV_VARS. - status.py: Added API key display and provider status section. - doctor.py: Added connectivity checks for each provider endpoint. - cli.py: Updated provider docstrings. Docs: Updated README.md, .env.example, cli-config.yaml.example, cli-commands.md, environment-variables.md, configuration.md. Tests: 50 new tests covering registry, aliases, resolution, auto-detection, credential resolution, and runtime provider dispatch. Inspired by PR #33 (numman-ali) which proposed a provider registry approach. Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related provider improvements merged earlier in this changeset.
2026-03-06 18:55:12 -08:00
| **Chinese AI models** | z.ai (GLM), Kimi/Moonshot, or MiniMax (first-class providers) |
:::tip
You can switch between providers at any time with `hermes model` — no restart required. Your conversation history, memory, and skills carry over regardless of which provider you use.
:::
## Optional API Keys
| Feature | Provider | Env Variable |
|---------|----------|--------------|
| Web scraping | [Firecrawl](https://firecrawl.dev/) | `FIRECRAWL_API_KEY`, `FIRECRAWL_API_URL` |
| Browser automation | [Browserbase](https://browserbase.com/) | `BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID` |
| Image generation | [FAL](https://fal.ai/) | `FAL_KEY` |
| Premium TTS voices | [ElevenLabs](https://elevenlabs.io/) | `ELEVENLABS_API_KEY` |
| OpenAI TTS + voice transcription | [OpenAI](https://platform.openai.com/api-keys) | `VOICE_TOOLS_OPENAI_KEY` |
| RL Training | [Tinker](https://tinker-console.thinkingmachines.ai/) + [WandB](https://wandb.ai/) | `TINKER_API_KEY`, `WANDB_API_KEY` |
| Cross-session user modeling | [Honcho](https://honcho.dev/) | `HONCHO_API_KEY` |
### Self-Hosting Firecrawl
By default, Hermes uses the [Firecrawl cloud API](https://firecrawl.dev/) for web search and scraping. If you prefer to run Firecrawl locally, you can point Hermes at a self-hosted instance instead. See Firecrawl's [SELF_HOST.md](https://github.com/firecrawl/firecrawl/blob/main/SELF_HOST.md) for complete setup instructions.
**What you get:** No API key required, no rate limits, no per-page costs, full data sovereignty.
**What you lose:** The cloud version uses Firecrawl's proprietary "Fire-engine" for advanced anti-bot bypassing (Cloudflare, CAPTCHAs, IP rotation). Self-hosted uses basic fetch + Playwright, so some protected sites may fail. Search uses DuckDuckGo instead of Google.
**Setup:**
1. Clone and start the Firecrawl Docker stack (5 containers: API, Playwright, Redis, RabbitMQ, PostgreSQL — requires ~4-8 GB RAM):
```bash
git clone https://github.com/firecrawl/firecrawl
cd firecrawl
# In .env, set: USE_DB_AUTHENTICATION=false, HOST=0.0.0.0, PORT=3002
docker compose up -d
```
2. Point Hermes at your instance (no API key needed):
```bash
hermes config set FIRECRAWL_API_URL http://localhost:3002
```
You can also set both `FIRECRAWL_API_KEY` and `FIRECRAWL_API_URL` if your self-hosted instance has authentication enabled.
## OpenRouter Provider Routing
When using OpenRouter, you can control how requests are routed across providers. Add a `provider_routing` section to `~/.hermes/config.yaml`:
```yaml
provider_routing:
sort: "throughput" # "price" (default), "throughput", or "latency"
# only: ["anthropic"] # Only use these providers
# ignore: ["deepinfra"] # Skip these providers
# order: ["anthropic", "google"] # Try providers in this order
# require_parameters: true # Only use providers that support all request params
# data_collection: "deny" # Exclude providers that may store/train on data
```
**Shortcuts:** Append `:nitro` to any model name for throughput sorting (e.g., `anthropic/claude-sonnet-4:nitro`), or `:floor` for price sorting.
docs: fallback providers + /background command documentation * docs: comprehensive fallback providers documentation - New dedicated page: user-guide/features/fallback-providers.md covering both primary model fallback and auxiliary task fallback systems - Updated configuration.md with fallback_model config section - Updated environment-variables.md noting fallback is config-only - Fleshed out developer-guide/provider-runtime.md fallback section with internal architecture details (trigger points, activation flow, config flow) - Added cross-reference from provider-routing.md distinguishing OpenRouter sub-provider routing from Hermes-level model fallback - Added new page to sidebar under Integrations * docs: comprehensive /background command documentation - Added Background Sessions section to cli.md covering how it works (daemon threads, isolated sessions, config inheritance, Rich panel output, bell notification, concurrent tasks) - Added Background Sessions section to messaging/index.md covering messaging-specific behavior (async execution, result delivery back to same chat, fire-and-forget pattern) - Documented background_process_notifications config (all/result/error/off) in messaging docs and configuration.md - Added HERMES_BACKGROUND_NOTIFICATIONS env var to reference page - Fixed inconsistency in slash-commands.md: /background was listed as messaging-only but works in both CLI and messaging. Moved it to the 'both surfaces' note. - Expanded one-liner table descriptions with detail and cross-references
2026-03-15 06:24:28 -07:00
## Fallback Model
Configure a backup provider:model that Hermes switches to automatically when your primary model fails (rate limits, server errors, auth failures):
```yaml
fallback_model:
provider: openrouter # required
model: anthropic/claude-sonnet-4 # required
# base_url: http://localhost:8000/v1 # optional, for custom endpoints
# api_key_env: MY_CUSTOM_KEY # optional, env var name for custom endpoint API key
```
When activated, the fallback swaps the model and provider mid-session without losing your conversation. It fires **at most once** per session.
Supported providers: `openrouter`, `nous`, `openai-codex`, `copilot`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `custom`.
docs: fallback providers + /background command documentation * docs: comprehensive fallback providers documentation - New dedicated page: user-guide/features/fallback-providers.md covering both primary model fallback and auxiliary task fallback systems - Updated configuration.md with fallback_model config section - Updated environment-variables.md noting fallback is config-only - Fleshed out developer-guide/provider-runtime.md fallback section with internal architecture details (trigger points, activation flow, config flow) - Added cross-reference from provider-routing.md distinguishing OpenRouter sub-provider routing from Hermes-level model fallback - Added new page to sidebar under Integrations * docs: comprehensive /background command documentation - Added Background Sessions section to cli.md covering how it works (daemon threads, isolated sessions, config inheritance, Rich panel output, bell notification, concurrent tasks) - Added Background Sessions section to messaging/index.md covering messaging-specific behavior (async execution, result delivery back to same chat, fire-and-forget pattern) - Documented background_process_notifications config (all/result/error/off) in messaging docs and configuration.md - Added HERMES_BACKGROUND_NOTIFICATIONS env var to reference page - Fixed inconsistency in slash-commands.md: /background was listed as messaging-only but works in both CLI and messaging. Moved it to the 'both surfaces' note. - Expanded one-liner table descriptions with detail and cross-references
2026-03-15 06:24:28 -07:00
:::tip
Fallback is configured exclusively through `config.yaml` — there are no environment variables for it. For full details on when it triggers, supported providers, and how it interacts with auxiliary tasks and delegation, see [Fallback Providers](/docs/user-guide/features/fallback-providers).
:::
fix: hermes update causes dual gateways on macOS (launchd) (#1567) * feat: add optional smart model routing Add a conservative cheap-vs-strong routing option that can send very short/simple turns to a cheaper model across providers while keeping the primary model for complex work. Wire it through CLI, gateway, and cron, and document the config.yaml workflow. * fix(gateway): remove recursive ExecStop from systemd units, extend TimeoutStopSec to 60s * fix(gateway): avoid recursive ExecStop in user systemd unit * fix: extend ExecStop removal and TimeoutStopSec=60 to system unit The cherry-picked PR #1448 fix only covered the user systemd unit. The system unit had the same TimeoutStopSec=15 and could benefit from the same 60s timeout for clean shutdown. Also adds a regression test for the system unit. --------- Co-authored-by: Ninja <ninja@local> * feat(skills): add blender-mcp optional skill for 3D modeling Control a running Blender instance from Hermes via socket connection to the blender-mcp addon (port 9876). Supports creating 3D objects, materials, animations, and running arbitrary bpy code. Placed in optional-skills/ since it requires Blender 4.3+ desktop with a third-party addon manually started each session. * feat(acp): support slash commands in ACP adapter (#1532) Adds /help, /model, /tools, /context, /reset, /compact, /version to the ACP adapter (VS Code, Zed, JetBrains). Commands are handled directly in the server without instantiating the TUI — each command queries agent/session state and returns plain text. Unrecognized /commands fall through to the LLM as normal messages. /model uses detect_provider_for_model() for auto-detection when switching models, matching the CLI and gateway behavior. Fixes #1402 * fix(logging): improve error logging in session search tool (#1533) * fix(gateway): restart on retryable startup failures (#1517) * feat(email): add skip_attachments option via config.yaml * feat(email): add skip_attachments option via config.yaml Adds a config.yaml-driven option to skip email attachments in the gateway email adapter. Useful for malware protection and bandwidth savings. Configure in config.yaml: platforms: email: skip_attachments: true Based on PR #1521 by @an420eth, changed from env var to config.yaml (via PlatformConfig.extra) to match the project's config-first pattern. * docs: document skip_attachments option for email adapter * fix(telegram): retry on transient TLS failures during connect and send Add exponential-backoff retry (3 attempts) around initialize() to handle transient TLS resets during gateway startup. Also catches TimedOut and OSError in addition to NetworkError. Add exponential-backoff retry (3 attempts) around send_message() for NetworkError during message delivery, wrapping the existing Markdown fallback logic. Both imports are guarded with try/except ImportError for test environments where telegram is mocked. Based on PR #1527 by cmd8. Closes #1526. * feat: permissive block_anchor thresholds and unicode normalization (#1539) Salvaged from PR #1528 by an420eth. Closes #517. Improves _strategy_block_anchor in fuzzy_match.py: - Add unicode normalization (smart quotes, em/en-dashes, ellipsis, non-breaking spaces → ASCII) so LLM-produced unicode artifacts don't break anchor line matching - Lower thresholds: 0.10 for unique matches (was 0.70), 0.30 for multiple candidates — if first/last lines match exactly, the block is almost certainly correct - Use original (non-normalized) content for offset calculation to preserve correct character positions Tested: 3 new scenarios fixed (em-dash anchors, non-breaking space anchors, very-low-similarity unique matches), zero regressions on all 9 existing fuzzy match tests. Co-authored-by: an420eth <an420eth@users.noreply.github.com> * feat(cli): add file path autocomplete in the input prompt (#1545) When typing a path-like token (./ ../ ~/ / or containing /), the CLI now shows filesystem completions in the dropdown menu. Directories show a trailing slash and 'dir' label; files show their size. Completions are case-insensitive and capped at 30 entries. Triggered by tokens like: edit ./src/ma → shows ./src/main.py, ./src/manifest.json, ... check ~/doc → shows ~/docs/, ~/documents/, ... read /etc/hos → shows /etc/hosts, /etc/hostname, ... open tools/reg → shows tools/registry.py Slash command autocomplete (/help, /model, etc.) is unaffected — it still triggers when the input starts with /. Inspired by OpenCode PR #145 (file path completion menu). Implementation: - hermes_cli/commands.py: _extract_path_word() detects path-like tokens, _path_completions() yields filesystem Completions with size labels, get_completions() routes to paths vs slash commands - tests/hermes_cli/test_path_completion.py: 26 tests covering path extraction, prefix filtering, directory markers, home expansion, case-insensitivity, integration with slash commands * feat(privacy): redact PII from LLM context when privacy.redact_pii is enabled Add privacy.redact_pii config option (boolean, default false). When enabled, the gateway redacts personally identifiable information from the system prompt before sending it to the LLM provider: - Phone numbers (user IDs on WhatsApp/Signal) → hashed to user_<sha256> - User IDs → hashed to user_<sha256> - Chat IDs → numeric portion hashed, platform prefix preserved - Home channel IDs → hashed - Names/usernames → NOT affected (user-chosen, publicly visible) Hashes are deterministic (same user → same hash) so the model can still distinguish users in group chats. Routing and delivery use the original values internally — redaction only affects LLM context. Inspired by OpenClaw PR #47959. * fix(privacy): skip PII redaction on Discord/Slack (mentions need real IDs) Discord uses <@user_id> for mentions and Slack uses <@U12345> — the LLM needs the real ID to tag users. Redaction now only applies to WhatsApp, Signal, and Telegram where IDs are pure routing metadata. Add 4 platform-specific tests covering Discord, WhatsApp, Signal, Slack. * feat: smart approvals + /stop command (inspired by OpenAI Codex) * feat: smart approvals — LLM-based risk assessment for dangerous commands Adds a 'smart' approval mode that uses the auxiliary LLM to assess whether a flagged command is genuinely dangerous or a false positive, auto-approving low-risk commands without prompting the user. Inspired by OpenAI Codex's Smart Approvals guardian subagent (openai/codex#13860). Config (config.yaml): approvals: mode: manual # manual (default), smart, off Modes: - manual — current behavior, always prompt the user - smart — aux LLM evaluates risk: APPROVE (auto-allow), DENY (block), or ESCALATE (fall through to manual prompt) - off — skip all approval prompts (equivalent to --yolo) When smart mode auto-approves, the pattern gets session-level approval so subsequent uses of the same pattern don't trigger another LLM call. When it denies, the command is blocked without user prompt. When uncertain, it escalates to the normal manual approval flow. The LLM prompt is carefully scoped: it sees only the command text and the flagged reason, assesses actual risk vs false positive, and returns a single-word verdict. * feat: make smart approval model configurable via config.yaml Adds auxiliary.approval section to config.yaml with the same provider/model/base_url/api_key pattern as other aux tasks (vision, web_extract, compression, etc.). Config: auxiliary: approval: provider: auto model: '' # fast/cheap model recommended base_url: '' api_key: '' Bridged to env vars in both CLI and gateway paths so the aux client picks them up automatically. * feat: add /stop command to kill all background processes Adds a /stop slash command that kills all running background processes at once. Currently users have to process(list) then process(kill) for each one individually. Inspired by OpenAI Codex's separation of interrupt (Ctrl+C stops current turn) from /stop (cleans up background processes). See openai/codex#14602. Ctrl+C continues to only interrupt the active agent turn — background dev servers, watchers, etc. are preserved. /stop is the explicit way to clean them all up. * feat: first-class plugin architecture + hide status bar cost by default (#1544) The persistent status bar now shows context %, token counts, and duration but NOT $ cost by default. Cost display is opt-in via: display: show_cost: true in config.yaml, or: hermes config set display.show_cost true The /usage command still shows full cost breakdown since the user explicitly asked for it — this only affects the always-visible bar. Status bar without cost: ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ 15m Status bar with show_cost: true: ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ $0.06 │ 15m * feat: improve memory prioritization + aggressive skill updates (inspired by OpenAI Codex) * feat: improve memory prioritization — user preferences over procedural knowledge Inspired by OpenAI Codex's memory prompt improvements (openai/codex#14493) which focus memory writes on user preferences and recurring patterns rather than procedural task details. Key insight: 'Optimize for reducing future user steering — the most valuable memory prevents the user from having to repeat themselves.' Changes: - MEMORY_GUIDANCE (prompt_builder.py): added prioritization hierarchy and the core principle about reducing user steering - MEMORY_SCHEMA (memory_tool.py): reordered WHEN TO SAVE list to put corrections first, added explicit PRIORITY guidance - Memory nudge (run_agent.py): now asks specifically about preferences, corrections, and workflow patterns instead of generic 'anything' - Memory flush (run_agent.py): now instructs to prioritize user preferences and corrections over task-specific details * feat: more aggressive skill creation and update prompting Press harder on skill updates — the agent should proactively patch skills when it encounters issues during use, not wait to be asked. Changes: - SKILLS_GUIDANCE: 'consider saving' → 'save'; added explicit instruction to patch skills immediately when found outdated/wrong - Skills header: added instruction to update loaded skills before finishing if they had missing steps or wrong commands - Skill nudge: more assertive ('save the approach' not 'consider saving'), now also prompts for updating existing skills used in the task - Skill nudge interval: lowered default from 15 to 10 iterations - skill_manage schema: added 'patch it immediately' to update triggers * feat: first-class plugin architecture (#1555) Plugin system for extending Hermes with custom tools, hooks, and integrations — no source code changes required. Core system (hermes_cli/plugins.py): - Plugin discovery from ~/.hermes/plugins/, .hermes/plugins/, and pip entry_points (hermes_agent.plugins group) - PluginContext with register_tool() and register_hook() - 6 lifecycle hooks: pre/post tool_call, pre/post llm_call, on_session_start/end - Namespace package handling for relative imports in plugins - Graceful error isolation — broken plugins never crash the agent Integration (model_tools.py): - Plugin discovery runs after built-in + MCP tools - Plugin tools bypass toolset filter via get_plugin_tool_names() - Pre/post tool call hooks fire in handle_function_call() CLI: - /plugins command shows loaded plugins, tool counts, status - Added to COMMANDS dict for autocomplete Docs: - Getting started guide (build-a-hermes-plugin.md) — full tutorial building a calculator plugin step by step - Reference page (features/plugins.md) — quick overview + tables - Covers: file structure, schemas, handlers, hooks, data files, bundled skills, env var gating, pip distribution, common mistakes Tests: 16 tests covering discovery, loading, hooks, tool visibility. * fix: hermes update causes dual gateways on macOS (launchd) Three bugs worked together to create the dual-gateway problem: 1. cmd_update only checked systemd for gateway restart, completely ignoring launchd on macOS. After killing the PID it would print 'Restart it with: hermes gateway run' even when launchd was about to auto-respawn the process. 2. launchd's KeepAlive.SuccessfulExit=false respawns the gateway after SIGTERM (non-zero exit), so the user's manual restart created a second instance. 3. The launchd plist lacked --replace (systemd had it), so the respawned gateway didn't kill stale instances on startup. Fixes: - Add --replace to launchd ProgramArguments (matches systemd) - Add launchd detection to cmd_update's auto-restart logic - Print 'auto-restart via launchd' instead of manual restart hint * fix: add launchd plist auto-refresh + explicit restart in cmd_update Two integration issues with the initial fix: 1. Existing macOS users with old plist (no --replace) would never get the fix until manual uninstall/reinstall. Added refresh_launchd_plist_if_needed() — mirrors the existing refresh_systemd_unit_if_needed(). Called from launchd_start(), launchd_restart(), and cmd_update. 2. cmd_update relied on KeepAlive respawn after SIGTERM rather than explicit launchctl stop/start. This caused races: launchd would respawn the old process before the PID file was cleaned up. Now does explicit stop+start (matching how systemd gets an explicit systemctl restart), with plist refresh first so the new --replace flag is picked up. --------- Co-authored-by: Ninja <ninja@local> Co-authored-by: alireza78a <alireza78a@users.noreply.github.com> Co-authored-by: Oktay Aydin <113846926+aydnOktay@users.noreply.github.com> Co-authored-by: JP Lew <polydegen@protonmail.com> Co-authored-by: an420eth <an420eth@users.noreply.github.com>
2026-03-16 12:36:29 -07:00
## Smart Model Routing
Optional cheap-vs-strong routing lets Hermes keep your main model for complex work while sending very short/simple turns to a cheaper model.
```yaml
smart_model_routing:
enabled: true
max_simple_chars: 160
max_simple_words: 28
cheap_model:
provider: openrouter
model: google/gemini-2.5-flash
# base_url: http://localhost:8000/v1 # optional custom endpoint
# api_key_env: MY_CUSTOM_KEY # optional env var name for that endpoint's API key
```
How it works:
- If a turn is short, single-line, and does not look code/tool/debug heavy, Hermes may route it to `cheap_model`
- If the turn looks complex, Hermes stays on your primary model/provider
- If the cheap route cannot be resolved cleanly, Hermes falls back to the primary model automatically
This is intentionally conservative. It is meant for quick, low-stakes turns like:
- short factual questions
- quick rewrites
- lightweight summaries
It will avoid routing prompts that look like:
- coding/debugging work
- tool-heavy requests
- long or multi-line analysis asks
Use this when you want lower latency or cost without fully changing your default model.
## Terminal Backend Configuration
Configure which environment the agent uses for terminal commands:
```yaml
terminal:
backend: local # or: docker, ssh, singularity, modal, daytona
cwd: "." # Working directory ("." = current dir)
timeout: 180 # Command timeout in seconds
# Docker-specific settings
docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
docker_mount_cwd_to_workspace: false # SECURITY: off by default. Opt in to mount the launch cwd into /workspace.
docker_forward_env: # Optional explicit allowlist for env passthrough
- "GITHUB_TOKEN"
docker_volumes: # Additional explicit host mounts
- "/home/user/projects:/workspace/projects"
- "/home/user/data:/data:ro" # :ro for read-only
# Container resource limits (docker, singularity, modal, daytona)
container_cpu: 1 # CPU cores
container_memory: 5120 # MB (default 5GB)
container_disk: 51200 # MB (default 50GB)
container_persistent: true # Persist filesystem across sessions
# Persistent shell — keep a long-lived bash process across commands
persistent_shell: true # Enabled by default for SSH backend
```
### Common Terminal Backend Issues
If terminal commands fail immediately or the terminal tool is reported as disabled, check the following:
- **Local backend**
- No special requirements. This is the safest default when you are just getting started.
- **Docker backend**
- Ensure Docker Desktop (or the Docker daemon) is installed and running.
- Hermes needs to be able to find the `docker` CLI. It checks your `$PATH` first and also probes common Docker Desktop install locations on macOS. Run:
```bash
docker version
```
If this fails, fix your Docker installation or switch back to the local backend:
```bash
hermes config set terminal.backend local
```
- **SSH backend**
- Both `TERMINAL_SSH_HOST` and `TERMINAL_SSH_USER` must be set, for example:
```bash
export TERMINAL_ENV=ssh
export TERMINAL_SSH_HOST=my-server.example.com
export TERMINAL_SSH_USER=ubuntu
```
- If either value is missing, Hermes will log a clear error and refuse to use the SSH backend.
- **Modal backend**
- You need either a `MODAL_TOKEN_ID` environment variable or a `~/.modal.toml` config file.
- If neither is present, the backend check fails and Hermes will report that the Modal backend is not available.
When in doubt, set `terminal.backend` back to `local` and verify that commands run there first.
### Docker Volume Mounts
When using the Docker backend, `docker_volumes` lets you share host directories with the container. Each entry uses standard Docker `-v` syntax: `host_path:container_path[:options]`.
```yaml
terminal:
backend: docker
docker_volumes:
- "/home/user/projects:/workspace/projects" # Read-write (default)
- "/home/user/datasets:/data:ro" # Read-only
- "/home/user/outputs:/outputs" # Agent writes, you read
```
This is useful for:
- **Providing files** to the agent (datasets, configs, reference code)
- **Receiving files** from the agent (generated code, reports, exports)
- **Shared workspaces** where both you and the agent access the same files
Can also be set via environment variable: `TERMINAL_DOCKER_VOLUMES='["/host:/container"]'` (JSON array).
### Docker Credential Forwarding
By default, Docker terminal sessions do not inherit arbitrary host credentials. If you need a specific token inside the container, add it to `terminal.docker_forward_env`.
```yaml
terminal:
backend: docker
docker_forward_env:
- "GITHUB_TOKEN"
- "NPM_TOKEN"
```
Hermes resolves each listed variable from your current shell first, then falls back to `~/.hermes/.env` if it was saved with `hermes config set`.
:::warning
Anything listed in `docker_forward_env` becomes visible to commands run inside the container. Only forward credentials you are comfortable exposing to the terminal session.
:::
### Optional: Mount the Launch Directory into `/workspace`
Docker sandboxes stay isolated by default. Hermes does **not** pass your current host working directory into the container unless you explicitly opt in.
Enable it in `config.yaml`:
```yaml
terminal:
backend: docker
docker_mount_cwd_to_workspace: true
```
When enabled:
- if you launch Hermes from `~/projects/my-app`, that host directory is bind-mounted to `/workspace`
- the Docker backend starts in `/workspace`
- file tools and terminal commands both see the same mounted project
When disabled, `/workspace` stays sandbox-owned unless you explicitly mount something via `docker_volumes`.
Security tradeoff:
- `false` preserves the sandbox boundary
- `true` gives the sandbox direct access to the directory you launched Hermes from
Use the opt-in only when you intentionally want the container to work on live host files.
### Persistent Shell
By default, each terminal command runs in its own subprocess — working directory, environment variables, and shell variables reset between commands. When **persistent shell** is enabled, a single long-lived bash process is kept alive across `execute()` calls so that state survives between commands.
This is most useful for the **SSH backend**, where it also eliminates per-command connection overhead. Persistent shell is **enabled by default for SSH** and disabled for the local backend.
```yaml
terminal:
persistent_shell: true # default — enables persistent shell for SSH
```
To disable:
```bash
hermes config set terminal.persistent_shell false
```
**What persists across commands:**
- Working directory (`cd /tmp` sticks for the next command)
- Exported environment variables (`export FOO=bar`)
- Shell variables (`MY_VAR=hello`)
**Precedence:**
| Level | Variable | Default |
|-------|----------|---------|
| Config | `terminal.persistent_shell` | `true` |
| SSH override | `TERMINAL_SSH_PERSISTENT` | follows config |
| Local override | `TERMINAL_LOCAL_PERSISTENT` | `false` |
Per-backend environment variables take highest precedence. If you want persistent shell on the local backend too:
```bash
export TERMINAL_LOCAL_PERSISTENT=true
```
:::note
Commands that require `stdin_data` or sudo automatically fall back to one-shot mode, since the persistent shell's stdin is already occupied by the IPC protocol.
:::
See [Code Execution](features/code-execution.md) and the [Terminal section of the README](features/tools.md) for details on each backend.
## Memory Configuration
```yaml
memory:
memory_enabled: true
user_profile_enabled: true
memory_char_limit: 2200 # ~800 tokens
user_char_limit: 1375 # ~500 tokens
```
## Git Worktree Isolation
Enable isolated git worktrees for running multiple agents in parallel on the same repo:
```yaml
worktree: true # Always create a worktree (same as hermes -w)
# worktree: false # Default — only when -w flag is passed
```
When enabled, each CLI session creates a fresh worktree under `.worktrees/` with its own branch. Agents can edit files, commit, push, and create PRs without interfering with each other. Clean worktrees are removed on exit; dirty ones are kept for manual recovery.
You can also list gitignored files to copy into worktrees via `.worktreeinclude` in your repo root:
```
# .worktreeinclude
.env
.venv/
node_modules/
```
## Context Compression
Hermes automatically compresses long conversations to stay within your model's context window. The compression summarizer is a separate LLM call — you can point it at any provider or endpoint.
All compression settings live in `config.yaml` (no environment variables).
### Full reference
```yaml
compression:
enabled: true # Toggle compression on/off
threshold: 0.50 # Compress at this % of context limit
summary_model: "google/gemini-3-flash-preview" # Model for summarization
summary_provider: "auto" # Provider: "auto", "openrouter", "nous", "codex", "main", etc.
summary_base_url: null # Custom OpenAI-compatible endpoint (overrides provider)
```
### Common setups
**Default (auto-detect) — no configuration needed:**
```yaml
compression:
enabled: true
threshold: 0.50
```
Uses the first available provider (OpenRouter → Nous → Codex) with Gemini Flash.
**Force a specific provider** (OAuth or API-key based):
```yaml
compression:
summary_provider: nous
summary_model: gemini-3-flash
```
Works with any provider: `nous`, `openrouter`, `codex`, `anthropic`, `main`, etc.
**Custom endpoint** (self-hosted, Ollama, zai, DeepSeek, etc.):
```yaml
compression:
summary_model: glm-4.7
summary_base_url: https://api.z.ai/api/coding/paas/v4
```
Points at a custom OpenAI-compatible endpoint. Uses `OPENAI_API_KEY` for auth.
### How the three knobs interact
| `summary_provider` | `summary_base_url` | Result |
|---------------------|---------------------|--------|
| `auto` (default) | not set | Auto-detect best available provider |
| `nous` / `openrouter` / etc. | not set | Force that provider, use its auth |
| any | set | Use the custom endpoint directly (provider ignored) |
The `summary_model` must support a context length at least as large as your main model's, since it receives the full middle section of the conversation for compression.
## Iteration Budget Pressure
When the agent is working on a complex task with many tool calls, it can burn through its iteration budget (default: 90 turns) without realizing it's running low. Budget pressure automatically warns the model as it approaches the limit:
| Threshold | Level | What the model sees |
|-----------|-------|---------------------|
| **70%** | Caution | `[BUDGET: 63/90. 27 iterations left. Start consolidating.]` |
| **90%** | Warning | `[BUDGET WARNING: 81/90. Only 9 left. Respond NOW.]` |
Warnings are injected into the last tool result's JSON (as a `_budget_warning` field) rather than as separate messages — this preserves prompt caching and doesn't disrupt the conversation structure.
```yaml
agent:
max_turns: 90 # Max iterations per conversation turn (default: 90)
```
Budget pressure is enabled by default. The agent sees warnings naturally as part of tool results, encouraging it to consolidate its work and deliver a response before running out of iterations.
## Context Pressure Warnings
Separate from iteration budget pressure, context pressure tracks how close the conversation is to the **compaction threshold** — the point where context compression fires to summarize older messages. This helps both you and the agent understand when the conversation is getting long.
| Progress | Level | What happens |
|----------|-------|-------------|
| **≥ 60%** to threshold | Info | CLI shows a cyan progress bar; gateway sends an informational notice |
| **≥ 85%** to threshold | Warning | CLI shows a bold yellow bar; gateway warns compaction is imminent |
In the CLI, context pressure appears as a progress bar in the tool output feed:
```
◐ context ████████████░░░░░░░░ 62% to compaction 48k threshold (50%) · approaching compaction
```
On messaging platforms, a plain-text notification is sent:
```
◐ Context: ████████████░░░░░░░░ 62% to compaction (threshold: 50% of window).
```
If auto-compression is disabled, the warning tells you context may be truncated instead.
Context pressure is automatic — no configuration needed. It fires purely as a user-facing notification and does not modify the message stream or inject anything into the model's context.
## Auxiliary Models
Hermes uses lightweight "auxiliary" models for side tasks like image analysis, web page summarization, and browser screenshot analysis. By default, these use **Gemini Flash** via auto-detection — you don't need to configure anything.
### The universal config pattern
Every model slot in Hermes — auxiliary tasks, compression, fallback — uses the same three knobs:
| Key | What it does | Default |
|-----|-------------|---------|
| `provider` | Which provider to use for auth and routing | `"auto"` |
| `model` | Which model to request | provider's default |
| `base_url` | Custom OpenAI-compatible endpoint (overrides provider) | not set |
When `base_url` is set, Hermes ignores the provider and calls that endpoint directly (using `api_key` or `OPENAI_API_KEY` for auth). When only `provider` is set, Hermes uses that provider's built-in auth and base URL.
Available providers: `auto`, `openrouter`, `nous`, `codex`, `copilot`, `anthropic`, `main`, `zai`, `kimi-coding`, `minimax`, and any provider registered in the [provider registry](/docs/reference/environment-variables).
### Full auxiliary config reference
```yaml
auxiliary:
# Image analysis (vision_analyze tool + browser screenshots)
vision:
provider: "auto" # "auto", "openrouter", "nous", "codex", "main", etc.
model: "" # e.g. "openai/gpt-4o", "google/gemini-2.5-flash"
base_url: "" # Custom OpenAI-compatible endpoint (overrides provider)
api_key: "" # API key for base_url (falls back to OPENAI_API_KEY)
fix: background task media delivery + vision download timeout (#3919) * feat(telegram): add webhook mode as alternative to polling When TELEGRAM_WEBHOOK_URL is set, the adapter starts an HTTP webhook server (via python-telegram-bot's start_webhook()) instead of long polling. This enables cloud platforms like Fly.io and Railway to auto-wake suspended machines on inbound HTTP traffic. Polling remains the default — no behavior change unless the env var is set. Env vars: TELEGRAM_WEBHOOK_URL Public HTTPS URL for Telegram to push to TELEGRAM_WEBHOOK_PORT Local listen port (default 8443) TELEGRAM_WEBHOOK_SECRET Secret token for update verification Cherry-picked and adapted from PR #2022 by SHL0MS. Preserved all current main enhancements (network error recovery, polling conflict detection, DM topics setup). Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com> * fix: send_document call in background task delivery + vision download timeout Two fixes salvaged from PR #2269 by amethystani: 1. gateway/run.py: adapter.send_file() → adapter.send_document() send_file() doesn't exist on BasePlatformAdapter. Background task media files were silently never delivered (AttributeError swallowed by except Exception: pass). 2. tools/vision_tools.py: configurable image download timeout via HERMES_VISION_DOWNLOAD_TIMEOUT env var (default 30s), plus guard against raise None when max_retries=0. The third fix in #2269 (opencode-go auth config) was already resolved on main. Co-authored-by: amethystani <amethystani@users.noreply.github.com> --------- Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com> Co-authored-by: amethystani <amethystani@users.noreply.github.com>
2026-03-30 02:59:39 -07:00
timeout: 30 # seconds — LLM API call; increase for slow local vision models
download_timeout: 30 # seconds — image HTTP download; increase for slow connections
# Web page summarization + browser page text extraction
web_extract:
provider: "auto"
model: "" # e.g. "google/gemini-2.5-flash"
base_url: ""
api_key: ""
timeout: 30 # seconds
# Dangerous command approval classifier
approval:
provider: "auto"
model: ""
base_url: ""
api_key: ""
timeout: 30 # seconds
# Context compression timeout (separate from compression.* config)
compression:
timeout: 120 # seconds — compression summarizes long conversations, needs more time
```
:::tip
fix: background task media delivery + vision download timeout (#3919) * feat(telegram): add webhook mode as alternative to polling When TELEGRAM_WEBHOOK_URL is set, the adapter starts an HTTP webhook server (via python-telegram-bot's start_webhook()) instead of long polling. This enables cloud platforms like Fly.io and Railway to auto-wake suspended machines on inbound HTTP traffic. Polling remains the default — no behavior change unless the env var is set. Env vars: TELEGRAM_WEBHOOK_URL Public HTTPS URL for Telegram to push to TELEGRAM_WEBHOOK_PORT Local listen port (default 8443) TELEGRAM_WEBHOOK_SECRET Secret token for update verification Cherry-picked and adapted from PR #2022 by SHL0MS. Preserved all current main enhancements (network error recovery, polling conflict detection, DM topics setup). Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com> * fix: send_document call in background task delivery + vision download timeout Two fixes salvaged from PR #2269 by amethystani: 1. gateway/run.py: adapter.send_file() → adapter.send_document() send_file() doesn't exist on BasePlatformAdapter. Background task media files were silently never delivered (AttributeError swallowed by except Exception: pass). 2. tools/vision_tools.py: configurable image download timeout via HERMES_VISION_DOWNLOAD_TIMEOUT env var (default 30s), plus guard against raise None when max_retries=0. The third fix in #2269 (opencode-go auth config) was already resolved on main. Co-authored-by: amethystani <amethystani@users.noreply.github.com> --------- Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com> Co-authored-by: amethystani <amethystani@users.noreply.github.com>
2026-03-30 02:59:39 -07:00
Each auxiliary task has a configurable `timeout` (in seconds). Defaults: vision 30s, web_extract 30s, approval 30s, compression 120s. Increase these if you use slow local models for auxiliary tasks. Vision also has a separate `download_timeout` (default 30s) for the HTTP image download — increase this for slow connections or self-hosted image servers.
:::
:::info
Context compression has its own top-level `compression:` block with `summary_provider`, `summary_model`, and `summary_base_url` — see [Context Compression](#context-compression) above. The fallback model uses a `fallback_model:` block — see [Fallback Model](#fallback-model) above. All three follow the same provider/model/base_url pattern.
:::
### Changing the Vision Model
To use GPT-4o instead of Gemini Flash for image analysis:
```yaml
auxiliary:
vision:
model: "openai/gpt-4o"
```
Or via environment variable (in `~/.hermes/.env`):
```bash
AUXILIARY_VISION_MODEL=openai/gpt-4o
```
### Provider Options
| Provider | Description | Requirements |
|----------|-------------|-------------|
| `"auto"` | Best available (default). Vision tries OpenRouter → Nous → Codex. | — |
| `"openrouter"` | Force OpenRouter — routes to any model (Gemini, GPT-4o, Claude, etc.) | `OPENROUTER_API_KEY` |
| `"nous"` | Force Nous Portal | `hermes login` |
| `"codex"` | Force Codex OAuth (ChatGPT account). Supports vision (gpt-5.3-codex). | `hermes model` → Codex |
| `"main"` | Use your active custom/main endpoint. This can come from `OPENAI_BASE_URL` + `OPENAI_API_KEY` or from a custom endpoint saved via `hermes model` / `config.yaml`. Works with OpenAI, local models, or any OpenAI-compatible API. | Custom endpoint credentials + base URL |
### Common Setups
**Using a direct custom endpoint** (clearer than `provider: "main"` for local/self-hosted APIs):
```yaml
auxiliary:
vision:
base_url: "http://localhost:1234/v1"
api_key: "local-key"
model: "qwen2.5-vl"
```
`base_url` takes precedence over `provider`, so this is the most explicit way to route an auxiliary task to a specific endpoint. For direct endpoint overrides, Hermes uses the configured `api_key` or falls back to `OPENAI_API_KEY`; it does not reuse `OPENROUTER_API_KEY` for that custom endpoint.
**Using OpenAI API key for vision:**
```yaml
# In ~/.hermes/.env:
# OPENAI_BASE_URL=https://api.openai.com/v1
# OPENAI_API_KEY=sk-...
auxiliary:
vision:
provider: "main"
model: "gpt-4o" # or "gpt-4o-mini" for cheaper
```
**Using OpenRouter for vision** (route to any model):
```yaml
auxiliary:
vision:
provider: "openrouter"
model: "openai/gpt-4o" # or "google/gemini-2.5-flash", etc.
```
**Using Codex OAuth** (ChatGPT Pro/Plus account — no API key needed):
```yaml
auxiliary:
vision:
provider: "codex" # uses your ChatGPT OAuth token
# model defaults to gpt-5.3-codex (supports vision)
```
**Using a local/self-hosted model:**
```yaml
auxiliary:
vision:
provider: "main" # uses your active custom endpoint
model: "my-local-model"
```
`provider: "main"` follows the same custom endpoint Hermes uses for normal chat. That endpoint can be set directly with `OPENAI_BASE_URL`, or saved once through `hermes model` and persisted in `config.yaml`.
:::tip
If you use Codex OAuth as your main model provider, vision works automatically — no extra configuration needed. Codex is included in the auto-detection chain for vision.
:::
:::warning
**Vision requires a multimodal model.** If you set `provider: "main"`, make sure your endpoint supports multimodal/vision — otherwise image analysis will fail.
:::
### Environment Variables (legacy)
Auxiliary models can also be configured via environment variables. However, `config.yaml` is the preferred method — it's easier to manage and supports all options including `base_url` and `api_key`.
| Setting | Environment Variable |
|---------|---------------------|
| Vision provider | `AUXILIARY_VISION_PROVIDER` |
| Vision model | `AUXILIARY_VISION_MODEL` |
| Vision endpoint | `AUXILIARY_VISION_BASE_URL` |
| Vision API key | `AUXILIARY_VISION_API_KEY` |
| Web extract provider | `AUXILIARY_WEB_EXTRACT_PROVIDER` |
| Web extract model | `AUXILIARY_WEB_EXTRACT_MODEL` |
| Web extract endpoint | `AUXILIARY_WEB_EXTRACT_BASE_URL` |
| Web extract API key | `AUXILIARY_WEB_EXTRACT_API_KEY` |
Compression and fallback model settings are config.yaml-only.
:::tip
Run `hermes config` to see your current auxiliary model settings. Overrides only show up when they differ from the defaults.
:::
## Reasoning Effort
Control how much "thinking" the model does before responding:
```yaml
agent:
reasoning_effort: "" # empty = medium (default). Options: xhigh (max), high, medium, low, minimal, none
```
When unset (default), reasoning effort defaults to "medium" — a balanced level that works well for most tasks. Setting a value overrides it — higher reasoning effort gives better results on complex tasks at the cost of more tokens and latency.
You can also change the reasoning effort at runtime with the `/reasoning` command:
```
/reasoning # Show current effort level and display state
/reasoning high # Set reasoning effort to high
/reasoning none # Disable reasoning
/reasoning show # Show model thinking above each response
/reasoning hide # Hide model thinking
```
## Tool-Use Enforcement
Some models (especially GPT-family) occasionally describe intended actions as text instead of making tool calls. Tool-use enforcement injects guidance that steers the model back to actually calling tools.
```yaml
agent:
tool_use_enforcement: "auto" # "auto" | true | false | ["model-substring", ...]
```
| Value | Behavior |
|-------|----------|
| `"auto"` (default) | Enabled for GPT models (`gpt-`, `openai/gpt-`) and disabled for all others. |
| `true` | Always enabled for all models. |
| `false` | Always disabled. |
| `["gpt-", "o1-", "custom-model"]` | Enabled only for models whose name contains one of the listed substrings. |
When enabled, the system prompt includes guidance reminding the model to make actual tool calls rather than describing what it would do. This is transparent to the user and has no effect on models that already use tools reliably.
## TTS Configuration
```yaml
tts:
provider: "edge" # "edge" | "elevenlabs" | "openai" | "neutts"
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
elevenlabs:
voice_id: "pNInz6obpgDQGcFmaJgB"
model_id: "eleven_multilingual_v2"
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
base_url: "https://api.openai.com/v1" # Override for OpenAI-compatible TTS endpoints
neutts:
ref_audio: ''
ref_text: ''
model: neuphonic/neutts-air-q4-gguf
device: cpu
```
2026-03-14 19:29:01 -07:00
This controls both the `text_to_speech` tool and spoken replies in voice mode (`/voice tts` in the CLI or messaging gateway).
## Display Settings
```yaml
display:
tool_progress: all # off | new | all | verbose
feat: config-gated /verbose command for messaging gateway (#3262) * feat: config-gated /verbose command for messaging gateway Add gateway_config_gate field to CommandDef, allowing cli_only commands to be conditionally available in the gateway based on a config value. - CommandDef gains gateway_config_gate: str | None — a config dotpath that, when truthy, overrides cli_only for gateway surfaces - /verbose uses gateway_config_gate='display.tool_progress_command' - Default is off (cli_only behavior preserved) - When enabled, /verbose cycles tool_progress mode (off/new/all/verbose) in the gateway, saving to config.yaml — same cycle as the CLI - Gateway helpers (help, telegram menus, slack mapping) dynamically check config to include/exclude config-gated commands - GATEWAY_KNOWN_COMMANDS always includes config-gated commands so the gateway recognizes them and can respond appropriately - Handles YAML 1.1 bool coercion (bare 'off' parses as False) - 8 new tests for the config gate mechanism + gateway handler * docs: document gateway_config_gate and /verbose messaging support - AGENTS.md: add gateway_config_gate to CommandDef fields - slash-commands.md: note /verbose can be enabled for messaging, update Notes - configuration.md: add tool_progress_command to display section + usage note - cli.md: cross-link to config docs for messaging enablement - messaging/index.md: show tool_progress_command in config snippet - plugins.md: add gateway_config_gate to register_command parameter table
2026-03-26 14:41:04 -07:00
tool_progress_command: false # Enable /verbose slash command in messaging gateway
skin: default # Built-in or custom CLI skin (see user-guide/features/skins)
theme_mode: auto # auto | light | dark — color scheme for skin-aware rendering
personality: "kawaii" # Legacy cosmetic field still surfaced in some summaries
compact: false # Compact output mode (less whitespace)
resume_display: full # full (show previous messages on resume) | minimal (one-liner only)
bell_on_complete: false # Play terminal bell when agent finishes (great for long tasks)
show_reasoning: false # Show model reasoning/thinking above each response (toggle with /reasoning show|hide)
streaming: false # Stream tokens to terminal as they arrive (real-time output)
docs: fallback providers + /background command documentation * docs: comprehensive fallback providers documentation - New dedicated page: user-guide/features/fallback-providers.md covering both primary model fallback and auxiliary task fallback systems - Updated configuration.md with fallback_model config section - Updated environment-variables.md noting fallback is config-only - Fleshed out developer-guide/provider-runtime.md fallback section with internal architecture details (trigger points, activation flow, config flow) - Added cross-reference from provider-routing.md distinguishing OpenRouter sub-provider routing from Hermes-level model fallback - Added new page to sidebar under Integrations * docs: comprehensive /background command documentation - Added Background Sessions section to cli.md covering how it works (daemon threads, isolated sessions, config inheritance, Rich panel output, bell notification, concurrent tasks) - Added Background Sessions section to messaging/index.md covering messaging-specific behavior (async execution, result delivery back to same chat, fire-and-forget pattern) - Documented background_process_notifications config (all/result/error/off) in messaging docs and configuration.md - Added HERMES_BACKGROUND_NOTIFICATIONS env var to reference page - Fixed inconsistency in slash-commands.md: /background was listed as messaging-only but works in both CLI and messaging. Moved it to the 'both surfaces' note. - Expanded one-liner table descriptions with detail and cross-references
2026-03-15 06:24:28 -07:00
background_process_notifications: all # all | result | error | off (gateway only)
show_cost: false # Show estimated $ cost in the CLI status bar
```
### Theme mode
The `theme_mode` setting controls whether skins render in light or dark mode:
| Mode | Behavior |
|------|----------|
| `auto` (default) | Detects your terminal's background color automatically. Falls back to `dark` if detection fails. |
| `light` | Forces light-mode skin colors. Skins that define a `colors_light` override use those colors instead of the default dark-mode palette. |
| `dark` | Forces dark-mode skin colors. |
This works with any skin — built-in or custom. Skin authors can provide `colors_light` in their skin definition for optimal light-terminal appearance.
| Mode | What you see |
|------|-------------|
| `off` | Silent — just the final response |
| `new` | Tool indicator only when the tool changes |
| `all` | Every tool call with a short preview (default) |
| `verbose` | Full args, results, and debug logs |
feat: config-gated /verbose command for messaging gateway (#3262) * feat: config-gated /verbose command for messaging gateway Add gateway_config_gate field to CommandDef, allowing cli_only commands to be conditionally available in the gateway based on a config value. - CommandDef gains gateway_config_gate: str | None — a config dotpath that, when truthy, overrides cli_only for gateway surfaces - /verbose uses gateway_config_gate='display.tool_progress_command' - Default is off (cli_only behavior preserved) - When enabled, /verbose cycles tool_progress mode (off/new/all/verbose) in the gateway, saving to config.yaml — same cycle as the CLI - Gateway helpers (help, telegram menus, slack mapping) dynamically check config to include/exclude config-gated commands - GATEWAY_KNOWN_COMMANDS always includes config-gated commands so the gateway recognizes them and can respond appropriately - Handles YAML 1.1 bool coercion (bare 'off' parses as False) - 8 new tests for the config gate mechanism + gateway handler * docs: document gateway_config_gate and /verbose messaging support - AGENTS.md: add gateway_config_gate to CommandDef fields - slash-commands.md: note /verbose can be enabled for messaging, update Notes - configuration.md: add tool_progress_command to display section + usage note - cli.md: cross-link to config docs for messaging enablement - messaging/index.md: show tool_progress_command in config snippet - plugins.md: add gateway_config_gate to register_command parameter table
2026-03-26 14:41:04 -07:00
In the CLI, cycle through these modes with `/verbose`. To use `/verbose` in messaging platforms (Telegram, Discord, Slack, etc.), set `tool_progress_command: true` in the `display` section above. The command will then cycle the mode and save to config.
## Privacy
```yaml
privacy:
redact_pii: false # Strip PII from LLM context (gateway only)
```
When `redact_pii` is `true`, the gateway redacts personally identifiable information from the system prompt before sending it to the LLM on supported platforms:
| Field | Treatment |
|-------|-----------|
| Phone numbers (user ID on WhatsApp/Signal) | Hashed to `user_<12-char-sha256>` |
| User IDs | Hashed to `user_<12-char-sha256>` |
| Chat IDs | Numeric portion hashed, platform prefix preserved (`telegram:<hash>`) |
| Home channel IDs | Numeric portion hashed |
| User names / usernames | **Not affected** (user-chosen, publicly visible) |
**Platform support:** Redaction applies to WhatsApp, Signal, and Telegram. Discord and Slack are excluded because their mention systems (`<@user_id>`) require the real ID in the LLM context.
Hashes are deterministic — the same user always maps to the same hash, so the model can still distinguish between users in group chats. Routing and delivery use the original values internally.
## Speech-to-Text (STT)
```yaml
stt:
2026-03-14 19:29:01 -07:00
provider: "local" # "local" | "groq" | "openai"
local:
model: "base" # tiny, base, small, medium, large-v3
openai:
model: "whisper-1" # whisper-1 | gpt-4o-mini-transcribe | gpt-4o-transcribe
# model: "whisper-1" # Legacy fallback key still respected
```
Provider behavior:
- `local` uses `faster-whisper` running on your machine. Install it separately with `pip install faster-whisper`.
- `groq` uses Groq's Whisper-compatible endpoint and reads `GROQ_API_KEY`.
- `openai` uses the OpenAI speech API and reads `VOICE_TOOLS_OPENAI_KEY`.
If the requested provider is unavailable, Hermes falls back automatically in this order: `local``groq``openai`.
Groq and OpenAI model overrides are environment-driven:
```bash
STT_GROQ_MODEL=whisper-large-v3-turbo
STT_OPENAI_MODEL=whisper-1
GROQ_BASE_URL=https://api.groq.com/openai/v1
STT_OPENAI_BASE_URL=https://api.openai.com/v1
```
## Voice Mode (CLI)
```yaml
voice:
record_key: "ctrl+b" # Push-to-talk key inside the CLI
max_recording_seconds: 120 # Hard stop for long recordings
auto_tts: false # Enable spoken replies automatically when /voice on
silence_threshold: 200 # RMS threshold for speech detection
silence_duration: 3.0 # Seconds of silence before auto-stop
```
2026-03-14 19:29:01 -07:00
Use `/voice on` in the CLI to enable microphone mode, `record_key` to start/stop recording, and `/voice tts` to toggle spoken replies. See [Voice Mode](/docs/user-guide/features/voice-mode) for end-to-end setup and platform-specific behavior.
## Streaming
Stream tokens to the terminal or messaging platforms as they arrive, instead of waiting for the full response.
### CLI Streaming
```yaml
display:
streaming: true # Stream tokens to terminal in real-time
show_reasoning: true # Also stream reasoning/thinking tokens (optional)
```
When enabled, responses appear token-by-token inside a streaming box. Tool calls are still captured silently. If the provider doesn't support streaming, it falls back to the normal display automatically.
### Gateway Streaming (Telegram, Discord, Slack)
```yaml
streaming:
enabled: true # Enable progressive message editing
edit_interval: 0.3 # Seconds between message edits
buffer_threshold: 40 # Characters before forcing an edit flush
cursor: " ▉" # Cursor shown during streaming
```
When enabled, the bot sends a message on the first token, then progressively edits it as more tokens arrive. Platforms that don't support message editing (Signal, Email) gracefully skip streaming and deliver the final response normally.
:::note
Streaming is disabled by default. Enable it in `~/.hermes/config.yaml` to try the streaming UX.
:::
## Group Chat Session Isolation
Control whether shared chats keep one conversation per room or one conversation per participant:
```yaml
group_sessions_per_user: true # true = per-user isolation in groups/channels, false = one shared session per chat
```
- `true` is the default and recommended setting. In Discord channels, Telegram groups, Slack channels, and similar shared contexts, each sender gets their own session when the platform provides a user ID.
- `false` reverts to the old shared-room behavior. That can be useful if you explicitly want Hermes to treat a channel like one collaborative conversation, but it also means users share context, token costs, and interrupt state.
- Direct messages are unaffected. Hermes still keys DMs by chat/DM ID as usual.
- Threads stay isolated from their parent channel either way; with `true`, each participant also gets their own session inside the thread.
For the behavior details and examples, see [Sessions](/docs/user-guide/sessions) and the [Discord guide](/docs/user-guide/messaging/discord).
## Unauthorized DM Behavior
Control what Hermes does when an unknown user sends a direct message:
```yaml
unauthorized_dm_behavior: pair
whatsapp:
unauthorized_dm_behavior: ignore
```
- `pair` is the default. Hermes denies access, but replies with a one-time pairing code in DMs.
- `ignore` silently drops unauthorized DMs.
- Platform sections override the global default, so you can keep pairing enabled broadly while making one platform quieter.
## Quick Commands
Define custom commands that run shell commands without invoking the LLM — zero token usage, instant execution. Especially useful from messaging platforms (Telegram, Discord, etc.) for quick server checks or utility scripts.
```yaml
quick_commands:
status:
type: exec
command: systemctl status hermes-agent
disk:
type: exec
command: df -h /
update:
type: exec
command: cd ~/.hermes/hermes-agent && git pull && pip install -e .
gpu:
type: exec
command: nvidia-smi --query-gpu=name,utilization.gpu,memory.used,memory.total --format=csv,noheader
```
Usage: type `/status`, `/disk`, `/update`, or `/gpu` in the CLI or any messaging platform. The command runs locally on the host and returns the output directly — no LLM call, no tokens consumed.
- **30-second timeout** — long-running commands are killed with an error message
- **Priority** — quick commands are checked before skill commands, so you can override skill names
- **Autocomplete** — quick commands are resolved at dispatch time and are not shown in the built-in slash-command autocomplete tables
- **Type** — only `exec` is supported (runs a shell command); other types show an error
- **Works everywhere** — CLI, Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant
docs: document 9 previously undocumented features New documentation for features that existed in code but had no docs: New page: - context-references.md: Full docs for @-syntax inline context injection (@file:, @folder:, @diff, @staged, @git:, @url:) with line ranges, CLI autocomplete, size limits, sensitive path blocking, and error handling configuration.md additions: - Environment variable substitution: ${VAR_NAME} syntax in config.yaml with expansion, fallback, and multi-reference support - Gateway streaming: Progressive token delivery on messaging platforms via message editing (StreamingConfig: enabled, transport, edit_interval, buffer_threshold, cursor) with platform support matrix - Web search backends: Three providers (Firecrawl, Parallel, Tavily) with web.backend config key, capability matrix, auto-detection from API keys, self-hosted Firecrawl, and Parallel search modes security.md additions: - SSRF protection: Always-on URL validation blocking private networks, loopback, link-local, CGNAT, cloud metadata hostnames, with fail-closed DNS and redirect chain re-validation - Tirith pre-exec security scanning: Content-level command scanning for homograph URLs, pipe-to-interpreter, terminal injection with auto-install, SHA-256/cosign verification, config options, and fail-open/fail-closed modes sessions.md addition: - Auto-generated session titles: Background LLM-powered title generation after first exchange creating-skills.md additions: - Conditional skill activation: requires_toolsets, requires_tools, fallback_for_toolsets, fallback_for_tools frontmatter fields with matching logic and use cases - Environment variable requirements: required_environment_variables frontmatter for automatic env passthrough to sandboxed execution, plus terminal.env_passthrough user config
2026-03-24 08:56:21 -07:00
## Gateway Streaming
Enable progressive token delivery on messaging platforms. When streaming is enabled, responses appear character-by-character in Telegram, Discord, and Slack via message editing, rather than waiting for the full response.
```yaml
streaming:
enabled: false # Enable streaming token delivery (default: off)
transport: edit # "edit" (progressive message editing) or "off"
edit_interval: 0.3 # Min seconds between message edits
buffer_threshold: 40 # Characters accumulated before forcing an edit
cursor: " ▉" # Cursor character shown during streaming
```
**Platform support:** Telegram, Discord, and Slack support edit-based streaming. Platforms that don't support message editing (Signal, Email, Home Assistant) are auto-detected on the first attempt — streaming is gracefully disabled for that session with no flood of messages.
**Overflow handling:** If the streamed text exceeds the platform's message length limit (~4096 chars), the current message is finalized and a new one starts automatically.
## Human Delay
Simulate human-like response pacing in messaging platforms:
```yaml
human_delay:
mode: "off" # off | natural | custom
docs: fix stale and incorrect documentation across 18 files Cross-referenced all 84 docs pages against the actual codebase and corrected every discrepancy found. Reference docs: - faq.md: Fix non-existent commands (/stats→/usage, /context→/usage, hermes models→hermes model, hermes config get→hermes config show, hermes gateway logs→cat gateway.log, async→sync chat() call) - cli-commands.md: Fix --provider choices list (remove providers not in argparse), add undocumented -s/--skills flag - slash-commands.md: Add missing /queue and /resume commands, fix /approve args_hint to show [session|always] - tools-reference.md: Remove duplicate vision and web toolset sections - environment-variables.md: Fix HERMES_INFERENCE_PROVIDER list (add copilot-acp, remove alibaba to match actual argparse choices) Configuration & user guide: - configuration.md: Fix approval_mode→approvals.mode (manual not ask), checkpoints.enabled default true not false, human_delay defaults (500/2000→800/2500), remove non-existent delegation.max_iterations and delegation.default_toolsets, fix website_blocklist nesting under security:, add .hermes.md and CLAUDE.md to context files table with priority system explanation - security.md: Fix website_blocklist nesting under security: - context-files.md: Add .hermes.md/HERMES.md and CLAUDE.md support, document priority-based first-match-wins loading behavior - cli.md: Fix personalities config nesting (top-level, not under agent:) - delegation.md: Fix model override docs (config-level, not per-call tool parameter) - rl-training.md: Fix log directory (tinker-atropos/logs/→ ~/.hermes/logs/rl_training/) - tts.md: Fix Discord delivery format (voice bubble with fallback, not just file attachment) - git-worktrees.md: Remove outdated v0.2.0 version reference Developer guide: - prompt-assembly.md: Add .hermes.md, CLAUDE.md, document priority system for context files - agent-loop.md: Fix callback list (remove non-existent message_callback, add stream_delta_callback, tool_gen_callback, status_callback) Messaging & guides: - webhooks.md: Fix command (hermes setup gateway→hermes gateway setup) - tips.md: Fix session idle timeout (120min→24h), config file (gateway.json→config.yaml) - build-a-hermes-plugin.md: Fix plugin.yaml provides: format (provides_tools/provides_hooks as lists), note register_command() as not yet implemented
2026-03-24 07:53:07 -07:00
min_ms: 800 # Minimum delay (custom mode)
max_ms: 2500 # Maximum delay (custom mode)
```
## Code Execution
Configure the sandboxed Python code execution tool:
```yaml
code_execution:
timeout: 300 # Max execution time in seconds
max_tool_calls: 50 # Max tool calls within code execution
```
docs: document 9 previously undocumented features New documentation for features that existed in code but had no docs: New page: - context-references.md: Full docs for @-syntax inline context injection (@file:, @folder:, @diff, @staged, @git:, @url:) with line ranges, CLI autocomplete, size limits, sensitive path blocking, and error handling configuration.md additions: - Environment variable substitution: ${VAR_NAME} syntax in config.yaml with expansion, fallback, and multi-reference support - Gateway streaming: Progressive token delivery on messaging platforms via message editing (StreamingConfig: enabled, transport, edit_interval, buffer_threshold, cursor) with platform support matrix - Web search backends: Three providers (Firecrawl, Parallel, Tavily) with web.backend config key, capability matrix, auto-detection from API keys, self-hosted Firecrawl, and Parallel search modes security.md additions: - SSRF protection: Always-on URL validation blocking private networks, loopback, link-local, CGNAT, cloud metadata hostnames, with fail-closed DNS and redirect chain re-validation - Tirith pre-exec security scanning: Content-level command scanning for homograph URLs, pipe-to-interpreter, terminal injection with auto-install, SHA-256/cosign verification, config options, and fail-open/fail-closed modes sessions.md addition: - Auto-generated session titles: Background LLM-powered title generation after first exchange creating-skills.md additions: - Conditional skill activation: requires_toolsets, requires_tools, fallback_for_toolsets, fallback_for_tools frontmatter fields with matching logic and use cases - Environment variable requirements: required_environment_variables frontmatter for automatic env passthrough to sandboxed execution, plus terminal.env_passthrough user config
2026-03-24 08:56:21 -07:00
## Web Search Backends
The `web_search`, `web_extract`, and `web_crawl` tools support three backend providers. Configure the backend in `config.yaml` or via `hermes tools`:
```yaml
web:
backend: firecrawl # firecrawl | parallel | tavily
```
| Backend | Env Var | Search | Extract | Crawl |
|---------|---------|--------|---------|-------|
| **Firecrawl** (default) | `FIRECRAWL_API_KEY` | ✔ | ✔ | ✔ |
| **Parallel** | `PARALLEL_API_KEY` | ✔ | ✔ | — |
| **Tavily** | `TAVILY_API_KEY` | ✔ | ✔ | ✔ |
**Backend selection:** If `web.backend` is not set, the backend is auto-detected from available API keys. If only `TAVILY_API_KEY` is set, Tavily is used. If only `PARALLEL_API_KEY` is set, Parallel is used. Otherwise Firecrawl is the default.
**Self-hosted Firecrawl:** Set `FIRECRAWL_API_URL` to point at your own instance. When a custom URL is set, the API key becomes optional (set `USE_DB_AUTHENTICATION=false` on the server to disable auth).
**Parallel search modes:** Set `PARALLEL_SEARCH_MODE` to control search behavior — `fast`, `one-shot`, or `agentic` (default: `agentic`).
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill New browser capabilities and a built-in skill for agent-driven web QA. ## New tool: browser_console Returns console messages (log/warn/error/info) AND uncaught JavaScript exceptions in a single call. Uses agent-browser's 'console' and 'errors' commands through the existing session plumbing. Supports --clear to reset buffers. Verified working in both local and Browserbase cloud modes. ## Enhanced tool: browser_vision(annotate=True) New boolean parameter on browser_vision. When true, agent-browser overlays numbered [N] labels on interactive elements — each [N] maps to ref @eN. Annotation data (element name, role, bounding box) returned alongside the vision analysis. Useful for QA reports and spatial reasoning. ## Config: browser.record_sessions Auto-record browser sessions as WebM video files when enabled: - Starts recording on first browser_navigate - Stops and saves on browser_close - Saves to ~/.hermes/browser_recordings/ - Works in both local and cloud modes (verified) - Disabled by default ## Built-in skill: dogfood Systematic exploratory QA testing for web applications. Teaches the agent a 5-phase workflow: 1. Plan — accept URL, create output dirs, set scope 2. Explore — systematic crawl with annotated screenshots 3. Collect Evidence — screenshots, console errors, JS exceptions 4. Categorize — severity (Critical/High/Medium/Low) and category (Functional/Visual/Accessibility/Console/UX/Content) 5. Report — structured markdown with per-issue evidence Includes: - skills/dogfood/SKILL.md — full workflow instructions - skills/dogfood/references/issue-taxonomy.md — severity/category defs - skills/dogfood/templates/dogfood-report-template.md — report template ## Tests 21 new tests covering: - browser_console message/error parsing, clear flag, empty/failed states - browser_console schema registration - browser_vision annotate schema and flag passing - record_sessions config defaults and recording lifecycle - Dogfood skill file existence and content validation Addresses #315.
2026-03-08 21:02:14 -07:00
## Browser
Configure browser automation behavior:
```yaml
browser:
inactivity_timeout: 120 # Seconds before auto-closing idle sessions
record_sessions: false # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/
```
The browser toolset supports multiple providers. See the [Browser feature page](/docs/user-guide/features/browser) for details on Browserbase, Browser Use, and local Chrome CDP setup.
## Website Blocklist
Block specific domains from being accessed by the agent's web and browser tools:
```yaml
docs: fix stale and incorrect documentation across 18 files Cross-referenced all 84 docs pages against the actual codebase and corrected every discrepancy found. Reference docs: - faq.md: Fix non-existent commands (/stats→/usage, /context→/usage, hermes models→hermes model, hermes config get→hermes config show, hermes gateway logs→cat gateway.log, async→sync chat() call) - cli-commands.md: Fix --provider choices list (remove providers not in argparse), add undocumented -s/--skills flag - slash-commands.md: Add missing /queue and /resume commands, fix /approve args_hint to show [session|always] - tools-reference.md: Remove duplicate vision and web toolset sections - environment-variables.md: Fix HERMES_INFERENCE_PROVIDER list (add copilot-acp, remove alibaba to match actual argparse choices) Configuration & user guide: - configuration.md: Fix approval_mode→approvals.mode (manual not ask), checkpoints.enabled default true not false, human_delay defaults (500/2000→800/2500), remove non-existent delegation.max_iterations and delegation.default_toolsets, fix website_blocklist nesting under security:, add .hermes.md and CLAUDE.md to context files table with priority system explanation - security.md: Fix website_blocklist nesting under security: - context-files.md: Add .hermes.md/HERMES.md and CLAUDE.md support, document priority-based first-match-wins loading behavior - cli.md: Fix personalities config nesting (top-level, not under agent:) - delegation.md: Fix model override docs (config-level, not per-call tool parameter) - rl-training.md: Fix log directory (tinker-atropos/logs/→ ~/.hermes/logs/rl_training/) - tts.md: Fix Discord delivery format (voice bubble with fallback, not just file attachment) - git-worktrees.md: Remove outdated v0.2.0 version reference Developer guide: - prompt-assembly.md: Add .hermes.md, CLAUDE.md, document priority system for context files - agent-loop.md: Fix callback list (remove non-existent message_callback, add stream_delta_callback, tool_gen_callback, status_callback) Messaging & guides: - webhooks.md: Fix command (hermes setup gateway→hermes gateway setup) - tips.md: Fix session idle timeout (120min→24h), config file (gateway.json→config.yaml) - build-a-hermes-plugin.md: Fix plugin.yaml provides: format (provides_tools/provides_hooks as lists), note register_command() as not yet implemented
2026-03-24 07:53:07 -07:00
security:
website_blocklist:
enabled: false # Enable URL blocking (default: false)
domains: # List of blocked domain patterns
- "*.internal.company.com"
- "admin.example.com"
- "*.local"
shared_files: # Load additional rules from external files
- "/etc/hermes/blocked-sites.txt"
```
When enabled, any URL matching a blocked domain pattern is rejected before the web or browser tool executes. This applies to `web_search`, `web_extract`, `browser_navigate`, and any tool that accesses URLs.
Domain rules support:
- Exact domains: `admin.example.com`
- Wildcard subdomains: `*.internal.company.com` (blocks all subdomains)
- TLD wildcards: `*.local`
Shared files contain one domain rule per line (blank lines and `#` comments are ignored). Missing or unreadable files log a warning but don't disable other web tools.
The policy is cached for 30 seconds, so config changes take effect quickly without restart.
## Smart Approvals
Control how Hermes handles potentially dangerous commands:
```yaml
docs: fix stale and incorrect documentation across 18 files Cross-referenced all 84 docs pages against the actual codebase and corrected every discrepancy found. Reference docs: - faq.md: Fix non-existent commands (/stats→/usage, /context→/usage, hermes models→hermes model, hermes config get→hermes config show, hermes gateway logs→cat gateway.log, async→sync chat() call) - cli-commands.md: Fix --provider choices list (remove providers not in argparse), add undocumented -s/--skills flag - slash-commands.md: Add missing /queue and /resume commands, fix /approve args_hint to show [session|always] - tools-reference.md: Remove duplicate vision and web toolset sections - environment-variables.md: Fix HERMES_INFERENCE_PROVIDER list (add copilot-acp, remove alibaba to match actual argparse choices) Configuration & user guide: - configuration.md: Fix approval_mode→approvals.mode (manual not ask), checkpoints.enabled default true not false, human_delay defaults (500/2000→800/2500), remove non-existent delegation.max_iterations and delegation.default_toolsets, fix website_blocklist nesting under security:, add .hermes.md and CLAUDE.md to context files table with priority system explanation - security.md: Fix website_blocklist nesting under security: - context-files.md: Add .hermes.md/HERMES.md and CLAUDE.md support, document priority-based first-match-wins loading behavior - cli.md: Fix personalities config nesting (top-level, not under agent:) - delegation.md: Fix model override docs (config-level, not per-call tool parameter) - rl-training.md: Fix log directory (tinker-atropos/logs/→ ~/.hermes/logs/rl_training/) - tts.md: Fix Discord delivery format (voice bubble with fallback, not just file attachment) - git-worktrees.md: Remove outdated v0.2.0 version reference Developer guide: - prompt-assembly.md: Add .hermes.md, CLAUDE.md, document priority system for context files - agent-loop.md: Fix callback list (remove non-existent message_callback, add stream_delta_callback, tool_gen_callback, status_callback) Messaging & guides: - webhooks.md: Fix command (hermes setup gateway→hermes gateway setup) - tips.md: Fix session idle timeout (120min→24h), config file (gateway.json→config.yaml) - build-a-hermes-plugin.md: Fix plugin.yaml provides: format (provides_tools/provides_hooks as lists), note register_command() as not yet implemented
2026-03-24 07:53:07 -07:00
approvals:
mode: manual # manual | smart | off
```
| Mode | Behavior |
|------|----------|
docs: fix stale and incorrect documentation across 18 files Cross-referenced all 84 docs pages against the actual codebase and corrected every discrepancy found. Reference docs: - faq.md: Fix non-existent commands (/stats→/usage, /context→/usage, hermes models→hermes model, hermes config get→hermes config show, hermes gateway logs→cat gateway.log, async→sync chat() call) - cli-commands.md: Fix --provider choices list (remove providers not in argparse), add undocumented -s/--skills flag - slash-commands.md: Add missing /queue and /resume commands, fix /approve args_hint to show [session|always] - tools-reference.md: Remove duplicate vision and web toolset sections - environment-variables.md: Fix HERMES_INFERENCE_PROVIDER list (add copilot-acp, remove alibaba to match actual argparse choices) Configuration & user guide: - configuration.md: Fix approval_mode→approvals.mode (manual not ask), checkpoints.enabled default true not false, human_delay defaults (500/2000→800/2500), remove non-existent delegation.max_iterations and delegation.default_toolsets, fix website_blocklist nesting under security:, add .hermes.md and CLAUDE.md to context files table with priority system explanation - security.md: Fix website_blocklist nesting under security: - context-files.md: Add .hermes.md/HERMES.md and CLAUDE.md support, document priority-based first-match-wins loading behavior - cli.md: Fix personalities config nesting (top-level, not under agent:) - delegation.md: Fix model override docs (config-level, not per-call tool parameter) - rl-training.md: Fix log directory (tinker-atropos/logs/→ ~/.hermes/logs/rl_training/) - tts.md: Fix Discord delivery format (voice bubble with fallback, not just file attachment) - git-worktrees.md: Remove outdated v0.2.0 version reference Developer guide: - prompt-assembly.md: Add .hermes.md, CLAUDE.md, document priority system for context files - agent-loop.md: Fix callback list (remove non-existent message_callback, add stream_delta_callback, tool_gen_callback, status_callback) Messaging & guides: - webhooks.md: Fix command (hermes setup gateway→hermes gateway setup) - tips.md: Fix session idle timeout (120min→24h), config file (gateway.json→config.yaml) - build-a-hermes-plugin.md: Fix plugin.yaml provides: format (provides_tools/provides_hooks as lists), note register_command() as not yet implemented
2026-03-24 07:53:07 -07:00
| `manual` (default) | Prompt the user before executing any flagged command. In the CLI, shows an interactive approval dialog. In messaging, queues a pending approval request. |
| `smart` | Use an auxiliary LLM to assess whether a flagged command is actually dangerous. Low-risk commands are auto-approved with session-level persistence. Genuinely risky commands are escalated to the user. |
| `off` | Skip all approval checks. Equivalent to `HERMES_YOLO_MODE=true`. **Use with caution.** |
Smart mode is particularly useful for reducing approval fatigue — it lets the agent work more autonomously on safe operations while still catching genuinely destructive commands.
:::warning
docs: fix stale and incorrect documentation across 18 files Cross-referenced all 84 docs pages against the actual codebase and corrected every discrepancy found. Reference docs: - faq.md: Fix non-existent commands (/stats→/usage, /context→/usage, hermes models→hermes model, hermes config get→hermes config show, hermes gateway logs→cat gateway.log, async→sync chat() call) - cli-commands.md: Fix --provider choices list (remove providers not in argparse), add undocumented -s/--skills flag - slash-commands.md: Add missing /queue and /resume commands, fix /approve args_hint to show [session|always] - tools-reference.md: Remove duplicate vision and web toolset sections - environment-variables.md: Fix HERMES_INFERENCE_PROVIDER list (add copilot-acp, remove alibaba to match actual argparse choices) Configuration & user guide: - configuration.md: Fix approval_mode→approvals.mode (manual not ask), checkpoints.enabled default true not false, human_delay defaults (500/2000→800/2500), remove non-existent delegation.max_iterations and delegation.default_toolsets, fix website_blocklist nesting under security:, add .hermes.md and CLAUDE.md to context files table with priority system explanation - security.md: Fix website_blocklist nesting under security: - context-files.md: Add .hermes.md/HERMES.md and CLAUDE.md support, document priority-based first-match-wins loading behavior - cli.md: Fix personalities config nesting (top-level, not under agent:) - delegation.md: Fix model override docs (config-level, not per-call tool parameter) - rl-training.md: Fix log directory (tinker-atropos/logs/→ ~/.hermes/logs/rl_training/) - tts.md: Fix Discord delivery format (voice bubble with fallback, not just file attachment) - git-worktrees.md: Remove outdated v0.2.0 version reference Developer guide: - prompt-assembly.md: Add .hermes.md, CLAUDE.md, document priority system for context files - agent-loop.md: Fix callback list (remove non-existent message_callback, add stream_delta_callback, tool_gen_callback, status_callback) Messaging & guides: - webhooks.md: Fix command (hermes setup gateway→hermes gateway setup) - tips.md: Fix session idle timeout (120min→24h), config file (gateway.json→config.yaml) - build-a-hermes-plugin.md: Fix plugin.yaml provides: format (provides_tools/provides_hooks as lists), note register_command() as not yet implemented
2026-03-24 07:53:07 -07:00
Setting `approvals.mode: off` disables all safety checks for terminal commands. Only use this in trusted, sandboxed environments.
:::
## Checkpoints
Automatic filesystem snapshots before destructive file operations. See the [Checkpoints feature page](/docs/user-guide/features/checkpoints) for details.
```yaml
checkpoints:
docs: fix stale and incorrect documentation across 18 files Cross-referenced all 84 docs pages against the actual codebase and corrected every discrepancy found. Reference docs: - faq.md: Fix non-existent commands (/stats→/usage, /context→/usage, hermes models→hermes model, hermes config get→hermes config show, hermes gateway logs→cat gateway.log, async→sync chat() call) - cli-commands.md: Fix --provider choices list (remove providers not in argparse), add undocumented -s/--skills flag - slash-commands.md: Add missing /queue and /resume commands, fix /approve args_hint to show [session|always] - tools-reference.md: Remove duplicate vision and web toolset sections - environment-variables.md: Fix HERMES_INFERENCE_PROVIDER list (add copilot-acp, remove alibaba to match actual argparse choices) Configuration & user guide: - configuration.md: Fix approval_mode→approvals.mode (manual not ask), checkpoints.enabled default true not false, human_delay defaults (500/2000→800/2500), remove non-existent delegation.max_iterations and delegation.default_toolsets, fix website_blocklist nesting under security:, add .hermes.md and CLAUDE.md to context files table with priority system explanation - security.md: Fix website_blocklist nesting under security: - context-files.md: Add .hermes.md/HERMES.md and CLAUDE.md support, document priority-based first-match-wins loading behavior - cli.md: Fix personalities config nesting (top-level, not under agent:) - delegation.md: Fix model override docs (config-level, not per-call tool parameter) - rl-training.md: Fix log directory (tinker-atropos/logs/→ ~/.hermes/logs/rl_training/) - tts.md: Fix Discord delivery format (voice bubble with fallback, not just file attachment) - git-worktrees.md: Remove outdated v0.2.0 version reference Developer guide: - prompt-assembly.md: Add .hermes.md, CLAUDE.md, document priority system for context files - agent-loop.md: Fix callback list (remove non-existent message_callback, add stream_delta_callback, tool_gen_callback, status_callback) Messaging & guides: - webhooks.md: Fix command (hermes setup gateway→hermes gateway setup) - tips.md: Fix session idle timeout (120min→24h), config file (gateway.json→config.yaml) - build-a-hermes-plugin.md: Fix plugin.yaml provides: format (provides_tools/provides_hooks as lists), note register_command() as not yet implemented
2026-03-24 07:53:07 -07:00
enabled: true # Enable automatic checkpoints (also: hermes --checkpoints)
max_snapshots: 50 # Max checkpoints to keep per directory
```
## Delegation
Configure subagent behavior for the delegate tool:
```yaml
delegation:
# model: "google/gemini-3-flash-preview" # Override model (empty = inherit parent)
# provider: "openrouter" # Override provider (empty = inherit parent)
# base_url: "http://localhost:1234/v1" # Direct OpenAI-compatible endpoint (takes precedence over provider)
# api_key: "local-key" # API key for base_url (falls back to OPENAI_API_KEY)
```
**Subagent provider:model override:** By default, subagents inherit the parent agent's provider and model. Set `delegation.provider` and `delegation.model` to route subagents to a different provider:model pair — e.g., use a cheap/fast model for narrowly-scoped subtasks while your primary agent runs an expensive reasoning model.
**Direct endpoint override:** If you want the obvious custom-endpoint path, set `delegation.base_url`, `delegation.api_key`, and `delegation.model`. That sends subagents directly to that OpenAI-compatible endpoint and takes precedence over `delegation.provider`. If `delegation.api_key` is omitted, Hermes falls back to `OPENAI_API_KEY` only.
The delegation provider uses the same credential resolution as CLI/gateway startup. All configured providers are supported: `openrouter`, `nous`, `copilot`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`. When a provider is set, the system automatically resolves the correct base URL, API key, and API mode — no manual credential wiring needed.
**Precedence:** `delegation.base_url` in config → `delegation.provider` in config → parent provider (inherited). `delegation.model` in config → parent model (inherited). Setting just `model` without `provider` changes only the model name while keeping the parent's credentials (useful for switching models within the same provider like OpenRouter).
## Clarify
Configure the clarification prompt behavior:
```yaml
clarify:
timeout: 120 # Seconds to wait for user clarification response
```
## Context Files (SOUL.md, AGENTS.md)
Hermes uses two different context scopes:
| File | Purpose | Scope |
|------|---------|-------|
| `SOUL.md` | **Primary agent identity** — defines who the agent is (slot #1 in the system prompt) | `~/.hermes/SOUL.md` or `$HERMES_HOME/SOUL.md` |
docs: fix stale and incorrect documentation across 18 files Cross-referenced all 84 docs pages against the actual codebase and corrected every discrepancy found. Reference docs: - faq.md: Fix non-existent commands (/stats→/usage, /context→/usage, hermes models→hermes model, hermes config get→hermes config show, hermes gateway logs→cat gateway.log, async→sync chat() call) - cli-commands.md: Fix --provider choices list (remove providers not in argparse), add undocumented -s/--skills flag - slash-commands.md: Add missing /queue and /resume commands, fix /approve args_hint to show [session|always] - tools-reference.md: Remove duplicate vision and web toolset sections - environment-variables.md: Fix HERMES_INFERENCE_PROVIDER list (add copilot-acp, remove alibaba to match actual argparse choices) Configuration & user guide: - configuration.md: Fix approval_mode→approvals.mode (manual not ask), checkpoints.enabled default true not false, human_delay defaults (500/2000→800/2500), remove non-existent delegation.max_iterations and delegation.default_toolsets, fix website_blocklist nesting under security:, add .hermes.md and CLAUDE.md to context files table with priority system explanation - security.md: Fix website_blocklist nesting under security: - context-files.md: Add .hermes.md/HERMES.md and CLAUDE.md support, document priority-based first-match-wins loading behavior - cli.md: Fix personalities config nesting (top-level, not under agent:) - delegation.md: Fix model override docs (config-level, not per-call tool parameter) - rl-training.md: Fix log directory (tinker-atropos/logs/→ ~/.hermes/logs/rl_training/) - tts.md: Fix Discord delivery format (voice bubble with fallback, not just file attachment) - git-worktrees.md: Remove outdated v0.2.0 version reference Developer guide: - prompt-assembly.md: Add .hermes.md, CLAUDE.md, document priority system for context files - agent-loop.md: Fix callback list (remove non-existent message_callback, add stream_delta_callback, tool_gen_callback, status_callback) Messaging & guides: - webhooks.md: Fix command (hermes setup gateway→hermes gateway setup) - tips.md: Fix session idle timeout (120min→24h), config file (gateway.json→config.yaml) - build-a-hermes-plugin.md: Fix plugin.yaml provides: format (provides_tools/provides_hooks as lists), note register_command() as not yet implemented
2026-03-24 07:53:07 -07:00
| `.hermes.md` / `HERMES.md` | Project-specific instructions (highest priority) | Walks to git root |
| `AGENTS.md` | Project-specific instructions, coding conventions | Recursive directory walk |
| `CLAUDE.md` | Claude Code context files (also detected) | Working directory only |
| `.cursorrules` | Cursor IDE rules (also detected) | Working directory only |
| `.cursor/rules/*.mdc` | Cursor rule files (also detected) | Working directory only |
- **SOUL.md** is the agent's primary identity. It occupies slot #1 in the system prompt, completely replacing the built-in default identity. Edit it to fully customize who the agent is.
- If SOUL.md is missing, empty, or cannot be loaded, Hermes falls back to a built-in default identity.
docs: fix stale and incorrect documentation across 18 files Cross-referenced all 84 docs pages against the actual codebase and corrected every discrepancy found. Reference docs: - faq.md: Fix non-existent commands (/stats→/usage, /context→/usage, hermes models→hermes model, hermes config get→hermes config show, hermes gateway logs→cat gateway.log, async→sync chat() call) - cli-commands.md: Fix --provider choices list (remove providers not in argparse), add undocumented -s/--skills flag - slash-commands.md: Add missing /queue and /resume commands, fix /approve args_hint to show [session|always] - tools-reference.md: Remove duplicate vision and web toolset sections - environment-variables.md: Fix HERMES_INFERENCE_PROVIDER list (add copilot-acp, remove alibaba to match actual argparse choices) Configuration & user guide: - configuration.md: Fix approval_mode→approvals.mode (manual not ask), checkpoints.enabled default true not false, human_delay defaults (500/2000→800/2500), remove non-existent delegation.max_iterations and delegation.default_toolsets, fix website_blocklist nesting under security:, add .hermes.md and CLAUDE.md to context files table with priority system explanation - security.md: Fix website_blocklist nesting under security: - context-files.md: Add .hermes.md/HERMES.md and CLAUDE.md support, document priority-based first-match-wins loading behavior - cli.md: Fix personalities config nesting (top-level, not under agent:) - delegation.md: Fix model override docs (config-level, not per-call tool parameter) - rl-training.md: Fix log directory (tinker-atropos/logs/→ ~/.hermes/logs/rl_training/) - tts.md: Fix Discord delivery format (voice bubble with fallback, not just file attachment) - git-worktrees.md: Remove outdated v0.2.0 version reference Developer guide: - prompt-assembly.md: Add .hermes.md, CLAUDE.md, document priority system for context files - agent-loop.md: Fix callback list (remove non-existent message_callback, add stream_delta_callback, tool_gen_callback, status_callback) Messaging & guides: - webhooks.md: Fix command (hermes setup gateway→hermes gateway setup) - tips.md: Fix session idle timeout (120min→24h), config file (gateway.json→config.yaml) - build-a-hermes-plugin.md: Fix plugin.yaml provides: format (provides_tools/provides_hooks as lists), note register_command() as not yet implemented
2026-03-24 07:53:07 -07:00
- **Project context files use a priority system** — only ONE type is loaded (first match wins): `.hermes.md``AGENTS.md``CLAUDE.md``.cursorrules`. SOUL.md is always loaded independently.
- **AGENTS.md** is hierarchical: if subdirectories also have AGENTS.md, all are combined.
- Hermes automatically seeds a default `SOUL.md` if one does not already exist.
- All loaded context files are capped at 20,000 characters with smart truncation.
See also:
- [Personality & SOUL.md](/docs/user-guide/features/personality)
- [Context Files](/docs/user-guide/features/context-files)
## Working Directory
| Context | Default |
|---------|---------|
| **CLI (`hermes`)** | Current directory where you run the command |
| **Messaging gateway** | Home directory `~` (override with `MESSAGING_CWD`) |
| **Docker / Singularity / Modal / SSH** | User's home directory inside the container or remote machine |
Override the working directory:
```bash
# In ~/.hermes/.env or ~/.hermes/config.yaml:
MESSAGING_CWD=/home/myuser/projects # Gateway sessions
TERMINAL_CWD=/workspace # All terminal sessions
```