Documents the full /model command overhaul across 6 files: AGENTS.md: - Add model_switch.py to project structure tree configuration.md: - Rewrite General Setup with 3 config methods (interactive, config.yaml, env vars) - Add new 'Switching Models with /model' section documenting all syntax variants - Add 'Named Custom Providers' section with config.yaml examples and custom:name:model triple syntax slash-commands.md: - Update /model descriptions in both CLI and messaging tables with full syntax examples (provider:model, custom:model, custom:name:model, bare custom auto-detect) cli-commands.md: - Add /model slash command subsection under hermes model with syntax table - Add custom endpoint config to hermes model use cases faq.md: - Add config.yaml example for offline/local model setup - Note that provider: custom is a first-class provider - Document /model custom auto-detect provider-runtime.md: - Add model_switch.py to implementation file list - Update provider families to show Custom as first-class with named variants
60 KiB
sidebar_position, title, description
| sidebar_position | title | description |
|---|---|---|
| 2 | Configuration | Configure Hermes Agent — config.yaml, providers, models, API keys, and more |
Configuration
All settings are stored in the ~/.hermes/ directory for easy access.
Directory Structure
~/.hermes/
├── config.yaml # Settings (model, terminal, TTS, compression, etc.)
├── .env # API keys and secrets
├── auth.json # OAuth provider credentials (Nous Portal, etc.)
├── SOUL.md # Primary agent identity (slot #1 in system prompt)
├── memories/ # Persistent memory (MEMORY.md, USER.md)
├── skills/ # Agent-created skills (managed via skill_manage tool)
├── cron/ # Scheduled jobs
├── sessions/ # Gateway sessions
└── logs/ # Logs (errors.log, gateway.log — secrets auto-redacted)
Managing Configuration
hermes config # View current configuration
hermes config edit # Open config.yaml in your editor
hermes config set KEY VAL # Set a specific value
hermes config check # Check for missing options (after updates)
hermes config migrate # Interactively add missing options
# Examples:
hermes config set model anthropic/claude-opus-4
hermes config set terminal.backend docker
hermes config set OPENROUTER_API_KEY sk-or-... # Saves to .env
:::tip
The hermes config set command automatically routes values to the right file — API keys are saved to .env, everything else to config.yaml.
:::
Configuration Precedence
Settings are resolved in this order (highest priority first):
- CLI arguments — e.g.,
hermes chat --model anthropic/claude-sonnet-4(per-invocation override) ~/.hermes/config.yaml— the primary config file for all non-secret settings~/.hermes/.env— fallback for env vars; required for secrets (API keys, tokens, passwords)- Built-in defaults — hardcoded safe defaults when nothing else is set
:::info Rule of Thumb
Secrets (API keys, bot tokens, passwords) go in .env. Everything else (model, terminal backend, compression settings, memory limits, toolsets) goes in config.yaml. When both are set, config.yaml wins for non-secret settings.
:::
Inference Providers
You need at least one way to connect to an LLM. Use hermes model to switch providers and models interactively, or configure directly:
| Provider | Setup |
|---|---|
| Nous Portal | hermes model (OAuth, subscription-based) |
| OpenAI Codex | hermes model (ChatGPT OAuth, uses Codex models) |
| GitHub Copilot | hermes model (OAuth device code flow, COPILOT_GITHUB_TOKEN, GH_TOKEN, or gh auth token) |
| GitHub Copilot ACP | hermes model (spawns local copilot --acp --stdio) |
| Anthropic | hermes model (Claude Pro/Max via Claude Code auth, Anthropic API key, or manual setup-token) |
| OpenRouter | OPENROUTER_API_KEY in ~/.hermes/.env |
| AI Gateway | AI_GATEWAY_API_KEY in ~/.hermes/.env (provider: ai-gateway) |
| z.ai / GLM | GLM_API_KEY in ~/.hermes/.env (provider: zai) |
| Kimi / Moonshot | KIMI_API_KEY in ~/.hermes/.env (provider: kimi-coding) |
| MiniMax | MINIMAX_API_KEY in ~/.hermes/.env (provider: minimax) |
| MiniMax China | MINIMAX_CN_API_KEY in ~/.hermes/.env (provider: minimax-cn) |
| Alibaba Cloud | DASHSCOPE_API_KEY in ~/.hermes/.env (provider: alibaba, aliases: dashscope, qwen) |
| Kilo Code | KILOCODE_API_KEY in ~/.hermes/.env (provider: kilocode) |
| OpenCode Zen | OPENCODE_ZEN_API_KEY in ~/.hermes/.env (provider: opencode-zen) |
| OpenCode Go | OPENCODE_GO_API_KEY in ~/.hermes/.env (provider: opencode-go) |
| Custom Endpoint | hermes model (saved in config.yaml) or OPENAI_BASE_URL + OPENAI_API_KEY in ~/.hermes/.env |
:::info Codex Note
The OpenAI Codex provider authenticates via device code (open a URL, enter a code). Hermes stores the resulting credentials in its own auth store under ~/.hermes/auth.json and can import existing Codex CLI credentials from ~/.codex/auth.json when present. No Codex CLI installation is required.
:::
:::warning
Even when using Nous Portal, Codex, or a custom endpoint, some tools (vision, web summarization, MoA) use a separate "auxiliary" model — by default Gemini Flash via OpenRouter. An OPENROUTER_API_KEY enables these tools automatically. You can also configure which model and provider these tools use — see Auxiliary Models below.
:::
Anthropic (Native)
Use Claude models directly through the Anthropic API — no OpenRouter proxy needed. Supports three auth methods:
# With an API key (pay-per-token)
export ANTHROPIC_API_KEY=***
hermes chat --provider anthropic --model claude-sonnet-4-6
# Preferred: authenticate through `hermes model`
# Hermes will use Claude Code's credential store directly when available
hermes model
# Manual override with a setup-token (fallback / legacy)
export ANTHROPIC_TOKEN=*** # setup-token or manual OAuth token
hermes chat --provider anthropic
# Auto-detect Claude Code credentials (if you already use Claude Code)
hermes chat --provider anthropic # reads Claude Code credential files automatically
When you choose Anthropic OAuth through hermes model, Hermes prefers Claude Code's own credential store over copying the token into ~/.hermes/.env. That keeps refreshable Claude credentials refreshable.
Or set it permanently:
model:
provider: "anthropic"
default: "claude-sonnet-4-6"
:::tip Aliases
--provider claude and --provider claude-code also work as shorthand for --provider anthropic.
:::
GitHub Copilot
Hermes supports GitHub Copilot as a first-class provider with two modes:
copilot — Direct Copilot API (recommended). Uses your GitHub Copilot subscription to access GPT-5.x, Claude, Gemini, and other models through the Copilot API.
hermes chat --provider copilot --model gpt-5.4
Authentication options (checked in this order):
COPILOT_GITHUB_TOKENenvironment variableGH_TOKENenvironment variableGITHUB_TOKENenvironment variablegh auth tokenCLI fallback
If no token is found, hermes model offers an OAuth device code login — the same flow used by the Copilot CLI and opencode.
:::warning Token types
The Copilot API does not support classic Personal Access Tokens (ghp_*). Supported token types:
| Type | Prefix | How to get |
|---|---|---|
| OAuth token | gho_ |
hermes model → GitHub Copilot → Login with GitHub |
| Fine-grained PAT | github_pat_ |
GitHub Settings → Developer settings → Fine-grained tokens (needs Copilot Requests permission) |
| GitHub App token | ghu_ |
Via GitHub App installation |
If your gh auth token returns a ghp_* token, use hermes model to authenticate via OAuth instead.
:::
API routing: GPT-5+ models (except gpt-5-mini) automatically use the Responses API. All other models (GPT-4o, Claude, Gemini, etc.) use Chat Completions. Models are auto-detected from the live Copilot catalog.
copilot-acp — Copilot ACP agent backend. Spawns the local Copilot CLI as a subprocess:
hermes chat --provider copilot-acp --model copilot-acp
# Requires the GitHub Copilot CLI in PATH and an existing `copilot login` session
Permanent config:
model:
provider: "copilot"
default: "gpt-5.4"
| Environment variable | Description |
|---|---|
COPILOT_GITHUB_TOKEN |
GitHub token for Copilot API (first priority) |
HERMES_COPILOT_ACP_COMMAND |
Override the Copilot CLI binary path (default: copilot) |
HERMES_COPILOT_ACP_ARGS |
Override ACP args (default: --acp --stdio) |
First-Class Chinese AI Providers
These providers have built-in support with dedicated provider IDs. Set the API key and use --provider to select:
# z.ai / ZhipuAI GLM
hermes chat --provider zai --model glm-4-plus
# Requires: GLM_API_KEY in ~/.hermes/.env
# Kimi / Moonshot AI
hermes chat --provider kimi-coding --model moonshot-v1-auto
# Requires: KIMI_API_KEY in ~/.hermes/.env
# MiniMax (global endpoint)
hermes chat --provider minimax --model MiniMax-M2.7
# Requires: MINIMAX_API_KEY in ~/.hermes/.env
# MiniMax (China endpoint)
hermes chat --provider minimax-cn --model MiniMax-M2.7
# Requires: MINIMAX_CN_API_KEY in ~/.hermes/.env
# Alibaba Cloud / DashScope (Qwen models)
hermes chat --provider alibaba --model qwen-plus
# Requires: DASHSCOPE_API_KEY in ~/.hermes/.env
Or set the provider permanently in config.yaml:
model:
provider: "zai" # or: kimi-coding, minimax, minimax-cn, alibaba
default: "glm-4-plus"
Base URLs can be overridden with GLM_BASE_URL, KIMI_BASE_URL, MINIMAX_BASE_URL, MINIMAX_CN_BASE_URL, or DASHSCOPE_BASE_URL environment variables.
Custom & Self-Hosted LLM Providers
Hermes Agent works with any OpenAI-compatible API endpoint. If a server implements /v1/chat/completions, you can point Hermes at it. This means you can use local models, GPU inference servers, multi-provider routers, or any third-party API.
General Setup
Three ways to configure a custom endpoint:
Interactive setup (recommended):
hermes model
# Select "Custom endpoint (self-hosted / VLLM / etc.)"
# Enter: API base URL, API key, Model name
Manual config (config.yaml):
# In ~/.hermes/config.yaml
model:
default: your-model-name
provider: custom
base_url: http://localhost:8000/v1
api_key: your-key-or-leave-empty-for-local
Environment variables (.env file):
# Add to ~/.hermes/.env
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=your-key # Any non-empty string for local servers
LLM_MODEL=your-model-name
All three approaches end up in the same runtime path. hermes model persists provider, model, and base URL to config.yaml so later sessions keep using that endpoint even if env vars are not set.
Switching Models with /model
Once a custom endpoint is configured, you can switch models mid-session:
/model custom:qwen-2.5 # Switch to a model on your custom endpoint
/model custom # Auto-detect the model from the endpoint
/model openrouter:claude-sonnet-4 # Switch back to a cloud provider
If you have named custom providers configured (see below), use the triple syntax:
/model custom:local:qwen-2.5 # Use the "local" custom provider with model qwen-2.5
/model custom:work:llama3 # Use the "work" custom provider with llama3
When switching providers, Hermes persists the base URL and provider to config so the change survives restarts. When switching away from a custom endpoint to a built-in provider, the stale base URL is automatically cleared.
:::tip
/model custom (bare, no model name) queries your endpoint's /models API and auto-selects the model if exactly one is loaded. Useful for local servers running a single model.
:::
Everything below follows this same pattern — just change the URL, key, and model name.
Ollama — Local Models, Zero Config
Ollama runs open-weight models locally with one command. Best for: quick local experimentation, privacy-sensitive work, offline use.
# Install and run a model
ollama pull llama3.1:70b
ollama serve # Starts on port 11434
# Configure Hermes
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=ollama # Any non-empty string
LLM_MODEL=llama3.1:70b
Ollama's OpenAI-compatible endpoint supports chat completions, streaming, and tool calling (for supported models). No GPU required for smaller models — Ollama handles CPU inference automatically.
:::tip
List available models with ollama list. Pull any model from the Ollama library with ollama pull <model>.
:::
vLLM — High-Performance GPU Inference
vLLM is the standard for production LLM serving. Best for: maximum throughput on GPU hardware, serving large models, continuous batching.
# Start vLLM server
pip install vllm
vllm serve meta-llama/Llama-3.1-70B-Instruct \
--port 8000 \
--tensor-parallel-size 2 # Multi-GPU
# Configure Hermes
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=dummy
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
vLLM supports tool calling, structured output, and multi-modal models. Use --enable-auto-tool-choice and --tool-call-parser hermes for Hermes-format tool calling with NousResearch models.
SGLang — Fast Serving with RadixAttention
SGLang is an alternative to vLLM with RadixAttention for KV cache reuse. Best for: multi-turn conversations (prefix caching), constrained decoding, structured output.
# Start SGLang server
pip install sglang[all]
python -m sglang.launch_server \
--model meta-llama/Llama-3.1-70B-Instruct \
--port 8000 \
--tp 2
# Configure Hermes
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=dummy
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
llama.cpp / llama-server — CPU & Metal Inference
llama.cpp runs quantized models on CPU, Apple Silicon (Metal), and consumer GPUs. Best for: running models without a datacenter GPU, Mac users, edge deployment.
# Build and start llama-server
cmake -B build && cmake --build build --config Release
./build/bin/llama-server \
-m models/llama-3.1-8b-instruct-Q4_K_M.gguf \
--port 8080 --host 0.0.0.0
# Configure Hermes
OPENAI_BASE_URL=http://localhost:8080/v1
OPENAI_API_KEY=dummy
LLM_MODEL=llama-3.1-8b-instruct
:::tip Download GGUF models from Hugging Face. Q4_K_M quantization offers the best balance of quality vs. memory usage. :::
LiteLLM Proxy — Multi-Provider Gateway
LiteLLM is an OpenAI-compatible proxy that unifies 100+ LLM providers behind a single API. Best for: switching between providers without config changes, load balancing, fallback chains, budget controls.
# Install and start
pip install litellm[proxy]
litellm --model anthropic/claude-sonnet-4 --port 4000
# Or with a config file for multiple models:
litellm --config litellm_config.yaml --port 4000
# Configure Hermes
OPENAI_BASE_URL=http://localhost:4000/v1
OPENAI_API_KEY=sk-your-litellm-key
LLM_MODEL=anthropic/claude-sonnet-4
Example litellm_config.yaml with fallback:
model_list:
- model_name: "best"
litellm_params:
model: anthropic/claude-sonnet-4
api_key: sk-ant-...
- model_name: "best"
litellm_params:
model: openai/gpt-4o
api_key: sk-...
router_settings:
routing_strategy: "latency-based-routing"
ClawRouter — Cost-Optimized Routing
ClawRouter by BlockRunAI is a local routing proxy that auto-selects models based on query complexity. It classifies requests across 14 dimensions and routes to the cheapest model that can handle the task. Payment is via USDC cryptocurrency (no API keys).
# Install and start
npx @blockrun/clawrouter # Starts on port 8402
# Configure Hermes
OPENAI_BASE_URL=http://localhost:8402/v1
OPENAI_API_KEY=dummy
LLM_MODEL=blockrun/auto # or: blockrun/eco, blockrun/premium, blockrun/agentic
Routing profiles:
| Profile | Strategy | Savings |
|---|---|---|
blockrun/auto |
Balanced quality/cost | 74-100% |
blockrun/eco |
Cheapest possible | 95-100% |
blockrun/premium |
Best quality models | 0% |
blockrun/free |
Free models only | 100% |
blockrun/agentic |
Optimized for tool use | varies |
:::note
ClawRouter requires a USDC-funded wallet on Base or Solana for payment. All requests route through BlockRun's backend API. Run npx @blockrun/clawrouter doctor to check wallet status.
:::
Other Compatible Providers
Any service with an OpenAI-compatible API works. Some popular options:
| Provider | Base URL | Notes |
|---|---|---|
| Together AI | https://api.together.xyz/v1 |
Cloud-hosted open models |
| Groq | https://api.groq.com/openai/v1 |
Ultra-fast inference |
| DeepSeek | https://api.deepseek.com/v1 |
DeepSeek models |
| Fireworks AI | https://api.fireworks.ai/inference/v1 |
Fast open model hosting |
| Cerebras | https://api.cerebras.ai/v1 |
Wafer-scale chip inference |
| Mistral AI | https://api.mistral.ai/v1 |
Mistral models |
| OpenAI | https://api.openai.com/v1 |
Direct OpenAI access |
| Azure OpenAI | https://YOUR.openai.azure.com/ |
Enterprise OpenAI |
| LocalAI | http://localhost:8080/v1 |
Self-hosted, multi-model |
| Jan | http://localhost:1337/v1 |
Desktop app with local models |
# Example: Together AI
OPENAI_BASE_URL=https://api.together.xyz/v1
OPENAI_API_KEY=your-together-key
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct-Turbo
Context Length Detection
Hermes uses a multi-source resolution chain to detect the correct context window for your model and provider:
- Config override —
model.context_lengthin config.yaml (highest priority) - Custom provider per-model —
custom_providers[].models.<id>.context_length - Persistent cache — previously discovered values (survives restarts)
- Endpoint
/models— queries your server's API (local/custom endpoints) - Anthropic
/v1/models— queries Anthropic's API formax_input_tokens(API-key users only) - OpenRouter API — live model metadata from OpenRouter
- Nous Portal — suffix-matches Nous model IDs against OpenRouter metadata
- models.dev — community-maintained registry with provider-specific context lengths for 3800+ models across 100+ providers
- Fallback defaults — broad model family patterns (128K default)
For most setups this works out of the box. The system is provider-aware — the same model can have different context limits depending on who serves it (e.g., claude-opus-4.6 is 1M on Anthropic direct but 128K on GitHub Copilot).
To set the context length explicitly, add context_length to your model config:
model:
default: "qwen3.5:9b"
base_url: "http://localhost:8080/v1"
context_length: 131072 # tokens
For custom endpoints, you can also set context length per model:
custom_providers:
- name: "My Local LLM"
base_url: "http://localhost:11434/v1"
models:
qwen3.5:27b:
context_length: 32768
deepseek-r1:70b:
context_length: 65536
hermes model will prompt for context length when configuring a custom endpoint. Leave it blank for auto-detection.
:::tip When to set this manually
- You're using Ollama with a custom
num_ctxthat's lower than the model's maximum - You want to limit context below the model's maximum (e.g., 8k on a 128k model to save VRAM)
- You're running behind a proxy that doesn't expose
/v1/models:::
Named Custom Providers
If you work with multiple custom endpoints (e.g., a local dev server and a remote GPU server), you can define them as named custom providers in config.yaml:
custom_providers:
- name: local
base_url: http://localhost:8080/v1
# api_key omitted — Hermes uses "no-key-required" for keyless local servers
- name: work
base_url: https://gpu-server.internal.corp/v1
api_key: corp-api-key
api_mode: chat_completions # optional, auto-detected from URL
- name: anthropic-proxy
base_url: https://proxy.example.com/anthropic
api_key: proxy-key
api_mode: anthropic_messages # for Anthropic-compatible proxies
Switch between them mid-session with the triple syntax:
/model custom:local:qwen-2.5 # Use the "local" endpoint with qwen-2.5
/model custom:work:llama3-70b # Use the "work" endpoint with llama3-70b
/model custom:anthropic-proxy:claude-sonnet-4 # Use the proxy
You can also select named custom providers from the interactive hermes model menu.
Choosing the Right Setup
| Use Case | Recommended |
|---|---|
| Just want it to work | OpenRouter (default) or Nous Portal |
| Local models, easy setup | Ollama |
| Production GPU serving | vLLM or SGLang |
| Mac / no GPU | Ollama or llama.cpp |
| Multi-provider routing | LiteLLM Proxy or OpenRouter |
| Cost optimization | ClawRouter or OpenRouter with sort: "price" |
| Maximum privacy | Ollama, vLLM, or llama.cpp (fully local) |
| Enterprise / Azure | Azure OpenAI with custom endpoint |
| Chinese AI models | z.ai (GLM), Kimi/Moonshot, or MiniMax (first-class providers) |
:::tip
You can switch between providers at any time with hermes model — no restart required. Your conversation history, memory, and skills carry over regardless of which provider you use.
:::
Optional API Keys
| Feature | Provider | Env Variable |
|---|---|---|
| Web scraping | Firecrawl | FIRECRAWL_API_KEY, FIRECRAWL_API_URL |
| Browser automation | Browserbase | BROWSERBASE_API_KEY, BROWSERBASE_PROJECT_ID |
| Image generation | FAL | FAL_KEY |
| Premium TTS voices | ElevenLabs | ELEVENLABS_API_KEY |
| OpenAI TTS + voice transcription | OpenAI | VOICE_TOOLS_OPENAI_KEY |
| RL Training | Tinker + WandB | TINKER_API_KEY, WANDB_API_KEY |
| Cross-session user modeling | Honcho | HONCHO_API_KEY |
Self-Hosting Firecrawl
By default, Hermes uses the Firecrawl cloud API for web search and scraping. If you prefer to run Firecrawl locally, you can point Hermes at a self-hosted instance instead. See Firecrawl's SELF_HOST.md for complete setup instructions.
What you get: No API key required, no rate limits, no per-page costs, full data sovereignty.
What you lose: The cloud version uses Firecrawl's proprietary "Fire-engine" for advanced anti-bot bypassing (Cloudflare, CAPTCHAs, IP rotation). Self-hosted uses basic fetch + Playwright, so some protected sites may fail. Search uses DuckDuckGo instead of Google.
Setup:
-
Clone and start the Firecrawl Docker stack (5 containers: API, Playwright, Redis, RabbitMQ, PostgreSQL — requires ~4-8 GB RAM):
git clone https://github.com/firecrawl/firecrawl cd firecrawl # In .env, set: USE_DB_AUTHENTICATION=false, HOST=0.0.0.0, PORT=3002 docker compose up -d -
Point Hermes at your instance (no API key needed):
hermes config set FIRECRAWL_API_URL http://localhost:3002
You can also set both FIRECRAWL_API_KEY and FIRECRAWL_API_URL if your self-hosted instance has authentication enabled.
OpenRouter Provider Routing
When using OpenRouter, you can control how requests are routed across providers. Add a provider_routing section to ~/.hermes/config.yaml:
provider_routing:
sort: "throughput" # "price" (default), "throughput", or "latency"
# only: ["anthropic"] # Only use these providers
# ignore: ["deepinfra"] # Skip these providers
# order: ["anthropic", "google"] # Try providers in this order
# require_parameters: true # Only use providers that support all request params
# data_collection: "deny" # Exclude providers that may store/train on data
Shortcuts: Append :nitro to any model name for throughput sorting (e.g., anthropic/claude-sonnet-4:nitro), or :floor for price sorting.
Fallback Model
Configure a backup provider:model that Hermes switches to automatically when your primary model fails (rate limits, server errors, auth failures):
fallback_model:
provider: openrouter # required
model: anthropic/claude-sonnet-4 # required
# base_url: http://localhost:8000/v1 # optional, for custom endpoints
# api_key_env: MY_CUSTOM_KEY # optional, env var name for custom endpoint API key
When activated, the fallback swaps the model and provider mid-session without losing your conversation. It fires at most once per session.
Supported providers: openrouter, nous, openai-codex, copilot, anthropic, zai, kimi-coding, minimax, minimax-cn, custom.
:::tip
Fallback is configured exclusively through config.yaml — there are no environment variables for it. For full details on when it triggers, supported providers, and how it interacts with auxiliary tasks and delegation, see Fallback Providers.
:::
Smart Model Routing
Optional cheap-vs-strong routing lets Hermes keep your main model for complex work while sending very short/simple turns to a cheaper model.
smart_model_routing:
enabled: true
max_simple_chars: 160
max_simple_words: 28
cheap_model:
provider: openrouter
model: google/gemini-2.5-flash
# base_url: http://localhost:8000/v1 # optional custom endpoint
# api_key_env: MY_CUSTOM_KEY # optional env var name for that endpoint's API key
How it works:
- If a turn is short, single-line, and does not look code/tool/debug heavy, Hermes may route it to
cheap_model - If the turn looks complex, Hermes stays on your primary model/provider
- If the cheap route cannot be resolved cleanly, Hermes falls back to the primary model automatically
This is intentionally conservative. It is meant for quick, low-stakes turns like:
- short factual questions
- quick rewrites
- lightweight summaries
It will avoid routing prompts that look like:
- coding/debugging work
- tool-heavy requests
- long or multi-line analysis asks
Use this when you want lower latency or cost without fully changing your default model.
Terminal Backend Configuration
Configure which environment the agent uses for terminal commands:
terminal:
backend: local # or: docker, ssh, singularity, modal, daytona
cwd: "." # Working directory ("." = current dir)
timeout: 180 # Command timeout in seconds
# Docker-specific settings
docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
docker_mount_cwd_to_workspace: false # SECURITY: off by default. Opt in to mount the launch cwd into /workspace.
docker_forward_env: # Optional explicit allowlist for env passthrough
- "GITHUB_TOKEN"
docker_volumes: # Additional explicit host mounts
- "/home/user/projects:/workspace/projects"
- "/home/user/data:/data:ro" # :ro for read-only
# Container resource limits (docker, singularity, modal, daytona)
container_cpu: 1 # CPU cores
container_memory: 5120 # MB (default 5GB)
container_disk: 51200 # MB (default 50GB)
container_persistent: true # Persist filesystem across sessions
# Persistent shell — keep a long-lived bash process across commands
persistent_shell: true # Enabled by default for SSH backend
Common Terminal Backend Issues
If terminal commands fail immediately or the terminal tool is reported as disabled, check the following:
-
Local backend
- No special requirements. This is the safest default when you are just getting started.
-
Docker backend
- Ensure Docker Desktop (or the Docker daemon) is installed and running.
- Hermes needs to be able to find the
dockerCLI. It checks your$PATHfirst and also probes common Docker Desktop install locations on macOS. Run:If this fails, fix your Docker installation or switch back to the local backend:docker versionhermes config set terminal.backend local
-
SSH backend
- Both
TERMINAL_SSH_HOSTandTERMINAL_SSH_USERmust be set, for example:export TERMINAL_ENV=ssh export TERMINAL_SSH_HOST=my-server.example.com export TERMINAL_SSH_USER=ubuntu - If either value is missing, Hermes will log a clear error and refuse to use the SSH backend.
- Both
-
Modal backend
- You need either a
MODAL_TOKEN_IDenvironment variable or a~/.modal.tomlconfig file. - If neither is present, the backend check fails and Hermes will report that the Modal backend is not available.
- You need either a
When in doubt, set terminal.backend back to local and verify that commands run there first.
Docker Volume Mounts
When using the Docker backend, docker_volumes lets you share host directories with the container. Each entry uses standard Docker -v syntax: host_path:container_path[:options].
terminal:
backend: docker
docker_volumes:
- "/home/user/projects:/workspace/projects" # Read-write (default)
- "/home/user/datasets:/data:ro" # Read-only
- "/home/user/outputs:/outputs" # Agent writes, you read
This is useful for:
- Providing files to the agent (datasets, configs, reference code)
- Receiving files from the agent (generated code, reports, exports)
- Shared workspaces where both you and the agent access the same files
Can also be set via environment variable: TERMINAL_DOCKER_VOLUMES='["/host:/container"]' (JSON array).
Docker Credential Forwarding
By default, Docker terminal sessions do not inherit arbitrary host credentials. If you need a specific token inside the container, add it to terminal.docker_forward_env.
terminal:
backend: docker
docker_forward_env:
- "GITHUB_TOKEN"
- "NPM_TOKEN"
Hermes resolves each listed variable from your current shell first, then falls back to ~/.hermes/.env if it was saved with hermes config set.
:::warning
Anything listed in docker_forward_env becomes visible to commands run inside the container. Only forward credentials you are comfortable exposing to the terminal session.
:::
Optional: Mount the Launch Directory into /workspace
Docker sandboxes stay isolated by default. Hermes does not pass your current host working directory into the container unless you explicitly opt in.
Enable it in config.yaml:
terminal:
backend: docker
docker_mount_cwd_to_workspace: true
When enabled:
- if you launch Hermes from
~/projects/my-app, that host directory is bind-mounted to/workspace - the Docker backend starts in
/workspace - file tools and terminal commands both see the same mounted project
When disabled, /workspace stays sandbox-owned unless you explicitly mount something via docker_volumes.
Security tradeoff:
falsepreserves the sandbox boundarytruegives the sandbox direct access to the directory you launched Hermes from
Use the opt-in only when you intentionally want the container to work on live host files.
Persistent Shell
By default, each terminal command runs in its own subprocess — working directory, environment variables, and shell variables reset between commands. When persistent shell is enabled, a single long-lived bash process is kept alive across execute() calls so that state survives between commands.
This is most useful for the SSH backend, where it also eliminates per-command connection overhead. Persistent shell is enabled by default for SSH and disabled for the local backend.
terminal:
persistent_shell: true # default — enables persistent shell for SSH
To disable:
hermes config set terminal.persistent_shell false
What persists across commands:
- Working directory (
cd /tmpsticks for the next command) - Exported environment variables (
export FOO=bar) - Shell variables (
MY_VAR=hello)
Precedence:
| Level | Variable | Default |
|---|---|---|
| Config | terminal.persistent_shell |
true |
| SSH override | TERMINAL_SSH_PERSISTENT |
follows config |
| Local override | TERMINAL_LOCAL_PERSISTENT |
false |
Per-backend environment variables take highest precedence. If you want persistent shell on the local backend too:
export TERMINAL_LOCAL_PERSISTENT=true
:::note
Commands that require stdin_data or sudo automatically fall back to one-shot mode, since the persistent shell's stdin is already occupied by the IPC protocol.
:::
See Code Execution and the Terminal section of the README for details on each backend.
Memory Configuration
memory:
memory_enabled: true
user_profile_enabled: true
memory_char_limit: 2200 # ~800 tokens
user_char_limit: 1375 # ~500 tokens
Git Worktree Isolation
Enable isolated git worktrees for running multiple agents in parallel on the same repo:
worktree: true # Always create a worktree (same as hermes -w)
# worktree: false # Default — only when -w flag is passed
When enabled, each CLI session creates a fresh worktree under .worktrees/ with its own branch. Agents can edit files, commit, push, and create PRs without interfering with each other. Clean worktrees are removed on exit; dirty ones are kept for manual recovery.
You can also list gitignored files to copy into worktrees via .worktreeinclude in your repo root:
# .worktreeinclude
.env
.venv/
node_modules/
Context Compression
Hermes automatically compresses long conversations to stay within your model's context window. The compression summarizer is a separate LLM call — you can point it at any provider or endpoint.
All compression settings live in config.yaml (no environment variables).
Full reference
compression:
enabled: true # Toggle compression on/off
threshold: 0.50 # Compress at this % of context limit
summary_model: "google/gemini-3-flash-preview" # Model for summarization
summary_provider: "auto" # Provider: "auto", "openrouter", "nous", "codex", "main", etc.
summary_base_url: null # Custom OpenAI-compatible endpoint (overrides provider)
Common setups
Default (auto-detect) — no configuration needed:
compression:
enabled: true
threshold: 0.50
Uses the first available provider (OpenRouter → Nous → Codex) with Gemini Flash.
Force a specific provider (OAuth or API-key based):
compression:
summary_provider: nous
summary_model: gemini-3-flash
Works with any provider: nous, openrouter, codex, anthropic, main, etc.
Custom endpoint (self-hosted, Ollama, zai, DeepSeek, etc.):
compression:
summary_model: glm-4.7
summary_base_url: https://api.z.ai/api/coding/paas/v4
Points at a custom OpenAI-compatible endpoint. Uses OPENAI_API_KEY for auth.
How the three knobs interact
summary_provider |
summary_base_url |
Result |
|---|---|---|
auto (default) |
not set | Auto-detect best available provider |
nous / openrouter / etc. |
not set | Force that provider, use its auth |
| any | set | Use the custom endpoint directly (provider ignored) |
The summary_model must support a context length at least as large as your main model's, since it receives the full middle section of the conversation for compression.
Iteration Budget Pressure
When the agent is working on a complex task with many tool calls, it can burn through its iteration budget (default: 90 turns) without realizing it's running low. Budget pressure automatically warns the model as it approaches the limit:
| Threshold | Level | What the model sees |
|---|---|---|
| 70% | Caution | [BUDGET: 63/90. 27 iterations left. Start consolidating.] |
| 90% | Warning | [BUDGET WARNING: 81/90. Only 9 left. Respond NOW.] |
Warnings are injected into the last tool result's JSON (as a _budget_warning field) rather than as separate messages — this preserves prompt caching and doesn't disrupt the conversation structure.
agent:
max_turns: 90 # Max iterations per conversation turn (default: 90)
Budget pressure is enabled by default. The agent sees warnings naturally as part of tool results, encouraging it to consolidate its work and deliver a response before running out of iterations.
Context Pressure Warnings
Separate from iteration budget pressure, context pressure tracks how close the conversation is to the compaction threshold — the point where context compression fires to summarize older messages. This helps both you and the agent understand when the conversation is getting long.
| Progress | Level | What happens |
|---|---|---|
| ≥ 60% to threshold | Info | CLI shows a cyan progress bar; gateway sends an informational notice |
| ≥ 85% to threshold | Warning | CLI shows a bold yellow bar; gateway warns compaction is imminent |
In the CLI, context pressure appears as a progress bar in the tool output feed:
◐ context ████████████░░░░░░░░ 62% to compaction 48k threshold (50%) · approaching compaction
On messaging platforms, a plain-text notification is sent:
◐ Context: ████████████░░░░░░░░ 62% to compaction (threshold: 50% of window).
If auto-compression is disabled, the warning tells you context may be truncated instead.
Context pressure is automatic — no configuration needed. It fires purely as a user-facing notification and does not modify the message stream or inject anything into the model's context.
Auxiliary Models
Hermes uses lightweight "auxiliary" models for side tasks like image analysis, web page summarization, and browser screenshot analysis. By default, these use Gemini Flash via auto-detection — you don't need to configure anything.
The universal config pattern
Every model slot in Hermes — auxiliary tasks, compression, fallback — uses the same three knobs:
| Key | What it does | Default |
|---|---|---|
provider |
Which provider to use for auth and routing | "auto" |
model |
Which model to request | provider's default |
base_url |
Custom OpenAI-compatible endpoint (overrides provider) | not set |
When base_url is set, Hermes ignores the provider and calls that endpoint directly (using api_key or OPENAI_API_KEY for auth). When only provider is set, Hermes uses that provider's built-in auth and base URL.
Available providers: auto, openrouter, nous, codex, copilot, anthropic, main, zai, kimi-coding, minimax, and any provider registered in the provider registry.
Full auxiliary config reference
auxiliary:
# Image analysis (vision_analyze tool + browser screenshots)
vision:
provider: "auto" # "auto", "openrouter", "nous", "codex", "main", etc.
model: "" # e.g. "openai/gpt-4o", "google/gemini-2.5-flash"
base_url: "" # Custom OpenAI-compatible endpoint (overrides provider)
api_key: "" # API key for base_url (falls back to OPENAI_API_KEY)
timeout: 30 # seconds — increase for slow local vision models
# Web page summarization + browser page text extraction
web_extract:
provider: "auto"
model: "" # e.g. "google/gemini-2.5-flash"
base_url: ""
api_key: ""
# Dangerous command approval classifier
approval:
provider: "auto"
model: ""
base_url: ""
api_key: ""
:::info
Context compression has its own top-level compression: block with summary_provider, summary_model, and summary_base_url — see Context Compression above. The fallback model uses a fallback_model: block — see Fallback Model above. All three follow the same provider/model/base_url pattern.
:::
Changing the Vision Model
To use GPT-4o instead of Gemini Flash for image analysis:
auxiliary:
vision:
model: "openai/gpt-4o"
Or via environment variable (in ~/.hermes/.env):
AUXILIARY_VISION_MODEL=openai/gpt-4o
Provider Options
| Provider | Description | Requirements |
|---|---|---|
"auto" |
Best available (default). Vision tries OpenRouter → Nous → Codex. | — |
"openrouter" |
Force OpenRouter — routes to any model (Gemini, GPT-4o, Claude, etc.) | OPENROUTER_API_KEY |
"nous" |
Force Nous Portal | hermes login |
"codex" |
Force Codex OAuth (ChatGPT account). Supports vision (gpt-5.3-codex). | hermes model → Codex |
"main" |
Use your active custom/main endpoint. This can come from OPENAI_BASE_URL + OPENAI_API_KEY or from a custom endpoint saved via hermes model / config.yaml. Works with OpenAI, local models, or any OpenAI-compatible API. |
Custom endpoint credentials + base URL |
Common Setups
Using a direct custom endpoint (clearer than provider: "main" for local/self-hosted APIs):
auxiliary:
vision:
base_url: "http://localhost:1234/v1"
api_key: "local-key"
model: "qwen2.5-vl"
base_url takes precedence over provider, so this is the most explicit way to route an auxiliary task to a specific endpoint. For direct endpoint overrides, Hermes uses the configured api_key or falls back to OPENAI_API_KEY; it does not reuse OPENROUTER_API_KEY for that custom endpoint.
Using OpenAI API key for vision:
# In ~/.hermes/.env:
# OPENAI_BASE_URL=https://api.openai.com/v1
# OPENAI_API_KEY=sk-...
auxiliary:
vision:
provider: "main"
model: "gpt-4o" # or "gpt-4o-mini" for cheaper
Using OpenRouter for vision (route to any model):
auxiliary:
vision:
provider: "openrouter"
model: "openai/gpt-4o" # or "google/gemini-2.5-flash", etc.
Using Codex OAuth (ChatGPT Pro/Plus account — no API key needed):
auxiliary:
vision:
provider: "codex" # uses your ChatGPT OAuth token
# model defaults to gpt-5.3-codex (supports vision)
Using a local/self-hosted model:
auxiliary:
vision:
provider: "main" # uses your active custom endpoint
model: "my-local-model"
provider: "main" follows the same custom endpoint Hermes uses for normal chat. That endpoint can be set directly with OPENAI_BASE_URL, or saved once through hermes model and persisted in config.yaml.
:::tip If you use Codex OAuth as your main model provider, vision works automatically — no extra configuration needed. Codex is included in the auto-detection chain for vision. :::
:::warning
Vision requires a multimodal model. If you set provider: "main", make sure your endpoint supports multimodal/vision — otherwise image analysis will fail.
:::
Environment Variables (legacy)
Auxiliary models can also be configured via environment variables. However, config.yaml is the preferred method — it's easier to manage and supports all options including base_url and api_key.
| Setting | Environment Variable |
|---|---|
| Vision provider | AUXILIARY_VISION_PROVIDER |
| Vision model | AUXILIARY_VISION_MODEL |
| Vision endpoint | AUXILIARY_VISION_BASE_URL |
| Vision API key | AUXILIARY_VISION_API_KEY |
| Web extract provider | AUXILIARY_WEB_EXTRACT_PROVIDER |
| Web extract model | AUXILIARY_WEB_EXTRACT_MODEL |
| Web extract endpoint | AUXILIARY_WEB_EXTRACT_BASE_URL |
| Web extract API key | AUXILIARY_WEB_EXTRACT_API_KEY |
Compression and fallback model settings are config.yaml-only.
:::tip
Run hermes config to see your current auxiliary model settings. Overrides only show up when they differ from the defaults.
:::
Reasoning Effort
Control how much "thinking" the model does before responding:
agent:
reasoning_effort: "" # empty = medium (default). Options: xhigh (max), high, medium, low, minimal, none
When unset (default), reasoning effort defaults to "medium" — a balanced level that works well for most tasks. Setting a value overrides it — higher reasoning effort gives better results on complex tasks at the cost of more tokens and latency.
You can also change the reasoning effort at runtime with the /reasoning command:
/reasoning # Show current effort level and display state
/reasoning high # Set reasoning effort to high
/reasoning none # Disable reasoning
/reasoning show # Show model thinking above each response
/reasoning hide # Hide model thinking
TTS Configuration
tts:
provider: "edge" # "edge" | "elevenlabs" | "openai" | "neutts"
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
elevenlabs:
voice_id: "pNInz6obpgDQGcFmaJgB"
model_id: "eleven_multilingual_v2"
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
base_url: "https://api.openai.com/v1" # Override for OpenAI-compatible TTS endpoints
neutts:
ref_audio: ''
ref_text: ''
model: neuphonic/neutts-air-q4-gguf
device: cpu
This controls both the text_to_speech tool and spoken replies in voice mode (/voice tts in the CLI or messaging gateway).
Display Settings
display:
tool_progress: all # off | new | all | verbose
skin: default # Built-in or custom CLI skin (see user-guide/features/skins)
theme_mode: auto # auto | light | dark — color scheme for skin-aware rendering
personality: "kawaii" # Legacy cosmetic field still surfaced in some summaries
compact: false # Compact output mode (less whitespace)
resume_display: full # full (show previous messages on resume) | minimal (one-liner only)
bell_on_complete: false # Play terminal bell when agent finishes (great for long tasks)
show_reasoning: false # Show model reasoning/thinking above each response (toggle with /reasoning show|hide)
streaming: false # Stream tokens to terminal as they arrive (real-time output)
background_process_notifications: all # all | result | error | off (gateway only)
show_cost: false # Show estimated $ cost in the CLI status bar
Theme mode
The theme_mode setting controls whether skins render in light or dark mode:
| Mode | Behavior |
|---|---|
auto (default) |
Detects your terminal's background color automatically. Falls back to dark if detection fails. |
light |
Forces light-mode skin colors. Skins that define a colors_light override use those colors instead of the default dark-mode palette. |
dark |
Forces dark-mode skin colors. |
This works with any skin — built-in or custom. Skin authors can provide colors_light in their skin definition for optimal light-terminal appearance.
| Mode | What you see |
|---|---|
off |
Silent — just the final response |
new |
Tool indicator only when the tool changes |
all |
Every tool call with a short preview (default) |
verbose |
Full args, results, and debug logs |
Privacy
privacy:
redact_pii: false # Strip PII from LLM context (gateway only)
When redact_pii is true, the gateway redacts personally identifiable information from the system prompt before sending it to the LLM on supported platforms:
| Field | Treatment |
|---|---|
| Phone numbers (user ID on WhatsApp/Signal) | Hashed to user_<12-char-sha256> |
| User IDs | Hashed to user_<12-char-sha256> |
| Chat IDs | Numeric portion hashed, platform prefix preserved (telegram:<hash>) |
| Home channel IDs | Numeric portion hashed |
| User names / usernames | Not affected (user-chosen, publicly visible) |
Platform support: Redaction applies to WhatsApp, Signal, and Telegram. Discord and Slack are excluded because their mention systems (<@user_id>) require the real ID in the LLM context.
Hashes are deterministic — the same user always maps to the same hash, so the model can still distinguish between users in group chats. Routing and delivery use the original values internally.
Speech-to-Text (STT)
stt:
provider: "local" # "local" | "groq" | "openai"
local:
model: "base" # tiny, base, small, medium, large-v3
openai:
model: "whisper-1" # whisper-1 | gpt-4o-mini-transcribe | gpt-4o-transcribe
# model: "whisper-1" # Legacy fallback key still respected
Provider behavior:
localusesfaster-whisperrunning on your machine. Install it separately withpip install faster-whisper.groquses Groq's Whisper-compatible endpoint and readsGROQ_API_KEY.openaiuses the OpenAI speech API and readsVOICE_TOOLS_OPENAI_KEY.
If the requested provider is unavailable, Hermes falls back automatically in this order: local → groq → openai.
Groq and OpenAI model overrides are environment-driven:
STT_GROQ_MODEL=whisper-large-v3-turbo
STT_OPENAI_MODEL=whisper-1
GROQ_BASE_URL=https://api.groq.com/openai/v1
STT_OPENAI_BASE_URL=https://api.openai.com/v1
Voice Mode (CLI)
voice:
record_key: "ctrl+b" # Push-to-talk key inside the CLI
max_recording_seconds: 120 # Hard stop for long recordings
auto_tts: false # Enable spoken replies automatically when /voice on
silence_threshold: 200 # RMS threshold for speech detection
silence_duration: 3.0 # Seconds of silence before auto-stop
Use /voice on in the CLI to enable microphone mode, record_key to start/stop recording, and /voice tts to toggle spoken replies. See Voice Mode for end-to-end setup and platform-specific behavior.
Streaming
Stream tokens to the terminal or messaging platforms as they arrive, instead of waiting for the full response.
CLI Streaming
display:
streaming: true # Stream tokens to terminal in real-time
show_reasoning: true # Also stream reasoning/thinking tokens (optional)
When enabled, responses appear token-by-token inside a streaming box. Tool calls are still captured silently. If the provider doesn't support streaming, it falls back to the normal display automatically.
Gateway Streaming (Telegram, Discord, Slack)
streaming:
enabled: true # Enable progressive message editing
edit_interval: 0.3 # Seconds between message edits
buffer_threshold: 40 # Characters before forcing an edit flush
cursor: " ▉" # Cursor shown during streaming
When enabled, the bot sends a message on the first token, then progressively edits it as more tokens arrive. Platforms that don't support message editing (Signal, Email) gracefully skip streaming and deliver the final response normally.
:::note
Streaming is disabled by default. Enable it in ~/.hermes/config.yaml to try the streaming UX.
:::
Group Chat Session Isolation
Control whether shared chats keep one conversation per room or one conversation per participant:
group_sessions_per_user: true # true = per-user isolation in groups/channels, false = one shared session per chat
trueis the default and recommended setting. In Discord channels, Telegram groups, Slack channels, and similar shared contexts, each sender gets their own session when the platform provides a user ID.falsereverts to the old shared-room behavior. That can be useful if you explicitly want Hermes to treat a channel like one collaborative conversation, but it also means users share context, token costs, and interrupt state.- Direct messages are unaffected. Hermes still keys DMs by chat/DM ID as usual.
- Threads stay isolated from their parent channel either way; with
true, each participant also gets their own session inside the thread.
For the behavior details and examples, see Sessions and the Discord guide.
Unauthorized DM Behavior
Control what Hermes does when an unknown user sends a direct message:
unauthorized_dm_behavior: pair
whatsapp:
unauthorized_dm_behavior: ignore
pairis the default. Hermes denies access, but replies with a one-time pairing code in DMs.ignoresilently drops unauthorized DMs.- Platform sections override the global default, so you can keep pairing enabled broadly while making one platform quieter.
Quick Commands
Define custom commands that run shell commands without invoking the LLM — zero token usage, instant execution. Especially useful from messaging platforms (Telegram, Discord, etc.) for quick server checks or utility scripts.
quick_commands:
status:
type: exec
command: systemctl status hermes-agent
disk:
type: exec
command: df -h /
update:
type: exec
command: cd ~/.hermes/hermes-agent && git pull && pip install -e .
gpu:
type: exec
command: nvidia-smi --query-gpu=name,utilization.gpu,memory.used,memory.total --format=csv,noheader
Usage: type /status, /disk, /update, or /gpu in the CLI or any messaging platform. The command runs locally on the host and returns the output directly — no LLM call, no tokens consumed.
- 30-second timeout — long-running commands are killed with an error message
- Priority — quick commands are checked before skill commands, so you can override skill names
- Autocomplete — quick commands are resolved at dispatch time and are not shown in the built-in slash-command autocomplete tables
- Type — only
execis supported (runs a shell command); other types show an error - Works everywhere — CLI, Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant
Human Delay
Simulate human-like response pacing in messaging platforms:
human_delay:
mode: "off" # off | natural | custom
min_ms: 500 # Minimum delay (custom mode)
max_ms: 2000 # Maximum delay (custom mode)
Code Execution
Configure the sandboxed Python code execution tool:
code_execution:
timeout: 300 # Max execution time in seconds
max_tool_calls: 50 # Max tool calls within code execution
Browser
Configure browser automation behavior:
browser:
inactivity_timeout: 120 # Seconds before auto-closing idle sessions
record_sessions: false # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/
The browser toolset supports multiple providers. See the Browser feature page for details on Browserbase, Browser Use, and local Chrome CDP setup.
Website Blocklist
Block specific domains from being accessed by the agent's web and browser tools:
website_blocklist:
enabled: false # Enable URL blocking (default: false)
domains: # List of blocked domain patterns
- "*.internal.company.com"
- "admin.example.com"
- "*.local"
shared_files: # Load additional rules from external files
- "/etc/hermes/blocked-sites.txt"
When enabled, any URL matching a blocked domain pattern is rejected before the web or browser tool executes. This applies to web_search, web_extract, browser_navigate, and any tool that accesses URLs.
Domain rules support:
- Exact domains:
admin.example.com - Wildcard subdomains:
*.internal.company.com(blocks all subdomains) - TLD wildcards:
*.local
Shared files contain one domain rule per line (blank lines and # comments are ignored). Missing or unreadable files log a warning but don't disable other web tools.
The policy is cached for 30 seconds, so config changes take effect quickly without restart.
Smart Approvals
Control how Hermes handles potentially dangerous commands:
approval_mode: ask # ask | smart | off
| Mode | Behavior |
|---|---|
ask (default) |
Prompt the user before executing any flagged command. In the CLI, shows an interactive approval dialog. In messaging, queues a pending approval request. |
smart |
Use an auxiliary LLM to assess whether a flagged command is actually dangerous. Low-risk commands are auto-approved with session-level persistence. Genuinely risky commands are escalated to the user. |
off |
Skip all approval checks. Equivalent to HERMES_YOLO_MODE=true. Use with caution. |
Smart mode is particularly useful for reducing approval fatigue — it lets the agent work more autonomously on safe operations while still catching genuinely destructive commands.
:::warning
Setting approval_mode: off disables all safety checks for terminal commands. Only use this in trusted, sandboxed environments.
:::
Checkpoints
Automatic filesystem snapshots before destructive file operations. See the Checkpoints feature page for details.
checkpoints:
enabled: false # Enable automatic checkpoints (also: hermes --checkpoints)
max_snapshots: 50 # Max checkpoints to keep per directory
Delegation
Configure subagent behavior for the delegate tool:
delegation:
max_iterations: 50 # Max iterations per subagent
default_toolsets: # Toolsets available to subagents
- terminal
- file
- web
# model: "google/gemini-3-flash-preview" # Override model (empty = inherit parent)
# provider: "openrouter" # Override provider (empty = inherit parent)
# base_url: "http://localhost:1234/v1" # Direct OpenAI-compatible endpoint (takes precedence over provider)
# api_key: "local-key" # API key for base_url (falls back to OPENAI_API_KEY)
Subagent provider:model override: By default, subagents inherit the parent agent's provider and model. Set delegation.provider and delegation.model to route subagents to a different provider:model pair — e.g., use a cheap/fast model for narrowly-scoped subtasks while your primary agent runs an expensive reasoning model.
Direct endpoint override: If you want the obvious custom-endpoint path, set delegation.base_url, delegation.api_key, and delegation.model. That sends subagents directly to that OpenAI-compatible endpoint and takes precedence over delegation.provider. If delegation.api_key is omitted, Hermes falls back to OPENAI_API_KEY only.
The delegation provider uses the same credential resolution as CLI/gateway startup. All configured providers are supported: openrouter, nous, copilot, zai, kimi-coding, minimax, minimax-cn. When a provider is set, the system automatically resolves the correct base URL, API key, and API mode — no manual credential wiring needed.
Precedence: delegation.base_url in config → delegation.provider in config → parent provider (inherited). delegation.model in config → parent model (inherited). Setting just model without provider changes only the model name while keeping the parent's credentials (useful for switching models within the same provider like OpenRouter).
Clarify
Configure the clarification prompt behavior:
clarify:
timeout: 120 # Seconds to wait for user clarification response
Context Files (SOUL.md, AGENTS.md)
Hermes uses two different context scopes:
| File | Purpose | Scope |
|---|---|---|
SOUL.md |
Primary agent identity — defines who the agent is (slot #1 in the system prompt) | ~/.hermes/SOUL.md or $HERMES_HOME/SOUL.md |
AGENTS.md |
Project-specific instructions, coding conventions | Working directory / project tree |
.cursorrules |
Cursor IDE rules (also detected) | Working directory |
.cursor/rules/*.mdc |
Cursor rule files (also detected) | Working directory |
- SOUL.md is the agent's primary identity. It occupies slot #1 in the system prompt, completely replacing the built-in default identity. Edit it to fully customize who the agent is.
- If SOUL.md is missing, empty, or cannot be loaded, Hermes falls back to a built-in default identity.
- AGENTS.md is hierarchical: if subdirectories also have AGENTS.md, all are combined.
- Hermes automatically seeds a default
SOUL.mdif one does not already exist. - All loaded context files are capped at 20,000 characters with smart truncation.
See also:
Working Directory
| Context | Default |
|---|---|
CLI (hermes) |
Current directory where you run the command |
| Messaging gateway | Home directory ~ (override with MESSAGING_CWD) |
| Docker / Singularity / Modal / SSH | User's home directory inside the container or remote machine |
Override the working directory:
# In ~/.hermes/.env or ~/.hermes/config.yaml:
MESSAGING_CWD=/home/myuser/projects # Gateway sessions
TERMINAL_CWD=/workspace # All terminal sessions