2026-03-05 05:24:55 -08:00
---
sidebar_position: 2
title: "Configuration"
description: "Configure Hermes Agent — config.yaml, providers, models, API keys, and more"
---
# Configuration
All settings are stored in the `~/.hermes/` directory for easy access.
## Directory Structure
```text
~/.hermes/
├── config.yaml # Settings (model, terminal, TTS, compression, etc.)
├── .env # API keys and secrets
├── auth.json # OAuth provider credentials (Nous Portal, etc.)
├── SOUL.md # Optional: global persona (agent embodies this personality)
├── memories/ # Persistent memory (MEMORY.md, USER.md)
├── skills/ # Agent-created skills (managed via skill_manage tool)
├── cron/ # Scheduled jobs
├── sessions/ # Gateway sessions
└── logs/ # Logs (errors.log, gateway.log — secrets auto-redacted)
```
## Managing Configuration
```bash
hermes config # View current configuration
hermes config edit # Open config.yaml in your editor
hermes config set KEY VAL # Set a specific value
hermes config check # Check for missing options (after updates)
hermes config migrate # Interactively add missing options
# Examples:
hermes config set model anthropic/claude-opus-4
hermes config set terminal.backend docker
hermes config set OPENROUTER_API_KEY sk-or-... # Saves to .env
```
:::tip
The `hermes config set` command automatically routes values to the right file — API keys are saved to `.env` , everything else to `config.yaml` .
:::
## Configuration Precedence
Settings are resolved in this order (highest priority first):
docs: comprehensive accuracy audit fixes (35+ corrections)
CRITICAL fixes:
- Installation: Remove false prerequisites (installer auto-installs everything except git)
- Tools: Remove non-existent 'web_crawl' tool from tools table
- Memory: Remove non-existent 'read' action (only add/replace/remove exist)
- Code execution: Fix 'search' to 'search_files' in sandbox tools list
- CLI commands: Fix --model/--provider/--toolsets/--verbose as chat subcommand flags
IMPORTANT fixes:
- Installation: Add missing installer features (Node.js, ripgrep, ffmpeg, skills seeding)
- Installation: Add 6 missing package extras to table (mcp, honcho, tts-premium, etc)
- Installation: Fix mkdir to include all directories the installer creates
- Quickstart: Add OpenAI Codex to provider table
- CLI: Fix all 'hermes --flag' to 'hermes chat --flag' across all docs
- Configuration: Remove non-existent --max-turns CLI flag
- Tools: Fix 'search' to 'search_files', add missing 'process' tool
- Skills: Remove skills_categories() (not a registered tool)
- Cron: Remove unsupported 'daily at 9am' schedule format
- TTS: Fix output directory to ~/.hermes/audio_cache/
- Delegation: Clarify depth limit wording
- Architecture: Fix default model, chat() signature, file names
- Contributing: Fix Python requirement from 3.11+ to 3.10+
- CLI reference: Add missing commands (login, tools, sessions subcommands)
- Env vars: Fix TERMINAL_DOCKER_IMAGE default, add HERMES_MODEL
2026-03-05 06:50:22 -08:00
1. **CLI arguments ** — e.g., `hermes chat --model anthropic/claude-sonnet-4` (per-invocation override)
2026-03-05 05:24:55 -08:00
2. * * `~/.hermes/config.yaml` ** — the primary config file for all non-secret settings
3. * * `~/.hermes/.env` ** — fallback for env vars; **required ** for secrets (API keys, tokens, passwords)
4. **Built-in defaults ** — hardcoded safe defaults when nothing else is set
:::info Rule of Thumb
Secrets (API keys, bot tokens, passwords) go in `.env` . Everything else (model, terminal backend, compression settings, memory limits, toolsets) goes in `config.yaml` . When both are set, `config.yaml` wins for non-secret settings.
:::
## Inference Providers
You need at least one way to connect to an LLM. Use `hermes model` to switch providers and models interactively, or configure directly:
| Provider | Setup |
|----------|-------|
| **Nous Portal ** | `hermes model` (OAuth, subscription-based) |
| **OpenAI Codex ** | `hermes model` (ChatGPT OAuth, uses Codex models) |
| **OpenRouter ** | `OPENROUTER_API_KEY` in `~/.hermes/.env` |
feat: add z.ai/GLM, Kimi/Moonshot, MiniMax as first-class providers
Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn)
to the inference provider system. All use standard OpenAI-compatible
chat/completions endpoints with Bearer token auth.
Core changes:
- auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var
fields. Added providers to PROVIDER_REGISTRY. Added provider aliases
(glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key
providers in resolve_provider(). Added resolve_api_key_provider_credentials()
and get_api_key_provider_status() helpers.
- runtime_provider.py: Added generic API-key provider branch in
resolve_runtime_provider() — any provider with auth_type='api_key'
is automatically handled.
- main.py: Added providers to hermes model menu with generic
_model_flow_api_key_provider() flow. Updated _has_any_provider_configured()
to check all provider env vars. Updated argparse --provider choices.
- setup.py: Added providers to setup wizard with API key prompts and
curated model lists.
- config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY,
etc.) to OPTIONAL_ENV_VARS.
- status.py: Added API key display and provider status section.
- doctor.py: Added connectivity checks for each provider endpoint.
- cli.py: Updated provider docstrings.
Docs: Updated README.md, .env.example, cli-config.yaml.example,
cli-commands.md, environment-variables.md, configuration.md.
Tests: 50 new tests covering registry, aliases, resolution, auto-detection,
credential resolution, and runtime provider dispatch.
Inspired by PR #33 (numman-ali) which proposed a provider registry approach.
Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related
provider improvements merged earlier in this changeset.
2026-03-06 18:55:12 -08:00
| **z.ai / GLM ** | `GLM_API_KEY` in `~/.hermes/.env` (provider: `zai` ) |
| **Kimi / Moonshot ** | `KIMI_API_KEY` in `~/.hermes/.env` (provider: `kimi-coding` ) |
| **MiniMax ** | `MINIMAX_API_KEY` in `~/.hermes/.env` (provider: `minimax` ) |
| **MiniMax China ** | `MINIMAX_CN_API_KEY` in `~/.hermes/.env` (provider: `minimax-cn` ) |
2026-03-05 05:24:55 -08:00
| **Custom Endpoint ** | `OPENAI_BASE_URL` + `OPENAI_API_KEY` in `~/.hermes/.env` |
:::info Codex Note
The OpenAI Codex provider authenticates via device code (open a URL, enter a code). Credentials are stored at `~/.codex/auth.json` and auto-refresh. No Codex CLI installation required.
:::
:::warning
2026-03-08 18:10:55 -07:00
Even when using Nous Portal, Codex, or a custom endpoint, some tools (vision, web summarization, MoA) use a separate "auxiliary" model — by default Gemini Flash via OpenRouter. An `OPENROUTER_API_KEY` enables these tools automatically. You can also configure which model and provider these tools use — see [Auxiliary Models ](#auxiliary-models ) below.
2026-03-05 05:24:55 -08:00
:::
feat: add z.ai/GLM, Kimi/Moonshot, MiniMax as first-class providers
Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn)
to the inference provider system. All use standard OpenAI-compatible
chat/completions endpoints with Bearer token auth.
Core changes:
- auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var
fields. Added providers to PROVIDER_REGISTRY. Added provider aliases
(glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key
providers in resolve_provider(). Added resolve_api_key_provider_credentials()
and get_api_key_provider_status() helpers.
- runtime_provider.py: Added generic API-key provider branch in
resolve_runtime_provider() — any provider with auth_type='api_key'
is automatically handled.
- main.py: Added providers to hermes model menu with generic
_model_flow_api_key_provider() flow. Updated _has_any_provider_configured()
to check all provider env vars. Updated argparse --provider choices.
- setup.py: Added providers to setup wizard with API key prompts and
curated model lists.
- config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY,
etc.) to OPTIONAL_ENV_VARS.
- status.py: Added API key display and provider status section.
- doctor.py: Added connectivity checks for each provider endpoint.
- cli.py: Updated provider docstrings.
Docs: Updated README.md, .env.example, cli-config.yaml.example,
cli-commands.md, environment-variables.md, configuration.md.
Tests: 50 new tests covering registry, aliases, resolution, auto-detection,
credential resolution, and runtime provider dispatch.
Inspired by PR #33 (numman-ali) which proposed a provider registry approach.
Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related
provider improvements merged earlier in this changeset.
2026-03-06 18:55:12 -08:00
### First-Class Chinese AI Providers
These providers have built-in support with dedicated provider IDs. Set the API key and use `--provider` to select:
```bash
# z.ai / ZhipuAI GLM
hermes chat --provider zai --model glm-4-plus
# Requires: GLM_API_KEY in ~/.hermes/.env
# Kimi / Moonshot AI
hermes chat --provider kimi-coding --model moonshot-v1-auto
# Requires: KIMI_API_KEY in ~/.hermes/.env
# MiniMax (global endpoint)
hermes chat --provider minimax --model MiniMax-Text-01
# Requires: MINIMAX_API_KEY in ~/.hermes/.env
# MiniMax (China endpoint)
hermes chat --provider minimax-cn --model MiniMax-Text-01
# Requires: MINIMAX_CN_API_KEY in ~/.hermes/.env
```
Or set the provider permanently in `config.yaml` :
```yaml
model:
provider: "zai" # or: kimi-coding, minimax, minimax-cn
default: "glm-4-plus"
```
Base URLs can be overridden with `GLM_BASE_URL` , `KIMI_BASE_URL` , `MINIMAX_BASE_URL` , or `MINIMAX_CN_BASE_URL` environment variables.
2026-03-06 14:15:57 -08:00
## Custom & Self-Hosted LLM Providers
Hermes Agent works with **any OpenAI-compatible API endpoint ** . If a server implements `/v1/chat/completions` , you can point Hermes at it. This means you can use local models, GPU inference servers, multi-provider routers, or any third-party API.
### General Setup
Two ways to configure a custom endpoint:
**Interactive (recommended):**
```bash
hermes model
# Select "Custom endpoint (self-hosted / VLLM / etc.)"
# Enter: API base URL, API key, Model name
```
**Manual (`.env` file):**
```bash
# Add to ~/.hermes/.env
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=your-key-or-dummy
LLM_MODEL=your-model-name
```
Everything below follows this same pattern — just change the URL, key, and model name.
---
### Ollama — Local Models, Zero Config
[Ollama ](https://ollama.com/ ) runs open-weight models locally with one command. Best for: quick local experimentation, privacy-sensitive work, offline use.
```bash
# Install and run a model
ollama pull llama3.1:70b
ollama serve # Starts on port 11434
# Configure Hermes
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=ollama # Any non-empty string
LLM_MODEL=llama3.1:70b
```
Ollama's OpenAI-compatible endpoint supports chat completions, streaming, and tool calling (for supported models). No GPU required for smaller models — Ollama handles CPU inference automatically.
:::tip
List available models with `ollama list` . Pull any model from the [Ollama library ](https://ollama.com/library ) with `ollama pull <model>` .
:::
---
### vLLM — High-Performance GPU Inference
[vLLM ](https://docs.vllm.ai/ ) is the standard for production LLM serving. Best for: maximum throughput on GPU hardware, serving large models, continuous batching.
```bash
# Start vLLM server
pip install vllm
vllm serve meta-llama/Llama-3.1-70B-Instruct \
--port 8000 \
--tensor-parallel-size 2 # Multi-GPU
# Configure Hermes
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=dummy
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
```
vLLM supports tool calling, structured output, and multi-modal models. Use `--enable-auto-tool-choice` and `--tool-call-parser hermes` for Hermes-format tool calling with NousResearch models.
---
### SGLang — Fast Serving with RadixAttention
[SGLang ](https://github.com/sgl-project/sglang ) is an alternative to vLLM with RadixAttention for KV cache reuse. Best for: multi-turn conversations (prefix caching), constrained decoding, structured output.
```bash
# Start SGLang server
pip install sglang[all]
python -m sglang.launch_server \
--model meta-llama/Llama-3.1-70B-Instruct \
--port 8000 \
--tp 2
# Configure Hermes
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=dummy
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
```
---
### llama.cpp / llama-server — CPU & Metal Inference
[llama.cpp ](https://github.com/ggml-org/llama.cpp ) runs quantized models on CPU, Apple Silicon (Metal), and consumer GPUs. Best for: running models without a datacenter GPU, Mac users, edge deployment.
```bash
# Build and start llama-server
cmake -B build && cmake --build build --config Release
./build/bin/llama-server \
-m models/llama-3.1-8b-instruct-Q4_K_M.gguf \
--port 8080 --host 0.0.0.0
# Configure Hermes
OPENAI_BASE_URL=http://localhost:8080/v1
OPENAI_API_KEY=dummy
LLM_MODEL=llama-3.1-8b-instruct
```
:::tip
Download GGUF models from [Hugging Face ](https://huggingface.co/models?library=gguf ). Q4_K_M quantization offers the best balance of quality vs. memory usage.
:::
---
### LiteLLM Proxy — Multi-Provider Gateway
[LiteLLM ](https://docs.litellm.ai/ ) is an OpenAI-compatible proxy that unifies 100+ LLM providers behind a single API. Best for: switching between providers without config changes, load balancing, fallback chains, budget controls.
```bash
# Install and start
pip install litellm[proxy]
litellm --model anthropic/claude-sonnet-4 --port 4000
# Or with a config file for multiple models:
litellm --config litellm_config.yaml --port 4000
# Configure Hermes
OPENAI_BASE_URL=http://localhost:4000/v1
OPENAI_API_KEY=sk-your-litellm-key
LLM_MODEL=anthropic/claude-sonnet-4
```
Example `litellm_config.yaml` with fallback:
```yaml
model_list:
- model_name: "best"
litellm_params:
model: anthropic/claude-sonnet-4
api_key: sk-ant-...
- model_name: "best"
litellm_params:
model: openai/gpt-4o
api_key: sk-...
router_settings:
routing_strategy: "latency-based-routing"
```
---
### ClawRouter — Cost-Optimized Routing
[ClawRouter ](https://github.com/BlockRunAI/ClawRouter ) by BlockRunAI is a local routing proxy that auto-selects models based on query complexity. It classifies requests across 14 dimensions and routes to the cheapest model that can handle the task. Payment is via USDC cryptocurrency (no API keys).
```bash
# Install and start
npx @blockrun/clawrouter # Starts on port 8402
# Configure Hermes
OPENAI_BASE_URL=http://localhost:8402/v1
OPENAI_API_KEY=dummy
LLM_MODEL=blockrun/auto # or: blockrun/eco, blockrun/premium, blockrun/agentic
```
Routing profiles:
| Profile | Strategy | Savings |
|---------|----------|---------|
| `blockrun/auto` | Balanced quality/cost | 74-100% |
| `blockrun/eco` | Cheapest possible | 95-100% |
| `blockrun/premium` | Best quality models | 0% |
| `blockrun/free` | Free models only | 100% |
| `blockrun/agentic` | Optimized for tool use | varies |
:::note
ClawRouter requires a USDC-funded wallet on Base or Solana for payment. All requests route through BlockRun's backend API. Run `npx @blockrun/clawrouter doctor` to check wallet status.
:::
---
### Other Compatible Providers
Any service with an OpenAI-compatible API works. Some popular options:
| Provider | Base URL | Notes |
|----------|----------|-------|
| [Together AI ](https://together.ai ) | `https://api.together.xyz/v1` | Cloud-hosted open models |
| [Groq ](https://groq.com ) | `https://api.groq.com/openai/v1` | Ultra-fast inference |
| [DeepSeek ](https://deepseek.com ) | `https://api.deepseek.com/v1` | DeepSeek models |
| [Fireworks AI ](https://fireworks.ai ) | `https://api.fireworks.ai/inference/v1` | Fast open model hosting |
| [Cerebras ](https://cerebras.ai ) | `https://api.cerebras.ai/v1` | Wafer-scale chip inference |
| [Mistral AI ](https://mistral.ai ) | `https://api.mistral.ai/v1` | Mistral models |
| [OpenAI ](https://openai.com ) | `https://api.openai.com/v1` | Direct OpenAI access |
| [Azure OpenAI ](https://azure.microsoft.com ) | `https://YOUR.openai.azure.com/` | Enterprise OpenAI |
| [LocalAI ](https://localai.io ) | `http://localhost:8080/v1` | Self-hosted, multi-model |
| [Jan ](https://jan.ai ) | `http://localhost:1337/v1` | Desktop app with local models |
```bash
# Example: Together AI
OPENAI_BASE_URL=https://api.together.xyz/v1
OPENAI_API_KEY=your-together-key
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct-Turbo
```
---
### Choosing the Right Setup
| Use Case | Recommended |
|----------|-------------|
| **Just want it to work ** | OpenRouter (default) or Nous Portal |
| **Local models, easy setup ** | Ollama |
| **Production GPU serving ** | vLLM or SGLang |
| **Mac / no GPU ** | Ollama or llama.cpp |
| **Multi-provider routing ** | LiteLLM Proxy or OpenRouter |
| **Cost optimization ** | ClawRouter or OpenRouter with `sort: "price"` |
| **Maximum privacy ** | Ollama, vLLM, or llama.cpp (fully local) |
| **Enterprise / Azure ** | Azure OpenAI with custom endpoint |
feat: add z.ai/GLM, Kimi/Moonshot, MiniMax as first-class providers
Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn)
to the inference provider system. All use standard OpenAI-compatible
chat/completions endpoints with Bearer token auth.
Core changes:
- auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var
fields. Added providers to PROVIDER_REGISTRY. Added provider aliases
(glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key
providers in resolve_provider(). Added resolve_api_key_provider_credentials()
and get_api_key_provider_status() helpers.
- runtime_provider.py: Added generic API-key provider branch in
resolve_runtime_provider() — any provider with auth_type='api_key'
is automatically handled.
- main.py: Added providers to hermes model menu with generic
_model_flow_api_key_provider() flow. Updated _has_any_provider_configured()
to check all provider env vars. Updated argparse --provider choices.
- setup.py: Added providers to setup wizard with API key prompts and
curated model lists.
- config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY,
etc.) to OPTIONAL_ENV_VARS.
- status.py: Added API key display and provider status section.
- doctor.py: Added connectivity checks for each provider endpoint.
- cli.py: Updated provider docstrings.
Docs: Updated README.md, .env.example, cli-config.yaml.example,
cli-commands.md, environment-variables.md, configuration.md.
Tests: 50 new tests covering registry, aliases, resolution, auto-detection,
credential resolution, and runtime provider dispatch.
Inspired by PR #33 (numman-ali) which proposed a provider registry approach.
Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related
provider improvements merged earlier in this changeset.
2026-03-06 18:55:12 -08:00
| **Chinese AI models ** | z.ai (GLM), Kimi/Moonshot, or MiniMax (first-class providers) |
2026-03-06 14:15:57 -08:00
:::tip
You can switch between providers at any time with `hermes model` — no restart required. Your conversation history, memory, and skills carry over regardless of which provider you use.
:::
2026-03-05 05:24:55 -08:00
## Optional API Keys
| Feature | Provider | Env Variable |
|---------|----------|--------------|
| Web scraping | [Firecrawl ](https://firecrawl.dev/ ) | `FIRECRAWL_API_KEY` |
| Browser automation | [Browserbase ](https://browserbase.com/ ) | `BROWSERBASE_API_KEY` , `BROWSERBASE_PROJECT_ID` |
| Image generation | [FAL ](https://fal.ai/ ) | `FAL_KEY` |
| Premium TTS voices | [ElevenLabs ](https://elevenlabs.io/ ) | `ELEVENLABS_API_KEY` |
| OpenAI TTS + voice transcription | [OpenAI ](https://platform.openai.com/api-keys ) | `VOICE_TOOLS_OPENAI_KEY` |
| RL Training | [Tinker ](https://tinker-console.thinkingmachines.ai/ ) + [WandB ](https://wandb.ai/ ) | `TINKER_API_KEY` , `WANDB_API_KEY` |
| Cross-session user modeling | [Honcho ](https://honcho.dev/ ) | `HONCHO_API_KEY` |
2026-03-05 16:44:21 -08:00
### Self-Hosting Firecrawl
By default, Hermes uses the [Firecrawl cloud API ](https://firecrawl.dev/ ) for web search and scraping. If you prefer to run Firecrawl locally, you can point Hermes at a self-hosted instance instead.
**What you get:** No API key required, no rate limits, no per-page costs, full data sovereignty.
**What you lose:** The cloud version uses Firecrawl's proprietary "Fire-engine" for advanced anti-bot bypassing (Cloudflare, CAPTCHAs, IP rotation). Self-hosted uses basic fetch + Playwright, so some protected sites may fail. Search uses DuckDuckGo instead of Google.
**Setup:**
1. Clone and start the Firecrawl Docker stack (5 containers: API, Playwright, Redis, RabbitMQ, PostgreSQL — requires ~4-8 GB RAM):
```bash
git clone https://github.com/mendableai/firecrawl
cd firecrawl
# In .env, set: USE_DB_AUTHENTICATION=false
docker compose up -d
```
2. Point Hermes at your instance (no API key needed):
```bash
hermes config set FIRECRAWL_API_URL http://localhost:3002
```
You can also set both `FIRECRAWL_API_KEY` and `FIRECRAWL_API_URL` if your self-hosted instance has authentication enabled.
2026-03-05 05:24:55 -08:00
## OpenRouter Provider Routing
When using OpenRouter, you can control how requests are routed across providers. Add a `provider_routing` section to `~/.hermes/config.yaml` :
```yaml
provider_routing:
sort: "throughput" # "price" (default), "throughput", or "latency"
# only: ["anthropic"] # Only use these providers
# ignore: ["deepinfra"] # Skip these providers
# order: ["anthropic", "google"] # Try providers in this order
# require_parameters: true # Only use providers that support all request params
# data_collection: "deny" # Exclude providers that may store/train on data
```
**Shortcuts:** Append `:nitro` to any model name for throughput sorting (e.g., `anthropic/claude-sonnet-4:nitro` ), or `:floor` for price sorting.
## Terminal Backend Configuration
Configure which environment the agent uses for terminal commands:
```yaml
terminal:
2026-03-05 11:55:41 -08:00
backend: local # or: docker, ssh, singularity, modal, daytona
2026-03-05 05:24:55 -08:00
cwd: "." # Working directory ("." = current dir)
timeout: 180 # Command timeout in seconds
2026-03-09 15:29:34 -07:00
# Docker-specific settings
docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
docker_volumes: # Share host directories with the container
- "/home/user/projects:/workspace/projects"
- "/home/user/data:/data:ro" # :ro for read-only
# Container resource limits (docker, singularity, modal, daytona)
container_cpu: 1 # CPU cores
container_memory: 5120 # MB (default 5GB)
container_disk: 51200 # MB (default 50GB)
container_persistent: true # Persist filesystem across sessions
2026-03-05 05:24:55 -08:00
```
2026-03-09 15:29:34 -07:00
### Docker Volume Mounts
When using the Docker backend, `docker_volumes` lets you share host directories with the container. Each entry uses standard Docker `-v` syntax: `host_path:container_path[:options]` .
```yaml
terminal:
backend: docker
docker_volumes:
- "/home/user/projects:/workspace/projects" # Read-write (default)
- "/home/user/datasets:/data:ro" # Read-only
- "/home/user/outputs:/outputs" # Agent writes, you read
```
This is useful for:
- **Providing files** to the agent (datasets, configs, reference code)
- **Receiving files** from the agent (generated code, reports, exports)
- **Shared workspaces** where both you and the agent access the same files
Can also be set via environment variable: `TERMINAL_DOCKER_VOLUMES='["/host:/container"]'` (JSON array).
2026-03-05 05:24:55 -08:00
See [Code Execution ](features/code-execution.md ) and the [Terminal section of the README ](features/tools.md ) for details on each backend.
## Memory Configuration
```yaml
memory:
memory_enabled: true
user_profile_enabled: true
memory_char_limit: 2200 # ~800 tokens
user_char_limit: 1375 # ~500 tokens
```
2026-03-07 21:05:40 -08:00
## Git Worktree Isolation
Enable isolated git worktrees for running multiple agents in parallel on the same repo:
```yaml
worktree: true # Always create a worktree (same as hermes -w)
# worktree: false # Default — only when -w flag is passed
```
When enabled, each CLI session creates a fresh worktree under `.worktrees/` with its own branch. Agents can edit files, commit, push, and create PRs without interfering with each other. Clean worktrees are removed on exit; dirty ones are kept for manual recovery.
You can also list gitignored files to copy into worktrees via `.worktreeinclude` in your repo root:
```
# .worktreeinclude
.env
.venv/
node_modules/
```
2026-03-05 05:24:55 -08:00
## Context Compression
```yaml
compression:
enabled: true
2026-03-08 18:10:55 -07:00
threshold: 0.85 # Compress at 85% of context limit
summary_model: "google/gemini-3-flash-preview" # Model for summarization
# summary_provider: "auto" # "auto", "openrouter", "nous", "main"
2026-03-05 05:24:55 -08:00
```
2026-03-08 18:10:55 -07:00
The `summary_model` must support a context length at least as large as your main model's, since it receives the full middle section of the conversation for compression.
## Auxiliary Models
Hermes uses lightweight "auxiliary" models for side tasks like image analysis, web page summarization, and browser screenshot analysis. By default, these use **Gemini Flash ** via OpenRouter or Nous Portal — you don't need to configure anything.
To use a different model, add an `auxiliary` section to `~/.hermes/config.yaml` :
```yaml
auxiliary:
# Image analysis (vision_analyze tool + browser screenshots)
vision:
provider: "auto" # "auto", "openrouter", "nous", "main"
model: "" # e.g. "openai/gpt-4o", "google/gemini-2.5-flash"
# Web page summarization + browser page text extraction
web_extract:
provider: "auto"
model: "" # e.g. "google/gemini-2.5-flash"
```
### Changing the Vision Model
To use GPT-4o instead of Gemini Flash for image analysis:
```yaml
auxiliary:
vision:
model: "openai/gpt-4o"
```
Or via environment variable (in `~/.hermes/.env` ):
```bash
AUXILIARY_VISION_MODEL=openai/gpt-4o
```
### Provider Options
feat: add 'openai' as auxiliary provider option
Users can now set provider: "openai" for auxiliary tasks (vision, web
extract, compression) to use OpenAI's API directly with their
OPENAI_API_KEY. This hits api.openai.com/v1 with gpt-4o-mini as the
default model — supports vision since GPT-4o handles image input.
Provider options are now: auto, openrouter, nous, openai, main.
Changes:
- agent/auxiliary_client.py: added _try_openai(), "openai" case in
_resolve_forced_provider(), updated auxiliary_max_tokens_param()
to use max_completion_tokens for OpenAI
- Updated docs: cli-config.yaml.example, AGENTS.md, and user-facing
configuration.md with Common Setups section showing OpenAI,
OpenRouter, and local model examples
- 3 new tests for OpenAI provider resolution
Tests: 2459 passed (was 2429).
2026-03-08 18:25:30 -07:00
| Provider | Description | Requirements |
|----------|-------------|-------------|
2026-03-08 18:44:25 -07:00
| `"auto"` | Best available (default). Vision tries OpenRouter → Nous → Codex. | — |
feat: add 'openai' as auxiliary provider option
Users can now set provider: "openai" for auxiliary tasks (vision, web
extract, compression) to use OpenAI's API directly with their
OPENAI_API_KEY. This hits api.openai.com/v1 with gpt-4o-mini as the
default model — supports vision since GPT-4o handles image input.
Provider options are now: auto, openrouter, nous, openai, main.
Changes:
- agent/auxiliary_client.py: added _try_openai(), "openai" case in
_resolve_forced_provider(), updated auxiliary_max_tokens_param()
to use max_completion_tokens for OpenAI
- Updated docs: cli-config.yaml.example, AGENTS.md, and user-facing
configuration.md with Common Setups section showing OpenAI,
OpenRouter, and local model examples
- 3 new tests for OpenAI provider resolution
Tests: 2459 passed (was 2429).
2026-03-08 18:25:30 -07:00
| `"openrouter"` | Force OpenRouter — routes to any model (Gemini, GPT-4o, Claude, etc.) | `OPENROUTER_API_KEY` |
| `"nous"` | Force Nous Portal | `hermes login` |
2026-03-08 18:44:25 -07:00
| `"codex"` | Force Codex OAuth (ChatGPT account). Supports vision (gpt-5.3-codex). | `hermes model` → Codex |
2026-03-08 18:50:26 -07:00
| `"main"` | Use your custom endpoint (`OPENAI_BASE_URL` + `OPENAI_API_KEY` ). Works with OpenAI, local models, or any OpenAI-compatible API. | `OPENAI_BASE_URL` + `OPENAI_API_KEY` |
feat: add 'openai' as auxiliary provider option
Users can now set provider: "openai" for auxiliary tasks (vision, web
extract, compression) to use OpenAI's API directly with their
OPENAI_API_KEY. This hits api.openai.com/v1 with gpt-4o-mini as the
default model — supports vision since GPT-4o handles image input.
Provider options are now: auto, openrouter, nous, openai, main.
Changes:
- agent/auxiliary_client.py: added _try_openai(), "openai" case in
_resolve_forced_provider(), updated auxiliary_max_tokens_param()
to use max_completion_tokens for OpenAI
- Updated docs: cli-config.yaml.example, AGENTS.md, and user-facing
configuration.md with Common Setups section showing OpenAI,
OpenRouter, and local model examples
- 3 new tests for OpenAI provider resolution
Tests: 2459 passed (was 2429).
2026-03-08 18:25:30 -07:00
### Common Setups
2026-03-08 18:50:26 -07:00
**Using OpenAI API key for vision:**
feat: add 'openai' as auxiliary provider option
Users can now set provider: "openai" for auxiliary tasks (vision, web
extract, compression) to use OpenAI's API directly with their
OPENAI_API_KEY. This hits api.openai.com/v1 with gpt-4o-mini as the
default model — supports vision since GPT-4o handles image input.
Provider options are now: auto, openrouter, nous, openai, main.
Changes:
- agent/auxiliary_client.py: added _try_openai(), "openai" case in
_resolve_forced_provider(), updated auxiliary_max_tokens_param()
to use max_completion_tokens for OpenAI
- Updated docs: cli-config.yaml.example, AGENTS.md, and user-facing
configuration.md with Common Setups section showing OpenAI,
OpenRouter, and local model examples
- 3 new tests for OpenAI provider resolution
Tests: 2459 passed (was 2429).
2026-03-08 18:25:30 -07:00
```yaml
2026-03-08 18:50:26 -07:00
# In ~/.hermes/.env:
# OPENAI_BASE_URL=https://api.openai.com/v1
# OPENAI_API_KEY=sk-...
feat: add 'openai' as auxiliary provider option
Users can now set provider: "openai" for auxiliary tasks (vision, web
extract, compression) to use OpenAI's API directly with their
OPENAI_API_KEY. This hits api.openai.com/v1 with gpt-4o-mini as the
default model — supports vision since GPT-4o handles image input.
Provider options are now: auto, openrouter, nous, openai, main.
Changes:
- agent/auxiliary_client.py: added _try_openai(), "openai" case in
_resolve_forced_provider(), updated auxiliary_max_tokens_param()
to use max_completion_tokens for OpenAI
- Updated docs: cli-config.yaml.example, AGENTS.md, and user-facing
configuration.md with Common Setups section showing OpenAI,
OpenRouter, and local model examples
- 3 new tests for OpenAI provider resolution
Tests: 2459 passed (was 2429).
2026-03-08 18:25:30 -07:00
auxiliary:
vision:
2026-03-08 18:50:26 -07:00
provider: "main"
feat: add 'openai' as auxiliary provider option
Users can now set provider: "openai" for auxiliary tasks (vision, web
extract, compression) to use OpenAI's API directly with their
OPENAI_API_KEY. This hits api.openai.com/v1 with gpt-4o-mini as the
default model — supports vision since GPT-4o handles image input.
Provider options are now: auto, openrouter, nous, openai, main.
Changes:
- agent/auxiliary_client.py: added _try_openai(), "openai" case in
_resolve_forced_provider(), updated auxiliary_max_tokens_param()
to use max_completion_tokens for OpenAI
- Updated docs: cli-config.yaml.example, AGENTS.md, and user-facing
configuration.md with Common Setups section showing OpenAI,
OpenRouter, and local model examples
- 3 new tests for OpenAI provider resolution
Tests: 2459 passed (was 2429).
2026-03-08 18:25:30 -07:00
model: "gpt-4o" # or "gpt-4o-mini" for cheaper
```
**Using OpenRouter for vision** (route to any model):
```yaml
auxiliary:
vision:
provider: "openrouter"
model: "openai/gpt-4o" # or "google/gemini-2.5-flash", etc.
```
2026-03-08 18:44:25 -07:00
**Using Codex OAuth** (ChatGPT Pro/Plus account — no API key needed):
```yaml
auxiliary:
vision:
provider: "codex" # uses your ChatGPT OAuth token
# model defaults to gpt-5.3-codex (supports vision)
```
feat: add 'openai' as auxiliary provider option
Users can now set provider: "openai" for auxiliary tasks (vision, web
extract, compression) to use OpenAI's API directly with their
OPENAI_API_KEY. This hits api.openai.com/v1 with gpt-4o-mini as the
default model — supports vision since GPT-4o handles image input.
Provider options are now: auto, openrouter, nous, openai, main.
Changes:
- agent/auxiliary_client.py: added _try_openai(), "openai" case in
_resolve_forced_provider(), updated auxiliary_max_tokens_param()
to use max_completion_tokens for OpenAI
- Updated docs: cli-config.yaml.example, AGENTS.md, and user-facing
configuration.md with Common Setups section showing OpenAI,
OpenRouter, and local model examples
- 3 new tests for OpenAI provider resolution
Tests: 2459 passed (was 2429).
2026-03-08 18:25:30 -07:00
**Using a local/self-hosted model:**
```yaml
auxiliary:
vision:
provider: "main" # uses your OPENAI_BASE_URL endpoint
model: "my-local-model"
```
2026-03-08 18:10:55 -07:00
2026-03-08 18:44:25 -07:00
:::tip
If you use Codex OAuth as your main model provider, vision works automatically — no extra configuration needed. Codex is included in the auto-detection chain for vision.
:::
2026-03-08 18:10:55 -07:00
:::warning
2026-03-08 18:44:25 -07:00
**Vision requires a multimodal model.** If you set `provider: "main"` , make sure your endpoint supports multimodal/vision — otherwise image analysis will fail.
2026-03-08 18:10:55 -07:00
:::
### Environment Variables
You can also configure auxiliary models via environment variables instead of `config.yaml` :
| Setting | Environment Variable |
|---------|---------------------|
| Vision provider | `AUXILIARY_VISION_PROVIDER` |
| Vision model | `AUXILIARY_VISION_MODEL` |
| Web extract provider | `AUXILIARY_WEB_EXTRACT_PROVIDER` |
| Web extract model | `AUXILIARY_WEB_EXTRACT_MODEL` |
| Compression provider | `CONTEXT_COMPRESSION_PROVIDER` |
| Compression model | `CONTEXT_COMPRESSION_MODEL` |
:::tip
Run `hermes config` to see your current auxiliary model settings. Overrides only show up when they differ from the defaults.
:::
2026-03-05 05:24:55 -08:00
## Reasoning Effort
Control how much "thinking" the model does before responding:
```yaml
agent:
2026-03-07 10:14:19 -08:00
reasoning_effort: "" # empty = medium (default). Options: xhigh (max), high, medium, low, minimal, none
2026-03-05 05:24:55 -08:00
```
2026-03-07 10:14:19 -08:00
When unset (default), reasoning effort defaults to "medium" — a balanced level that works well for most tasks. Setting a value overrides it — higher reasoning effort gives better results on complex tasks at the cost of more tokens and latency.
2026-03-05 05:24:55 -08:00
## TTS Configuration
```yaml
tts:
provider: "edge" # "edge" | "elevenlabs" | "openai"
edge:
voice: "en-US-AriaNeural" # 322 voices, 74 languages
elevenlabs:
voice_id: "pNInz6obpgDQGcFmaJgB"
model_id: "eleven_multilingual_v2"
openai:
model: "gpt-4o-mini-tts"
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
```
## Display Settings
```yaml
display:
tool_progress: all # off | new | all | verbose
docs: fix all remaining minor accuracy issues
- updating.md: Note that 'hermes update' auto-handles config migration
- cli.md: Add summary_model to compression config, fix display config
(add personality/compact), remove unverified pastes/ claim
- configuration.md: Add 5 missing config sections (stt, human_delay,
code_execution, delegation, clarify), fix display defaults,
fix reasoning_effort default to empty/unset
- messaging/index.md: Add GATEWAY_ALLOWED_USERS to security section
- skills.md: Add category field to skills_list return value
- mcp.md: Document auto-registered utility tools (resources/prompts)
- architecture.md: Fix file_tools.py reference, base_url default to None,
synchronous agent loop pseudocode
- cli-commands.md: Fix hermes logout description
- environment-variables.md: Add HERMES_QUIET, HERMES_EXEC_ASK,
BROWSER_INACTIVITY_TIMEOUT, GATEWAY_ALLOWED_USERS
Verification scan: 27/27 checks passed, zero issues remaining.
2026-03-05 07:00:51 -08:00
personality: "kawaii" # Default personality for the CLI
compact: false # Compact output mode (less whitespace)
2026-03-08 17:55:14 -07:00
resume_display: full # full (show previous messages on resume) | minimal (one-liner only)
2026-03-08 19:41:17 -07:00
bell_on_complete: false # Play terminal bell when agent finishes (great for long tasks)
2026-03-05 05:24:55 -08:00
```
| Mode | What you see |
|------|-------------|
| `off` | Silent — just the final response |
| `new` | Tool indicator only when the tool changes |
| `all` | Every tool call with a short preview (default) |
| `verbose` | Full args, results, and debug logs |
docs: fix all remaining minor accuracy issues
- updating.md: Note that 'hermes update' auto-handles config migration
- cli.md: Add summary_model to compression config, fix display config
(add personality/compact), remove unverified pastes/ claim
- configuration.md: Add 5 missing config sections (stt, human_delay,
code_execution, delegation, clarify), fix display defaults,
fix reasoning_effort default to empty/unset
- messaging/index.md: Add GATEWAY_ALLOWED_USERS to security section
- skills.md: Add category field to skills_list return value
- mcp.md: Document auto-registered utility tools (resources/prompts)
- architecture.md: Fix file_tools.py reference, base_url default to None,
synchronous agent loop pseudocode
- cli-commands.md: Fix hermes logout description
- environment-variables.md: Add HERMES_QUIET, HERMES_EXEC_ASK,
BROWSER_INACTIVITY_TIMEOUT, GATEWAY_ALLOWED_USERS
Verification scan: 27/27 checks passed, zero issues remaining.
2026-03-05 07:00:51 -08:00
## Speech-to-Text (STT)
```yaml
stt:
provider: "openai" # STT provider
```
Requires `VOICE_TOOLS_OPENAI_KEY` in `.env` for OpenAI STT.
## Human Delay
Simulate human-like response pacing in messaging platforms:
```yaml
human_delay:
mode: "off" # off | natural | custom
min_ms: 500 # Minimum delay (custom mode)
max_ms: 2000 # Maximum delay (custom mode)
```
## Code Execution
Configure the sandboxed Python code execution tool:
```yaml
code_execution:
timeout: 300 # Max execution time in seconds
max_tool_calls: 50 # Max tool calls within code execution
```
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
## Browser
Configure browser automation behavior:
```yaml
browser:
inactivity_timeout: 120 # Seconds before auto-closing idle sessions
record_sessions: false # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/
```
feat: filesystem checkpoints and /rollback command
Automatic filesystem snapshots before destructive file operations,
with user-facing rollback. Inspired by PR #559 (by @alireza78a).
Architecture:
- Shadow git repos at ~/.hermes/checkpoints/{hash}/ via GIT_DIR
- CheckpointManager: take/list/restore, turn-scoped dedup, pruning
- Transparent — the LLM never sees it, no tool schema, no tokens
- Once per turn — only first write_file/patch triggers a snapshot
Integration:
- Config: checkpoints.enabled + checkpoints.max_snapshots
- CLI flag: hermes --checkpoints
- Trigger: run_agent.py _execute_tool_calls() before write_file/patch
- /rollback slash command in CLI + gateway (list, restore by number)
- Pre-rollback snapshot auto-created on restore (undo the undo)
Safety:
- Never blocks file operations — all errors silently logged
- Skips root dir, home dir, dirs >50K files
- Disables gracefully when git not installed
- Shadow repo completely isolated from project git
Tests: 35 new tests, all passing (2798 total suite)
Docs: feature page, config reference, CLI commands reference
2026-03-10 00:49:15 -07:00
## Checkpoints
Automatic filesystem snapshots before destructive file operations. See the [Checkpoints feature page ](/docs/user-guide/features/checkpoints ) for details.
```yaml
checkpoints:
enabled: false # Enable automatic checkpoints (also: hermes --checkpoints)
max_snapshots: 50 # Max checkpoints to keep per directory
```
docs: fix all remaining minor accuracy issues
- updating.md: Note that 'hermes update' auto-handles config migration
- cli.md: Add summary_model to compression config, fix display config
(add personality/compact), remove unverified pastes/ claim
- configuration.md: Add 5 missing config sections (stt, human_delay,
code_execution, delegation, clarify), fix display defaults,
fix reasoning_effort default to empty/unset
- messaging/index.md: Add GATEWAY_ALLOWED_USERS to security section
- skills.md: Add category field to skills_list return value
- mcp.md: Document auto-registered utility tools (resources/prompts)
- architecture.md: Fix file_tools.py reference, base_url default to None,
synchronous agent loop pseudocode
- cli-commands.md: Fix hermes logout description
- environment-variables.md: Add HERMES_QUIET, HERMES_EXEC_ASK,
BROWSER_INACTIVITY_TIMEOUT, GATEWAY_ALLOWED_USERS
Verification scan: 27/27 checks passed, zero issues remaining.
2026-03-05 07:00:51 -08:00
## Delegation
Configure subagent behavior for the delegate tool:
```yaml
delegation:
max_iterations: 50 # Max iterations per subagent
default_toolsets: # Toolsets available to subagents
- terminal
- file
- web
```
## Clarify
Configure the clarification prompt behavior:
```yaml
clarify:
timeout: 120 # Seconds to wait for user clarification response
```
2026-03-05 05:24:55 -08:00
## Context Files (SOUL.md, AGENTS.md)
Drop these files in your project directory and the agent automatically picks them up:
| File | Purpose |
|------|---------|
| `AGENTS.md` | Project-specific instructions, coding conventions |
| `SOUL.md` | Persona definition — the agent embodies this personality |
| `.cursorrules` | Cursor IDE rules (also detected) |
| `.cursor/rules/*.mdc` | Cursor rule files (also detected) |
- **AGENTS.md** is hierarchical: if subdirectories also have AGENTS.md, all are combined.
- **SOUL.md** checks cwd first, then `~/.hermes/SOUL.md` as a global fallback.
- All context files are capped at 20,000 characters with smart truncation.
## Working Directory
| Context | Default |
|---------|---------|
| **CLI (`hermes`) ** | Current directory where you run the command |
| **Messaging gateway ** | Home directory `~` (override with `MESSAGING_CWD` ) |
| **Docker / Singularity / Modal / SSH ** | User's home directory inside the container or remote machine |
Override the working directory:
```bash
# In ~/.hermes/.env or ~/.hermes/config.yaml:
MESSAGING_CWD=/home/myuser/projects # Gateway sessions
TERMINAL_CWD=/workspace # All terminal sessions
```