Files

Teknium 5e5c92663d fix: hermes update causes dual gateways on macOS (launchd) (#1567 )

* feat: add optional smart model routing

Add a conservative cheap-vs-strong routing option that can send very short/simple turns to a cheaper model across providers while keeping the primary model for complex work. Wire it through CLI, gateway, and cron, and document the config.yaml workflow.

* fix(gateway): remove recursive ExecStop from systemd units, extend TimeoutStopSec to 60s

* fix(gateway): avoid recursive ExecStop in user systemd unit

* fix: extend ExecStop removal and TimeoutStopSec=60 to system unit

The cherry-picked PR #1448 fix only covered the user systemd unit.
The system unit had the same TimeoutStopSec=15 and could benefit
from the same 60s timeout for clean shutdown. Also adds a regression
test for the system unit.

---------

Co-authored-by: Ninja <ninja@local>

* feat(skills): add blender-mcp optional skill for 3D modeling

Control a running Blender instance from Hermes via socket connection
to the blender-mcp addon (port 9876). Supports creating 3D objects,
materials, animations, and running arbitrary bpy code.

Placed in optional-skills/ since it requires Blender 4.3+ desktop
with a third-party addon manually started each session.

* feat(acp): support slash commands in ACP adapter (#1532)

Adds /help, /model, /tools, /context, /reset, /compact, /version
to the ACP adapter (VS Code, Zed, JetBrains). Commands are handled
directly in the server without instantiating the TUI — each command
queries agent/session state and returns plain text.

Unrecognized /commands fall through to the LLM as normal messages.

/model uses detect_provider_for_model() for auto-detection when
switching models, matching the CLI and gateway behavior.

Fixes #1402

* fix(logging): improve error logging in session search tool (#1533)

* fix(gateway): restart on retryable startup failures (#1517)

* feat(email): add skip_attachments option via config.yaml

* feat(email): add skip_attachments option via config.yaml

Adds a config.yaml-driven option to skip email attachments in the
gateway email adapter. Useful for malware protection and bandwidth
savings.

Configure in config.yaml:
  platforms:
    email:
      skip_attachments: true

Based on PR #1521 by @an420eth, changed from env var to config.yaml
(via PlatformConfig.extra) to match the project's config-first pattern.

* docs: document skip_attachments option for email adapter

* fix(telegram): retry on transient TLS failures during connect and send

Add exponential-backoff retry (3 attempts) around initialize() to
handle transient TLS resets during gateway startup. Also catches
TimedOut and OSError in addition to NetworkError.

Add exponential-backoff retry (3 attempts) around send_message() for
NetworkError during message delivery, wrapping the existing Markdown
fallback logic.

Both imports are guarded with try/except ImportError for test
environments where telegram is mocked.

Based on PR #1527 by cmd8. Closes #1526.

* feat: permissive block_anchor thresholds and unicode normalization (#1539)

Salvaged from PR #1528 by an420eth. Closes #517.

Improves _strategy_block_anchor in fuzzy_match.py:
- Add unicode normalization (smart quotes, em/en-dashes, ellipsis,
  non-breaking spaces → ASCII) so LLM-produced unicode artifacts
  don't break anchor line matching
- Lower thresholds: 0.10 for unique matches (was 0.70), 0.30 for
  multiple candidates — if first/last lines match exactly, the
  block is almost certainly correct
- Use original (non-normalized) content for offset calculation to
  preserve correct character positions

Tested: 3 new scenarios fixed (em-dash anchors, non-breaking space
anchors, very-low-similarity unique matches), zero regressions on
all 9 existing fuzzy match tests.

Co-authored-by: an420eth <an420eth@users.noreply.github.com>

* feat(cli): add file path autocomplete in the input prompt (#1545)

When typing a path-like token (./  ../  ~/  /  or containing /),
the CLI now shows filesystem completions in the dropdown menu.
Directories show a trailing slash and 'dir' label; files show
their size. Completions are case-insensitive and capped at 30
entries.

Triggered by tokens like:
  edit ./src/ma     → shows ./src/main.py, ./src/manifest.json, ...
  check ~/doc       → shows ~/docs/, ~/documents/, ...
  read /etc/hos     → shows /etc/hosts, /etc/hostname, ...
  open tools/reg    → shows tools/registry.py

Slash command autocomplete (/help, /model, etc.) is unaffected —
it still triggers when the input starts with /.

Inspired by OpenCode PR #145 (file path completion menu).

Implementation:
- hermes_cli/commands.py: _extract_path_word() detects path-like
  tokens, _path_completions() yields filesystem Completions with
  size labels, get_completions() routes to paths vs slash commands
- tests/hermes_cli/test_path_completion.py: 26 tests covering
  path extraction, prefix filtering, directory markers, home
  expansion, case-insensitivity, integration with slash commands

* feat(privacy): redact PII from LLM context when privacy.redact_pii is enabled

Add privacy.redact_pii config option (boolean, default false). When
enabled, the gateway redacts personally identifiable information from
the system prompt before sending it to the LLM provider:

- Phone numbers (user IDs on WhatsApp/Signal) → hashed to user_<sha256>
- User IDs → hashed to user_<sha256>
- Chat IDs → numeric portion hashed, platform prefix preserved
- Home channel IDs → hashed
- Names/usernames → NOT affected (user-chosen, publicly visible)

Hashes are deterministic (same user → same hash) so the model can
still distinguish users in group chats. Routing and delivery use
the original values internally — redaction only affects LLM context.

Inspired by OpenClaw PR #47959.

* fix(privacy): skip PII redaction on Discord/Slack (mentions need real IDs)

Discord uses <@user_id> for mentions and Slack uses <@U12345> — the LLM
needs the real ID to tag users. Redaction now only applies to WhatsApp,
Signal, and Telegram where IDs are pure routing metadata.

Add 4 platform-specific tests covering Discord, WhatsApp, Signal, Slack.

* feat: smart approvals + /stop command (inspired by OpenAI Codex)

* feat: smart approvals — LLM-based risk assessment for dangerous commands

Adds a 'smart' approval mode that uses the auxiliary LLM to assess
whether a flagged command is genuinely dangerous or a false positive,
auto-approving low-risk commands without prompting the user.

Inspired by OpenAI Codex's Smart Approvals guardian subagent
(openai/codex#13860).

Config (config.yaml):
  approvals:
    mode: manual   # manual (default), smart, off

Modes:
- manual — current behavior, always prompt the user
- smart  — aux LLM evaluates risk: APPROVE (auto-allow), DENY (block),
           or ESCALATE (fall through to manual prompt)
- off    — skip all approval prompts (equivalent to --yolo)

When smart mode auto-approves, the pattern gets session-level approval
so subsequent uses of the same pattern don't trigger another LLM call.
When it denies, the command is blocked without user prompt. When
uncertain, it escalates to the normal manual approval flow.

The LLM prompt is carefully scoped: it sees only the command text and
the flagged reason, assesses actual risk vs false positive, and returns
a single-word verdict.

* feat: make smart approval model configurable via config.yaml

Adds auxiliary.approval section to config.yaml with the same
provider/model/base_url/api_key pattern as other aux tasks (vision,
web_extract, compression, etc.).

Config:
  auxiliary:
    approval:
      provider: auto
      model: ''        # fast/cheap model recommended
      base_url: ''
      api_key: ''

Bridged to env vars in both CLI and gateway paths so the aux client
picks them up automatically.

* feat: add /stop command to kill all background processes

Adds a /stop slash command that kills all running background processes
at once. Currently users have to process(list) then process(kill) for
each one individually.

Inspired by OpenAI Codex's separation of interrupt (Ctrl+C stops current
turn) from /stop (cleans up background processes). See openai/codex#14602.

Ctrl+C continues to only interrupt the active agent turn — background
dev servers, watchers, etc. are preserved. /stop is the explicit way
to clean them all up.

* feat: first-class plugin architecture + hide status bar cost by default (#1544)

The persistent status bar now shows context %, token counts, and
duration but NOT $ cost by default. Cost display is opt-in via:

  display:
    show_cost: true

in config.yaml, or: hermes config set display.show_cost true

The /usage command still shows full cost breakdown since the user
explicitly asked for it — this only affects the always-visible bar.

Status bar without cost:
  ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ 15m

Status bar with show_cost: true:
  ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ $0.06 │ 15m

* feat: improve memory prioritization + aggressive skill updates (inspired by OpenAI Codex)

* feat: improve memory prioritization — user preferences over procedural knowledge

Inspired by OpenAI Codex's memory prompt improvements (openai/codex#14493)
which focus memory writes on user preferences and recurring patterns
rather than procedural task details.

Key insight: 'Optimize for reducing future user steering — the most
valuable memory prevents the user from having to repeat themselves.'

Changes:
- MEMORY_GUIDANCE (prompt_builder.py): added prioritization hierarchy
  and the core principle about reducing user steering
- MEMORY_SCHEMA (memory_tool.py): reordered WHEN TO SAVE list to put
  corrections first, added explicit PRIORITY guidance
- Memory nudge (run_agent.py): now asks specifically about preferences,
  corrections, and workflow patterns instead of generic 'anything'
- Memory flush (run_agent.py): now instructs to prioritize user
  preferences and corrections over task-specific details

* feat: more aggressive skill creation and update prompting

Press harder on skill updates — the agent should proactively patch
skills when it encounters issues during use, not wait to be asked.

Changes:
- SKILLS_GUIDANCE: 'consider saving' → 'save'; added explicit instruction
  to patch skills immediately when found outdated/wrong
- Skills header: added instruction to update loaded skills before finishing
  if they had missing steps or wrong commands
- Skill nudge: more assertive ('save the approach' not 'consider saving'),
  now also prompts for updating existing skills used in the task
- Skill nudge interval: lowered default from 15 to 10 iterations
- skill_manage schema: added 'patch it immediately' to update triggers

* feat: first-class plugin architecture (#1555)

Plugin system for extending Hermes with custom tools, hooks, and
integrations — no source code changes required.

Core system (hermes_cli/plugins.py):
  - Plugin discovery from ~/.hermes/plugins/, .hermes/plugins/, and
    pip entry_points (hermes_agent.plugins group)
  - PluginContext with register_tool() and register_hook()
  - 6 lifecycle hooks: pre/post tool_call, pre/post llm_call,
    on_session_start/end
  - Namespace package handling for relative imports in plugins
  - Graceful error isolation — broken plugins never crash the agent

Integration (model_tools.py):
  - Plugin discovery runs after built-in + MCP tools
  - Plugin tools bypass toolset filter via get_plugin_tool_names()
  - Pre/post tool call hooks fire in handle_function_call()

CLI:
  - /plugins command shows loaded plugins, tool counts, status
  - Added to COMMANDS dict for autocomplete

Docs:
  - Getting started guide (build-a-hermes-plugin.md) — full tutorial
    building a calculator plugin step by step
  - Reference page (features/plugins.md) — quick overview + tables
  - Covers: file structure, schemas, handlers, hooks, data files,
    bundled skills, env var gating, pip distribution, common mistakes

Tests: 16 tests covering discovery, loading, hooks, tool visibility.

* fix: hermes update causes dual gateways on macOS (launchd)

Three bugs worked together to create the dual-gateway problem:

1. cmd_update only checked systemd for gateway restart, completely
   ignoring launchd on macOS. After killing the PID it would print
   'Restart it with: hermes gateway run' even when launchd was about
   to auto-respawn the process.

2. launchd's KeepAlive.SuccessfulExit=false respawns the gateway
   after SIGTERM (non-zero exit), so the user's manual restart
   created a second instance.

3. The launchd plist lacked --replace (systemd had it), so the
   respawned gateway didn't kill stale instances on startup.

Fixes:
- Add --replace to launchd ProgramArguments (matches systemd)
- Add launchd detection to cmd_update's auto-restart logic
- Print 'auto-restart via launchd' instead of manual restart hint

* fix: add launchd plist auto-refresh + explicit restart in cmd_update

Two integration issues with the initial fix:

1. Existing macOS users with old plist (no --replace) would never
   get the fix until manual uninstall/reinstall. Added
   refresh_launchd_plist_if_needed() — mirrors the existing
   refresh_systemd_unit_if_needed(). Called from launchd_start(),
   launchd_restart(), and cmd_update.

2. cmd_update relied on KeepAlive respawn after SIGTERM rather than
   explicit launchctl stop/start. This caused races: launchd would
   respawn the old process before the PID file was cleaned up.
   Now does explicit stop+start (matching how systemd gets an
   explicit systemctl restart), with plist refresh first so the
   new --replace flag is picked up.

---------

Co-authored-by: Ninja <ninja@local>
Co-authored-by: alireza78a <alireza78a@users.noreply.github.com>
Co-authored-by: Oktay Aydin <113846926+aydnOktay@users.noreply.github.com>
Co-authored-by: JP Lew <polydegen@protonmail.com>
Co-authored-by: an420eth <an420eth@users.noreply.github.com>

2026-03-16 12:36:29 -07:00

43 KiB

Raw Blame History

sidebar_position, title, description

sidebar_position	title	description
2	Configuration	Configure Hermes Agent — config.yaml, providers, models, API keys, and more

Configuration

All settings are stored in the ~/.hermes/ directory for easy access.

Directory Structure

~/.hermes/
├── config.yaml     # Settings (model, terminal, TTS, compression, etc.)
├── .env            # API keys and secrets
├── auth.json       # OAuth provider credentials (Nous Portal, etc.)
├── SOUL.md         # Optional: global persona (agent embodies this personality)
├── memories/       # Persistent memory (MEMORY.md, USER.md)
├── skills/         # Agent-created skills (managed via skill_manage tool)
├── cron/           # Scheduled jobs
├── sessions/       # Gateway sessions
└── logs/           # Logs (errors.log, gateway.log — secrets auto-redacted)

Managing Configuration

hermes config              # View current configuration
hermes config edit         # Open config.yaml in your editor
hermes config set KEY VAL  # Set a specific value
hermes config check        # Check for missing options (after updates)
hermes config migrate      # Interactively add missing options

# Examples:
hermes config set model anthropic/claude-opus-4
hermes config set terminal.backend docker
hermes config set OPENROUTER_API_KEY sk-or-...  # Saves to .env

:::tip The hermes config set command automatically routes values to the right file — API keys are saved to .env, everything else to config.yaml. :::

Configuration Precedence

Settings are resolved in this order (highest priority first):

CLI arguments — e.g., hermes chat --model anthropic/claude-sonnet-4 (per-invocation override)
~/.hermes/config.yaml — the primary config file for all non-secret settings
~/.hermes/.env — fallback for env vars; required for secrets (API keys, tokens, passwords)
Built-in defaults — hardcoded safe defaults when nothing else is set

:::info Rule of Thumb Secrets (API keys, bot tokens, passwords) go in .env. Everything else (model, terminal backend, compression settings, memory limits, toolsets) goes in config.yaml. When both are set, config.yaml wins for non-secret settings. :::

Inference Providers

You need at least one way to connect to an LLM. Use hermes model to switch providers and models interactively, or configure directly:

Provider	Setup
Nous Portal	`hermes model` (OAuth, subscription-based)
OpenAI Codex	`hermes model` (ChatGPT OAuth, uses Codex models)
Anthropic	`hermes model` (Claude Pro/Max via Claude Code auth, Anthropic API key, or manual setup-token)
OpenRouter	`OPENROUTER_API_KEY` in `~/.hermes/.env`
z.ai / GLM	`GLM_API_KEY` in `~/.hermes/.env` (provider: `zai`)
Kimi / Moonshot	`KIMI_API_KEY` in `~/.hermes/.env` (provider: `kimi-coding`)
MiniMax	`MINIMAX_API_KEY` in `~/.hermes/.env` (provider: `minimax`)
MiniMax China	`MINIMAX_CN_API_KEY` in `~/.hermes/.env` (provider: `minimax-cn`)
Custom Endpoint	`hermes model` (saved in `config.yaml`) or `OPENAI_BASE_URL` + `OPENAI_API_KEY` in `~/.hermes/.env`

:::info Codex Note The OpenAI Codex provider authenticates via device code (open a URL, enter a code). Hermes stores the resulting credentials in its own auth store under ~/.hermes/auth.json and can import existing Codex CLI credentials from ~/.codex/auth.json when present. No Codex CLI installation is required. :::

:::warning Even when using Nous Portal, Codex, or a custom endpoint, some tools (vision, web summarization, MoA) use a separate "auxiliary" model — by default Gemini Flash via OpenRouter. An OPENROUTER_API_KEY enables these tools automatically. You can also configure which model and provider these tools use — see Auxiliary Models below. :::

Anthropic (Native)

Use Claude models directly through the Anthropic API — no OpenRouter proxy needed. Supports three auth methods:

# With an API key (pay-per-token)
export ANTHROPIC_API_KEY=***
hermes chat --provider anthropic --model claude-sonnet-4-6

# Preferred: authenticate through `hermes model`
# Hermes will use Claude Code's credential store directly when available
hermes model

# Manual override with a setup-token (fallback / legacy)
export ANTHROPIC_TOKEN=***  # setup-token or manual OAuth token
hermes chat --provider anthropic

# Auto-detect Claude Code credentials (if you already use Claude Code)
hermes chat --provider anthropic  # reads Claude Code credential files automatically

When you choose Anthropic OAuth through hermes model, Hermes prefers Claude Code's own credential store over copying the token into ~/.hermes/.env. That keeps refreshable Claude credentials refreshable.

Or set it permanently:

model:
  provider: "anthropic"
  default: "claude-sonnet-4-6"

:::tip Aliases --provider claude and --provider claude-code also work as shorthand for --provider anthropic. :::

First-Class Chinese AI Providers

These providers have built-in support with dedicated provider IDs. Set the API key and use --provider to select:

# z.ai / ZhipuAI GLM
hermes chat --provider zai --model glm-4-plus
# Requires: GLM_API_KEY in ~/.hermes/.env

# Kimi / Moonshot AI
hermes chat --provider kimi-coding --model moonshot-v1-auto
# Requires: KIMI_API_KEY in ~/.hermes/.env

# MiniMax (global endpoint)
hermes chat --provider minimax --model MiniMax-Text-01
# Requires: MINIMAX_API_KEY in ~/.hermes/.env

# MiniMax (China endpoint)
hermes chat --provider minimax-cn --model MiniMax-Text-01
# Requires: MINIMAX_CN_API_KEY in ~/.hermes/.env

Or set the provider permanently in config.yaml:

model:
  provider: "zai"       # or: kimi-coding, minimax, minimax-cn
  default: "glm-4-plus"

Base URLs can be overridden with GLM_BASE_URL, KIMI_BASE_URL, MINIMAX_BASE_URL, or MINIMAX_CN_BASE_URL environment variables.

Custom & Self-Hosted LLM Providers

Hermes Agent works with any OpenAI-compatible API endpoint. If a server implements /v1/chat/completions, you can point Hermes at it. This means you can use local models, GPU inference servers, multi-provider routers, or any third-party API.

General Setup

Two ways to configure a custom endpoint:

Interactive (recommended):

hermes model
# Select "Custom endpoint (self-hosted / VLLM / etc.)"
# Enter: API base URL, API key, Model name

Manual (.env file):

# Add to ~/.hermes/.env
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=***
LLM_MODEL=your-model-name

hermes model and the manual .env approach end up in the same runtime path. If you save a custom endpoint through hermes model, Hermes persists the provider + base URL in config.yaml so later sessions keep using that endpoint even if OPENAI_BASE_URL is not exported in your current shell.

Everything below follows this same pattern — just change the URL, key, and model name.

Ollama — Local Models, Zero Config

Ollama runs open-weight models locally with one command. Best for: quick local experimentation, privacy-sensitive work, offline use.

# Install and run a model
ollama pull llama3.1:70b
ollama serve   # Starts on port 11434

# Configure Hermes
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=ollama           # Any non-empty string
LLM_MODEL=llama3.1:70b

Ollama's OpenAI-compatible endpoint supports chat completions, streaming, and tool calling (for supported models). No GPU required for smaller models — Ollama handles CPU inference automatically.

:::tip List available models with ollama list. Pull any model from the Ollama library with ollama pull <model>. :::

vLLM — High-Performance GPU Inference

vLLM is the standard for production LLM serving. Best for: maximum throughput on GPU hardware, serving large models, continuous batching.

# Start vLLM server
pip install vllm
vllm serve meta-llama/Llama-3.1-70B-Instruct \
  --port 8000 \
  --tensor-parallel-size 2    # Multi-GPU

# Configure Hermes
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=dummy
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct

vLLM supports tool calling, structured output, and multi-modal models. Use --enable-auto-tool-choice and --tool-call-parser hermes for Hermes-format tool calling with NousResearch models.

SGLang — Fast Serving with RadixAttention

SGLang is an alternative to vLLM with RadixAttention for KV cache reuse. Best for: multi-turn conversations (prefix caching), constrained decoding, structured output.

# Start SGLang server
pip install sglang[all]
python -m sglang.launch_server \
  --model meta-llama/Llama-3.1-70B-Instruct \
  --port 8000 \
  --tp 2

# Configure Hermes
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=dummy
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct

llama.cpp / llama-server — CPU & Metal Inference

llama.cpp runs quantized models on CPU, Apple Silicon (Metal), and consumer GPUs. Best for: running models without a datacenter GPU, Mac users, edge deployment.

# Build and start llama-server
cmake -B build && cmake --build build --config Release
./build/bin/llama-server \
  -m models/llama-3.1-8b-instruct-Q4_K_M.gguf \
  --port 8080 --host 0.0.0.0

# Configure Hermes
OPENAI_BASE_URL=http://localhost:8080/v1
OPENAI_API_KEY=dummy
LLM_MODEL=llama-3.1-8b-instruct

:::tip Download GGUF models from Hugging Face. Q4_K_M quantization offers the best balance of quality vs. memory usage. :::

LiteLLM Proxy — Multi-Provider Gateway

LiteLLM is an OpenAI-compatible proxy that unifies 100+ LLM providers behind a single API. Best for: switching between providers without config changes, load balancing, fallback chains, budget controls.

# Install and start
pip install litellm[proxy]
litellm --model anthropic/claude-sonnet-4 --port 4000

# Or with a config file for multiple models:
litellm --config litellm_config.yaml --port 4000

# Configure Hermes
OPENAI_BASE_URL=http://localhost:4000/v1
OPENAI_API_KEY=sk-your-litellm-key
LLM_MODEL=anthropic/claude-sonnet-4

Example litellm_config.yaml with fallback:

model_list:
  - model_name: "best"
    litellm_params:
      model: anthropic/claude-sonnet-4
      api_key: sk-ant-...
  - model_name: "best"
    litellm_params:
      model: openai/gpt-4o
      api_key: sk-...
router_settings:
  routing_strategy: "latency-based-routing"

ClawRouter — Cost-Optimized Routing

ClawRouter by BlockRunAI is a local routing proxy that auto-selects models based on query complexity. It classifies requests across 14 dimensions and routes to the cheapest model that can handle the task. Payment is via USDC cryptocurrency (no API keys).

# Install and start
npx @blockrun/clawrouter    # Starts on port 8402

# Configure Hermes
OPENAI_BASE_URL=http://localhost:8402/v1
OPENAI_API_KEY=dummy
LLM_MODEL=blockrun/auto     # or: blockrun/eco, blockrun/premium, blockrun/agentic

Routing profiles:

Profile	Strategy	Savings
`blockrun/auto`	Balanced quality/cost	74-100%
`blockrun/eco`	Cheapest possible	95-100%
`blockrun/premium`	Best quality models	0%
`blockrun/free`	Free models only	100%
`blockrun/agentic`	Optimized for tool use	varies

:::note ClawRouter requires a USDC-funded wallet on Base or Solana for payment. All requests route through BlockRun's backend API. Run npx @blockrun/clawrouter doctor to check wallet status. :::

Other Compatible Providers

Any service with an OpenAI-compatible API works. Some popular options:

Provider	Base URL	Notes
Together AI	`https://api.together.xyz/v1`	Cloud-hosted open models
Groq	`https://api.groq.com/openai/v1`	Ultra-fast inference
DeepSeek	`https://api.deepseek.com/v1`	DeepSeek models
Fireworks AI	`https://api.fireworks.ai/inference/v1`	Fast open model hosting
Cerebras	`https://api.cerebras.ai/v1`	Wafer-scale chip inference
Mistral AI	`https://api.mistral.ai/v1`	Mistral models
OpenAI	`https://api.openai.com/v1`	Direct OpenAI access
Azure OpenAI	`https://YOUR.openai.azure.com/`	Enterprise OpenAI
LocalAI	`http://localhost:8080/v1`	Self-hosted, multi-model
Jan	`http://localhost:1337/v1`	Desktop app with local models

# Example: Together AI
OPENAI_BASE_URL=https://api.together.xyz/v1
OPENAI_API_KEY=your-together-key
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct-Turbo

Choosing the Right Setup

Use Case	Recommended
Just want it to work	OpenRouter (default) or Nous Portal
Local models, easy setup	Ollama
Production GPU serving	vLLM or SGLang
Mac / no GPU	Ollama or llama.cpp
Multi-provider routing	LiteLLM Proxy or OpenRouter
Cost optimization	ClawRouter or OpenRouter with `sort: "price"`
Maximum privacy	Ollama, vLLM, or llama.cpp (fully local)
Enterprise / Azure	Azure OpenAI with custom endpoint
Chinese AI models	z.ai (GLM), Kimi/Moonshot, or MiniMax (first-class providers)

:::tip You can switch between providers at any time with hermes model — no restart required. Your conversation history, memory, and skills carry over regardless of which provider you use. :::

Optional API Keys

Feature	Provider	Env Variable
Web scraping	Firecrawl	`FIRECRAWL_API_KEY`
Browser automation	Browserbase	`BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID`
Image generation	FAL	`FAL_KEY`
Premium TTS voices	ElevenLabs	`ELEVENLABS_API_KEY`
OpenAI TTS + voice transcription	OpenAI	`VOICE_TOOLS_OPENAI_KEY`
RL Training	Tinker + WandB	`TINKER_API_KEY`, `WANDB_API_KEY`
Cross-session user modeling	Honcho	`HONCHO_API_KEY`

Self-Hosting Firecrawl

By default, Hermes uses the Firecrawl cloud API for web search and scraping. If you prefer to run Firecrawl locally, you can point Hermes at a self-hosted instance instead.

What you get: No API key required, no rate limits, no per-page costs, full data sovereignty.

What you lose: The cloud version uses Firecrawl's proprietary "Fire-engine" for advanced anti-bot bypassing (Cloudflare, CAPTCHAs, IP rotation). Self-hosted uses basic fetch + Playwright, so some protected sites may fail. Search uses DuckDuckGo instead of Google.

Setup:

Clone and start the Firecrawl Docker stack (5 containers: API, Playwright, Redis, RabbitMQ, PostgreSQL — requires ~4-8 GB RAM):

git clone https://github.com/mendableai/firecrawl
cd firecrawl
# In .env, set: USE_DB_AUTHENTICATION=false
docker compose up -d

Point Hermes at your instance (no API key needed):

hermes config set FIRECRAWL_API_URL http://localhost:3002

You can also set both FIRECRAWL_API_KEY and FIRECRAWL_API_URL if your self-hosted instance has authentication enabled.

OpenRouter Provider Routing

When using OpenRouter, you can control how requests are routed across providers. Add a provider_routing section to ~/.hermes/config.yaml:

provider_routing:
  sort: "throughput"          # "price" (default), "throughput", or "latency"
  # only: ["anthropic"]      # Only use these providers
  # ignore: ["deepinfra"]    # Skip these providers
  # order: ["anthropic", "google"]  # Try providers in this order
  # require_parameters: true  # Only use providers that support all request params
  # data_collection: "deny"   # Exclude providers that may store/train on data

Shortcuts: Append :nitro to any model name for throughput sorting (e.g., anthropic/claude-sonnet-4:nitro), or :floor for price sorting.

Fallback Model

Configure a backup provider:model that Hermes switches to automatically when your primary model fails (rate limits, server errors, auth failures):

fallback_model:
  provider: openrouter                    # required
  model: anthropic/claude-sonnet-4        # required
  # base_url: http://localhost:8000/v1    # optional, for custom endpoints
  # api_key_env: MY_CUSTOM_KEY           # optional, env var name for custom endpoint API key

When activated, the fallback swaps the model and provider mid-session without losing your conversation. It fires at most once per session.

Supported providers: openrouter, nous, openai-codex, anthropic, zai, kimi-coding, minimax, minimax-cn, custom.

:::tip Fallback is configured exclusively through config.yaml — there are no environment variables for it. For full details on when it triggers, supported providers, and how it interacts with auxiliary tasks and delegation, see Fallback Providers. :::

Smart Model Routing

Optional cheap-vs-strong routing lets Hermes keep your main model for complex work while sending very short/simple turns to a cheaper model.

smart_model_routing:
  enabled: true
  max_simple_chars: 160
  max_simple_words: 28
  cheap_model:
    provider: openrouter
    model: google/gemini-2.5-flash
    # base_url: http://localhost:8000/v1  # optional custom endpoint
    # api_key_env: MY_CUSTOM_KEY          # optional env var name for that endpoint's API key

How it works:

If a turn is short, single-line, and does not look code/tool/debug heavy, Hermes may route it to cheap_model
If the turn looks complex, Hermes stays on your primary model/provider
If the cheap route cannot be resolved cleanly, Hermes falls back to the primary model automatically

This is intentionally conservative. It is meant for quick, low-stakes turns like:

short factual questions
quick rewrites
lightweight summaries

It will avoid routing prompts that look like:

coding/debugging work
tool-heavy requests
long or multi-line analysis asks

Use this when you want lower latency or cost without fully changing your default model.

Terminal Backend Configuration

Configure which environment the agent uses for terminal commands:

terminal:
  backend: local    # or: docker, ssh, singularity, modal, daytona
  cwd: "."          # Working directory ("." = current dir)
  timeout: 180      # Command timeout in seconds

  # Docker-specific settings
  docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
  docker_mount_cwd_to_workspace: false  # SECURITY: off by default. Opt in to mount the launch cwd into /workspace.
  docker_volumes:                    # Additional explicit host mounts
    - "/home/user/projects:/workspace/projects"
    - "/home/user/data:/data:ro"     # :ro for read-only

  # Container resource limits (docker, singularity, modal, daytona)
  container_cpu: 1                   # CPU cores
  container_memory: 5120             # MB (default 5GB)
  container_disk: 51200              # MB (default 50GB)
  container_persistent: true         # Persist filesystem across sessions

  # Persistent shell — keep a long-lived bash process across commands
  persistent_shell: true             # Enabled by default for SSH backend

Common Terminal Backend Issues

If terminal commands fail immediately or the terminal tool is reported as disabled, check the following:

Local backend
- No special requirements. This is the safest default when you are just getting started.
Docker backend
- Ensure Docker Desktop (or the Docker daemon) is installed and running.
- Hermes needs to be able to find the docker CLI. It checks your $PATH first and also probes common Docker Desktop install locations on macOS. Run:
```
docker version
```
  If this fails, fix your Docker installation or switch back to the local backend:
```
hermes config set terminal.backend local
```
SSH backend
- Both TERMINAL_SSH_HOST and TERMINAL_SSH_USER must be set, for example:
```
export TERMINAL_ENV=ssh
export TERMINAL_SSH_HOST=my-server.example.com
export TERMINAL_SSH_USER=ubuntu
```
- If either value is missing, Hermes will log a clear error and refuse to use the SSH backend.
Modal backend
- You need either a MODAL_TOKEN_ID environment variable or a ~/.modal.toml config file.
- If neither is present, the backend check fails and Hermes will report that the Modal backend is not available.

When in doubt, set terminal.backend back to local and verify that commands run there first.

Docker Volume Mounts

When using the Docker backend, docker_volumes lets you share host directories with the container. Each entry uses standard Docker -v syntax: host_path:container_path[:options].

terminal:
  backend: docker
  docker_volumes:
    - "/home/user/projects:/workspace/projects"   # Read-write (default)
    - "/home/user/datasets:/data:ro"              # Read-only
    - "/home/user/outputs:/outputs"               # Agent writes, you read

This is useful for:

Providing files to the agent (datasets, configs, reference code)
Receiving files from the agent (generated code, reports, exports)
Shared workspaces where both you and the agent access the same files

Can also be set via environment variable: TERMINAL_DOCKER_VOLUMES='["/host:/container"]' (JSON array).

Optional: Mount the Launch Directory into `/workspace`

Docker sandboxes stay isolated by default. Hermes does not pass your current host working directory into the container unless you explicitly opt in.

Enable it in config.yaml:

terminal:
  backend: docker
  docker_mount_cwd_to_workspace: true

When enabled:

if you launch Hermes from ~/projects/my-app, that host directory is bind-mounted to /workspace
the Docker backend starts in /workspace
file tools and terminal commands both see the same mounted project

When disabled, /workspace stays sandbox-owned unless you explicitly mount something via docker_volumes.

Security tradeoff:

false preserves the sandbox boundary
true gives the sandbox direct access to the directory you launched Hermes from

Use the opt-in only when you intentionally want the container to work on live host files.

Persistent Shell

By default, each terminal command runs in its own subprocess — working directory, environment variables, and shell variables reset between commands. When persistent shell is enabled, a single long-lived bash process is kept alive across execute() calls so that state survives between commands.

This is most useful for the SSH backend, where it also eliminates per-command connection overhead. Persistent shell is enabled by default for SSH and disabled for the local backend.

terminal:
  persistent_shell: true   # default — enables persistent shell for SSH

To disable:

hermes config set terminal.persistent_shell false

What persists across commands:

Working directory (cd /tmp sticks for the next command)
Exported environment variables (export FOO=bar)
Shell variables (MY_VAR=hello)

Precedence:

Level	Variable	Default
Config	`terminal.persistent_shell`	`true`
SSH override	`TERMINAL_SSH_PERSISTENT`	follows config
Local override	`TERMINAL_LOCAL_PERSISTENT`	`false`

Per-backend environment variables take highest precedence. If you want persistent shell on the local backend too:

export TERMINAL_LOCAL_PERSISTENT=true

:::note Commands that require stdin_data or sudo automatically fall back to one-shot mode, since the persistent shell's stdin is already occupied by the IPC protocol. :::

See Code Execution and the Terminal section of the README for details on each backend.

Memory Configuration

memory:
  memory_enabled: true
  user_profile_enabled: true
  memory_char_limit: 2200   # ~800 tokens
  user_char_limit: 1375     # ~500 tokens

Git Worktree Isolation

Enable isolated git worktrees for running multiple agents in parallel on the same repo:

worktree: true    # Always create a worktree (same as hermes -w)
# worktree: false # Default — only when -w flag is passed

When enabled, each CLI session creates a fresh worktree under .worktrees/ with its own branch. Agents can edit files, commit, push, and create PRs without interfering with each other. Clean worktrees are removed on exit; dirty ones are kept for manual recovery.

You can also list gitignored files to copy into worktrees via .worktreeinclude in your repo root:

# .worktreeinclude
.env
.venv/
node_modules/

Context Compression

compression:
  enabled: true
  threshold: 0.50              # Compress at 50% of context limit by default
  summary_model: "google/gemini-3-flash-preview"   # Model for summarization
  # summary_provider: "auto"   # "auto", "openrouter", "nous", "main"

The summary_model must support a context length at least as large as your main model's, since it receives the full middle section of the conversation for compression.

Iteration Budget Pressure

When the agent is working on a complex task with many tool calls, it can burn through its iteration budget (default: 90 turns) without realizing it's running low. Budget pressure automatically warns the model as it approaches the limit:

Threshold	Level	What the model sees
70%	Caution	`[BUDGET: 63/90. 27 iterations left. Start consolidating.]`
90%	Warning	`[BUDGET WARNING: 81/90. Only 9 left. Respond NOW.]`

Warnings are injected into the last tool result's JSON (as a _budget_warning field) rather than as separate messages — this preserves prompt caching and doesn't disrupt the conversation structure.

agent:
  max_turns: 90                # Max iterations per conversation turn (default: 90)

Budget pressure is enabled by default. The agent sees warnings naturally as part of tool results, encouraging it to consolidate its work and deliver a response before running out of iterations.

Auxiliary Models

Hermes uses lightweight "auxiliary" models for side tasks like image analysis, web page summarization, and browser screenshot analysis. By default, these use Gemini Flash via OpenRouter or Nous Portal — you don't need to configure anything.

To use a different model, add an auxiliary section to ~/.hermes/config.yaml:

auxiliary:
  # Image analysis (vision_analyze tool + browser screenshots)
  vision:
    provider: "auto"           # "auto", "openrouter", "nous", "main"
    model: ""                  # e.g. "openai/gpt-4o", "google/gemini-2.5-flash"
    base_url: ""               # direct OpenAI-compatible endpoint (takes precedence over provider)
    api_key: ""                # API key for base_url (falls back to OPENAI_API_KEY)

  # Web page summarization + browser page text extraction
  web_extract:
    provider: "auto"
    model: ""                  # e.g. "google/gemini-2.5-flash"
    base_url: ""
    api_key: ""

Changing the Vision Model

To use GPT-4o instead of Gemini Flash for image analysis:

auxiliary:
  vision:
    model: "openai/gpt-4o"

Or via environment variable (in ~/.hermes/.env):

AUXILIARY_VISION_MODEL=openai/gpt-4o

Provider Options

Provider	Description	Requirements
`"auto"`	Best available (default). Vision tries OpenRouter → Nous → Codex.	—
`"openrouter"`	Force OpenRouter — routes to any model (Gemini, GPT-4o, Claude, etc.)	`OPENROUTER_API_KEY`
`"nous"`	Force Nous Portal	`hermes login`
`"codex"`	Force Codex OAuth (ChatGPT account). Supports vision (gpt-5.3-codex).	`hermes model` → Codex
`"main"`	Use your active custom/main endpoint. This can come from `OPENAI_BASE_URL` + `OPENAI_API_KEY` or from a custom endpoint saved via `hermes model` / `config.yaml`. Works with OpenAI, local models, or any OpenAI-compatible API.	Custom endpoint credentials + base URL

Common Setups

Using a direct custom endpoint (clearer than provider: "main" for local/self-hosted APIs):

auxiliary:
  vision:
    base_url: "http://localhost:1234/v1"
    api_key: "local-key"
    model: "qwen2.5-vl"

base_url takes precedence over provider, so this is the most explicit way to route an auxiliary task to a specific endpoint. For direct endpoint overrides, Hermes uses the configured api_key or falls back to OPENAI_API_KEY; it does not reuse OPENROUTER_API_KEY for that custom endpoint.

Using OpenAI API key for vision:

# In ~/.hermes/.env:
# OPENAI_BASE_URL=https://api.openai.com/v1
# OPENAI_API_KEY=sk-...

auxiliary:
  vision:
    provider: "main"
    model: "gpt-4o"       # or "gpt-4o-mini" for cheaper

Using OpenRouter for vision (route to any model):

auxiliary:
  vision:
    provider: "openrouter"
    model: "openai/gpt-4o"      # or "google/gemini-2.5-flash", etc.

Using Codex OAuth (ChatGPT Pro/Plus account — no API key needed):

auxiliary:
  vision:
    provider: "codex"     # uses your ChatGPT OAuth token
    # model defaults to gpt-5.3-codex (supports vision)

Using a local/self-hosted model:

auxiliary:
  vision:
    provider: "main"      # uses your active custom endpoint
    model: "my-local-model"

provider: "main" follows the same custom endpoint Hermes uses for normal chat. That endpoint can be set directly with OPENAI_BASE_URL, or saved once through hermes model and persisted in config.yaml.

:::tip If you use Codex OAuth as your main model provider, vision works automatically — no extra configuration needed. Codex is included in the auto-detection chain for vision. :::

:::warning Vision requires a multimodal model. If you set provider: "main", make sure your endpoint supports multimodal/vision — otherwise image analysis will fail. :::

Environment Variables

You can also configure auxiliary models via environment variables instead of config.yaml:

Setting	Environment Variable
Vision provider	`AUXILIARY_VISION_PROVIDER`
Vision model	`AUXILIARY_VISION_MODEL`
Web extract provider	`AUXILIARY_WEB_EXTRACT_PROVIDER`
Web extract model	`AUXILIARY_WEB_EXTRACT_MODEL`
Compression provider	`CONTEXT_COMPRESSION_PROVIDER`
Compression model	`CONTEXT_COMPRESSION_MODEL`

:::tip Run hermes config to see your current auxiliary model settings. Overrides only show up when they differ from the defaults. :::

Reasoning Effort

Control how much "thinking" the model does before responding:

agent:
  reasoning_effort: ""   # empty = medium (default). Options: xhigh (max), high, medium, low, minimal, none

When unset (default), reasoning effort defaults to "medium" — a balanced level that works well for most tasks. Setting a value overrides it — higher reasoning effort gives better results on complex tasks at the cost of more tokens and latency.

You can also change the reasoning effort at runtime with the /reasoning command:

/reasoning           # Show current effort level and display state
/reasoning high      # Set reasoning effort to high
/reasoning none      # Disable reasoning
/reasoning show      # Show model thinking above each response
/reasoning hide      # Hide model thinking

TTS Configuration

tts:
  provider: "edge"              # "edge" | "elevenlabs" | "openai"
  edge:
    voice: "en-US-AriaNeural"   # 322 voices, 74 languages
  elevenlabs:
    voice_id: "pNInz6obpgDQGcFmaJgB"
    model_id: "eleven_multilingual_v2"
  openai:
    model: "gpt-4o-mini-tts"
    voice: "alloy"              # alloy, echo, fable, onyx, nova, shimmer

This controls both the text_to_speech tool and spoken replies in voice mode (/voice tts in the CLI or messaging gateway).

Display Settings

display:
  tool_progress: all      # off | new | all | verbose
  skin: default           # Built-in or custom CLI skin (see user-guide/features/skins)
  personality: "kawaii"  # Legacy cosmetic field still surfaced in some summaries
  compact: false          # Compact output mode (less whitespace)
  resume_display: full    # full (show previous messages on resume) | minimal (one-liner only)
  bell_on_complete: false # Play terminal bell when agent finishes (great for long tasks)
  show_reasoning: false   # Show model reasoning/thinking above each response (toggle with /reasoning show|hide)
  background_process_notifications: all  # all | result | error | off (gateway only)

Mode	What you see
`off`	Silent — just the final response
`new`	Tool indicator only when the tool changes
`all`	Every tool call with a short preview (default)
`verbose`	Full args, results, and debug logs

Privacy

privacy:
  redact_pii: false  # Strip PII from LLM context (gateway only)

When redact_pii is true, the gateway redacts personally identifiable information from the system prompt before sending it to the LLM on supported platforms:

Field	Treatment
Phone numbers (user ID on WhatsApp/Signal)	Hashed to `user_<12-char-sha256>`
User IDs	Hashed to `user_<12-char-sha256>`
Chat IDs	Numeric portion hashed, platform prefix preserved (`telegram:<hash>`)
Home channel IDs	Numeric portion hashed
User names / usernames	Not affected (user-chosen, publicly visible)

Platform support: Redaction applies to WhatsApp, Signal, and Telegram. Discord and Slack are excluded because their mention systems (<@user_id>) require the real ID in the LLM context.

Hashes are deterministic — the same user always maps to the same hash, so the model can still distinguish between users in group chats. Routing and delivery use the original values internally.

Speech-to-Text (STT)

stt:
  provider: "local"            # "local" | "groq" | "openai"
  local:
    model: "base"              # tiny, base, small, medium, large-v3
  openai:
    model: "whisper-1"         # whisper-1 | gpt-4o-mini-transcribe | gpt-4o-transcribe
  # model: "whisper-1"         # Legacy fallback key still respected

Provider behavior:

local uses faster-whisper running on your machine. Install it separately with pip install faster-whisper.
groq uses Groq's Whisper-compatible endpoint and reads GROQ_API_KEY.
openai uses the OpenAI speech API and reads VOICE_TOOLS_OPENAI_KEY.

If the requested provider is unavailable, Hermes falls back automatically in this order: local → groq → openai.

Groq and OpenAI model overrides are environment-driven:

STT_GROQ_MODEL=whisper-large-v3-turbo
STT_OPENAI_MODEL=whisper-1
GROQ_BASE_URL=https://api.groq.com/openai/v1
STT_OPENAI_BASE_URL=https://api.openai.com/v1

Voice Mode (CLI)

voice:
  record_key: "ctrl+b"         # Push-to-talk key inside the CLI
  max_recording_seconds: 120    # Hard stop for long recordings
  auto_tts: false               # Enable spoken replies automatically when /voice on
  silence_threshold: 200        # RMS threshold for speech detection
  silence_duration: 3.0         # Seconds of silence before auto-stop

Use /voice on in the CLI to enable microphone mode, record_key to start/stop recording, and /voice tts to toggle spoken replies. See Voice Mode for end-to-end setup and platform-specific behavior.

Group Chat Session Isolation

Control whether shared chats keep one conversation per room or one conversation per participant:

group_sessions_per_user: true  # true = per-user isolation in groups/channels, false = one shared session per chat

true is the default and recommended setting. In Discord channels, Telegram groups, Slack channels, and similar shared contexts, each sender gets their own session when the platform provides a user ID.
false reverts to the old shared-room behavior. That can be useful if you explicitly want Hermes to treat a channel like one collaborative conversation, but it also means users share context, token costs, and interrupt state.
Direct messages are unaffected. Hermes still keys DMs by chat/DM ID as usual.
Threads stay isolated from their parent channel either way; with true, each participant also gets their own session inside the thread.

For the behavior details and examples, see Sessions and the Discord guide.

Quick Commands

Define custom commands that run shell commands without invoking the LLM — zero token usage, instant execution. Especially useful from messaging platforms (Telegram, Discord, etc.) for quick server checks or utility scripts.

quick_commands:
  status:
    type: exec
    command: systemctl status hermes-agent
  disk:
    type: exec
    command: df -h /
  update:
    type: exec
    command: cd ~/.hermes/hermes-agent && git pull && pip install -e .
  gpu:
    type: exec
    command: nvidia-smi --query-gpu=name,utilization.gpu,memory.used,memory.total --format=csv,noheader

Usage: type /status, /disk, /update, or /gpu in the CLI or any messaging platform. The command runs locally on the host and returns the output directly — no LLM call, no tokens consumed.

30-second timeout — long-running commands are killed with an error message
Priority — quick commands are checked before skill commands, so you can override skill names
Autocomplete — quick commands are resolved at dispatch time and are not shown in the built-in slash-command autocomplete tables
Type — only exec is supported (runs a shell command); other types show an error
Works everywhere — CLI, Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant

Human Delay

Simulate human-like response pacing in messaging platforms:

human_delay:
  mode: "off"                  # off | natural | custom
  min_ms: 500                  # Minimum delay (custom mode)
  max_ms: 2000                 # Maximum delay (custom mode)

Code Execution

Configure the sandboxed Python code execution tool:

code_execution:
  timeout: 300                 # Max execution time in seconds
  max_tool_calls: 50           # Max tool calls within code execution

Browser

Configure browser automation behavior:

browser:
  inactivity_timeout: 120        # Seconds before auto-closing idle sessions
  record_sessions: false         # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/

Checkpoints

Automatic filesystem snapshots before destructive file operations. See the Checkpoints feature page for details.

checkpoints:
  enabled: false                 # Enable automatic checkpoints (also: hermes --checkpoints)
  max_snapshots: 50              # Max checkpoints to keep per directory

Delegation

Configure subagent behavior for the delegate tool:

delegation:
  max_iterations: 50           # Max iterations per subagent
  default_toolsets:             # Toolsets available to subagents
    - terminal
    - file
    - web
  # model: "google/gemini-3-flash-preview"  # Override model (empty = inherit parent)
  # provider: "openrouter"                  # Override provider (empty = inherit parent)
  # base_url: "http://localhost:1234/v1"    # Direct OpenAI-compatible endpoint (takes precedence over provider)
  # api_key: "local-key"                    # API key for base_url (falls back to OPENAI_API_KEY)

Subagent provider:model override: By default, subagents inherit the parent agent's provider and model. Set delegation.provider and delegation.model to route subagents to a different provider:model pair — e.g., use a cheap/fast model for narrowly-scoped subtasks while your primary agent runs an expensive reasoning model.

Direct endpoint override: If you want the obvious custom-endpoint path, set delegation.base_url, delegation.api_key, and delegation.model. That sends subagents directly to that OpenAI-compatible endpoint and takes precedence over delegation.provider. If delegation.api_key is omitted, Hermes falls back to OPENAI_API_KEY only.

The delegation provider uses the same credential resolution as CLI/gateway startup. All configured providers are supported: openrouter, nous, zai, kimi-coding, minimax, minimax-cn. When a provider is set, the system automatically resolves the correct base URL, API key, and API mode — no manual credential wiring needed.

Precedence: delegation.base_url in config → delegation.provider in config → parent provider (inherited). delegation.model in config → parent model (inherited). Setting just model without provider changes only the model name while keeping the parent's credentials (useful for switching models within the same provider like OpenRouter).

Clarify

Configure the clarification prompt behavior:

clarify:
  timeout: 120                 # Seconds to wait for user clarification response

Context Files (SOUL.md, AGENTS.md)

Hermes uses two different context scopes:

File	Purpose	Scope
`AGENTS.md`	Project-specific instructions, coding conventions	Working directory / project tree
`SOUL.md`	Default persona for this Hermes instance	`~/.hermes/SOUL.md` or `$HERMES_HOME/SOUL.md`
`.cursorrules`	Cursor IDE rules (also detected)	Working directory
`.cursor/rules/*.mdc`	Cursor rule files (also detected)	Working directory

AGENTS.md is hierarchical: if subdirectories also have AGENTS.md, all are combined.
SOUL.md is now global to the Hermes instance and is loaded only from HERMES_HOME.
Hermes automatically seeds a default SOUL.md if one does not already exist.
An empty SOUL.md contributes nothing to the system prompt.
All loaded context files are capped at 20,000 characters with smart truncation.

Working Directory

Context	Default
CLI (`hermes`)	Current directory where you run the command
Messaging gateway	Home directory `~` (override with `MESSAGING_CWD`)
Docker / Singularity / Modal / SSH	User's home directory inside the container or remote machine

Override the working directory:

# In ~/.hermes/.env or ~/.hermes/config.yaml:
MESSAGING_CWD=/home/myuser/projects    # Gateway sessions
TERMINAL_CWD=/workspace                # All terminal sessions

43 KiB Raw Blame History