docs: fallback providers + /background command documentation
* docs: comprehensive fallback providers documentation - New dedicated page: user-guide/features/fallback-providers.md covering both primary model fallback and auxiliary task fallback systems - Updated configuration.md with fallback_model config section - Updated environment-variables.md noting fallback is config-only - Fleshed out developer-guide/provider-runtime.md fallback section with internal architecture details (trigger points, activation flow, config flow) - Added cross-reference from provider-routing.md distinguishing OpenRouter sub-provider routing from Hermes-level model fallback - Added new page to sidebar under Integrations * docs: comprehensive /background command documentation - Added Background Sessions section to cli.md covering how it works (daemon threads, isolated sessions, config inheritance, Rich panel output, bell notification, concurrent tasks) - Added Background Sessions section to messaging/index.md covering messaging-specific behavior (async execution, result delivery back to same chat, fire-and-forget pattern) - Documented background_process_notifications config (all/result/error/off) in messaging docs and configuration.md - Added HERMES_BACKGROUND_NOTIFICATIONS env var to reference page - Fixed inconsistency in slash-commands.md: /background was listed as messaging-only but works in both CLI and messaging. Moved it to the 'both surfaces' note. - Expanded one-liner table descriptions with detail and cross-references
This commit is contained in:
@@ -130,7 +130,41 @@ When an auxiliary task is configured with provider `main`, Hermes resolves that
|
||||
|
||||
## Fallback models
|
||||
|
||||
Hermes also supports a configured fallback model/provider, allowing runtime failover in supported error paths.
|
||||
Hermes supports a configured fallback model/provider pair, allowing runtime failover when the primary model encounters errors.
|
||||
|
||||
### How it works internally
|
||||
|
||||
1. **Storage**: `AIAgent.__init__` stores the `fallback_model` dict and sets `_fallback_activated = False`.
|
||||
|
||||
2. **Trigger points**: `_try_activate_fallback()` is called from three places in the main retry loop in `run_agent.py`:
|
||||
- After max retries on invalid API responses (None choices, missing content)
|
||||
- On non-retryable client errors (HTTP 401, 403, 404)
|
||||
- After max retries on transient errors (HTTP 429, 500, 502, 503)
|
||||
|
||||
3. **Activation flow** (`_try_activate_fallback`):
|
||||
- Returns `False` immediately if already activated or not configured
|
||||
- Calls `resolve_provider_client()` from `auxiliary_client.py` to build a new client with proper auth
|
||||
- Determines `api_mode`: `codex_responses` for openai-codex, `anthropic_messages` for anthropic, `chat_completions` for everything else
|
||||
- Swaps in-place: `self.model`, `self.provider`, `self.base_url`, `self.api_mode`, `self.client`, `self._client_kwargs`
|
||||
- For anthropic fallback: builds a native Anthropic client instead of OpenAI-compatible
|
||||
- Re-evaluates prompt caching (enabled for Claude models on OpenRouter)
|
||||
- Sets `_fallback_activated = True` — prevents firing again
|
||||
- Resets retry count to 0 and continues the loop
|
||||
|
||||
4. **Config flow**:
|
||||
- CLI: `cli.py` reads `CLI_CONFIG["fallback_model"]` → passes to `AIAgent(fallback_model=...)`
|
||||
- Gateway: `gateway/run.py._load_fallback_model()` reads `config.yaml` → passes to `AIAgent`
|
||||
- Validation: both `provider` and `model` keys must be non-empty, or fallback is disabled
|
||||
|
||||
### What does NOT support fallback
|
||||
|
||||
- **Subagent delegation** (`tools/delegate_tool.py`): subagents inherit the parent's provider but not the fallback config
|
||||
- **Cron jobs** (`cron/`): run with a fixed provider, no fallback mechanism
|
||||
- **Auxiliary tasks**: use their own independent provider auto-detection chain (see Auxiliary model routing above)
|
||||
|
||||
### Test coverage
|
||||
|
||||
See `tests/test_fallback_model.py` for comprehensive tests covering all supported providers, one-shot semantics, and edge cases.
|
||||
|
||||
## Related docs
|
||||
|
||||
|
||||
@@ -164,6 +164,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
|
||||
| `HERMES_QUIET` | Suppress non-essential output (`true`/`false`) |
|
||||
| `HERMES_API_TIMEOUT` | LLM API call timeout in seconds (default: `900`) |
|
||||
| `HERMES_EXEC_ASK` | Enable execution approval prompts in gateway mode (`true`/`false`) |
|
||||
| `HERMES_BACKGROUND_NOTIFICATIONS` | Background process notification mode in gateway: `all` (default), `result`, `error`, `off` |
|
||||
|
||||
## Session Settings
|
||||
|
||||
@@ -197,6 +198,18 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
|
||||
|
||||
For task-specific direct endpoints, Hermes uses the task's configured API key or `OPENAI_API_KEY`. It does not reuse `OPENROUTER_API_KEY` for those custom endpoints.
|
||||
|
||||
## Fallback Model (config.yaml only)
|
||||
|
||||
The primary model fallback is configured exclusively through `config.yaml` — there are no environment variables for it. Add a `fallback_model` section with `provider` and `model` keys to enable automatic failover when your main model encounters errors.
|
||||
|
||||
```yaml
|
||||
fallback_model:
|
||||
provider: openrouter
|
||||
model: anthropic/claude-sonnet-4
|
||||
```
|
||||
|
||||
See [Fallback Providers](/docs/user-guide/features/fallback-providers) for full details.
|
||||
|
||||
## Provider Routing (config.yaml only)
|
||||
|
||||
These go in `~/.hermes/config.yaml` under the `provider_routing` section:
|
||||
|
||||
@@ -31,7 +31,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
|
||||
| `/title` | Set a title for the current session (usage: /title My Session Name) |
|
||||
| `/compress` | Manually compress conversation context (flush memories + summarize) |
|
||||
| `/rollback` | List or restore filesystem checkpoints (usage: /rollback [number]) |
|
||||
| `/background` | Run a prompt in the background (usage: /background <prompt>) |
|
||||
| `/background <prompt>` | Run a prompt in a separate background session. The agent processes your prompt independently — your current session stays free for other work. Results appear as a panel when the task finishes. See [CLI Background Sessions](/docs/user-guide/cli#background-sessions). |
|
||||
| `/plan [request]` | Load the bundled `plan` skill to write a markdown plan instead of executing the work. Plans are saved under `.hermes/plans/` relative to the active workspace/backend working directory. |
|
||||
|
||||
### Configuration
|
||||
@@ -109,7 +109,7 @@ The messaging gateway supports the following built-in commands inside Telegram,
|
||||
| `/reasoning [level\|show\|hide]` | Change reasoning effort or toggle reasoning display. |
|
||||
| `/voice [on\|off\|tts\|join\|channel\|leave\|status]` | Control spoken replies in chat. `join`/`channel`/`leave` manage Discord voice-channel mode. |
|
||||
| `/rollback [number]` | List or restore filesystem checkpoints. |
|
||||
| `/background <prompt>` | Run a prompt in a separate background session. |
|
||||
| `/background <prompt>` | Run a prompt in a separate background session. Results are delivered back to the same chat when the task finishes. See [Messaging Background Sessions](/docs/user-guide/messaging/#background-sessions). |
|
||||
| `/plan [request]` | Load the bundled `plan` skill to write a markdown plan instead of executing the work. Plans are saved under `.hermes/plans/` relative to the active workspace/backend working directory. |
|
||||
| `/reload-mcp` | Reload MCP servers from config. |
|
||||
| `/update` | Update Hermes Agent to the latest version. |
|
||||
@@ -119,6 +119,6 @@ The messaging gateway supports the following built-in commands inside Telegram,
|
||||
## Notes
|
||||
|
||||
- `/skin`, `/tools`, `/toolsets`, `/config`, `/prompt`, `/cron`, `/skills`, `/platforms`, `/paste`, and `/verbose` are **CLI-only** commands.
|
||||
- `/status`, `/stop`, `/sethome`, `/resume`, `/background`, and `/update` are **messaging-only** commands.
|
||||
- `/voice`, `/reload-mcp`, and `/rollback` work in **both** the CLI and the messaging gateway.
|
||||
- `/status`, `/stop`, `/sethome`, `/resume`, and `/update` are **messaging-only** commands.
|
||||
- `/background`, `/voice`, `/reload-mcp`, and `/rollback` work in **both** the CLI and the messaging gateway.
|
||||
- `/voice join`, `/voice channel`, and `/voice leave` are only meaningful on Discord.
|
||||
|
||||
@@ -259,6 +259,55 @@ compression:
|
||||
|
||||
When compression triggers, middle turns are summarized while the first 3 and last 4 turns are always preserved.
|
||||
|
||||
## Background Sessions
|
||||
|
||||
Run a prompt in a separate background session while continuing to use the CLI for other work:
|
||||
|
||||
```
|
||||
/background Analyze the logs in /var/log and summarize any errors from today
|
||||
```
|
||||
|
||||
Hermes immediately confirms the task and gives you back the prompt:
|
||||
|
||||
```
|
||||
🔄 Background task #1 started: "Analyze the logs in /var/log and summarize..."
|
||||
Task ID: bg_143022_a1b2c3
|
||||
```
|
||||
|
||||
### How It Works
|
||||
|
||||
Each `/background` prompt spawns a **completely separate agent session** in a daemon thread:
|
||||
|
||||
- **Isolated conversation** — the background agent has no knowledge of your current session's history. It receives only the prompt you provide.
|
||||
- **Same configuration** — the background agent inherits your model, provider, toolsets, reasoning settings, and fallback model from the current session.
|
||||
- **Non-blocking** — your foreground session stays fully interactive. You can chat, run commands, or even start more background tasks.
|
||||
- **Multiple tasks** — you can run several background tasks simultaneously. Each gets a numbered ID.
|
||||
|
||||
### Results
|
||||
|
||||
When a background task finishes, the result appears as a panel in your terminal:
|
||||
|
||||
```
|
||||
╭─ ⚕ Hermes (background #1) ──────────────────────────────────╮
|
||||
│ Found 3 errors in syslog from today: │
|
||||
│ 1. OOM killer invoked at 03:22 — killed process nginx │
|
||||
│ 2. Disk I/O error on /dev/sda1 at 07:15 │
|
||||
│ 3. Failed SSH login attempts from 192.168.1.50 at 14:30 │
|
||||
╰──────────────────────────────────────────────────────────────╯
|
||||
```
|
||||
|
||||
If the task fails, you'll see an error notification instead. If `display.bell_on_complete` is enabled in your config, the terminal bell rings when the task finishes.
|
||||
|
||||
### Use Cases
|
||||
|
||||
- **Long-running research** — "/background research the latest developments in quantum error correction" while you work on code
|
||||
- **File processing** — "/background analyze all Python files in this repo and list any security issues" while you continue a conversation
|
||||
- **Parallel investigations** — start multiple background tasks to explore different angles simultaneously
|
||||
|
||||
:::info
|
||||
Background sessions do not appear in your main conversation history. They are standalone sessions with their own task ID (e.g., `bg_143022_a1b2c3`).
|
||||
:::
|
||||
|
||||
## Quiet Mode
|
||||
|
||||
By default, the CLI runs in quiet mode which:
|
||||
|
||||
@@ -421,6 +421,26 @@ provider_routing:
|
||||
|
||||
**Shortcuts:** Append `:nitro` to any model name for throughput sorting (e.g., `anthropic/claude-sonnet-4:nitro`), or `:floor` for price sorting.
|
||||
|
||||
## Fallback Model
|
||||
|
||||
Configure a backup provider:model that Hermes switches to automatically when your primary model fails (rate limits, server errors, auth failures):
|
||||
|
||||
```yaml
|
||||
fallback_model:
|
||||
provider: openrouter # required
|
||||
model: anthropic/claude-sonnet-4 # required
|
||||
# base_url: http://localhost:8000/v1 # optional, for custom endpoints
|
||||
# api_key_env: MY_CUSTOM_KEY # optional, env var name for custom endpoint API key
|
||||
```
|
||||
|
||||
When activated, the fallback swaps the model and provider mid-session without losing your conversation. It fires **at most once** per session.
|
||||
|
||||
Supported providers: `openrouter`, `nous`, `openai-codex`, `anthropic`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `custom`.
|
||||
|
||||
:::tip
|
||||
Fallback is configured exclusively through `config.yaml` — there are no environment variables for it. For full details on when it triggers, supported providers, and how it interacts with auxiliary tasks and delegation, see [Fallback Providers](/docs/user-guide/features/fallback-providers).
|
||||
:::
|
||||
|
||||
## Terminal Backend Configuration
|
||||
|
||||
Configure which environment the agent uses for terminal commands:
|
||||
@@ -733,6 +753,7 @@ display:
|
||||
resume_display: full # full (show previous messages on resume) | minimal (one-liner only)
|
||||
bell_on_complete: false # Play terminal bell when agent finishes (great for long tasks)
|
||||
show_reasoning: false # Show model reasoning/thinking above each response (toggle with /reasoning show|hide)
|
||||
background_process_notifications: all # all | result | error | off (gateway only)
|
||||
```
|
||||
|
||||
| Mode | What you see |
|
||||
|
||||
311
website/docs/user-guide/features/fallback-providers.md
Normal file
311
website/docs/user-guide/features/fallback-providers.md
Normal file
@@ -0,0 +1,311 @@
|
||||
---
|
||||
title: Fallback Providers
|
||||
description: Configure automatic failover to backup LLM providers when your primary model is unavailable.
|
||||
sidebar_label: Fallback Providers
|
||||
sidebar_position: 8
|
||||
---
|
||||
|
||||
# Fallback Providers
|
||||
|
||||
Hermes Agent has two separate fallback systems that keep your sessions running when providers hit issues:
|
||||
|
||||
1. **Primary model fallback** — automatically switches to a backup provider:model when your main model fails
|
||||
2. **Auxiliary task fallback** — independent provider resolution for side tasks like vision, compression, and web extraction
|
||||
|
||||
Both are optional and work independently.
|
||||
|
||||
## Primary Model Fallback
|
||||
|
||||
When your main LLM provider encounters errors — rate limits, server overload, auth failures, connection drops — Hermes can automatically switch to a backup provider:model pair mid-session without losing your conversation.
|
||||
|
||||
### Configuration
|
||||
|
||||
Add a `fallback_model` section to `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
fallback_model:
|
||||
provider: openrouter
|
||||
model: anthropic/claude-sonnet-4
|
||||
```
|
||||
|
||||
Both `provider` and `model` are **required**. If either is missing, the fallback is disabled.
|
||||
|
||||
### Supported Providers
|
||||
|
||||
| Provider | Value | Requirements |
|
||||
|----------|-------|-------------|
|
||||
| OpenRouter | `openrouter` | `OPENROUTER_API_KEY` |
|
||||
| Nous Portal | `nous` | `hermes login` (OAuth) |
|
||||
| OpenAI Codex | `openai-codex` | `hermes model` (ChatGPT OAuth) |
|
||||
| Anthropic | `anthropic` | `ANTHROPIC_API_KEY` or Claude Code credentials |
|
||||
| z.ai / GLM | `zai` | `GLM_API_KEY` |
|
||||
| Kimi / Moonshot | `kimi-coding` | `KIMI_API_KEY` |
|
||||
| MiniMax | `minimax` | `MINIMAX_API_KEY` |
|
||||
| MiniMax (China) | `minimax-cn` | `MINIMAX_CN_API_KEY` |
|
||||
| Custom endpoint | `custom` | `base_url` + `api_key_env` (see below) |
|
||||
|
||||
### Custom Endpoint Fallback
|
||||
|
||||
For a custom OpenAI-compatible endpoint, add `base_url` and optionally `api_key_env`:
|
||||
|
||||
```yaml
|
||||
fallback_model:
|
||||
provider: custom
|
||||
model: my-local-model
|
||||
base_url: http://localhost:8000/v1
|
||||
api_key_env: MY_LOCAL_KEY # env var name containing the API key
|
||||
```
|
||||
|
||||
### When Fallback Triggers
|
||||
|
||||
The fallback activates automatically when the primary model fails with:
|
||||
|
||||
- **Rate limits** (HTTP 429) — after exhausting retry attempts
|
||||
- **Server errors** (HTTP 500, 502, 503) — after exhausting retry attempts
|
||||
- **Auth failures** (HTTP 401, 403) — immediately (no point retrying)
|
||||
- **Not found** (HTTP 404) — immediately
|
||||
- **Invalid responses** — when the API returns malformed or empty responses repeatedly
|
||||
|
||||
When triggered, Hermes:
|
||||
|
||||
1. Resolves credentials for the fallback provider
|
||||
2. Builds a new API client
|
||||
3. Swaps the model, provider, and client in-place
|
||||
4. Resets the retry counter and continues the conversation
|
||||
|
||||
The switch is seamless — your conversation history, tool calls, and context are preserved. The agent continues from exactly where it left off, just using a different model.
|
||||
|
||||
:::info One-Shot
|
||||
Fallback activates **at most once** per session. If the fallback provider also fails, normal error handling takes over (retries, then error message). This prevents cascading failover loops.
|
||||
:::
|
||||
|
||||
### Examples
|
||||
|
||||
**OpenRouter as fallback for Anthropic native:**
|
||||
```yaml
|
||||
model:
|
||||
provider: anthropic
|
||||
default: claude-sonnet-4-6
|
||||
|
||||
fallback_model:
|
||||
provider: openrouter
|
||||
model: anthropic/claude-sonnet-4
|
||||
```
|
||||
|
||||
**Nous Portal as fallback for OpenRouter:**
|
||||
```yaml
|
||||
model:
|
||||
provider: openrouter
|
||||
default: anthropic/claude-opus-4
|
||||
|
||||
fallback_model:
|
||||
provider: nous
|
||||
model: nous-hermes-3
|
||||
```
|
||||
|
||||
**Local model as fallback for cloud:**
|
||||
```yaml
|
||||
fallback_model:
|
||||
provider: custom
|
||||
model: llama-3.1-70b
|
||||
base_url: http://localhost:8000/v1
|
||||
api_key_env: LOCAL_API_KEY
|
||||
```
|
||||
|
||||
**Codex OAuth as fallback:**
|
||||
```yaml
|
||||
fallback_model:
|
||||
provider: openai-codex
|
||||
model: gpt-5.3-codex
|
||||
```
|
||||
|
||||
### Where Fallback Works
|
||||
|
||||
| Context | Fallback Supported |
|
||||
|---------|-------------------|
|
||||
| CLI sessions | ✔ |
|
||||
| Messaging gateway (Telegram, Discord, etc.) | ✔ |
|
||||
| Subagent delegation | ✘ (subagents do not inherit fallback config) |
|
||||
| Cron jobs | ✘ (run with a fixed provider) |
|
||||
| Auxiliary tasks (vision, compression) | ✘ (use their own provider chain — see below) |
|
||||
|
||||
:::tip
|
||||
There are no environment variables for `fallback_model` — it is configured exclusively through `config.yaml`. This is intentional: fallback configuration is a deliberate choice, not something a stale shell export should override.
|
||||
:::
|
||||
|
||||
---
|
||||
|
||||
## Auxiliary Task Fallback
|
||||
|
||||
Hermes uses separate lightweight models for side tasks. Each task has its own provider resolution chain that acts as a built-in fallback system.
|
||||
|
||||
### Tasks with Independent Provider Resolution
|
||||
|
||||
| Task | What It Does | Config Key |
|
||||
|------|-------------|-----------|
|
||||
| Vision | Image analysis, browser screenshots | `auxiliary.vision` |
|
||||
| Web Extract | Web page summarization | `auxiliary.web_extract` |
|
||||
| Compression | Context compression summaries | `auxiliary.compression` or `compression.summary_provider` |
|
||||
| Session Search | Past session summarization | `auxiliary.session_search` |
|
||||
| Skills Hub | Skill search and discovery | `auxiliary.skills_hub` |
|
||||
| MCP | MCP helper operations | `auxiliary.mcp` |
|
||||
| Memory Flush | Memory consolidation | `auxiliary.flush_memories` |
|
||||
|
||||
### Auto-Detection Chain
|
||||
|
||||
When a task's provider is set to `"auto"` (the default), Hermes tries providers in order until one works:
|
||||
|
||||
**For text tasks (compression, web extract, etc.):**
|
||||
|
||||
```text
|
||||
OpenRouter → Nous Portal → Custom endpoint → Codex OAuth →
|
||||
API-key providers (z.ai, Kimi, MiniMax, Anthropic) → give up
|
||||
```
|
||||
|
||||
**For vision tasks:**
|
||||
|
||||
```text
|
||||
Main provider (if vision-capable) → OpenRouter → Nous Portal →
|
||||
Codex OAuth → Anthropic → Custom endpoint → give up
|
||||
```
|
||||
|
||||
If the resolved provider fails at call time, Hermes also has an internal retry: if the provider is not OpenRouter and no explicit `base_url` is set, it tries OpenRouter as a last-resort fallback.
|
||||
|
||||
### Configuring Auxiliary Providers
|
||||
|
||||
Each task can be configured independently in `config.yaml`:
|
||||
|
||||
```yaml
|
||||
auxiliary:
|
||||
vision:
|
||||
provider: "auto" # auto | openrouter | nous | codex | main | anthropic
|
||||
model: "" # e.g. "openai/gpt-4o"
|
||||
base_url: "" # direct endpoint (takes precedence over provider)
|
||||
api_key: "" # API key for base_url
|
||||
|
||||
web_extract:
|
||||
provider: "auto"
|
||||
model: ""
|
||||
|
||||
compression:
|
||||
provider: "auto"
|
||||
model: ""
|
||||
|
||||
session_search:
|
||||
provider: "auto"
|
||||
model: ""
|
||||
|
||||
skills_hub:
|
||||
provider: "auto"
|
||||
model: ""
|
||||
|
||||
mcp:
|
||||
provider: "auto"
|
||||
model: ""
|
||||
|
||||
flush_memories:
|
||||
provider: "auto"
|
||||
model: ""
|
||||
```
|
||||
|
||||
Or via environment variables:
|
||||
|
||||
```bash
|
||||
AUXILIARY_VISION_PROVIDER=openrouter
|
||||
AUXILIARY_VISION_MODEL=openai/gpt-4o
|
||||
AUXILIARY_WEB_EXTRACT_PROVIDER=nous
|
||||
CONTEXT_COMPRESSION_PROVIDER=main
|
||||
CONTEXT_COMPRESSION_MODEL=google/gemini-3-flash-preview
|
||||
```
|
||||
|
||||
### Provider Options for Auxiliary Tasks
|
||||
|
||||
| Provider | Description | Requirements |
|
||||
|----------|-------------|-------------|
|
||||
| `"auto"` | Try providers in order until one works (default) | At least one provider configured |
|
||||
| `"openrouter"` | Force OpenRouter | `OPENROUTER_API_KEY` |
|
||||
| `"nous"` | Force Nous Portal | `hermes login` |
|
||||
| `"codex"` | Force Codex OAuth | `hermes model` → Codex |
|
||||
| `"main"` | Use whatever provider the main agent uses | Active main provider configured |
|
||||
| `"anthropic"` | Force Anthropic native | `ANTHROPIC_API_KEY` or Claude Code credentials |
|
||||
|
||||
### Direct Endpoint Override
|
||||
|
||||
For any auxiliary task, setting `base_url` bypasses provider resolution entirely and sends requests directly to that endpoint:
|
||||
|
||||
```yaml
|
||||
auxiliary:
|
||||
vision:
|
||||
base_url: "http://localhost:1234/v1"
|
||||
api_key: "local-key"
|
||||
model: "qwen2.5-vl"
|
||||
```
|
||||
|
||||
`base_url` takes precedence over `provider`. Hermes uses the configured `api_key` for authentication, falling back to `OPENAI_API_KEY` if not set. It does **not** reuse `OPENROUTER_API_KEY` for custom endpoints.
|
||||
|
||||
---
|
||||
|
||||
## Context Compression Fallback
|
||||
|
||||
Context compression has a legacy configuration path in addition to the auxiliary system:
|
||||
|
||||
```yaml
|
||||
compression:
|
||||
summary_provider: "auto" # auto | openrouter | nous | main
|
||||
summary_model: "google/gemini-3-flash-preview"
|
||||
```
|
||||
|
||||
This is equivalent to configuring `auxiliary.compression.provider` and `auxiliary.compression.model`. If both are set, the `auxiliary.compression` values take precedence.
|
||||
|
||||
If no provider is available for compression, Hermes drops middle conversation turns without generating a summary rather than failing the session.
|
||||
|
||||
---
|
||||
|
||||
## Delegation Provider Override
|
||||
|
||||
Subagents spawned by `delegate_task` do **not** use the primary fallback model. However, they can be routed to a different provider:model pair for cost optimization:
|
||||
|
||||
```yaml
|
||||
delegation:
|
||||
provider: "openrouter" # override provider for all subagents
|
||||
model: "google/gemini-3-flash-preview" # override model
|
||||
# base_url: "http://localhost:1234/v1" # or use a direct endpoint
|
||||
# api_key: "local-key"
|
||||
```
|
||||
|
||||
See [Subagent Delegation](/docs/user-guide/features/delegation) for full configuration details.
|
||||
|
||||
---
|
||||
|
||||
## Cron Job Providers
|
||||
|
||||
Cron jobs run with whatever provider is configured at execution time. They do not support a fallback model. To use a different provider for cron jobs, configure `provider` and `model` overrides on the cron job itself:
|
||||
|
||||
```python
|
||||
cronjob(
|
||||
action="create",
|
||||
schedule="every 2h",
|
||||
prompt="Check server status",
|
||||
provider="openrouter",
|
||||
model="google/gemini-3-flash-preview"
|
||||
)
|
||||
```
|
||||
|
||||
See [Scheduled Tasks (Cron)](/docs/user-guide/features/cron) for full configuration details.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Feature | Fallback Mechanism | Config Location |
|
||||
|---------|-------------------|----------------|
|
||||
| Main agent model | `fallback_model` in config.yaml — one-shot failover on errors | `fallback_model:` (top-level) |
|
||||
| Vision | Auto-detection chain + internal OpenRouter retry | `auxiliary.vision` |
|
||||
| Web extraction | Auto-detection chain + internal OpenRouter retry | `auxiliary.web_extract` |
|
||||
| Context compression | Auto-detection chain, degrades to no-summary if unavailable | `auxiliary.compression` or `compression.summary_provider` |
|
||||
| Session search | Auto-detection chain | `auxiliary.session_search` |
|
||||
| Skills hub | Auto-detection chain | `auxiliary.skills_hub` |
|
||||
| MCP helpers | Auto-detection chain | `auxiliary.mcp` |
|
||||
| Memory flush | Auto-detection chain | `auxiliary.flush_memories` |
|
||||
| Delegation | Provider override only (no automatic fallback) | `delegation.provider` / `delegation.model` |
|
||||
| Cron jobs | Per-job provider override only (no automatic fallback) | Per-job `provider` / `model` |
|
||||
@@ -194,3 +194,7 @@ provider_routing:
|
||||
## Default Behavior
|
||||
|
||||
When no `provider_routing` section is configured (the default), OpenRouter uses its own default routing logic, which generally balances cost and availability automatically.
|
||||
|
||||
:::tip Provider Routing vs. Fallback Models
|
||||
Provider routing controls which **sub-providers within OpenRouter** handle your requests. For automatic failover to an entirely different provider when your primary model fails, see [Fallback Providers](/docs/user-guide/features/fallback-providers).
|
||||
:::
|
||||
|
||||
@@ -181,6 +181,63 @@ When enabled, the bot sends status messages as it works:
|
||||
🐍 execute_code...
|
||||
```
|
||||
|
||||
## Background Sessions
|
||||
|
||||
Run a prompt in a separate background session so the agent works on it independently while your main chat stays responsive:
|
||||
|
||||
```
|
||||
/background Check all servers in the cluster and report any that are down
|
||||
```
|
||||
|
||||
Hermes confirms immediately:
|
||||
|
||||
```
|
||||
🔄 Background task started: "Check all servers in the cluster..."
|
||||
Task ID: bg_143022_a1b2c3
|
||||
```
|
||||
|
||||
### How It Works
|
||||
|
||||
Each `/background` prompt spawns a **separate agent instance** that runs asynchronously:
|
||||
|
||||
- **Isolated session** — the background agent has its own session with its own conversation history. It has no knowledge of your current chat context and receives only the prompt you provide.
|
||||
- **Same configuration** — inherits your model, provider, toolsets, reasoning settings, and provider routing from the current gateway setup.
|
||||
- **Non-blocking** — your main chat stays fully interactive. Send messages, run other commands, or start more background tasks while it works.
|
||||
- **Result delivery** — when the task finishes, the result is sent back to the **same chat or channel** where you issued the command, prefixed with "✅ Background task complete". If it fails, you'll see "❌ Background task failed" with the error.
|
||||
|
||||
### Background Process Notifications
|
||||
|
||||
When the agent running a background session uses `terminal(background=true)` to start long-running processes (servers, builds, etc.), the gateway can push status updates to your chat. Control this with `display.background_process_notifications` in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
display:
|
||||
background_process_notifications: all # all | result | error | off
|
||||
```
|
||||
|
||||
| Mode | What you receive |
|
||||
|------|-----------------|
|
||||
| `all` | Running-output updates **and** the final completion message (default) |
|
||||
| `result` | Only the final completion message (regardless of exit code) |
|
||||
| `error` | Only the final message when the exit code is non-zero |
|
||||
| `off` | No process watcher messages at all |
|
||||
|
||||
You can also set this via environment variable:
|
||||
|
||||
```bash
|
||||
HERMES_BACKGROUND_NOTIFICATIONS=result
|
||||
```
|
||||
|
||||
### Use Cases
|
||||
|
||||
- **Server monitoring** — "/background Check the health of all services and alert me if anything is down"
|
||||
- **Long builds** — "/background Build and deploy the staging environment" while you continue chatting
|
||||
- **Research tasks** — "/background Research competitor pricing and summarize in a table"
|
||||
- **File operations** — "/background Organize the photos in ~/Downloads by date into folders"
|
||||
|
||||
:::tip
|
||||
Background tasks on messaging platforms are fire-and-forget — you don't need to wait or check on them. Results arrive in the same chat automatically when the task finishes.
|
||||
:::
|
||||
|
||||
## Service Management
|
||||
|
||||
### Linux (systemd)
|
||||
|
||||
@@ -91,6 +91,7 @@ const sidebars: SidebarsConfig = {
|
||||
'user-guide/features/mcp',
|
||||
'user-guide/features/honcho',
|
||||
'user-guide/features/provider-routing',
|
||||
'user-guide/features/fallback-providers',
|
||||
],
|
||||
},
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user