Files
hermes-agent/website/docs/developer-guide/provider-runtime.md
Teknium 463239ed85 docs: fallback providers + /background command documentation
* docs: comprehensive fallback providers documentation

- New dedicated page: user-guide/features/fallback-providers.md covering
  both primary model fallback and auxiliary task fallback systems
- Updated configuration.md with fallback_model config section
- Updated environment-variables.md noting fallback is config-only
- Fleshed out developer-guide/provider-runtime.md fallback section with
  internal architecture details (trigger points, activation flow, config flow)
- Added cross-reference from provider-routing.md distinguishing OpenRouter
  sub-provider routing from Hermes-level model fallback
- Added new page to sidebar under Integrations

* docs: comprehensive /background command documentation

- Added Background Sessions section to cli.md covering how it works
  (daemon threads, isolated sessions, config inheritance, Rich panel
  output, bell notification, concurrent tasks)
- Added Background Sessions section to messaging/index.md covering
  messaging-specific behavior (async execution, result delivery back
  to same chat, fire-and-forget pattern)
- Documented background_process_notifications config
  (all/result/error/off) in messaging docs and configuration.md
- Added HERMES_BACKGROUND_NOTIFICATIONS env var to reference page
- Fixed inconsistency in slash-commands.md: /background was listed as
  messaging-only but works in both CLI and messaging. Moved it to the
  'both surfaces' note.
- Expanded one-liner table descriptions with detail and cross-references
2026-03-15 06:24:28 -07:00

6.1 KiB

sidebar_position, title, description
sidebar_position title description
4 Provider Runtime Resolution How Hermes resolves providers, credentials, API modes, and auxiliary models at runtime

Provider Runtime Resolution

Hermes has a shared provider runtime resolver used across:

  • CLI
  • gateway
  • cron jobs
  • ACP
  • auxiliary model calls

Primary implementation:

  • hermes_cli/runtime_provider.py
  • hermes_cli/auth.py
  • agent/auxiliary_client.py

If you are trying to add a new first-class inference provider, read Adding Providers alongside this page.

Resolution precedence

At a high level, provider resolution uses:

  1. explicit CLI/runtime request
  2. config.yaml model/provider config
  3. environment variables
  4. provider-specific defaults or auto resolution

That ordering matters because Hermes treats the saved model/provider choice as the source of truth for normal runs. This prevents a stale shell export from silently overriding the endpoint a user last selected in hermes model.

Providers

Current provider families include:

  • OpenRouter
  • Nous Portal
  • OpenAI Codex
  • Anthropic (native)
  • Z.AI
  • Kimi / Moonshot
  • MiniMax
  • MiniMax China
  • custom OpenAI-compatible endpoints

Output of runtime resolution

The runtime resolver returns data such as:

  • provider
  • api_mode
  • base_url
  • api_key
  • source
  • provider-specific metadata like expiry/refresh info

Why this matters

This resolver is the main reason Hermes can share auth/runtime logic between:

  • hermes chat
  • gateway message handling
  • cron jobs running in fresh sessions
  • ACP editor sessions
  • auxiliary model tasks

OpenRouter vs custom OpenAI-compatible base URLs

Hermes contains logic to avoid leaking the wrong API key to a custom endpoint when both OPENROUTER_API_KEY and OPENAI_API_KEY exist.

It also distinguishes between:

  • a real custom endpoint selected by the user
  • the OpenRouter fallback path used when no custom endpoint is configured

That distinction is especially important for:

  • local model servers
  • non-OpenRouter OpenAI-compatible APIs
  • switching providers without re-running setup
  • config-saved custom endpoints that should keep working even when OPENAI_BASE_URL is not exported in the current shell

Native Anthropic path

Anthropic is not just "via OpenRouter" anymore.

When provider resolution selects anthropic, Hermes uses:

  • api_mode = anthropic_messages
  • the native Anthropic Messages API
  • agent/anthropic_adapter.py for translation

Credential resolution for native Anthropic now prefers refreshable Claude Code credentials over copied env tokens when both are present. In practice that means:

  • Claude Code credential files are treated as the preferred source when they include refreshable auth
  • manual ANTHROPIC_TOKEN / CLAUDE_CODE_OAUTH_TOKEN values still work as explicit overrides
  • Hermes preflights Anthropic credential refresh before native Messages API calls
  • Hermes still retries once on a 401 after rebuilding the Anthropic client, as a fallback path

OpenAI Codex path

Codex uses a separate Responses API path:

  • api_mode = codex_responses
  • dedicated credential resolution and auth store support

Auxiliary model routing

Auxiliary tasks such as:

  • vision
  • web extraction summarization
  • context compression summaries
  • session search summarization
  • skills hub operations
  • MCP helper operations
  • memory flushes

can use their own provider/model routing rather than the main conversational model.

When an auxiliary task is configured with provider main, Hermes resolves that through the same shared runtime path as normal chat. In practice that means:

  • env-driven custom endpoints still work
  • custom endpoints saved via hermes model / config.yaml also work
  • auxiliary routing can tell the difference between a real saved custom endpoint and the OpenRouter fallback

Fallback models

Hermes supports a configured fallback model/provider pair, allowing runtime failover when the primary model encounters errors.

How it works internally

  1. Storage: AIAgent.__init__ stores the fallback_model dict and sets _fallback_activated = False.

  2. Trigger points: _try_activate_fallback() is called from three places in the main retry loop in run_agent.py:

    • After max retries on invalid API responses (None choices, missing content)
    • On non-retryable client errors (HTTP 401, 403, 404)
    • After max retries on transient errors (HTTP 429, 500, 502, 503)
  3. Activation flow (_try_activate_fallback):

    • Returns False immediately if already activated or not configured
    • Calls resolve_provider_client() from auxiliary_client.py to build a new client with proper auth
    • Determines api_mode: codex_responses for openai-codex, anthropic_messages for anthropic, chat_completions for everything else
    • Swaps in-place: self.model, self.provider, self.base_url, self.api_mode, self.client, self._client_kwargs
    • For anthropic fallback: builds a native Anthropic client instead of OpenAI-compatible
    • Re-evaluates prompt caching (enabled for Claude models on OpenRouter)
    • Sets _fallback_activated = True — prevents firing again
    • Resets retry count to 0 and continues the loop
  4. Config flow:

    • CLI: cli.py reads CLI_CONFIG["fallback_model"] → passes to AIAgent(fallback_model=...)
    • Gateway: gateway/run.py._load_fallback_model() reads config.yaml → passes to AIAgent
    • Validation: both provider and model keys must be non-empty, or fallback is disabled

What does NOT support fallback

  • Subagent delegation (tools/delegate_tool.py): subagents inherit the parent's provider but not the fallback config
  • Cron jobs (cron/): run with a fixed provider, no fallback mechanism
  • Auxiliary tasks: use their own independent provider auto-detection chain (see Auxiliary model routing above)

Test coverage

See tests/test_fallback_model.py for comprehensive tests covering all supported providers, one-shot semantics, and edge cases.