Files

Teknium 6b1adb7eb1 Merge pull request #1376 from NousResearch/hermes/hermes-781f9235-docs

docs: clarify saved custom endpoint routing

2026-03-14 21:15:24 -07:00

4.1 KiB

Raw Blame History

sidebar_position, title, description

sidebar_position	title	description
4	Provider Runtime Resolution	How Hermes resolves providers, credentials, API modes, and auxiliary models at runtime

Provider Runtime Resolution

Hermes has a shared provider runtime resolver used across:

CLI
gateway
cron jobs
ACP
auxiliary model calls

Primary implementation:

hermes_cli/runtime_provider.py
hermes_cli/auth.py
agent/auxiliary_client.py

If you are trying to add a new first-class inference provider, read Adding Providers alongside this page.

Resolution precedence

At a high level, provider resolution uses:

explicit CLI/runtime request
config.yaml model/provider config
environment variables
provider-specific defaults or auto resolution

That ordering matters because Hermes treats the saved model/provider choice as the source of truth for normal runs. This prevents a stale shell export from silently overriding the endpoint a user last selected in hermes model.

Providers

Current provider families include:

OpenRouter
Nous Portal
OpenAI Codex
Anthropic (native)
Z.AI
Kimi / Moonshot
MiniMax
MiniMax China
custom OpenAI-compatible endpoints

Output of runtime resolution

The runtime resolver returns data such as:

provider
api_mode
base_url
api_key
source
provider-specific metadata like expiry/refresh info

Why this matters

This resolver is the main reason Hermes can share auth/runtime logic between:

hermes chat
gateway message handling
cron jobs running in fresh sessions
ACP editor sessions
auxiliary model tasks

OpenRouter vs custom OpenAI-compatible base URLs

Hermes contains logic to avoid leaking the wrong API key to a custom endpoint when both OPENROUTER_API_KEY and OPENAI_API_KEY exist.

It also distinguishes between:

a real custom endpoint selected by the user
the OpenRouter fallback path used when no custom endpoint is configured

That distinction is especially important for:

local model servers
non-OpenRouter OpenAI-compatible APIs
switching providers without re-running setup
config-saved custom endpoints that should keep working even when OPENAI_BASE_URL is not exported in the current shell

Native Anthropic path

Anthropic is not just "via OpenRouter" anymore.

When provider resolution selects anthropic, Hermes uses:

api_mode = anthropic_messages
the native Anthropic Messages API
agent/anthropic_adapter.py for translation

Credential resolution for native Anthropic now prefers refreshable Claude Code credentials over copied env tokens when both are present. In practice that means:

Claude Code credential files are treated as the preferred source when they include refreshable auth
manual ANTHROPIC_TOKEN / CLAUDE_CODE_OAUTH_TOKEN values still work as explicit overrides
Hermes preflights Anthropic credential refresh before native Messages API calls
Hermes still retries once on a 401 after rebuilding the Anthropic client, as a fallback path

OpenAI Codex path

Codex uses a separate Responses API path:

api_mode = codex_responses
dedicated credential resolution and auth store support

Auxiliary model routing

Auxiliary tasks such as:

vision
web extraction summarization
context compression summaries
session search summarization
skills hub operations
MCP helper operations
memory flushes

can use their own provider/model routing rather than the main conversational model.

When an auxiliary task is configured with provider main, Hermes resolves that through the same shared runtime path as normal chat. In practice that means:

env-driven custom endpoints still work
custom endpoints saved via hermes model / config.yaml also work
auxiliary routing can tell the difference between a real saved custom endpoint and the OpenRouter fallback

Fallback models

Hermes also supports a configured fallback model/provider, allowing runtime failover in supported error paths.

4.1 KiB Raw Blame History