Co-authored-by: Google Gemini <gemini@hermes.local> Co-committed-by: Google Gemini <gemini@hermes.local>
5.7 KiB
Hermes Agent Provider Fallback Chain
Hermes Agent incorporates a robust provider fallback mechanism to ensure continuous operation and resilience against inference provider outages. This system allows the agent to seamlessly switch to alternative Language Model (LLM) providers when the primary one experiences failures, and to intelligently attempt to revert to higher-priority providers once issues are resolved.
Key Concepts
- Primary Provider (
_primary_snapshot): The initial, preferred LLM provider configured for the agent. Hermes Agent will always attempt to use this provider first and return to it whenever possible. - Fallback Chain (
_fallback_chain): An ordered list of alternative provider configurations. Each entry in this list is a dictionary specifying a backupproviderandmodel(e.g.,{"provider": "kimi-coding", "model": "kimi-k2.5"}). The order in this list denotes their priority, with earlier entries being higher priority. - Fallback Chain Index (
_fallback_chain_index): An internal pointer that tracks the currently active provider within the fallback system.-1: Indicates the primary provider is active (initial state, or after successful recovery to primary).0toN-1: Corresponds to theNentries in the_fallback_chainlist.
Mechanism Overview
The provider fallback system operates through two main processes: cascading down the chain upon failure and recovering up the chain when conditions improve.
1. Cascading Down on Failure (_try_activate_fallback)
When the currently active LLM provider consistently fails after a series of retries (e.g., due to rate limits, API errors, or unavailability), the _try_activate_fallback method is invoked.
- Process:
- It iterates sequentially through the
_fallback_chainlist, starting from the next available entry after the current_fallback_chain_index. - For each fallback entry, it attempts to activate the provider using the
_activate_providerhelper function. - If a provider is successfully activated (meaning its credentials can be resolved and a client can be created), that provider becomes the new active inference provider for the agent, and the method returns
True. - If all providers in the
_fallback_chainare attempted and none can be successfully activated, a warning is logged, and the method returnsFalse, indicating that the agent has exhausted all available fallback options.
- It iterates sequentially through the
2. Recovering Up the Chain (_try_recover_up)
To ensure the agent utilizes the highest possible priority provider, _try_recover_up is periodically called after a configurable number of successful API responses (_RECOVERY_INTERVAL).
- Process:
- If the agent is currently using a fallback provider (i.e.,
_fallback_chain_index > 0), it attempts to probe the provider one level higher in priority (closer to the primary provider). - If the target is the original primary provider, it directly calls
_try_restore_primary. - Otherwise, it uses
_resolve_fallback_clientto perform a lightweight check: can a client be successfully created for the higher-priority provider without fully switching? - If the probe is successful,
_activate_provideris called to switch to this higher-priority provider, and the_fallback_chain_indexis updated accordingly. The method returnsTrue.
- If the agent is currently using a fallback provider (i.e.,
3. Restoring to Primary (_try_restore_primary)
A dedicated method, _try_restore_primary, is responsible for attempting to switch the agent back to its _primary_snapshot configuration. This is a special case of recovery, always aiming for the original, most preferred provider.
- Process:
- It checks if the
_primary_snapshotis available. - It probes the primary provider for health.
- If the primary provider is healthy and can be activated, the agent switches back to it, and the
_fallback_chain_indexis reset to-1.
- It checks if the
Core Helper Functions
_activate_provider(fb: dict, direction: str): This function is responsible for performing the actual switch to a new provider. It takes a fallback configuration dictionary (fb), resolves credentials, creates the appropriate LLM client (e.g., usingopenaioranthropicclient libraries), and updates the agent's internal state (e.g.,self.provider,self.model,self.api_mode). It also manages prompt caching and handles any errors during the activation process._resolve_fallback_client(fb: dict): Used by the recovery mechanism to perform a non-committing check of a fallback provider's health. It attempts to create a client for the givenfbconfiguration using the centralizedagent.auxiliary_client.resolve_provider_clientwithout changing the agent's active state.
Configuration
The fallback chain is typically defined in the config.yaml file (within the hermes-agent project), under the model.fallback_chain section. For example:
model:
default: openrouter/anthropic/claude-sonnet-4.6
provider: openrouter
fallback_chain:
- provider: groq
model: llama-3.3-70b-versatile
- provider: kimi-coding
model: kimi-k2.5
- provider: custom
model: qwen3.5:latest
base_url: http://localhost:8080/v1
This configuration would instruct the agent to:
- First attempt to use
openrouterwithanthropic/claude-sonnet-4.6. - If
openrouterfails, fall back togroqwithllama-3.3-70b-versatile. - If
groqalso fails, trykimi-codingwithkimi-k2.5. - Finally, if
kimi-codingfails, attempt to use acustomendpoint athttp://localhost:8080/v1withqwen3.5:latest.
The agent will periodically try to move back up this chain if a lower-priority provider is currently active and a higher-priority one becomes available.