Files
the-nexus/HERMES_AGENT_PROVIDER_FALLBACK.md
Google Gemini 31b05e3549
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
[gemini] Document hermes-agent provider fallback chain (#287) (#310)
Co-authored-by: Google Gemini <gemini@hermes.local>
Co-committed-by: Google Gemini <gemini@hermes.local>
2026-03-24 04:52:20 +00:00

5.7 KiB

Hermes Agent Provider Fallback Chain

Hermes Agent incorporates a robust provider fallback mechanism to ensure continuous operation and resilience against inference provider outages. This system allows the agent to seamlessly switch to alternative Language Model (LLM) providers when the primary one experiences failures, and to intelligently attempt to revert to higher-priority providers once issues are resolved.

Key Concepts

  • Primary Provider (_primary_snapshot): The initial, preferred LLM provider configured for the agent. Hermes Agent will always attempt to use this provider first and return to it whenever possible.
  • Fallback Chain (_fallback_chain): An ordered list of alternative provider configurations. Each entry in this list is a dictionary specifying a backup provider and model (e.g., {"provider": "kimi-coding", "model": "kimi-k2.5"}). The order in this list denotes their priority, with earlier entries being higher priority.
  • Fallback Chain Index (_fallback_chain_index): An internal pointer that tracks the currently active provider within the fallback system.
    • -1: Indicates the primary provider is active (initial state, or after successful recovery to primary).
    • 0 to N-1: Corresponds to the N entries in the _fallback_chain list.

Mechanism Overview

The provider fallback system operates through two main processes: cascading down the chain upon failure and recovering up the chain when conditions improve.

1. Cascading Down on Failure (_try_activate_fallback)

When the currently active LLM provider consistently fails after a series of retries (e.g., due to rate limits, API errors, or unavailability), the _try_activate_fallback method is invoked.

  • Process:
    1. It iterates sequentially through the _fallback_chain list, starting from the next available entry after the current _fallback_chain_index.
    2. For each fallback entry, it attempts to activate the provider using the _activate_provider helper function.
    3. If a provider is successfully activated (meaning its credentials can be resolved and a client can be created), that provider becomes the new active inference provider for the agent, and the method returns True.
    4. If all providers in the _fallback_chain are attempted and none can be successfully activated, a warning is logged, and the method returns False, indicating that the agent has exhausted all available fallback options.

2. Recovering Up the Chain (_try_recover_up)

To ensure the agent utilizes the highest possible priority provider, _try_recover_up is periodically called after a configurable number of successful API responses (_RECOVERY_INTERVAL).

  • Process:
    1. If the agent is currently using a fallback provider (i.e., _fallback_chain_index > 0), it attempts to probe the provider one level higher in priority (closer to the primary provider).
    2. If the target is the original primary provider, it directly calls _try_restore_primary.
    3. Otherwise, it uses _resolve_fallback_client to perform a lightweight check: can a client be successfully created for the higher-priority provider without fully switching?
    4. If the probe is successful, _activate_provider is called to switch to this higher-priority provider, and the _fallback_chain_index is updated accordingly. The method returns True.

3. Restoring to Primary (_try_restore_primary)

A dedicated method, _try_restore_primary, is responsible for attempting to switch the agent back to its _primary_snapshot configuration. This is a special case of recovery, always aiming for the original, most preferred provider.

  • Process:
    1. It checks if the _primary_snapshot is available.
    2. It probes the primary provider for health.
    3. If the primary provider is healthy and can be activated, the agent switches back to it, and the _fallback_chain_index is reset to -1.

Core Helper Functions

  • _activate_provider(fb: dict, direction: str): This function is responsible for performing the actual switch to a new provider. It takes a fallback configuration dictionary (fb), resolves credentials, creates the appropriate LLM client (e.g., using openai or anthropic client libraries), and updates the agent's internal state (e.g., self.provider, self.model, self.api_mode). It also manages prompt caching and handles any errors during the activation process.
  • _resolve_fallback_client(fb: dict): Used by the recovery mechanism to perform a non-committing check of a fallback provider's health. It attempts to create a client for the given fb configuration using the centralized agent.auxiliary_client.resolve_provider_client without changing the agent's active state.

Configuration

The fallback chain is typically defined in the config.yaml file (within the hermes-agent project), under the model.fallback_chain section. For example:

model:
  default: openrouter/anthropic/claude-sonnet-4.6
  provider: openrouter
  fallback_chain:
    - provider: groq
      model: llama-3.3-70b-versatile
    - provider: kimi-coding
      model: kimi-k2.5
    - provider: custom
      model: qwen3.5:latest
      base_url: http://localhost:8080/v1

This configuration would instruct the agent to:

  1. First attempt to use openrouter with anthropic/claude-sonnet-4.6.
  2. If openrouter fails, fall back to groq with llama-3.3-70b-versatile.
  3. If groq also fails, try kimi-coding with kimi-k2.5.
  4. Finally, if kimi-coding fails, attempt to use a custom endpoint at http://localhost:8080/v1 with qwen3.5:latest.

The agent will periodically try to move back up this chain if a lower-priority provider is currently active and a higher-priority one becomes available.