[gemini] Document hermes-agent provider fallback chain (#287) #310
75
HERMES_AGENT_PROVIDER_FALLBACK.md
Normal file
75
HERMES_AGENT_PROVIDER_FALLBACK.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# Hermes Agent Provider Fallback Chain
|
||||
|
||||
Hermes Agent incorporates a robust provider fallback mechanism to ensure continuous operation and resilience against inference provider outages. This system allows the agent to seamlessly switch to alternative Language Model (LLM) providers when the primary one experiences failures, and to intelligently attempt to revert to higher-priority providers once issues are resolved.
|
||||
|
||||
## Key Concepts
|
||||
|
||||
* **Primary Provider (`_primary_snapshot`)**: The initial, preferred LLM provider configured for the agent. Hermes Agent will always attempt to use this provider first and return to it whenever possible.
|
||||
* **Fallback Chain (`_fallback_chain`)**: An ordered list of alternative provider configurations. Each entry in this list is a dictionary specifying a backup `provider` and `model` (e.g., `{"provider": "kimi-coding", "model": "kimi-k2.5"}`). The order in this list denotes their priority, with earlier entries being higher priority.
|
||||
* **Fallback Chain Index (`_fallback_chain_index`)**: An internal pointer that tracks the currently active provider within the fallback system.
|
||||
* `-1`: Indicates the primary provider is active (initial state, or after successful recovery to primary).
|
||||
* `0` to `N-1`: Corresponds to the `N` entries in the `_fallback_chain` list.
|
||||
|
||||
## Mechanism Overview
|
||||
|
||||
The provider fallback system operates through two main processes: cascading down the chain upon failure and recovering up the chain when conditions improve.
|
||||
|
||||
### 1. Cascading Down on Failure (`_try_activate_fallback`)
|
||||
|
||||
When the currently active LLM provider consistently fails after a series of retries (e.g., due to rate limits, API errors, or unavailability), the `_try_activate_fallback` method is invoked.
|
||||
|
||||
* **Process**:
|
||||
1. It iterates sequentially through the `_fallback_chain` list, starting from the next available entry after the current `_fallback_chain_index`.
|
||||
2. For each fallback entry, it attempts to *activate* the provider using the `_activate_provider` helper function.
|
||||
3. If a provider is successfully activated (meaning its credentials can be resolved and a client can be created), that provider becomes the new active inference provider for the agent, and the method returns `True`.
|
||||
4. If all providers in the `_fallback_chain` are attempted and none can be successfully activated, a warning is logged, and the method returns `False`, indicating that the agent has exhausted all available fallback options.
|
||||
|
||||
### 2. Recovering Up the Chain (`_try_recover_up`)
|
||||
|
||||
To ensure the agent utilizes the highest possible priority provider, `_try_recover_up` is periodically called after a configurable number of successful API responses (`_RECOVERY_INTERVAL`).
|
||||
|
||||
* **Process**:
|
||||
1. If the agent is currently using a fallback provider (i.e., `_fallback_chain_index > 0`), it attempts to probe the provider one level higher in priority (closer to the primary provider).
|
||||
2. If the target is the original primary provider, it directly calls `_try_restore_primary`.
|
||||
3. Otherwise, it uses `_resolve_fallback_client` to perform a lightweight check: can a client be successfully created for the higher-priority provider without fully switching?
|
||||
4. If the probe is successful, `_activate_provider` is called to switch to this higher-priority provider, and the `_fallback_chain_index` is updated accordingly. The method returns `True`.
|
||||
|
||||
### 3. Restoring to Primary (`_try_restore_primary`)
|
||||
|
||||
A dedicated method, `_try_restore_primary`, is responsible for attempting to switch the agent back to its `_primary_snapshot` configuration. This is a special case of recovery, always aiming for the original, most preferred provider.
|
||||
|
||||
* **Process**:
|
||||
1. It checks if the `_primary_snapshot` is available.
|
||||
2. It probes the primary provider for health.
|
||||
3. If the primary provider is healthy and can be activated, the agent switches back to it, and the `_fallback_chain_index` is reset to `-1`.
|
||||
|
||||
### Core Helper Functions
|
||||
|
||||
* **`_activate_provider(fb: dict, direction: str)`**: This function is responsible for performing the actual switch to a new provider. It takes a fallback configuration dictionary (`fb`), resolves credentials, creates the appropriate LLM client (e.g., using `openai` or `anthropic` client libraries), and updates the agent's internal state (e.g., `self.provider`, `self.model`, `self.api_mode`). It also manages prompt caching and handles any errors during the activation process.
|
||||
* **`_resolve_fallback_client(fb: dict)`**: Used by the recovery mechanism to perform a non-committing check of a fallback provider's health. It attempts to create a client for the given `fb` configuration using the centralized `agent.auxiliary_client.resolve_provider_client` without changing the agent's active state.
|
||||
|
||||
## Configuration
|
||||
|
||||
The fallback chain is typically defined in the `config.yaml` file (within the `hermes-agent` project), under the `model.fallback_chain` section. For example:
|
||||
|
||||
```yaml
|
||||
model:
|
||||
default: openrouter/anthropic/claude-sonnet-4.6
|
||||
provider: openrouter
|
||||
fallback_chain:
|
||||
- provider: groq
|
||||
model: llama-3.3-70b-versatile
|
||||
- provider: kimi-coding
|
||||
model: kimi-k2.5
|
||||
- provider: custom
|
||||
model: qwen3.5:latest
|
||||
base_url: http://localhost:8080/v1
|
||||
```
|
||||
|
||||
This configuration would instruct the agent to:
|
||||
1. First attempt to use `openrouter` with `anthropic/claude-sonnet-4.6`.
|
||||
2. If `openrouter` fails, fall back to `groq` with `llama-3.3-70b-versatile`.
|
||||
3. If `groq` also fails, try `kimi-coding` with `kimi-k2.5`.
|
||||
4. Finally, if `kimi-coding` fails, attempt to use a `custom` endpoint at `http://localhost:8080/v1` with `qwen3.5:latest`.
|
||||
|
||||
The agent will periodically try to move back up this chain if a lower-priority provider is currently active and a higher-priority one becomes available.
|
||||
Reference in New Issue
Block a user