feat: simple fallback model for provider resilience
When the primary model/provider fails after retries (rate limit, overload,
auth errors, connection failures), Hermes automatically switches to a
configured fallback model for the remainder of the session.
Config (in ~/.hermes/config.yaml):
fallback_model:
provider: openrouter
model: anthropic/claude-sonnet-4
Supports all major providers: OpenRouter, OpenAI, Nous, DeepSeek, Together,
Groq, Fireworks, Mistral, Gemini — plus custom endpoints via base_url and
api_key_env overrides.
Design principles:
- Dead simple: one fallback model, not a chain
- One-shot: switches once, doesn't ping-pong back
- Zero new dependencies: uses existing OpenAI client
- Minimal code: ~100 lines in run_agent.py, ~5 lines in cli.py/gateway
- Three trigger points: max retries exhausted, non-retryable client errors,
and invalid response exhaustion
Does NOT trigger on context overflow or payload-too-large errors (those
are handled by the existing compression system).
Addresses #737.
25 new tests, 2492 total passing.
This commit is contained in:
@@ -103,6 +103,18 @@ DEFAULT_CONFIG = {
|
||||
},
|
||||
},
|
||||
|
||||
# Fallback model — used when the primary model/provider fails after retries.
|
||||
# When the primary hits rate limits (429), overload (529), or service errors (503),
|
||||
# Hermes will automatically switch to this model for the remainder of the session.
|
||||
# Set to None / omit to disable fallback.
|
||||
"fallback_model": {
|
||||
"provider": "", # e.g. "openrouter", "openai", "nous", "deepseek", "together", "groq"
|
||||
"model": "", # e.g. "anthropic/claude-sonnet-4", "gpt-4.1", "deepseek-chat"
|
||||
# Optional overrides (usually auto-resolved from provider):
|
||||
# "base_url": "", # custom endpoint URL
|
||||
# "api_key_env": "", # env var name for API key (e.g. "MY_CUSTOM_KEY")
|
||||
},
|
||||
|
||||
"display": {
|
||||
"compact": False,
|
||||
"personality": "kawaii",
|
||||
|
||||
Reference in New Issue
Block a user