forked from Rockachopa/Timmy-time-dashboard
3.5 KiB
3.5 KiB
ADR 020: Cascade Router Integration with Timmy Agent
Status
Proposed
Context
- Automatic failover between multiple LLM providers
- Circuit breaker pattern for failing providers
- Cost and latency tracking per provider
- Priority-based routing (local first, then APIs)
The Cascade Router (src/router/cascade.py) already implements these features but is not integrated with Timmy.
Decision
Integrate the Cascade Router as the primary LLM routing layer for Timmy, replacing the direct backend abstraction.
Architecture
Current Flow
User Request → Timmy Agent → backends.py → Ollama
Proposed Flow
User Request → Timmy Agent → Cascade Router → Provider 1 (Ollama)
↓ (if fail)
↓ (if fail)
Provider 3 (API - optional)
↓
Track metrics per provider
Integration Points
-
Timmy Agent (
src/timmy/agent.py)- Replace
create_timmy()backend initialization - Use
CascadeRouter.complete()instead of directagent.run() - Expose provider status in agent responses
- Replace
-
Cascade Router (
src/router/cascade.py)- Already has: Circuit breakers, metrics, failover logic
- Add: Integration with existing
src/timmy/prompts.py
-
Configuration (
config.yamlorconfig.py)- Provider list with priorities
- API keys (optional, for cloud fallback)
- Circuit breaker thresholds
-
Dashboard (new route)
/router/status- Show provider health, metrics, recent failures- Real-time provider status indicator
Provider Priority Order
- Ollama (local) - Priority 1, always try first
- API providers - Priority 3+, only if configured
Data Flow
# Timmy Agent
async def respond(self, message: str) -> str:
# Get cascade router
router = get_cascade_router()
# Route through cascade with automatic failover
response = await router.complete(
messages=[{"role": "user", "content": message}],
system_prompt=TIMMY_SYSTEM_PROMPT,
)
# Response includes which provider was used
return response.content
Schema Additions
Provider Status Table (new)
CREATE TABLE provider_metrics (
provider_name TEXT PRIMARY KEY,
total_requests INTEGER DEFAULT 0,
successful_requests INTEGER DEFAULT 0,
failed_requests INTEGER DEFAULT 0,
avg_latency_ms REAL DEFAULT 0,
last_error_time TEXT,
circuit_state TEXT DEFAULT 'closed',
updated_at TEXT
);
Consequences
Positive
- Automatic failover improves reliability
- Metrics enable data-driven provider selection
- Circuit breakers prevent cascade failures
- Configurable without code changes
Negative
- Additional complexity in request path
- Potential latency increase from retries
- Requires careful circuit breaker tuning
Mitigations
- Circuit breakers have short recovery timeouts (60s)
- Metrics exposed for monitoring
- Fallback to mock responses if all providers fail
Implementation Plan
- Create
src/timmy/cascade_adapter.py- Adapter between Timmy and Cascade Router - Modify
src/timmy/agent.py- Use adapter instead of direct backends - Create dashboard route
/router/status- Provider health UI - Add provider metrics persistence to SQLite
- Write tests for failover scenarios
Dependencies
- Existing
src/router/cascade.py - Existing
src/timmy/agent.py - New dashboard route