This repository has been archived on 2026-03-24. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
Timmy-time-dashboard/docs/adr/020-cascade-router-integration.md
Alexander Payne d8d976aa60 feat: complete Event Log, Ledger, Memory, Cascade Router, Upgrade Queue, Activity Feed
This commit implements six major features:

1. Event Log System (src/swarm/event_log.py)
   - SQLite-based audit trail for all swarm events
   - Task lifecycle tracking (created, assigned, completed, failed)
   - Agent lifecycle tracking (joined, left, status changes)
   - Integrated with coordinator for automatic logging
   - Dashboard page at /swarm/events

2. Lightning Ledger (src/lightning/ledger.py)
   - Transaction tracking for Lightning Network payments
   - Balance calculations (incoming, outgoing, net, available)
   - Integrated with payment_handler for automatic logging
   - Dashboard page at /lightning/ledger

3. Semantic Memory / Vector Store (src/memory/vector_store.py)
   - Embedding-based similarity search for Echo agent
   - Fallback to keyword matching if sentence-transformers unavailable
   - Personal facts storage and retrieval
   - Dashboard page at /memory

4. Cascade Router Integration (src/timmy/cascade_adapter.py)
   - Automatic LLM failover between providers (Ollama → AirLLM → API)
   - Circuit breaker pattern for failing providers
   - Metrics tracking per provider (latency, error rates)
   - Dashboard status page at /router/status

5. Self-Upgrade Approval Queue (src/upgrades/)
   - State machine for self-modifications: proposed → approved/rejected → applied/failed
   - Human approval required before applying changes
   - Git integration for branch management
   - Dashboard queue at /self-modify/queue

6. Real-Time Activity Feed (src/events/broadcaster.py)
   - WebSocket-based live activity streaming
   - Bridges event_log to dashboard clients
   - Activity panel on /swarm/live

Tests:
- 101 unit tests passing
- 4 new E2E test files for Selenium testing
- Run with: SELENIUM_UI=1 pytest tests/functional/ -v --headed

Documentation:
- 6 ADRs (017-022) documenting architecture decisions
- Implementation summary in docs/IMPLEMENTATION_SUMMARY.md
- Architecture diagram in docs/architecture-v2.md
2026-02-26 08:01:01 -05:00

127 lines
3.8 KiB
Markdown

# ADR 020: Cascade Router Integration with Timmy Agent
## Status
Proposed
## Context
Currently, the Timmy agent (`src/timmy/agent.py`) uses `src/timmy/backends.py` which provides a simple abstraction over Ollama and AirLLM. However, this lacks:
- Automatic failover between multiple LLM providers
- Circuit breaker pattern for failing providers
- Cost and latency tracking per provider
- Priority-based routing (local first, then APIs)
The Cascade Router (`src/router/cascade.py`) already implements these features but is not integrated with Timmy.
## Decision
Integrate the Cascade Router as the primary LLM routing layer for Timmy, replacing the direct backend abstraction.
## Architecture
### Current Flow
```
User Request → Timmy Agent → backends.py → Ollama/AirLLM
```
### Proposed Flow
```
User Request → Timmy Agent → Cascade Router → Provider 1 (Ollama)
↓ (if fail)
Provider 2 (Local AirLLM)
↓ (if fail)
Provider 3 (API - optional)
Track metrics per provider
```
### Integration Points
1. **Timmy Agent** (`src/timmy/agent.py`)
- Replace `create_timmy()` backend initialization
- Use `CascadeRouter.complete()` instead of direct `agent.run()`
- Expose provider status in agent responses
2. **Cascade Router** (`src/router/cascade.py`)
- Already supports: Ollama, OpenAI, Anthropic, AirLLM
- Already has: Circuit breakers, metrics, failover logic
- Add: Integration with existing `src/timmy/prompts.py`
3. **Configuration** (`config.yaml` or `config.py`)
- Provider list with priorities
- API keys (optional, for cloud fallback)
- Circuit breaker thresholds
4. **Dashboard** (new route)
- `/router/status` - Show provider health, metrics, recent failures
- Real-time provider status indicator
### Provider Priority Order
1. **Ollama (local)** - Priority 1, always try first
2. **AirLLM (local)** - Priority 2, if Ollama unavailable
3. **API providers** - Priority 3+, only if configured
### Data Flow
```python
# Timmy Agent
async def respond(self, message: str) -> str:
# Get cascade router
router = get_cascade_router()
# Route through cascade with automatic failover
response = await router.complete(
messages=[{"role": "user", "content": message}],
system_prompt=TIMMY_SYSTEM_PROMPT,
)
# Response includes which provider was used
return response.content
```
## Schema Additions
### Provider Status Table (new)
```sql
CREATE TABLE provider_metrics (
provider_name TEXT PRIMARY KEY,
total_requests INTEGER DEFAULT 0,
successful_requests INTEGER DEFAULT 0,
failed_requests INTEGER DEFAULT 0,
avg_latency_ms REAL DEFAULT 0,
last_error_time TEXT,
circuit_state TEXT DEFAULT 'closed',
updated_at TEXT
);
```
## Consequences
### Positive
- Automatic failover improves reliability
- Metrics enable data-driven provider selection
- Circuit breakers prevent cascade failures
- Configurable without code changes
### Negative
- Additional complexity in request path
- Potential latency increase from retries
- Requires careful circuit breaker tuning
### Mitigations
- Circuit breakers have short recovery timeouts (60s)
- Metrics exposed for monitoring
- Fallback to mock responses if all providers fail
## Implementation Plan
1. Create `src/timmy/cascade_adapter.py` - Adapter between Timmy and Cascade Router
2. Modify `src/timmy/agent.py` - Use adapter instead of direct backends
3. Create dashboard route `/router/status` - Provider health UI
4. Add provider metrics persistence to SQLite
5. Write tests for failover scenarios
## Dependencies
- Existing `src/router/cascade.py`
- Existing `src/timmy/agent.py`
- New dashboard route