forked from Rockachopa/Timmy-time-dashboard
This commit implements six major features: 1. Event Log System (src/swarm/event_log.py) - SQLite-based audit trail for all swarm events - Task lifecycle tracking (created, assigned, completed, failed) - Agent lifecycle tracking (joined, left, status changes) - Integrated with coordinator for automatic logging - Dashboard page at /swarm/events 2. Lightning Ledger (src/lightning/ledger.py) - Transaction tracking for Lightning Network payments - Balance calculations (incoming, outgoing, net, available) - Integrated with payment_handler for automatic logging - Dashboard page at /lightning/ledger 3. Semantic Memory / Vector Store (src/memory/vector_store.py) - Embedding-based similarity search for Echo agent - Fallback to keyword matching if sentence-transformers unavailable - Personal facts storage and retrieval - Dashboard page at /memory 4. Cascade Router Integration (src/timmy/cascade_adapter.py) - Automatic LLM failover between providers (Ollama → AirLLM → API) - Circuit breaker pattern for failing providers - Metrics tracking per provider (latency, error rates) - Dashboard status page at /router/status 5. Self-Upgrade Approval Queue (src/upgrades/) - State machine for self-modifications: proposed → approved/rejected → applied/failed - Human approval required before applying changes - Git integration for branch management - Dashboard queue at /self-modify/queue 6. Real-Time Activity Feed (src/events/broadcaster.py) - WebSocket-based live activity streaming - Bridges event_log to dashboard clients - Activity panel on /swarm/live Tests: - 101 unit tests passing - 4 new E2E test files for Selenium testing - Run with: SELENIUM_UI=1 pytest tests/functional/ -v --headed Documentation: - 6 ADRs (017-022) documenting architecture decisions - Implementation summary in docs/IMPLEMENTATION_SUMMARY.md - Architecture diagram in docs/architecture-v2.md
3.8 KiB
3.8 KiB
ADR 020: Cascade Router Integration with Timmy Agent
Status
Proposed
Context
Currently, the Timmy agent (src/timmy/agent.py) uses src/timmy/backends.py which provides a simple abstraction over Ollama and AirLLM. However, this lacks:
- Automatic failover between multiple LLM providers
- Circuit breaker pattern for failing providers
- Cost and latency tracking per provider
- Priority-based routing (local first, then APIs)
The Cascade Router (src/router/cascade.py) already implements these features but is not integrated with Timmy.
Decision
Integrate the Cascade Router as the primary LLM routing layer for Timmy, replacing the direct backend abstraction.
Architecture
Current Flow
User Request → Timmy Agent → backends.py → Ollama/AirLLM
Proposed Flow
User Request → Timmy Agent → Cascade Router → Provider 1 (Ollama)
↓ (if fail)
Provider 2 (Local AirLLM)
↓ (if fail)
Provider 3 (API - optional)
↓
Track metrics per provider
Integration Points
-
Timmy Agent (
src/timmy/agent.py)- Replace
create_timmy()backend initialization - Use
CascadeRouter.complete()instead of directagent.run() - Expose provider status in agent responses
- Replace
-
Cascade Router (
src/router/cascade.py)- Already supports: Ollama, OpenAI, Anthropic, AirLLM
- Already has: Circuit breakers, metrics, failover logic
- Add: Integration with existing
src/timmy/prompts.py
-
Configuration (
config.yamlorconfig.py)- Provider list with priorities
- API keys (optional, for cloud fallback)
- Circuit breaker thresholds
-
Dashboard (new route)
/router/status- Show provider health, metrics, recent failures- Real-time provider status indicator
Provider Priority Order
- Ollama (local) - Priority 1, always try first
- AirLLM (local) - Priority 2, if Ollama unavailable
- API providers - Priority 3+, only if configured
Data Flow
# Timmy Agent
async def respond(self, message: str) -> str:
# Get cascade router
router = get_cascade_router()
# Route through cascade with automatic failover
response = await router.complete(
messages=[{"role": "user", "content": message}],
system_prompt=TIMMY_SYSTEM_PROMPT,
)
# Response includes which provider was used
return response.content
Schema Additions
Provider Status Table (new)
CREATE TABLE provider_metrics (
provider_name TEXT PRIMARY KEY,
total_requests INTEGER DEFAULT 0,
successful_requests INTEGER DEFAULT 0,
failed_requests INTEGER DEFAULT 0,
avg_latency_ms REAL DEFAULT 0,
last_error_time TEXT,
circuit_state TEXT DEFAULT 'closed',
updated_at TEXT
);
Consequences
Positive
- Automatic failover improves reliability
- Metrics enable data-driven provider selection
- Circuit breakers prevent cascade failures
- Configurable without code changes
Negative
- Additional complexity in request path
- Potential latency increase from retries
- Requires careful circuit breaker tuning
Mitigations
- Circuit breakers have short recovery timeouts (60s)
- Metrics exposed for monitoring
- Fallback to mock responses if all providers fail
Implementation Plan
- Create
src/timmy/cascade_adapter.py- Adapter between Timmy and Cascade Router - Modify
src/timmy/agent.py- Use adapter instead of direct backends - Create dashboard route
/router/status- Provider health UI - Add provider metrics persistence to SQLite
- Write tests for failover scenarios
Dependencies
- Existing
src/router/cascade.py - Existing
src/timmy/agent.py - New dashboard route