This repository has been archived on 2026-03-24. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
Timmy-time-dashboard/docs/adr/020-cascade-router-integration.md
Alexander Payne d8d976aa60 feat: complete Event Log, Ledger, Memory, Cascade Router, Upgrade Queue, Activity Feed
This commit implements six major features:

1. Event Log System (src/swarm/event_log.py)
   - SQLite-based audit trail for all swarm events
   - Task lifecycle tracking (created, assigned, completed, failed)
   - Agent lifecycle tracking (joined, left, status changes)
   - Integrated with coordinator for automatic logging
   - Dashboard page at /swarm/events

2. Lightning Ledger (src/lightning/ledger.py)
   - Transaction tracking for Lightning Network payments
   - Balance calculations (incoming, outgoing, net, available)
   - Integrated with payment_handler for automatic logging
   - Dashboard page at /lightning/ledger

3. Semantic Memory / Vector Store (src/memory/vector_store.py)
   - Embedding-based similarity search for Echo agent
   - Fallback to keyword matching if sentence-transformers unavailable
   - Personal facts storage and retrieval
   - Dashboard page at /memory

4. Cascade Router Integration (src/timmy/cascade_adapter.py)
   - Automatic LLM failover between providers (Ollama → AirLLM → API)
   - Circuit breaker pattern for failing providers
   - Metrics tracking per provider (latency, error rates)
   - Dashboard status page at /router/status

5. Self-Upgrade Approval Queue (src/upgrades/)
   - State machine for self-modifications: proposed → approved/rejected → applied/failed
   - Human approval required before applying changes
   - Git integration for branch management
   - Dashboard queue at /self-modify/queue

6. Real-Time Activity Feed (src/events/broadcaster.py)
   - WebSocket-based live activity streaming
   - Bridges event_log to dashboard clients
   - Activity panel on /swarm/live

Tests:
- 101 unit tests passing
- 4 new E2E test files for Selenium testing
- Run with: SELENIUM_UI=1 pytest tests/functional/ -v --headed

Documentation:
- 6 ADRs (017-022) documenting architecture decisions
- Implementation summary in docs/IMPLEMENTATION_SUMMARY.md
- Architecture diagram in docs/architecture-v2.md
2026-02-26 08:01:01 -05:00

3.8 KiB

ADR 020: Cascade Router Integration with Timmy Agent

Status

Proposed

Context

Currently, the Timmy agent (src/timmy/agent.py) uses src/timmy/backends.py which provides a simple abstraction over Ollama and AirLLM. However, this lacks:

  • Automatic failover between multiple LLM providers
  • Circuit breaker pattern for failing providers
  • Cost and latency tracking per provider
  • Priority-based routing (local first, then APIs)

The Cascade Router (src/router/cascade.py) already implements these features but is not integrated with Timmy.

Decision

Integrate the Cascade Router as the primary LLM routing layer for Timmy, replacing the direct backend abstraction.

Architecture

Current Flow

User Request → Timmy Agent → backends.py → Ollama/AirLLM

Proposed Flow

User Request → Timmy Agent → Cascade Router → Provider 1 (Ollama)
                                      ↓ (if fail)
                                 Provider 2 (Local AirLLM)
                                      ↓ (if fail)
                                 Provider 3 (API - optional)
                                      ↓
                                 Track metrics per provider

Integration Points

  1. Timmy Agent (src/timmy/agent.py)

    • Replace create_timmy() backend initialization
    • Use CascadeRouter.complete() instead of direct agent.run()
    • Expose provider status in agent responses
  2. Cascade Router (src/router/cascade.py)

    • Already supports: Ollama, OpenAI, Anthropic, AirLLM
    • Already has: Circuit breakers, metrics, failover logic
    • Add: Integration with existing src/timmy/prompts.py
  3. Configuration (config.yaml or config.py)

    • Provider list with priorities
    • API keys (optional, for cloud fallback)
    • Circuit breaker thresholds
  4. Dashboard (new route)

    • /router/status - Show provider health, metrics, recent failures
    • Real-time provider status indicator

Provider Priority Order

  1. Ollama (local) - Priority 1, always try first
  2. AirLLM (local) - Priority 2, if Ollama unavailable
  3. API providers - Priority 3+, only if configured

Data Flow

# Timmy Agent
async def respond(self, message: str) -> str:
    # Get cascade router
    router = get_cascade_router()
    
    # Route through cascade with automatic failover
    response = await router.complete(
        messages=[{"role": "user", "content": message}],
        system_prompt=TIMMY_SYSTEM_PROMPT,
    )
    
    # Response includes which provider was used
    return response.content

Schema Additions

Provider Status Table (new)

CREATE TABLE provider_metrics (
    provider_name TEXT PRIMARY KEY,
    total_requests INTEGER DEFAULT 0,
    successful_requests INTEGER DEFAULT 0,
    failed_requests INTEGER DEFAULT 0,
    avg_latency_ms REAL DEFAULT 0,
    last_error_time TEXT,
    circuit_state TEXT DEFAULT 'closed',
    updated_at TEXT
);

Consequences

Positive

  • Automatic failover improves reliability
  • Metrics enable data-driven provider selection
  • Circuit breakers prevent cascade failures
  • Configurable without code changes

Negative

  • Additional complexity in request path
  • Potential latency increase from retries
  • Requires careful circuit breaker tuning

Mitigations

  • Circuit breakers have short recovery timeouts (60s)
  • Metrics exposed for monitoring
  • Fallback to mock responses if all providers fail

Implementation Plan

  1. Create src/timmy/cascade_adapter.py - Adapter between Timmy and Cascade Router
  2. Modify src/timmy/agent.py - Use adapter instead of direct backends
  3. Create dashboard route /router/status - Provider health UI
  4. Add provider metrics persistence to SQLite
  5. Write tests for failover scenarios

Dependencies

  • Existing src/router/cascade.py
  • Existing src/timmy/agent.py
  • New dashboard route