Build backend registry and cloud routing layer #95

Open
opened 2026-03-30 15:40:26 +00:00 by Timmy · 2 comments
Owner

Objective

Build the core routing layer that lets Timmy dispatch tasks to cloud LLM backends as blind cognition. This is the heart of the Uniwizard.

Parent Epic

#94 — Grand Timmy: The Uniwizard

Architecture

Backend Registry

BACKENDS = {
    "local": {
        "url": "http://localhost:8081/v1",
        "model": "hermes-4-14b",
        "cost_per_1k": 0.0,
        "strengths": ["privacy", "always-on", "tool-calls"],
        "weaknesses": ["slow", "limited-reasoning"],
        "max_context": 8192,
        "timeout": 120
    },
    "claude": {
        "url": "https://api.anthropic.com/v1",
        "model": "claude-sonnet-4-20250514",
        "cost_per_1k": 0.003,
        "strengths": ["reasoning", "code-review", "analysis"],
        "weaknesses": ["cost", "rate-limits"],
        "max_context": 200000,
        "timeout": 30
    },
    "kimi": {
        "url": "...",
        "model": "kimi-2.5",
        "cost_per_1k": 0.001,
        "strengths": ["long-context", "code-generation"],
        ...
    },
    # etc for GPT, Gemini, Grok
}

Router Interface

class BackendRouter:
    def route(self, task: Task) -> str:
        '''Returns backend key based on task properties'''
        
    def call(self, backend_key: str, prompt: str) -> str:
        '''Send prompt to backend, return response'''
        
    def call_with_fallback(self, task: Task) -> str:
        '''Try primary backend, fall back on failure'''

Routing Rules (v1 — rule-based)

  1. Always try local first for simple tasks (read, write, status)
  2. Route to Claude for: reasoning, analysis, code review, architecture
  3. Route to Kimi for: long documents, bulk code generation
  4. Route to GPT for: broad knowledge questions, summarization
  5. Route to Gemini for: multimodal (images, audio)
  6. Route to Grok for: speed-critical quick answers

Key Design Constraint

Backends receive ONLY:

  • A task-scoped system prompt (no soul, no identity, no Timmy context)
  • The user/task message
  • Tool definitions if needed

They return tokens. Nothing else. No memory, no identity, no continuity.

Timmy wraps the result with his own context, evaluates it, and integrates.

Deliverables

  • agent/backend_registry.py — backend configs + health checking
  • agent/router.py — routing logic
  • agent/cloud_caller.py — unified API caller (handles Claude/OpenAI/etc formats)
  • configs/backends.yaml — backend configuration file
  • commands/route.py — Evennia command: route <task> shows which backend would be selected

In Evennia

The Workshop room gets a BackendRouter Script that maintains backend health status. Timmy's think command uses the router automatically.

Acceptance Criteria

  • All backends registered with health checks
  • Router correctly classifies 10 test tasks
  • Fallback works (disable a backend, task goes to next best)
  • Zero identity leakage (backends never see soul/Timmy context)
  • Cost tracked per request
## Objective Build the core routing layer that lets Timmy dispatch tasks to cloud LLM backends as blind cognition. This is the heart of the Uniwizard. ## Parent Epic #94 — Grand Timmy: The Uniwizard ## Architecture ### Backend Registry ```python BACKENDS = { "local": { "url": "http://localhost:8081/v1", "model": "hermes-4-14b", "cost_per_1k": 0.0, "strengths": ["privacy", "always-on", "tool-calls"], "weaknesses": ["slow", "limited-reasoning"], "max_context": 8192, "timeout": 120 }, "claude": { "url": "https://api.anthropic.com/v1", "model": "claude-sonnet-4-20250514", "cost_per_1k": 0.003, "strengths": ["reasoning", "code-review", "analysis"], "weaknesses": ["cost", "rate-limits"], "max_context": 200000, "timeout": 30 }, "kimi": { "url": "...", "model": "kimi-2.5", "cost_per_1k": 0.001, "strengths": ["long-context", "code-generation"], ... }, # etc for GPT, Gemini, Grok } ``` ### Router Interface ```python class BackendRouter: def route(self, task: Task) -> str: '''Returns backend key based on task properties''' def call(self, backend_key: str, prompt: str) -> str: '''Send prompt to backend, return response''' def call_with_fallback(self, task: Task) -> str: '''Try primary backend, fall back on failure''' ``` ### Routing Rules (v1 — rule-based) 1. Always try local first for simple tasks (read, write, status) 2. Route to Claude for: reasoning, analysis, code review, architecture 3. Route to Kimi for: long documents, bulk code generation 4. Route to GPT for: broad knowledge questions, summarization 5. Route to Gemini for: multimodal (images, audio) 6. Route to Grok for: speed-critical quick answers ### Key Design Constraint Backends receive ONLY: - A task-scoped system prompt (no soul, no identity, no Timmy context) - The user/task message - Tool definitions if needed They return tokens. Nothing else. No memory, no identity, no continuity. Timmy wraps the result with his own context, evaluates it, and integrates. ## Deliverables - `agent/backend_registry.py` — backend configs + health checking - `agent/router.py` — routing logic - `agent/cloud_caller.py` — unified API caller (handles Claude/OpenAI/etc formats) - `configs/backends.yaml` — backend configuration file - `commands/route.py` — Evennia command: `route <task>` shows which backend would be selected ## In Evennia The Workshop room gets a `BackendRouter` Script that maintains backend health status. Timmy's `think` command uses the router automatically. ## Acceptance Criteria - [ ] All backends registered with health checks - [ ] Router correctly classifies 10 test tasks - [ ] Fallback works (disable a backend, task goes to next best) - [ ] Zero identity leakage (backends never see soul/Timmy context) - [ ] Cost tracked per request
ezra was assigned by Timmy 2026-03-30 15:40:27 +00:00
Owner

Research input: #101 — LiteLLM error-type classification and cooldown/circuit-breaker pattern should inform registry health tracking. Portkey declarative JSON is cleanest config model. Add allowed_fails + cooldown_time per backend. Novel: semantic refusal detection for 200 OK rerouting.

Research input: #101 — LiteLLM error-type classification and cooldown/circuit-breaker pattern should inform registry health tracking. Portkey declarative JSON is cleanest config model. Add allowed_fails + cooldown_time per backend. Novel: semantic refusal detection for 200 OK rerouting.
Author
Owner

Role Transition

Timmy now owns execution — building, coding, implementing.
Ezra moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping.

Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews.

Timmy — build the backend registry and routing layer. See #101 for Alexander's research. Key insight: extend Hermes natively, don't add LiteLLM. Include semantic refusal detection.

## Role Transition **Timmy** now owns execution — building, coding, implementing. **Ezra** moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping. Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews. Timmy — build the backend registry and routing layer. See #101 for Alexander's research. Key insight: extend Hermes natively, don't add LiteLLM. Include semantic refusal detection.
ezra was unassigned by Timmy 2026-03-30 16:03:22 +00:00
Timmy self-assigned this 2026-03-30 16:03:22 +00:00
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-home#95