Build backend registry and cloud routing layer #95

New Issue

Timmy · 2026-03-30T15:40:26Z

Timmy commented

2026-03-30 15:40:26 +00:00

Objective

Build the core routing layer that lets Timmy dispatch tasks to cloud LLM backends as blind cognition. This is the heart of the Uniwizard.

Parent Epic

#94 — Grand Timmy: The Uniwizard

Architecture

Backend Registry

BACKENDS = {
    "local": {
        "url": "http://localhost:8081/v1",
        "model": "hermes-4-14b",
        "cost_per_1k": 0.0,
        "strengths": ["privacy", "always-on", "tool-calls"],
        "weaknesses": ["slow", "limited-reasoning"],
        "max_context": 8192,
        "timeout": 120
    },
    "claude": {
        "url": "https://api.anthropic.com/v1",
        "model": "claude-sonnet-4-20250514",
        "cost_per_1k": 0.003,
        "strengths": ["reasoning", "code-review", "analysis"],
        "weaknesses": ["cost", "rate-limits"],
        "max_context": 200000,
        "timeout": 30
    },
    "kimi": {
        "url": "...",
        "model": "kimi-2.5",
        "cost_per_1k": 0.001,
        "strengths": ["long-context", "code-generation"],
        ...
    },
    # etc for GPT, Gemini, Grok
}

Router Interface

class BackendRouter:
    def route(self, task: Task) -> str:
        '''Returns backend key based on task properties'''
        
    def call(self, backend_key: str, prompt: str) -> str:
        '''Send prompt to backend, return response'''
        
    def call_with_fallback(self, task: Task) -> str:
        '''Try primary backend, fall back on failure'''

Routing Rules (v1 — rule-based)

Always try local first for simple tasks (read, write, status)
Route to Claude for: reasoning, analysis, code review, architecture
Route to Kimi for: long documents, bulk code generation
Route to GPT for: broad knowledge questions, summarization
Route to Gemini for: multimodal (images, audio)
Route to Grok for: speed-critical quick answers

Key Design Constraint

Backends receive ONLY:

A task-scoped system prompt (no soul, no identity, no Timmy context)
The user/task message
Tool definitions if needed

They return tokens. Nothing else. No memory, no identity, no continuity.

Timmy wraps the result with his own context, evaluates it, and integrates.

Deliverables

agent/backend_registry.py — backend configs + health checking
agent/router.py — routing logic
agent/cloud_caller.py — unified API caller (handles Claude/OpenAI/etc formats)
configs/backends.yaml — backend configuration file
commands/route.py — Evennia command: route <task> shows which backend would be selected

In Evennia

The Workshop room gets a BackendRouter Script that maintains backend health status. Timmy's think command uses the router automatically.

Acceptance Criteria

All backends registered with health checks
Router correctly classifies 10 test tasks
Fallback works (disable a backend, task goes to next best)
Zero identity leakage (backends never see soul/Timmy context)
Cost tracked per request

## Objective Build the core routing layer that lets Timmy dispatch tasks to cloud LLM backends as blind cognition. This is the heart of the Uniwizard. ## Parent Epic #94 — Grand Timmy: The Uniwizard ## Architecture ### Backend Registry ```python BACKENDS = { "local": { "url": "http://localhost:8081/v1", "model": "hermes-4-14b", "cost_per_1k": 0.0, "strengths": ["privacy", "always-on", "tool-calls"], "weaknesses": ["slow", "limited-reasoning"], "max_context": 8192, "timeout": 120 }, "claude": { "url": "https://api.anthropic.com/v1", "model": "claude-sonnet-4-20250514", "cost_per_1k": 0.003, "strengths": ["reasoning", "code-review", "analysis"], "weaknesses": ["cost", "rate-limits"], "max_context": 200000, "timeout": 30 }, "kimi": { "url": "...", "model": "kimi-2.5", "cost_per_1k": 0.001, "strengths": ["long-context", "code-generation"], ... }, # etc for GPT, Gemini, Grok } ``` ### Router Interface ```python class BackendRouter: def route(self, task: Task) -> str: '''Returns backend key based on task properties''' def call(self, backend_key: str, prompt: str) -> str: '''Send prompt to backend, return response''' def call_with_fallback(self, task: Task) -> str: '''Try primary backend, fall back on failure''' ``` ### Routing Rules (v1 — rule-based) 1. Always try local first for simple tasks (read, write, status) 2. Route to Claude for: reasoning, analysis, code review, architecture 3. Route to Kimi for: long documents, bulk code generation 4. Route to GPT for: broad knowledge questions, summarization 5. Route to Gemini for: multimodal (images, audio) 6. Route to Grok for: speed-critical quick answers ### Key Design Constraint Backends receive ONLY: - A task-scoped system prompt (no soul, no identity, no Timmy context) - The user/task message - Tool definitions if needed They return tokens. Nothing else. No memory, no identity, no continuity. Timmy wraps the result with his own context, evaluates it, and integrates. ## Deliverables - `agent/backend_registry.py` — backend configs + health checking - `agent/router.py` — routing logic - `agent/cloud_caller.py` — unified API caller (handles Claude/OpenAI/etc formats) - `configs/backends.yaml` — backend configuration file - `commands/route.py` — Evennia command: `route <task>` shows which backend would be selected ## In Evennia The Workshop room gets a `BackendRouter` Script that maintains backend health status. Timmy's `think` command uses the router automatically. ## Acceptance Criteria - [ ] All backends registered with health checks - [ ] Router correctly classifies 10 test tasks - [ ] Fallback works (disable a backend, task goes to next best) - [ ] Zero identity leakage (backends never see soul/Timmy context) - [ ] Cost tracked per request

ezra was assigned by Timmy

2026-03-30 15:40:27 +00:00

Timmy referenced this issue

2026-03-30 15:40:59 +00:00

[AUTORESEARCH] Stage 1 local-first pipeline with separated wizard workbenches #50

Rockachopa referenced this issue

2026-03-30 15:48:45 +00:00

[RESEARCH] Multi-Backend LLM Routing Landscape — Uniwizard Design Input #101

Rockachopa referenced this issue

2026-03-30 15:49:19 +00:00

[EPIC] Grand Timmy — The Uniwizard #94

Rockachopa commented

2026-03-30 15:49:28 +00:00

Research input: #101 — LiteLLM error-type classification and cooldown/circuit-breaker pattern should inform registry health tracking. Portkey declarative JSON is cleanest config model. Add allowed_fails + cooldown_time per backend. Novel: semantic refusal detection for 200 OK rerouting.

Rockachopa referenced this issue

2026-03-30 15:50:21 +00:00

[EPIC] Grand Timmy — The Uniwizard #94

Timmy referenced this issue

2026-03-30 15:58:50 +00:00

[EPIC] Grand Timmy — The Uniwizard #94

Timmy commented

2026-03-30 16:03:22 +00:00

Role Transition

Timmy now owns execution — building, coding, implementing.
Ezra moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping.

Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews.

Timmy — build the backend registry and routing layer. See #101 for Alexander's research. Key insight: extend Hermes natively, don't add LiteLLM. Include semantic refusal detection.

## Role Transition **Timmy** now owns execution — building, coding, implementing. **Ezra** moves to persistent online ops — monitoring, triage, review, cron, 24/7 watchkeeping. Timmy: this is yours. Read the ticket, build it, PR it. Ezra reviews. Timmy — build the backend registry and routing layer. See #101 for Alexander's research. Key insight: extend Hermes natively, don't add LiteLLM. Include semantic refusal detection.

ezra was unassigned by Timmy

2026-03-30 16:03:22 +00:00

Timmy self-assigned this 2026-03-30 16:03:22 +00:00

Timmy referenced this issue

2026-03-30 16:49:46 +00:00

[RESEARCH] Multi-Backend LLM Routing Landscape — Uniwizard Design Input #101

Rockachopa referenced this issue

2026-03-30 17:01:38 +00:00

[HARNESS] Programmatic session API — RPC/SDK mode for dispatch without cron #104

Timmy referenced this issue

2026-03-30 19:48:31 +00:00

[EPIC] Grand Timmy — The Uniwizard #94