[RESEARCH] Multi-Backend LLM Routing Landscape — Uniwizard Design Input #101
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Parent Epic
#94 — Grand Timmy: The Uniwizard
Purpose
Research report on the multi-backend LLM routing landscape. What exists, what works, what doesn't, and what the Hermes harness already has. This informs the implementation of #95 (backend registry + routing) and #96 (task-to-backend classifier).
The Key Insight
Timmy is at the helm, not the inference backend. The LLM backends are brains Timmy borrows. If Claude refuses, route to GPT. If GPT is down, route to Ollama local. The soul stays the same. The brain rotates. This is sovereignty made real.
What Exists in the Wild
Two Categories of Routing
Category A: Infrastructure Gateways — route across providers, handle errors/rate-limits/fallbacks
Category B: Intelligent Model Selectors — ML-driven per-prompt routing for quality/cost optimization
Major Projects Analyzed
Routing Strategies in Production
The Gap Nobody Fills: Semantic Refusal Detection
This is the most important finding.
Every existing gateway handles HTTP-level errors (429, 500, etc). But when Claude returns 200 OK and says "I can't help with that" — nobody catches it and reroutes.
A model can refuse in ways that look like a successful response:
No existing tool detects this and triggers a reroute. This is the uniwizard's unique value proposition.
What Hermes Already Has (5-Layer Routing)
Layer 1: Primary Model Selection (runtime_provider.py)
Layer 2: Smart Model Routing (smart_model_routing.py)
Layer 3: Primary Fallback Chain (run_agent.py)
Layer 4: Auxiliary Task Routing (auxiliary_client.py)
Layer 5: Auto-Detection Chain
Gap Analysis: What Hermes Needs for Uniwizard
Already Have (just needs wiring)
Need to Build
Implementation Recommendation: Extend Hermes Natively
Why Not External Tools
What to Steal from Each Project
The Refusal Detector (Novel Component)
When refusal detected -> immediately reroute to next backend. Log for quality scoring (#98).
Related Issues
Cross-Repo References
Addendum: The Routing Philosophy is Inverted
Every project analyzed above optimizes to MINIMIZE spend. The uniwizard must MAXIMIZE utilization of already-purchased quota.
$500/month across backends. Quotas reset. Unused tokens = wasted money. Timmy's job is to burn through quota toward goals, not to conserve it.
This means the implementation recommendation changes:
GOAP (Goal Oriented Action Planning) is the execution model. Define done, plan backwards, execute aggressively, re-plan on failure. Not step-by-step reactive. Goal-driven. Urgent.
Ezra owns research synthesis. Will distill #101 findings into actionable specs for Timmy's implementation tickets.
Research complete. Findings absorbed into #95 (backend registry), #96 (classifier), and board pass on #94. Semantic refusal detection documented. Closing.