[Core] Model Tiering Router — Local 8B / Hermes 70B / Cloud API Cascade #882

New Issue

perplexity · 2026-03-21T23:40:08Z

perplexity commented

2026-03-21 23:40:08 +00:00

Why This Is High Leverage

The feasibility guide recommends a three-tier model strategy: local 7-8B for routine actions (free, fast), Hermes 70B for complex reasoning (~8-12 tok/s on M3 Max), and Claude/GPT-4o for difficult decisions ($5-15/hr). Timmy's existing src/infrastructure/models/registry.py and src/infrastructure/router/cascade.py have stubs for this, but there's no intelligence in the routing. This ticket makes the LLM selection automatic based on task complexity, keeping costs near zero for 90% of operations.

Scope

Build an intelligent model router that selects the right LLM tier for each request.

Tier 1: Local Fast (Free)

Llama 3.1 8B or Hermes 3 8B via Ollama
For: navigation, basic interactions, perception processing, simple decisions
Latency: 0.3-1s

Tier 2: Local Heavy (Free, Slower)

Hermes 3/4 70B via Ollama on M3 Max 128GB
For: quest planning, dialogue strategy, combat tactics, complex reasoning
Latency: 8-12 tok/s (~5-10s for 200 tokens)

Tier 3: Cloud API (Paid, Best)

Claude or GPT-4o
For: recovery from failures, novel situations, multi-step planning that failed on Tier 2
Latency: 1-3s, cost ~$5-15/hr of heavy use

Routing Logic

Task complexity classifier (keyword/pattern-based initially, learnable later)
Automatic escalation: if Tier 1 response quality is low, retry on Tier 2
Budget tracking: daily/monthly cloud API spend limits
Fallback chain: if Tier 3 unavailable, Tier 2 handles everything

Requirements

src/infrastructure/models/router.py — Task classifier + tier selection
src/infrastructure/models/budget.py — Cloud API spend tracking + limits
Integrate with existing registry.py and cascade.py
Configurable thresholds and tier assignments
Logging: which tier handled which request, latency, estimated cost
Tests with mock LLM backends

Acceptance Criteria

"Walk to the next room" routes to Tier 1
"Plan the optimal path to become Hortator" routes to Tier 2
Failed Tier 1 response automatically escalates to Tier 2
Cloud API spend stays within configured budget
Routing decisions logged for analysis

Assignee: Kimi

## Why This Is High Leverage The feasibility guide recommends a three-tier model strategy: local 7-8B for routine actions (free, fast), Hermes 70B for complex reasoning (~8-12 tok/s on M3 Max), and Claude/GPT-4o for difficult decisions ($5-15/hr). Timmy's existing `src/infrastructure/models/registry.py` and `src/infrastructure/router/cascade.py` have stubs for this, but there's no intelligence in the routing. This ticket makes the LLM selection automatic based on task complexity, keeping costs near zero for 90% of operations. ## Scope Build an intelligent model router that selects the right LLM tier for each request. ### Tier 1: Local Fast (Free) - Llama 3.1 8B or Hermes 3 8B via Ollama - For: navigation, basic interactions, perception processing, simple decisions - Latency: 0.3-1s ### Tier 2: Local Heavy (Free, Slower) - Hermes 3/4 70B via Ollama on M3 Max 128GB - For: quest planning, dialogue strategy, combat tactics, complex reasoning - Latency: 8-12 tok/s (~5-10s for 200 tokens) ### Tier 3: Cloud API (Paid, Best) - Claude or GPT-4o - For: recovery from failures, novel situations, multi-step planning that failed on Tier 2 - Latency: 1-3s, cost ~$5-15/hr of heavy use ### Routing Logic - Task complexity classifier (keyword/pattern-based initially, learnable later) - Automatic escalation: if Tier 1 response quality is low, retry on Tier 2 - Budget tracking: daily/monthly cloud API spend limits - Fallback chain: if Tier 3 unavailable, Tier 2 handles everything ## Requirements - [ ] `src/infrastructure/models/router.py` — Task classifier + tier selection - [ ] `src/infrastructure/models/budget.py` — Cloud API spend tracking + limits - [ ] Integrate with existing `registry.py` and `cascade.py` - [ ] Configurable thresholds and tier assignments - [ ] Logging: which tier handled which request, latency, estimated cost - [ ] Tests with mock LLM backends ## Acceptance Criteria - "Walk to the next room" routes to Tier 1 - "Plan the optimal path to become Hortator" routes to Tier 2 - Failed Tier 1 response automatically escalates to Tier 2 - Cloud API spend stays within configured budget - Routing decisions logged for analysis ## Assignee: Kimi

perplexity referenced this issue

2026-03-22 13:00:27 +00:00

[Study] State-of-the-Art Open Source for Sovereign Creative AI Agents — Q1 2026 Landscape Survey #903

perplexity referenced this issue

2026-03-22 13:00:55 +00:00

[Study] State-of-the-Art Open Source for Sovereign Creative AI Agents — Q1 2026 Landscape Survey #903

perplexity referenced this issue

2026-03-22 13:04:23 +00:00

[Study] Autoresearch Integration Proposal v2 — Karpathy's Self-Improvement Loop for Timmy Time #904

perplexity referenced this issue

2026-03-22 13:06:05 +00:00

[Autoresearch H2] Define Timmy-Task Benchmark Suite (5-10 Initial Tasks) #909

perplexity referenced this issue

2026-03-22 13:06:06 +00:00

[Study] Autoresearch Integration Proposal v2 — Karpathy's Self-Improvement Loop for Timmy Time #904

perplexity referenced this issue

2026-03-22 18:23:01 +00:00

[GOVERNING] The Sovereignty Loop — Falsework-Native Architecture for Autonomous Game Agents #953

perplexity referenced this issue

2026-03-22 18:23:03 +00:00

[Sovereignty P2] Auto-Crystallizer for Groq Reasoning Chains #961

perplexity referenced this issue

2026-03-22 19:10:52 +00:00

[GOVERNING] Session Crystallization & Operational Playbook — Master Handoff Document #982

perplexity referenced this issue

2026-03-22 19:12:47 +00:00

[GOVERNING] Session Crystallization & Operational Playbook — Master Handoff Document #982

gemini referenced this issue

2026-03-22 23:13:49 +00:00

PR for #982: [GOVERNING] Session Crystallization & Operational Playbook — Master Handoff Document #1043

claude was assigned by Rockachopa

2026-03-22 23:33:21 +00:00

claude added the harness inference p0-critical labels 2026-03-23 13:52:55 +00:00

claude referenced this issue from a commit

2026-03-24 01:51:24 +00:00

feat: three-tier model router — Local 8B / Hermes 70B / Cloud API cascade (#882)

claude referenced a pull request that will close this issue

2026-03-24 01:51:42 +00:00

[claude] Three-tier model router: Local 8B / Hermes 70B / Cloud API cascade (#882) #1297

claude commented

2026-03-24 01:51:53 +00:00

PR created: http://143.198.27.163:3000/Rockachopa/Timmy-time-dashboard/pulls/1297

Summary of changes:

src/infrastructure/models/router.py — TieredModelRouter + classify_tier() heuristic classifier. Tier 1 (LOCAL_FAST, 8B) for navigation/simple actions; Tier 2 (LOCAL_HEAVY, 70B) for planning/complex reasoning; Tier 3 (CLOUD_API) for failures or explicit override. Auto-escalation from T1 on low-quality responses. Budget guard prevents cloud overspend.
src/infrastructure/models/budget.py — BudgetTracker with SQLite persistence and in-memory fallback. Tracks daily/monthly cloud API spend, enforces configurable limits, estimates costs for Claude/GPT-4o/Grok.
src/config.py — 5 new settings: tier_local_fast_model, tier_local_heavy_model, tier_cloud_model, tier_cloud_daily_budget_usd ($5 default), tier_cloud_monthly_budget_usd ($50 default).
44 new unit tests. All 544 unit tests pass.

Acceptance criteria verified: "Walk to the next room" → LOCAL_FAST ✓, "Plan the optimal path to become Hortator" → LOCAL_HEAVY ✓, auto-escalation ✓, budget limits ✓, routing logged ✓.

PR created: http://143.198.27.163:3000/Rockachopa/Timmy-time-dashboard/pulls/1297 **Summary of changes:** - `src/infrastructure/models/router.py` — `TieredModelRouter` + `classify_tier()` heuristic classifier. Tier 1 (LOCAL_FAST, 8B) for navigation/simple actions; Tier 2 (LOCAL_HEAVY, 70B) for planning/complex reasoning; Tier 3 (CLOUD_API) for failures or explicit override. Auto-escalation from T1 on low-quality responses. Budget guard prevents cloud overspend. - `src/infrastructure/models/budget.py` — `BudgetTracker` with SQLite persistence and in-memory fallback. Tracks daily/monthly cloud API spend, enforces configurable limits, estimates costs for Claude/GPT-4o/Grok. - `src/config.py` — 5 new settings: `tier_local_fast_model`, `tier_local_heavy_model`, `tier_cloud_model`, `tier_cloud_daily_budget_usd` ($5 default), `tier_cloud_monthly_budget_usd` ($50 default). - 44 new unit tests. All 544 unit tests pass. Acceptance criteria verified: "Walk to the next room" → LOCAL_FAST ✓, "Plan the optimal path to become Hortator" → LOCAL_HEAVY ✓, auto-escalation ✓, budget limits ✓, routing logged ✓.

claude closed this issue

2026-03-24 01:52:18 +00:00

Timmy referenced this issue from a commit

2026-03-24 01:53:26 +00:00

[claude] Three-tier model router: Local 8B / Hermes 70B / Cloud API cascade (#882) (#1297)

Sign in to join this conversation.

Branches Tags

main

gemini/issue-892

claude/issue-1342

claude/issue-1346

claude/issue-1351

claude/issue-1340

fix/test-llm-triage-syntax

gemini/issue-1014

gemini/issue-932

claude/issue-1277

claude/issue-1139

claude/issue-870

claude/issue-1285

claude/issue-1292

claude/issue-1281

claude/issue-917

claude/issue-1275

claude/issue-925

claude/issue-1019

claude/issue-1094

claude/issue-1019-v3

fix/flaky-vassal-xdist-tests

fix/test-config-env-isolation

claude/issue-1019-v2

claude/issue-957-v2

claude/issue-1218

claude/issue-1217

test/chat-store-unit-tests

claude/issue-1191

claude/issue-1186

claude/issue-957

gemini/issue-936

claude/issue-1065

gemini/issue-976

gemini/issue-1149

claude/issue-1135

claude/issue-1064

gemini/issue-1012

claude/issue-1095

claude/issue-1102

claude/issue-1114

gemini/issue-978

gemini/issue-971

claude/issue-1074

claude/issue-987

claude/issue-1011

feature/internal-monologue

feature/issue-1006

feature/issue-1007

feature/issue-1008

feature/issue-1009

feature/issue-1010

feature/issue-1011

feature/issue-1012

feature/issue-1013

feature/issue-1014

feature/issue-981

feature/issue-982

feature/issue-983

feature/issue-984

feature/issue-985

feature/issue-986

feature/issue-987

feature/issue-993

claude/issue-943

claude/issue-975

claude/issue-989

claude/issue-988

fix/loop-guard-gitea-api-and-queue-validation

feature/lhf-tech-debt-fixes

kimi/issue-753

kimi/issue-714

kimi/issue-716

fix/csrf-check-before-execute

chore/migrate-gitea-to-vps

kimi/issue-640

fix/utcnow-calm-py

kimi/issue-635

kimi/issue-625

fix/router-api-truncated-param

kimi/issue-604

kimi/issue-594

review-fixes

kimi/issue-570

kimi/issue-554

kimi/issue-539

kimi/issue-540

feature/ipad-v1-api

kimi/issue-506

kimi/issue-512

refactor/airllm-doc-cleanup

kimi/issue-513

kimi/issue-514

kimi/issue-500

kimi/issue-492

kimi/issue-490

kimi/issue-459

kimi/issue-472

kimi/issue-473

kimi/issue-462

kimi/issue-463

kimi/issue-454

kimi/issue-445

kimi/issue-446

kimi/issue-431

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#882