[claude] Three-tier model router: Local 8B / Hermes 70B / Cloud API cascade (#882) #1297

claude · 2026-03-24T01:51:42Z

claude commented

2026-03-24 01:51:42 +00:00

Fixes #882

What this does

Implements the intelligent three-tier model router described in issue #882, making LLM backend selection automatic based on task complexity.

New files

src/infrastructure/models/router.py — TieredModelRouter

classify_tier(task, context) — heuristic classifier, no LLM call required
Tier 1 (LOCAL_FAST): short navigation/action tasks → 8B local model
Tier 2 (LOCAL_HEAVY): planning, strategy, complex reasoning → 70B local model
Tier 3 (CLOUD_API): explicit override or Tier-2 failure → cloud (Claude/GPT-4o)
Auto-escalation: Tier-1 low-quality response → retry on Tier-2
Budget guard: cloud tier blocked when daily/monthly limit reached
Per-request logging: tier, model, latency, estimated cost

src/infrastructure/models/budget.py — BudgetTracker

SQLite-backed spend recording with in-memory fallback (graceful degradation)
record_spend(provider, model, tokens_in, tokens_out) with auto cost estimation
cloud_allowed() checks daily + monthly limits
get_summary() for dashboards
Cost estimates for Claude (haiku/sonnet/opus), GPT-4o, Grok

Config additions (`src/config.py`)

Setting	Default	Env var
`tier_local_fast_model`	`llama3.1:8b`	`TIER_LOCAL_FAST_MODEL`
`tier_local_heavy_model`	`hermes3:70b`	`TIER_LOCAL_HEAVY_MODEL`
`tier_cloud_model`	`claude-haiku-4-5`	`TIER_CLOUD_MODEL`
`tier_cloud_daily_budget_usd`	`5.0`	`TIER_CLOUD_DAILY_BUDGET_USD`
`tier_cloud_monthly_budget_usd`	`50.0`	`TIER_CLOUD_MONTHLY_BUDGET_USD`

Acceptance criteria

"Walk to the next room" → LOCAL_FAST (Tier 1)
"Plan the optimal path to become Hortator" → LOCAL_HEAVY (Tier 2)
Failed Tier-1 response auto-escalates to Tier-2
Cloud API spend stays within configured budget
Routing decisions logged for analysis

Tests

44 new unit tests in:

tests/infrastructure/test_tiered_model_router.py (router + classifier)
tests/infrastructure/test_budget_tracker.py (spend tracking + limits)

All 544 unit tests pass (tox -e unit).

Fixes #882 ## What this does Implements the intelligent three-tier model router described in issue #882, making LLM backend selection automatic based on task complexity. ### New files **`src/infrastructure/models/router.py`** — `TieredModelRouter` - `classify_tier(task, context)` — heuristic classifier, no LLM call required - Tier 1 (`LOCAL_FAST`): short navigation/action tasks → 8B local model - Tier 2 (`LOCAL_HEAVY`): planning, strategy, complex reasoning → 70B local model - Tier 3 (`CLOUD_API`): explicit override or Tier-2 failure → cloud (Claude/GPT-4o) - Auto-escalation: Tier-1 low-quality response → retry on Tier-2 - Budget guard: cloud tier blocked when daily/monthly limit reached - Per-request logging: tier, model, latency, estimated cost **`src/infrastructure/models/budget.py`** — `BudgetTracker` - SQLite-backed spend recording with in-memory fallback (graceful degradation) - `record_spend(provider, model, tokens_in, tokens_out)` with auto cost estimation - `cloud_allowed()` checks daily + monthly limits - `get_summary()` for dashboards - Cost estimates for Claude (haiku/sonnet/opus), GPT-4o, Grok ### Config additions (`src/config.py`) | Setting | Default | Env var | |---|---|---| | `tier_local_fast_model` | `llama3.1:8b` | `TIER_LOCAL_FAST_MODEL` | | `tier_local_heavy_model` | `hermes3:70b` | `TIER_LOCAL_HEAVY_MODEL` | | `tier_cloud_model` | `claude-haiku-4-5` | `TIER_CLOUD_MODEL` | | `tier_cloud_daily_budget_usd` | `5.0` | `TIER_CLOUD_DAILY_BUDGET_USD` | | `tier_cloud_monthly_budget_usd` | `50.0` | `TIER_CLOUD_MONTHLY_BUDGET_USD` | ## Acceptance criteria - [x] `"Walk to the next room"` → `LOCAL_FAST` (Tier 1) - [x] `"Plan the optimal path to become Hortator"` → `LOCAL_HEAVY` (Tier 2) - [x] Failed Tier-1 response auto-escalates to Tier-2 - [x] Cloud API spend stays within configured budget - [x] Routing decisions logged for analysis ## Tests 44 new unit tests in: - `tests/infrastructure/test_tiered_model_router.py` (router + classifier) - `tests/infrastructure/test_budget_tracker.py` (spend tracking + limits) All 544 unit tests pass (`tox -e unit`).

claude added 1 commit 2026-03-24 01:51:42 +00:00

feat: three-tier model router — Local 8B / Hermes 70B / Cloud API cascade (#882 )

Tests / lint (pull_request) Failing after 16s

Details

Tests / test (pull_request) Has been skipped

Details

80d798a94b

Implements the intelligent model tiering router from issue #882:

- `src/infrastructure/models/router.py` — TieredModelRouter with heuristic
  task classifier (classify_tier), automatic T1→T2 escalation on low-quality
  responses, cloud-tier budget guard, and per-request routing logs.

- `src/infrastructure/models/budget.py` — BudgetTracker with SQLite
  persistence (in-memory fallback), daily/monthly cloud spend limits,
  cost estimates per model, and get_summary() for dashboards.

- `src/config.py` — five new settings: tier_local_fast_model,
  tier_local_heavy_model, tier_cloud_model, tier_cloud_daily_budget_usd
  (default $5), tier_cloud_monthly_budget_usd (default $50).

- Exports added to `src/infrastructure/models/__init__.py`.

- 44 new unit tests covering classify_tier, _is_low_quality, BudgetTracker,
  and TieredModelRouter (including acceptance criteria from the issue).

Acceptance criteria verified:
  "Walk to the next room"                       → LOCAL_FAST (Tier 1) ✓
  "Plan the optimal path to become Hortator"    → LOCAL_HEAVY (Tier 2) ✓
  Failed Tier-1 response auto-escalates to T2   ✓
  Cloud spend stays within configured budget    ✓
  Routing decisions logged                      ✓

Fixes #882

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude referenced this pull request

2026-03-24 01:51:53 +00:00

[Core] Model Tiering Router — Local 8B / Hermes 70B / Cloud API Cascade #882

Timmy merged commit 4f80cfcd58 into main

2026-03-24 01:53:26 +00:00

Timmy deleted branch claude/issue-882

2026-03-24 01:53:26 +00:00

Timmy referenced this issue from a commit

2026-03-24 01:53:26 +00:00

[claude] Three-tier model router: Local 8B / Hermes 70B / Cloud API cascade (#882) (#1297)

Sign in to join this conversation.