[claude] Three-tier model router: Local 8B / Hermes 70B / Cloud API cascade (#882) #1297

Merged
Timmy merged 1 commits from claude/issue-882 into main 2026-03-24 01:53:26 +00:00
Collaborator

Fixes #882

What this does

Implements the intelligent three-tier model router described in issue #882, making LLM backend selection automatic based on task complexity.

New files

src/infrastructure/models/router.pyTieredModelRouter

  • classify_tier(task, context) — heuristic classifier, no LLM call required
  • Tier 1 (LOCAL_FAST): short navigation/action tasks → 8B local model
  • Tier 2 (LOCAL_HEAVY): planning, strategy, complex reasoning → 70B local model
  • Tier 3 (CLOUD_API): explicit override or Tier-2 failure → cloud (Claude/GPT-4o)
  • Auto-escalation: Tier-1 low-quality response → retry on Tier-2
  • Budget guard: cloud tier blocked when daily/monthly limit reached
  • Per-request logging: tier, model, latency, estimated cost

src/infrastructure/models/budget.pyBudgetTracker

  • SQLite-backed spend recording with in-memory fallback (graceful degradation)
  • record_spend(provider, model, tokens_in, tokens_out) with auto cost estimation
  • cloud_allowed() checks daily + monthly limits
  • get_summary() for dashboards
  • Cost estimates for Claude (haiku/sonnet/opus), GPT-4o, Grok

Config additions (src/config.py)

Setting Default Env var
tier_local_fast_model llama3.1:8b TIER_LOCAL_FAST_MODEL
tier_local_heavy_model hermes3:70b TIER_LOCAL_HEAVY_MODEL
tier_cloud_model claude-haiku-4-5 TIER_CLOUD_MODEL
tier_cloud_daily_budget_usd 5.0 TIER_CLOUD_DAILY_BUDGET_USD
tier_cloud_monthly_budget_usd 50.0 TIER_CLOUD_MONTHLY_BUDGET_USD

Acceptance criteria

  • "Walk to the next room"LOCAL_FAST (Tier 1)
  • "Plan the optimal path to become Hortator"LOCAL_HEAVY (Tier 2)
  • Failed Tier-1 response auto-escalates to Tier-2
  • Cloud API spend stays within configured budget
  • Routing decisions logged for analysis

Tests

44 new unit tests in:

  • tests/infrastructure/test_tiered_model_router.py (router + classifier)
  • tests/infrastructure/test_budget_tracker.py (spend tracking + limits)

All 544 unit tests pass (tox -e unit).

Fixes #882 ## What this does Implements the intelligent three-tier model router described in issue #882, making LLM backend selection automatic based on task complexity. ### New files **`src/infrastructure/models/router.py`** — `TieredModelRouter` - `classify_tier(task, context)` — heuristic classifier, no LLM call required - Tier 1 (`LOCAL_FAST`): short navigation/action tasks → 8B local model - Tier 2 (`LOCAL_HEAVY`): planning, strategy, complex reasoning → 70B local model - Tier 3 (`CLOUD_API`): explicit override or Tier-2 failure → cloud (Claude/GPT-4o) - Auto-escalation: Tier-1 low-quality response → retry on Tier-2 - Budget guard: cloud tier blocked when daily/monthly limit reached - Per-request logging: tier, model, latency, estimated cost **`src/infrastructure/models/budget.py`** — `BudgetTracker` - SQLite-backed spend recording with in-memory fallback (graceful degradation) - `record_spend(provider, model, tokens_in, tokens_out)` with auto cost estimation - `cloud_allowed()` checks daily + monthly limits - `get_summary()` for dashboards - Cost estimates for Claude (haiku/sonnet/opus), GPT-4o, Grok ### Config additions (`src/config.py`) | Setting | Default | Env var | |---|---|---| | `tier_local_fast_model` | `llama3.1:8b` | `TIER_LOCAL_FAST_MODEL` | | `tier_local_heavy_model` | `hermes3:70b` | `TIER_LOCAL_HEAVY_MODEL` | | `tier_cloud_model` | `claude-haiku-4-5` | `TIER_CLOUD_MODEL` | | `tier_cloud_daily_budget_usd` | `5.0` | `TIER_CLOUD_DAILY_BUDGET_USD` | | `tier_cloud_monthly_budget_usd` | `50.0` | `TIER_CLOUD_MONTHLY_BUDGET_USD` | ## Acceptance criteria - [x] `"Walk to the next room"` → `LOCAL_FAST` (Tier 1) - [x] `"Plan the optimal path to become Hortator"` → `LOCAL_HEAVY` (Tier 2) - [x] Failed Tier-1 response auto-escalates to Tier-2 - [x] Cloud API spend stays within configured budget - [x] Routing decisions logged for analysis ## Tests 44 new unit tests in: - `tests/infrastructure/test_tiered_model_router.py` (router + classifier) - `tests/infrastructure/test_budget_tracker.py` (spend tracking + limits) All 544 unit tests pass (`tox -e unit`).
claude added 1 commit 2026-03-24 01:51:42 +00:00
feat: three-tier model router — Local 8B / Hermes 70B / Cloud API cascade (#882)
Some checks failed
Tests / lint (pull_request) Failing after 16s
Tests / test (pull_request) Has been skipped
80d798a94b
Implements the intelligent model tiering router from issue #882:

- `src/infrastructure/models/router.py` — TieredModelRouter with heuristic
  task classifier (classify_tier), automatic T1→T2 escalation on low-quality
  responses, cloud-tier budget guard, and per-request routing logs.

- `src/infrastructure/models/budget.py` — BudgetTracker with SQLite
  persistence (in-memory fallback), daily/monthly cloud spend limits,
  cost estimates per model, and get_summary() for dashboards.

- `src/config.py` — five new settings: tier_local_fast_model,
  tier_local_heavy_model, tier_cloud_model, tier_cloud_daily_budget_usd
  (default $5), tier_cloud_monthly_budget_usd (default $50).

- Exports added to `src/infrastructure/models/__init__.py`.

- 44 new unit tests covering classify_tier, _is_low_quality, BudgetTracker,
  and TieredModelRouter (including acceptance criteria from the issue).

Acceptance criteria verified:
  "Walk to the next room"                       → LOCAL_FAST (Tier 1) ✓
  "Plan the optimal path to become Hortator"    → LOCAL_HEAVY (Tier 2) ✓
  Failed Tier-1 response auto-escalates to T2   ✓
  Cloud spend stays within configured budget    ✓
  Routing decisions logged                      ✓

Fixes #882

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Timmy merged commit 4f80cfcd58 into main 2026-03-24 01:53:26 +00:00
Timmy deleted branch claude/issue-882 2026-03-24 01:53:26 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#1297