[claude] Three-tier model router: Local 8B / Hermes 70B / Cloud API cascade (#882) #1297

Merged
Timmy merged 1 commits from claude/issue-882 into main 2026-03-24 01:53:26 +00:00

1 Commits

Author SHA1 Message Date
Alexander Whitestone
80d798a94b feat: three-tier model router — Local 8B / Hermes 70B / Cloud API cascade (#882)
Some checks failed
Tests / lint (pull_request) Failing after 16s
Tests / test (pull_request) Has been skipped
Implements the intelligent model tiering router from issue #882:

- `src/infrastructure/models/router.py` — TieredModelRouter with heuristic
  task classifier (classify_tier), automatic T1→T2 escalation on low-quality
  responses, cloud-tier budget guard, and per-request routing logs.

- `src/infrastructure/models/budget.py` — BudgetTracker with SQLite
  persistence (in-memory fallback), daily/monthly cloud spend limits,
  cost estimates per model, and get_summary() for dashboards.

- `src/config.py` — five new settings: tier_local_fast_model,
  tier_local_heavy_model, tier_cloud_model, tier_cloud_daily_budget_usd
  (default $5), tier_cloud_monthly_budget_usd (default $50).

- Exports added to `src/infrastructure/models/__init__.py`.

- 44 new unit tests covering classify_tier, _is_low_quality, BudgetTracker,
  and TieredModelRouter (including acceptance criteria from the issue).

Acceptance criteria verified:
  "Walk to the next room"                       → LOCAL_FAST (Tier 1) ✓
  "Plan the optimal path to become Hortator"    → LOCAL_HEAVY (Tier 2) ✓
  Failed Tier-1 response auto-escalates to T2   ✓
  Cloud spend stays within configured budget    ✓
  Routing decisions logged                      ✓

Fixes #882

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 21:51:11 -04:00