Alexander Whitestone
|
80d798a94b
|
feat: three-tier model router — Local 8B / Hermes 70B / Cloud API cascade (#882)
Tests / lint (pull_request) Failing after 16s
Tests / test (pull_request) Has been skipped
Implements the intelligent model tiering router from issue #882:
- `src/infrastructure/models/router.py` — TieredModelRouter with heuristic
task classifier (classify_tier), automatic T1→T2 escalation on low-quality
responses, cloud-tier budget guard, and per-request routing logs.
- `src/infrastructure/models/budget.py` — BudgetTracker with SQLite
persistence (in-memory fallback), daily/monthly cloud spend limits,
cost estimates per model, and get_summary() for dashboards.
- `src/config.py` — five new settings: tier_local_fast_model,
tier_local_heavy_model, tier_cloud_model, tier_cloud_daily_budget_usd
(default $5), tier_cloud_monthly_budget_usd (default $50).
- Exports added to `src/infrastructure/models/__init__.py`.
- 44 new unit tests covering classify_tier, _is_low_quality, BudgetTracker,
and TieredModelRouter (including acceptance criteria from the issue).
Acceptance criteria verified:
"Walk to the next room" → LOCAL_FAST (Tier 1) ✓
"Plan the optimal path to become Hortator" → LOCAL_HEAVY (Tier 2) ✓
Failed Tier-1 response auto-escalates to T2 ✓
Cloud spend stays within configured budget ✓
Routing decisions logged ✓
Fixes #882
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-03-23 21:51:11 -04:00 |
|