Cloud API cascade (#882) #1297

Merged

Timmy merged 1 commits from claude/issue-882 into main

2026-03-24 01:53:26 +00:00

Author	SHA1	Message	Date
Alexander Whitestone	80d798a94b	feat: three-tier model router — Local 8B / Hermes 70B / Cloud API cascade (#882 ) Some checks failed Tests / lint (pull_request) Failing after 16s Details Tests / test (pull_request) Has been skipped Details Implements the intelligent model tiering router from issue #882: - `src/infrastructure/models/router.py` — TieredModelRouter with heuristic task classifier (classify_tier), automatic T1→T2 escalation on low-quality responses, cloud-tier budget guard, and per-request routing logs. - `src/infrastructure/models/budget.py` — BudgetTracker with SQLite persistence (in-memory fallback), daily/monthly cloud spend limits, cost estimates per model, and get_summary() for dashboards. - `src/config.py` — five new settings: tier_local_fast_model, tier_local_heavy_model, tier_cloud_model, tier_cloud_daily_budget_usd (default $5), tier_cloud_monthly_budget_usd (default $50). - Exports added to `src/infrastructure/models/__init__.py`. - 44 new unit tests covering classify_tier, _is_low_quality, BudgetTracker, and TieredModelRouter (including acceptance criteria from the issue). Acceptance criteria verified: "Walk to the next room" → LOCAL_FAST (Tier 1) ✓ "Plan the optimal path to become Hortator" → LOCAL_HEAVY (Tier 2) ✓ Failed Tier-1 response auto-escalates to T2 ✓ Cloud spend stays within configured budget ✓ Routing decisions logged ✓ Fixes #882 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-23 21:51:11 -04:00

Author

SHA1

Message

Date

Alexander Whitestone

80d798a94b

feat: three-tier model router — Local 8B / Hermes 70B / Cloud API cascade (#882 )

Tests / lint (pull_request) Failing after 16s

Details

Tests / test (pull_request) Has been skipped

Details

Implements the intelligent model tiering router from issue #882:

- `src/infrastructure/models/router.py` — TieredModelRouter with heuristic
  task classifier (classify_tier), automatic T1→T2 escalation on low-quality
  responses, cloud-tier budget guard, and per-request routing logs.

- `src/infrastructure/models/budget.py` — BudgetTracker with SQLite
  persistence (in-memory fallback), daily/monthly cloud spend limits,
  cost estimates per model, and get_summary() for dashboards.

- `src/config.py` — five new settings: tier_local_fast_model,
  tier_local_heavy_model, tier_cloud_model, tier_cloud_daily_budget_usd
  (default $5), tier_cloud_monthly_budget_usd (default $50).

- Exports added to `src/infrastructure/models/__init__.py`.

- 44 new unit tests covering classify_tier, _is_low_quality, BudgetTracker,
  and TieredModelRouter (including acceptance criteria from the issue).

Acceptance criteria verified:
  "Walk to the next room"                       → LOCAL_FAST (Tier 1) ✓
  "Plan the optimal path to become Hortator"    → LOCAL_HEAVY (Tier 2) ✓
  Failed Tier-1 response auto-escalates to T2   ✓
  Cloud spend stays within configured budget    ✓
  Routing decisions logged                      ✓

Fixes #882

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-23 21:51:11 -04:00

[claude] Three-tier model router: Local 8B / Hermes 70B / Cloud API cascade (#882) #1297

1 Commits