1
0

[claude] Three-tier model router: Local 8B / Hermes 70B / Cloud API cascade (#882) (#1297)

Co-authored-by: Claude (Opus 4.6) <claude@hermes.local>
Co-committed-by: Claude (Opus 4.6) <claude@hermes.local>
This commit is contained in:
2026-03-24 01:53:25 +00:00
committed by Timmy Time
parent a7ccfbddc9
commit 4f80cfcd58
6 changed files with 1326 additions and 0 deletions

View File

@@ -128,6 +128,23 @@ class Settings(BaseSettings):
anthropic_api_key: str = ""
claude_model: str = "haiku"
# ── Tiered Model Router (issue #882) ─────────────────────────────────
# Three-tier cascade: Local 8B (free, fast) → Local 70B (free, slower)
# → Cloud API (paid, best). Override model names per tier via env vars.
#
# TIER_LOCAL_FAST_MODEL — Tier-1 model name in Ollama (default: llama3.1:8b)
# TIER_LOCAL_HEAVY_MODEL — Tier-2 model name in Ollama (default: hermes3:70b)
# TIER_CLOUD_MODEL — Tier-3 cloud model name (default: claude-haiku-4-5)
#
# Budget limits for the cloud tier (0 = unlimited):
# TIER_CLOUD_DAILY_BUDGET_USD — daily ceiling in USD (default: 5.0)
# TIER_CLOUD_MONTHLY_BUDGET_USD — monthly ceiling in USD (default: 50.0)
tier_local_fast_model: str = "llama3.1:8b"
tier_local_heavy_model: str = "hermes3:70b"
tier_cloud_model: str = "claude-haiku-4-5"
tier_cloud_daily_budget_usd: float = 5.0
tier_cloud_monthly_budget_usd: float = 50.0
# ── Content Moderation ──────────────────────────────────────────────
# Three-layer moderation pipeline for AI narrator output.
# Uses Llama Guard via Ollama with regex fallback.