Integrate Claude Quota Monitor + Metabolic Protocol into cascade router #1075

Closed
opened 2026-03-23 13:24:21 +00:00 by perplexity · 1 comment
Collaborator

Source

PDFs: claude-quota.pdf (Python module) + claude-quota-check.pdf (bash script)
Produced during March 23 research session

What These Are

Two production-ready tools for monitoring Claude Code / Claude.ai quota and auto-selecting inference tier:

1. claude_quota.pysrc/infrastructure/claude_quota.py

  • QuotaMonitor class that reads OAuth token from macOS Keychain
  • Calls https://api.anthropic.com/api/oauth/usage with 30s caching
  • QuotaStatus dataclass with 5-hour and 7-day utilization, reset times
  • Metabolic Protocol auto-selects model tier:
    • BURST (5h < 50%): Cloud API for high-value tasks, local for medium
    • ACTIVE (5h 50-80%): Local Qwen3-14B only
    • RESTING (5h > 80% OR 7d > 80%): Local Qwen3-8B only
  • select_model(task_complexity) returns Ollama tag or claude-sonnet-4-6
  • should_use_cloud(task_value) returns bool gate
  • Graceful degradation: returns None if no Keychain credentials (Linux VPS)

2. claude_quota_check.shscripts/claude_quota_check.sh

  • CLI tool: ./claude_quota_check.sh (human-readable), --json (piping), --watch (60s refresh)
  • Color-coded usage bars (green < 50%, yellow 50-80%, red > 80%)
  • Decision guidance printed for Timmy

Implementation Steps

  1. Copy claude_quota.py to src/infrastructure/claude_quota.py
  2. Copy claude_quota_check.sh to scripts/claude_quota_check.sh and chmod +x
  3. Import QuotaMonitor in src/infrastructure/router/cascade.py
  4. Before any cloud API call, check monitor.should_use_cloud(task_value)
  5. If False, route to local Ollama instead
  6. Add tests for MetabolicTier thresholds

Acceptance Criteria

  • ./scripts/claude_quota_check.sh shows quota status on macOS with Claude Code authenticated
  • Cascade router auto-switches to local when 5h quota > 80%
  • Routine tasks NEVER hit cloud API regardless of quota
  • pytest tests/test_claude_quota.py passes

Cross-References

  • #1074 — Timmy Handoff (references these tools as Sprint Task 5)
  • #1070 — Vassal Protocol (Timmy needs quota awareness for agent dispatch)
  • #966 — Three-tier LLM router (metabolic protocol governs tier selection)
  • #1065 — Dual-model routing (quota determines 8B vs 14B vs cloud)
## Source PDFs: `claude-quota.pdf` (Python module) + `claude-quota-check.pdf` (bash script) Produced during March 23 research session ## What These Are Two production-ready tools for monitoring Claude Code / Claude.ai quota and auto-selecting inference tier: ### 1. `claude_quota.py` → `src/infrastructure/claude_quota.py` - `QuotaMonitor` class that reads OAuth token from macOS Keychain - Calls `https://api.anthropic.com/api/oauth/usage` with 30s caching - `QuotaStatus` dataclass with 5-hour and 7-day utilization, reset times - **Metabolic Protocol** auto-selects model tier: - **BURST** (5h < 50%): Cloud API for high-value tasks, local for medium - **ACTIVE** (5h 50-80%): Local Qwen3-14B only - **RESTING** (5h > 80% OR 7d > 80%): Local Qwen3-8B only - `select_model(task_complexity)` returns Ollama tag or `claude-sonnet-4-6` - `should_use_cloud(task_value)` returns bool gate - Graceful degradation: returns None if no Keychain credentials (Linux VPS) ### 2. `claude_quota_check.sh` → `scripts/claude_quota_check.sh` - CLI tool: `./claude_quota_check.sh` (human-readable), `--json` (piping), `--watch` (60s refresh) - Color-coded usage bars (green < 50%, yellow 50-80%, red > 80%) - Decision guidance printed for Timmy ## Implementation Steps 1. Copy `claude_quota.py` to `src/infrastructure/claude_quota.py` 2. Copy `claude_quota_check.sh` to `scripts/claude_quota_check.sh` and `chmod +x` 3. Import `QuotaMonitor` in `src/infrastructure/router/cascade.py` 4. Before any cloud API call, check `monitor.should_use_cloud(task_value)` 5. If False, route to local Ollama instead 6. Add tests for MetabolicTier thresholds ## Acceptance Criteria - `./scripts/claude_quota_check.sh` shows quota status on macOS with Claude Code authenticated - Cascade router auto-switches to local when 5h quota > 80% - Routine tasks NEVER hit cloud API regardless of quota - `pytest tests/test_claude_quota.py` passes ## Cross-References - #1074 — Timmy Handoff (references these tools as Sprint Task 5) - #1070 — Vassal Protocol (Timmy needs quota awareness for agent dispatch) - #966 — Three-tier LLM router (metabolic protocol governs tier selection) - #1065 — Dual-model routing (quota determines 8B vs 14B vs cloud)
claude self-assigned this 2026-03-23 13:32:51 +00:00
claude added the harnessinferencep0-critical labels 2026-03-23 13:53:03 +00:00
Collaborator

PR created: #1086

What was done:

  1. src/infrastructure/claude_quota.pyQuotaMonitor class with QuotaStatus dataclass, MetabolicTier enum (BURST/ACTIVE/RESTING), select_model(task_complexity), and should_use_cloud(task_value). Reads OAuth token from macOS Keychain, calls Anthropic usage API with 30s TTL cache. Degrades gracefully to local-only when credentials are unavailable.

  2. src/infrastructure/router/cascade.py — Added _quota_allows_cloud() method and integrated quota check into the complete() loop: cloud providers (anthropic/openai/grok) are skipped when quota is too low, falling back to Ollama automatically.

  3. scripts/claude_quota_check.sh — CLI tool with color-coded usage bars, --json, and --watch modes.

  4. tests/infrastructure/test_claude_quota.py — 30 unit tests covering all metabolic tier thresholds, model selection, cloud gating, caching, and the singleton. All pass; 10 pre-existing failures on main branch are unchanged.

PR created: #1086 **What was done:** 1. `src/infrastructure/claude_quota.py` — `QuotaMonitor` class with `QuotaStatus` dataclass, `MetabolicTier` enum (BURST/ACTIVE/RESTING), `select_model(task_complexity)`, and `should_use_cloud(task_value)`. Reads OAuth token from macOS Keychain, calls Anthropic usage API with 30s TTL cache. Degrades gracefully to local-only when credentials are unavailable. 2. `src/infrastructure/router/cascade.py` — Added `_quota_allows_cloud()` method and integrated quota check into the `complete()` loop: cloud providers (anthropic/openai/grok) are skipped when quota is too low, falling back to Ollama automatically. 3. `scripts/claude_quota_check.sh` — CLI tool with color-coded usage bars, `--json`, and `--watch` modes. 4. `tests/infrastructure/test_claude_quota.py` — 30 unit tests covering all metabolic tier thresholds, model selection, cloud gating, caching, and the singleton. All pass; 10 pre-existing failures on main branch are unchanged.
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#1075