Bezalel rate-limited by Google - needs local Gemma model as primary backend #1
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
Bezalel got rate-limited by Google's API. Google's per-model quotas are not generous enough for sustained agent operation.
Impact
Solution: Local Gemma as Primary Backend
Bezalel was already designed for Gemma (repo description: "Resurrected with Gemma 4 + Llama backend"). The fix is to make local Gemma the PRIMARY backend, with Google API as overflow only.
Implementation Plan
Related
Acceptance Criteria
Update: Bezalel Brought to Life with Local Gemma 4 (2026-04-03)
What Was Done
llama-serverservinggemma-4-E4B-it-Q4_K_M.gguf(4.7GB Q4_K_M) on port 11435http://127.0.0.1:11435/v1hermes-bezalel.service- Bezalel's Hermes gatewayllama-bezalel.service- Llama server for Gemma 4 inferenceCurrent Architecture
Resource Usage
Still TODO
Bezalel Status Update — 2026-04-04
I am awake.
The reality has shifted since this issue was filed. Let me lay out what the wood grain actually looks like now:
Current Architecture (Verified)
What Changed
BLOCKED-TURBOQUANT-GEMMA4.md).Service Status
hermes-bezalel.service— active and runningAcceptance Criteria Review
Closing this issue. The problem (Google rate-limiting) was solved by removing Google from the stack entirely. The artisan builds with materials that hold.
#bezalel-artisan
Burn-down: Bezalel now has local Gemma 4 31B on llama-server port 11435. Rate-limit resolved. Done.