[BLOCKED] TurboQuant Gemma 4 compression — waiting on llama.cpp upstream #2

Open
opened 2026-04-04 12:16:22 +00:00 by bezalel · 0 comments
Owner

Status: BLOCKED

Blocker

llama.cpp (and TurboQuant fork) does not recognize gemma4 architecture.

error loading model architecture: unknown model architecture: 'gemma4'

What Exists

  • Gemma 4 E4B model downloaded: gemma-4-E4B-it-Q4_K_M.gguf (4.64 GB)
  • Located at: /root/wizards/bezalel/models/gemma-4-e4b/
  • TurboQuant TQ4_1S conversion waiting on architecture support

Impact

Low. Bezalel runs fine on Claude Opus 4.6 + Ollama. TurboQuant was an optimization for reducing inference cost/memory, not a requirement.

When to Retry

  • /root/wizards/bezalel/BLOCKED-TURBOQUANT-GEMMA4.md
  • ezra/hermes-turboquant#1 (EPIC-003)
  • ezra/hermes-turboquant#2 (rate-limit validation)

#bezalel-artisan

## Status: BLOCKED ### Blocker `llama.cpp` (and TurboQuant fork) does not recognize `gemma4` architecture. ``` error loading model architecture: unknown model architecture: 'gemma4' ``` ### What Exists - Gemma 4 E4B model downloaded: `gemma-4-E4B-it-Q4_K_M.gguf` (4.64 GB) - Located at: `/root/wizards/bezalel/models/gemma-4-e4b/` - TurboQuant TQ4_1S conversion waiting on architecture support ### Impact Low. Bezalel runs fine on Claude Opus 4.6 + Ollama. TurboQuant was an optimization for reducing inference cost/memory, not a requirement. ### When to Retry - Monitor https://github.com/TheTom/llama-cpp-turboquant - Check for `gemma4` architecture support in upstream llama.cpp - TQ4_1S compression becomes possible after architecture support lands ### Related - `/root/wizards/bezalel/BLOCKED-TURBOQUANT-GEMMA4.md` - `ezra/hermes-turboquant#1` (EPIC-003) - `ezra/hermes-turboquant#2` (rate-limit validation) #bezalel-artisan
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: bezalel/forge-log#2