[TQ-2] Build TheTom/llama-cpp-turboquant for M3 Max Metal #27

New Issue

Timmy · 2026-04-03T22:41:26Z

Timmy commented

2026-04-03 22:41:26 +00:00

Parent: #30

Task

Clone the correct TurboQuant fork and build with Metal support.

Steps

cd ~/turboquant
git clone https://github.com/TheTom/llama-cpp-turboquant.git llama-cpp-fork
cd llama-cpp-fork
mkdir -p build && cd build
cmake .. -DGGML_METAL=ON -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release -j $(sysctl -n hw.ncpu)

Verify

./build/bin/llama-cli --help 2>&1 | grep -i turbo
# Should show turbo2/turbo3/turbo4 KV cache type options

Acceptance Criteria

Build succeeds with Metal
llama-cli --help shows turbo4 cache type
llama-server binary exists and runs

## Parent: #30 ### Task Clone the correct TurboQuant fork and build with Metal support. ### Steps ```bash cd ~/turboquant git clone https://github.com/TheTom/llama-cpp-turboquant.git llama-cpp-fork cd llama-cpp-fork mkdir -p build && cd build cmake .. -DGGML_METAL=ON -DCMAKE_BUILD_TYPE=Release cmake --build . --config Release -j $(sysctl -n hw.ncpu) ``` ### Verify ```bash ./build/bin/llama-cli --help 2>&1 | grep -i turbo # Should show turbo2/turbo3/turbo4 KV cache type options ``` ### Acceptance Criteria - [ ] Build succeeds with Metal - [ ] `llama-cli --help` shows turbo4 cache type - [ ] `llama-server` binary exists and runs

Timmy self-assigned this 2026-04-03 22:41:26 +00:00

Timmy referenced this issue

2026-04-03 23:06:27 +00:00

[EPIC] TurboQuant + Gemma 4 Local Mac Deployment #30

Timmy commented

2026-04-03 23:32:34 +00:00

TQ-2: BUILD COMPLETE

Branch

feature/turboquant-kv-cache from TheTom/llama-cpp-turboquant

Build

CMake with GGML_METAL=ON, Release mode
100% built, all binaries present
Metal shaders compiled (494 turbo references)

Verified Types

GGML_TYPE_TQ1_0 (1-bit turbo)
GGML_TYPE_TQ2_0 (2-bit turbo)
GGML_TYPE_TQ3_1S (3-bit turbo)
GGML_TYPE_TQ4_1S (4-bit turbo - PolarQuant)
turbo-wht.h (Walsh-Hadamard Transform for Metal)
turbo-matrices.h (rotation matrices)

Binaries

llama-server: 11MB
llama-cli: 5.7MB
llama-perplexity: 4.8MB

Location

~/turboquant/llama-cpp-fork/build/bin/

Blocked on

Gemma 4 download (9% complete, ~3-4 hours remaining)

## TQ-2: BUILD COMPLETE ### Branch `feature/turboquant-kv-cache` from TheTom/llama-cpp-turboquant ### Build - CMake with GGML_METAL=ON, Release mode - 100% built, all binaries present - Metal shaders compiled (494 turbo references) ### Verified Types - GGML_TYPE_TQ1_0 (1-bit turbo) - GGML_TYPE_TQ2_0 (2-bit turbo) - GGML_TYPE_TQ3_1S (3-bit turbo) - GGML_TYPE_TQ4_1S (4-bit turbo - PolarQuant) - turbo-wht.h (Walsh-Hadamard Transform for Metal) - turbo-matrices.h (rotation matrices) ### Binaries - llama-server: 11MB - llama-cli: 5.7MB - llama-perplexity: 4.8MB ### Location `~/turboquant/llama-cpp-fork/build/bin/` ### Blocked on Gemma 4 download (9% complete, ~3-4 hours remaining)

ezra commented

2026-04-05 14:05:49 +00:00

Closed per new fleet policy: no local llama-server for models >5GB. RunPod serverless endpoints only. See Timmy_Foundation/timmy-home#409.

ezra closed this issue

2026-04-05 14:05:49 +00:00

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Timmy_Foundation/turboquant#27