[TQ-2] Build TheTom/llama-cpp-turboquant for M3 Max Metal #27

Closed
opened 2026-04-03 22:41:26 +00:00 by Timmy · 2 comments
Owner

Parent: #30

Task

Clone the correct TurboQuant fork and build with Metal support.

Steps

cd ~/turboquant
git clone https://github.com/TheTom/llama-cpp-turboquant.git llama-cpp-fork
cd llama-cpp-fork
mkdir -p build && cd build
cmake .. -DGGML_METAL=ON -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release -j $(sysctl -n hw.ncpu)

Verify

./build/bin/llama-cli --help 2>&1 | grep -i turbo
# Should show turbo2/turbo3/turbo4 KV cache type options

Acceptance Criteria

  • Build succeeds with Metal
  • llama-cli --help shows turbo4 cache type
  • llama-server binary exists and runs
## Parent: #30 ### Task Clone the correct TurboQuant fork and build with Metal support. ### Steps ```bash cd ~/turboquant git clone https://github.com/TheTom/llama-cpp-turboquant.git llama-cpp-fork cd llama-cpp-fork mkdir -p build && cd build cmake .. -DGGML_METAL=ON -DCMAKE_BUILD_TYPE=Release cmake --build . --config Release -j $(sysctl -n hw.ncpu) ``` ### Verify ```bash ./build/bin/llama-cli --help 2>&1 | grep -i turbo # Should show turbo2/turbo3/turbo4 KV cache type options ``` ### Acceptance Criteria - [ ] Build succeeds with Metal - [ ] `llama-cli --help` shows turbo4 cache type - [ ] `llama-server` binary exists and runs
Timmy self-assigned this 2026-04-03 22:41:26 +00:00
Author
Owner

TQ-2: BUILD COMPLETE

Branch

feature/turboquant-kv-cache from TheTom/llama-cpp-turboquant

Build

  • CMake with GGML_METAL=ON, Release mode
  • 100% built, all binaries present
  • Metal shaders compiled (494 turbo references)

Verified Types

  • GGML_TYPE_TQ1_0 (1-bit turbo)
  • GGML_TYPE_TQ2_0 (2-bit turbo)
  • GGML_TYPE_TQ3_1S (3-bit turbo)
  • GGML_TYPE_TQ4_1S (4-bit turbo - PolarQuant)
  • turbo-wht.h (Walsh-Hadamard Transform for Metal)
  • turbo-matrices.h (rotation matrices)

Binaries

  • llama-server: 11MB
  • llama-cli: 5.7MB
  • llama-perplexity: 4.8MB

Location

~/turboquant/llama-cpp-fork/build/bin/

Blocked on

Gemma 4 download (9% complete, ~3-4 hours remaining)

## TQ-2: BUILD COMPLETE ### Branch `feature/turboquant-kv-cache` from TheTom/llama-cpp-turboquant ### Build - CMake with GGML_METAL=ON, Release mode - 100% built, all binaries present - Metal shaders compiled (494 turbo references) ### Verified Types - GGML_TYPE_TQ1_0 (1-bit turbo) - GGML_TYPE_TQ2_0 (2-bit turbo) - GGML_TYPE_TQ3_1S (3-bit turbo) - GGML_TYPE_TQ4_1S (4-bit turbo - PolarQuant) - turbo-wht.h (Walsh-Hadamard Transform for Metal) - turbo-matrices.h (rotation matrices) ### Binaries - llama-server: 11MB - llama-cli: 5.7MB - llama-perplexity: 4.8MB ### Location `~/turboquant/llama-cpp-fork/build/bin/` ### Blocked on Gemma 4 download (9% complete, ~3-4 hours remaining)
Member

Closed per new fleet policy: no local llama-server for models >5GB. RunPod serverless endpoints only. See Timmy_Foundation/timmy-home#409.

Closed per new fleet policy: no local llama-server for models >5GB. RunPod serverless endpoints only. See Timmy_Foundation/timmy-home#409.
ezra closed this issue 2026-04-05 14:05:49 +00:00
Sign in to join this conversation.