[P1-S1] Build llama.cpp fork with Metal backend on M4 Max #4

Closed
opened 2026-03-30 17:11:05 +00:00 by Timmy · 1 comment
Owner

Parent: #1 | Depends on: #3 (fork assessment)

Build the llama.cpp TurboQuant fork with Metal backend on MacBook Pro M4 Max.

2-Hour Cap

If it doesn't compile and pass smoke test (load model, generate 10 tokens) within 2 hours, STOP. Pivot to MLX path. Report what broke.

Paths

  • Direct build: If fork is fresh
  • Cherry-pick: If fork 2-4 weeks stale, TurboQuant commits onto current HEAD
  • Clean-room: If conflicts extensive, use turboquant_plus as reference, implement into current HEAD (60-90 min)
  • MLX pivot: If all llama.cpp paths blocked, switch to rachittshah/mlx-turboquant

Smoke Test

  • Load qwen3.5:27b
  • Generate 10 tokens
  • No crashes, no Metal errors

Acceptance Criteria

  • llama.cpp builds with Metal backend (or MLX pivot documented)
  • Smoke test passes: model loads, generates tokens
  • Build path documented (which approach was used)
  • Build time reported
## Parent: #1 | Depends on: #3 (fork assessment) Build the llama.cpp TurboQuant fork with Metal backend on MacBook Pro M4 Max. ## 2-Hour Cap If it doesn't compile and pass smoke test (load model, generate 10 tokens) within 2 hours, STOP. Pivot to MLX path. Report what broke. ## Paths - **Direct build:** If fork is fresh - **Cherry-pick:** If fork 2-4 weeks stale, TurboQuant commits onto current HEAD - **Clean-room:** If conflicts extensive, use turboquant_plus as reference, implement into current HEAD (60-90 min) - **MLX pivot:** If all llama.cpp paths blocked, switch to rachittshah/mlx-turboquant ## Smoke Test - Load qwen3.5:27b - Generate 10 tokens - No crashes, no Metal errors ## Acceptance Criteria - [ ] llama.cpp builds with Metal backend (or MLX pivot documented) - [ ] Smoke test passes: model loads, generates tokens - [ ] Build path documented (which approach was used) - [ ] Build time reported
Timmy added this to the Phase 1 — PolarQuant MVP milestone 2026-03-30 17:11:05 +00:00
Timmy added the buildphase-1owner:cid labels 2026-03-30 17:11:05 +00:00
Author
Owner

Build Complete

Branch: feature/turboquant-kv-cache (commit adac2c6)
Build: cmake -DGGML_METAL=ON -DCMAKE_BUILD_TYPE=Release → 100% clean build
Metal init output confirms TurboQuant:

ggml_metal_library_init: turbo3 using 4-mag LUT (pre-M5 hardware)

All binaries built: llama-cli, llama-bench, llama-perplexity, llama-server.
Smoke test PASSED: model loads, generates coherent text at ~34 t/s.

Build time: ~3 minutes (cmake configure + make -j14)
Path used: Direct build from feature branch (no cherry-picking needed)

## Build Complete ✅ **Branch:** feature/turboquant-kv-cache (commit adac2c6) **Build:** cmake -DGGML_METAL=ON -DCMAKE_BUILD_TYPE=Release → 100% clean build **Metal init output confirms TurboQuant:** ``` ggml_metal_library_init: turbo3 using 4-mag LUT (pre-M5 hardware) ``` All binaries built: llama-cli, llama-bench, llama-perplexity, llama-server. Smoke test PASSED: model loads, generates coherent text at ~34 t/s. **Build time:** ~3 minutes (cmake configure + make -j14) **Path used:** Direct build from feature branch (no cherry-picking needed)
Timmy closed this issue 2026-03-30 20:09:51 +00:00
Sign in to join this conversation.