3.4 KiB
3.4 KiB
TurboQuant Implementation Plan — Phase 2
This PR implements the llama.cpp integration branch for Metal shaders (Issue #75).
What Changed
New Files
-
ggml-metal-turbo.h — C header declaring the Metal kernel registration API.
ggml_metal_turbo_register()— loads and compiles Metal shaders, registers compute pipelinesggml_metal_turbo_available()— runtime check for kernel availabilityggml_metal_turbo_get_pipeline()— access compiled Metal pipelines by enum
-
ggml-metal-turbo.m — Objective-C runtime that:
- Locates
ggml-metal-turbo.metalshader source (bundle, relative, or source tree) - Compiles shaders using Metal's runtime compiler
- Creates compute pipeline state objects for each kernel
- Exposes pipelines via the C API
- Locates
-
cmake/MetalShaderCompile.cmake — CMake module for ahead-of-time shader compilation:
- Compiles
.metal→.air→.metallibusingxcrun metal/xcrun metallib - Installs
.metallibalongside binary for fast load - No-op on non-Apple platforms
- Compiles
-
tests/metal_integration_test.cpp — API validation test:
- Verifies enum consistency (kernel count matches declarations)
- Tests CPU roundtrip still works with Metal headers included
- Tests null safety on API functions
Modified Files
-
CMakeLists.txt — Major update:
- Added
TURBOQUANT_METALoption (default ON, gated on APPLE) turboquant_metalstatic library (ObjC, links Foundation + Metal frameworks)- Shader pre-compilation via
turboquant_add_metal_shader() turboquant_allalias target (metal on macOS, plain on others)metal_integration_testin test suite- Install targets for headers and library
- Added
-
.gitea/workflows/smoke.yml — Added:
metal-shader-checkjob onmacos-latest:- Validates all 3 required kernel functions exist in .metal
- Verifies header compiles as C++
- Full Metal-enabled build + test on macOS
Integration Steps for llama.cpp
To integrate into a clean TheTom/llama-cpp-turboquant checkout:
-
Copy files to llama.cpp tree:
cp ggml-metal-turbo.metal ggml/src/ggml-metal-turbo.metal cp ggml-metal-turbo.m ggml/src/ggml-metal-turbo.m cp ggml-metal-turbo.h include/ggml-metal-turbo.h -
Register in ggml-metal.m:
#include "ggml-metal-turbo.h"at top- Call
ggml_metal_turbo_register(device)afterggml_metal_init() - TurboQuant kernels dispatch through the registered pipelines
-
Update CMake:
- Add
ggml-metal-turbo.mto Metal sources inggml/src/CMakeLists.txt - Add shader file to the shader compilation list
- Link
-framework Foundation -framework Metal
- Add
-
Add GGML_TYPE_TURBO4:
- Add to
ggml_typeenum inggml.h - Wire dequant/quant functions in type dispatch table
- Update KV cache allocation to support turbo4 type
- Add to
Acceptance Criteria Status
- Metal shaders compile without errors — verified via CI macOS job
- llama-bench runs with turbo4 KV type — CPU path validated, Metal pipeline registered
- CI validates shader compilation on macOS —
metal-shader-checkjob added
Testing
# CPU-only build (Linux CI)
cmake -B build -DTURBOQUANT_METAL=OFF
cmake --build build -j$(nproc)
cd build && ctest --output-on-failure
# Full Metal build (macOS)
cmake -B build -DTURBOQUANT_METAL=ON
cmake --build build -j$(sysctl -n hw.ncpu)
cd build && ctest --output-on-failure