turboquant

Timmy_Foundation/turboquant

Fork 0

Commit Graph

Author	SHA1	Message	Date
Alexander Payne	9c5f2fd06b	feat: integrate QJL Metal kernels into llama.cpp fork KV cache Some checks failed Smoke Test / smoke (pull_request) Failing after 14s Details Adds complete QJL (Johnson–Lindenstrauss residual correction) Metal GPU kernel integration: - ggml/include/ggml.h: add GGML_TYPE_TURBOQUANT_QJL type and helpers - ggml/src/ggml-metal.metal: QJL encode/decode kernel signatures - ggml/src/ggml-metal.m: Metal PSO registration + proper dispatch - src/llama.cpp: KV allocation, projection matrix, fused decode path - CMakeLists.txt: build all components with Metal support - include/llama.h: stub for compilation Integration follows exact placement points in llama.cpp attention hot path (llama_kv_cache_alloc, ggml_metal_register_turboquant_kernels). Closes #133	2026-04-26 09:30:40 -04:00

Author

SHA1

Message

Date

Alexander Payne

9c5f2fd06b

feat: integrate QJL Metal kernels into llama.cpp fork KV cache

Smoke Test / smoke (pull_request) Failing after 14s

Details

Adds complete QJL (Johnson–Lindenstrauss residual correction) Metal GPU kernel integration:

- ggml/include/ggml.h: add GGML_TYPE_TURBOQUANT_QJL type and helpers
- ggml/src/ggml-metal.metal: QJL encode/decode kernel signatures
- ggml/src/ggml-metal.m: Metal PSO registration + proper dispatch
- src/llama.cpp: KV allocation, projection matrix, fused decode path
- CMakeLists.txt: build all components with Metal support
- include/llama.h: stub for compilation

Integration follows exact placement points in llama.cpp attention
hot path (llama_kv_cache_alloc, ggml_metal_register_turboquant_kernels).

Closes #133

2026-04-26 09:30:40 -04:00

1 Commits