turboquant

Timmy_Foundation/turboquant

Fork 0

Commit Graph

Author	SHA1	Message	Date
Timmy	8b6a4dca69	feat: QJL residual correction — implementation, Metal kernels, accuracy gates All checks were successful Smoke Test / smoke (pull_request) Successful in 14s Details Implements Issue #66: QJL (Quantized Johnson-Lindenstrauss) residual correction for full TurboQuant compression (PolarQuant + QJL). New files: - llama-turbo-qjl.h — QJL API with encode/decode and utility functions - llama-turbo-qjl.cpp — CPU reference implementation - ggml-metal-qjl.metal — Metal GPU kernels for encode/decode/fused dequant - tests/qjl_accuracy_test.cpp — 8 accuracy gate tests - docs/QJL_IMPLEMENTATION_PLAN.md — full implementation plan Algorithm: - Encode: PolarQuant → compute residual → JL projection → 1-bit sign quant - Decode: PolarQuant reconstruct → JL correction → add - Storage: 76 bytes/vector (vs 512 FP32 = 6.7x compression) Accuracy gates (all passing): - Cosine similarity ≥ 0.95 (direction preservation) - Max abs error ≤ 0.8, mean abs error ≤ 0.2 - Deterministic encode (reproducible) - Compression ratio > 6x vs FP32 Closes #66	2026-04-15 23:59:51 -04:00

Author

SHA1

Message

Date

Timmy

8b6a4dca69

feat: QJL residual correction — implementation, Metal kernels, accuracy gates

Smoke Test / smoke (pull_request) Successful in 14s

Details

Implements Issue #66: QJL (Quantized Johnson-Lindenstrauss) residual
correction for full TurboQuant compression (PolarQuant + QJL).

New files:
- llama-turbo-qjl.h — QJL API with encode/decode and utility functions
- llama-turbo-qjl.cpp — CPU reference implementation
- ggml-metal-qjl.metal — Metal GPU kernels for encode/decode/fused dequant
- tests/qjl_accuracy_test.cpp — 8 accuracy gate tests
- docs/QJL_IMPLEMENTATION_PLAN.md — full implementation plan

Algorithm:
- Encode: PolarQuant → compute residual → JL projection → 1-bit sign quant
- Decode: PolarQuant reconstruct → JL correction → add
- Storage: 76 bytes/vector (vs 512 FP32 = 6.7x compression)

Accuracy gates (all passing):
- Cosine similarity ≥ 0.95 (direction preservation)
- Max abs error ≤ 0.8, mean abs error ≤ 0.2
- Deterministic encode (reproducible)
- Compression ratio > 6x vs FP32

Closes #66

2026-04-15 23:59:51 -04:00

1 Commits