Timmy
|
8b6a4dca69
|
feat: QJL residual correction — implementation, Metal kernels, accuracy gates
Smoke Test / smoke (pull_request) Successful in 14s
Implements Issue #66: QJL (Quantized Johnson-Lindenstrauss) residual
correction for full TurboQuant compression (PolarQuant + QJL).
New files:
- llama-turbo-qjl.h — QJL API with encode/decode and utility functions
- llama-turbo-qjl.cpp — CPU reference implementation
- ggml-metal-qjl.metal — Metal GPU kernels for encode/decode/fused dequant
- tests/qjl_accuracy_test.cpp — 8 accuracy gate tests
- docs/QJL_IMPLEMENTATION_PLAN.md — full implementation plan
Algorithm:
- Encode: PolarQuant → compute residual → JL projection → 1-bit sign quant
- Decode: PolarQuant reconstruct → JL correction → add
- Storage: 76 bytes/vector (vs 512 FP32 = 6.7x compression)
Accuracy gates (all passing):
- Cosine similarity ≥ 0.95 (direction preservation)
- Max abs error ≤ 0.8, mean abs error ≤ 0.2
- Deterministic encode (reproducible)
- Compression ratio > 6x vs FP32
Closes #66
|
2026-04-15 23:59:51 -04:00 |
|