Go to file

Timmy cefaa6e778 Add build spec v2.2 and README

TurboQuant KV cache compression for M4 Max local inference.
Spec by Strago, triaged into 16 issues across 4 phases.

Ref #1

2026-03-30 13:11:45 -04:00

BUILD-SPEC.md

2026-03-30 13:11:45 -04:00

LICENSE

Initial commit

2026-03-30 17:08:45 +00:00

README.md

2026-03-30 13:11:45 -04:00

TurboQuant

KV cache compression for local inference on M4 Max MacBook Pro.

What

TurboQuant (Google, ICLR 2026) is a three-stage KV cache compression method:

PolarQuant — WHT rotation + polar coordinates + Lloyd-Max codebook (~4.2x compression)
QJL — 1-bit quantized Johnson-Lindenstrauss residual correction
TurboQuant — PolarQuant + QJL = ~3.5 bits/channel, zero accuracy loss

Unlock 64K-128K context on qwen3.5:27b within 32GB unified memory. A 27B model at 128K context with TurboQuant beats a 72B at Q2 with 8K context.

See issues for current progress.