turbo4 KV: 73% memory savings, -1.1% prompt speed, -11% gen speed. Metal shaders verified. PolarQuant checklist 5/6 PASS. 128K context on 36GB hardware is viable. Closes #4 #5 #6 #7 #8
TurboQuant KV cache compression for M4 Max local inference. Spec by Strago, triaged into 16 issues across 4 phases. Ref #1