Commit Graph

8 Commits

Author SHA1 Message Date
5f9f316f2c Add implementation plan 2026-03-30 21:06:51 +00:00
2bd7354eed Add ggml-metal-turbo.metal implementation 2026-03-30 21:06:50 +00:00
3705c332ac Add llama-turbo.h implementation 2026-03-30 21:06:49 +00:00
2bcd36f7c5 Add llama-turbo.cpp implementation 2026-03-30 21:06:49 +00:00
Timmy
10f720b500 Full KT report: Phase 1-3 complete
12/16 issues resolved. turbo4 validated. Ollama deferred (llama-server
is production path). Per-layer adaptive found built-in. QJL assessed,
not needed at current compression targets.

Ref #1
2026-03-30 17:05:23 -04:00
Timmy
441f4ee765 Phase 1 Report: PolarQuant MVP complete
turbo4 KV: 73% memory savings, -1.1% prompt speed, -11% gen speed.
Metal shaders verified. PolarQuant checklist 5/6 PASS.
128K context on 36GB hardware is viable.

Closes #4 #5 #6 #7 #8
2026-03-30 16:12:01 -04:00
Timmy
cefaa6e778 Add build spec v2.2 and README
TurboQuant KV cache compression for M4 Max local inference.
Spec by Strago, triaged into 16 issues across 4 phases.

Ref #1
2026-03-30 13:11:45 -04:00
0b62c72737 Initial commit 2026-03-30 17:08:45 +00:00