2026-03-06 - 2026-04-06
Overview
1 Release published by 1 user
Published
GoldenRockachopa
3 Pull requests merged by 1 user
Merged
#18 PolarQuant Implementation & Phase 2 Integration Plan
Merged
#19 Benchmarking Suite: Objective Quality and Performance Testing
Merged
#20 feat: Sovereign Evolution Redistribution — turboquant
19 Issues closed from 1 user
Closed
#24 [P2-4] Run full quality comparison: turbo4 vs f16 on 10 test prompts
Closed
#1 TurboQuant — KV Cache Compression for Local Inference on M4 Max
Closed
#23 [P2-3] Fix Ollama install and build custom Ollama with TurboQuant fork
Closed
#25 [P2-5] Download qwen3.5:27b and benchmark turbo4 at 64K/128K context
Closed
#27 [TQ-2] Build TheTom/llama-cpp-turboquant for M3 Max Metal
Closed
#26 [P2-6] Production cutover: swap Timmy's llama-server to TurboQuant
Closed
#31 [TQ-1] Download Gemma 4 via Ollama on Mac
Closed
#30 [EPIC] TurboQuant + Gemma 4 Local Mac Deployment
Closed
#14 [P3] QJL residual correction — Metal port
Closed
#13 [P2.5] Per-layer quantization profiles
Closed
#10 [P2] Custom Ollama build + MacBook deployment
Closed
#9 [P2-S0] Ollama CGo API compatibility check
Closed
#6 [P1-S2] Baseline benchmarks — FP16 KV cache (no TurboQuant)
Closed
#8 [P1-S2] Peak memory profiling at each context length
Closed
#7 [P1-S2] PolarQuant benchmarks — turbo4 KV cache + asymmetric test
Closed
#5 [P1-S1] PolarQuant verification checklist
Closed
#4 [P1-S1] Build llama.cpp fork with Metal backend on M4 Max
Closed
#3 [P1-S0] Fork assessment — age, conflicts, build path estimate
Closed
#2 [P1-GATE] Metal kernel check — determines llama.cpp vs MLX path
29 Issues created by 2 users
Opened
#1 TurboQuant — KV Cache Compression for Local Inference on M4 Max
Opened
#2 [P1-GATE] Metal kernel check — determines llama.cpp vs MLX path
Opened
#3 [P1-S0] Fork assessment — age, conflicts, build path estimate
Opened
#4 [P1-S1] Build llama.cpp fork with Metal backend on M4 Max
Opened
#5 [P1-S1] PolarQuant verification checklist
Opened
#6 [P1-S2] Baseline benchmarks — FP16 KV cache (no TurboQuant)
Opened
#7 [P1-S2] PolarQuant benchmarks — turbo4 KV cache + asymmetric test
Opened
#8 [P1-S2] Peak memory profiling at each context length
Opened
#9 [P2-S0] Ollama CGo API compatibility check
Opened
#10 [P2] Custom Ollama build + MacBook deployment
Opened
#11 [P2] Full test matrix — 10 prompts + quality + performance
Opened
#12 [P2] Long-session quality test — 50-turn conversation
Opened
#13 [P2.5] Per-layer quantization profiles
Opened
#14 [P3] QJL residual correction — Metal port
Opened
#15 [P4] Upstream llama.cpp / Ollama TurboQuant watch
Opened
#16 [P1-PREP] Write 10 predefined test prompts
Opened
#17 TurboQuant Initiative Review & Contributor Feedback
Opened
#21 [P2-1] Download wikitext-2-raw and run perplexity quality gate
Opened
#22 [P2-2] Write 10 test prompts for quality comparison
Opened
#25 [P2-5] Download qwen3.5:27b and benchmark turbo4 at 64K/128K context
Opened
#24 [P2-4] Run full quality comparison: turbo4 vs f16 on 10 test prompts
Opened
#23 [P2-3] Fix Ollama install and build custom Ollama with TurboQuant fork
Opened
#26 [P2-6] Production cutover: swap Timmy's llama-server to TurboQuant
Opened
#27 [TQ-2] Build TheTom/llama-cpp-turboquant for M3 Max Metal
Opened
#28 [TQ-4] Create Hermes profile for local Gemma 4 + TurboQuant
Opened
#29 [TQ-5] Benchmark: latency, memory, quality comparison
Opened
#30 [EPIC] TurboQuant + Gemma 4 Local Mac Deployment
Opened
#31 [TQ-1] Download Gemma 4 via Ollama on Mac
Opened
#32 [TQ-3] Perplexity quality gate: turbo4 vs f16