Files

Smoke Test / smoke (pull_request) Successful in 19s

Details

docs(#17 ): Initiative review and contributor feedback analysis

Reviews Phase 1 status, analyzes feedback from @manus, @Timmy,
@Rockachopa. Identifies 5 blockers and priority order.
Refs #17

2026-04-15 02:14:57 +00:00

2.7 KiB

Raw Blame History

TurboQuant Initiative Review — Issue #17

Date: 2026-04-14 Reviewer: Timmy (burn worker) Issue: #17 — TurboQuant Initiative Review & Contributor Feedback

Current State

What's Done (Phase 1 — Complete)

PolarQuant MVP: WHT rotation + Lloyd-Max codebook, 4-bit KV cache
Metal shaders: Full flash attention for turbo2/3/4, WHT kernels, codebooks
CPU reference implementation: llama-turbo.h / llama-turbo.cpp
Benchmarks: 73% KV memory savings, 1% prompt overhead, 11% generation overhead
Fork builds clean: cmake + make, all binaries functional
Build spec v2.2 (Strago) aligned with implementation

What's Not Done (Phase 2 — In Progress)

Integration into main llama.cpp fork (PR not submitted)
QJL residual correction (1-bit Johnson-Lindenstrauss)
Unit tests for encode/decode (#54, #59, #60)
Standalone build system (#51)
CI smoke workflow (#48, #50)
Security: bounds checking in Metal shader (#55, #57)
Ollama integration (the hard part — submodule fork + CGo bindings)

Feedback Analysis

From @manus: "More frequent updates on PolarQuant"

Status: Partially addressed. PROJECT_STATUS.md exists but is dated (2026-03-30). No updates since Phase 1 completion.

Action: Create a living status tracker updated on each milestone.

From @Timmy: "Build spec stays aligned with Metal shader benchmarks"

Status: Aligned. Build spec v2.2 matches benchmark results. Hardware note corrected (M3 Max 36GB, not M4 Max 32GB).

Action: Document the alignment explicitly. Add benchmark-to-spec mapping table.

From @Rockachopa: "Oversight on QJL residual correction accuracy"

Status: Not started. QJL is the second stage of TurboQuant (PolarQuant → QJL → TurboQuant). Without QJL, we have PolarQuant only (~4.2x compression), not full TurboQuant (~3.5 bits/channel).

Action: File issue for QJL implementation with accuracy gates.

Blockers Identified

No integration PR to llama.cpp — The Metal shaders exist but aren't upstreamed or even in a PR branch of the main fork
No unit tests — encode/decode correctness unverified beyond manual spot checks
No CI — No automated build or quality checks
Security gap — Metal shader lacks bounds checking (#57)
Stale README — Points to old Gitea IP (143.198.27.163:3000), not the Forge URL

Recommendation

The initiative has solid Phase 1 results. The gap is integration engineering — getting from "works on my machine" to "production-ready in llama.cpp."

Priority order:

Security fix (#57 — bounds checking)
Unit tests (#54 — encode/decode)
Integration PR to llama.cpp fork
CI pipeline (#48, #50)
QJL implementation
Ollama integration

2.7 KiB Raw Blame History