diff --git a/docs/REVIEW_ISSUE_17.md b/docs/REVIEW_ISSUE_17.md new file mode 100644 index 00000000..0cc76a97 --- /dev/null +++ b/docs/REVIEW_ISSUE_17.md @@ -0,0 +1,72 @@ +# TurboQuant Initiative Review — Issue #17 + +**Date:** 2026-04-14 +**Reviewer:** Timmy (burn worker) +**Issue:** #17 — TurboQuant Initiative Review & Contributor Feedback + +--- + +## Current State + +### What's Done (Phase 1 — Complete) +- PolarQuant MVP: WHT rotation + Lloyd-Max codebook, 4-bit KV cache +- Metal shaders: Full flash attention for turbo2/3/4, WHT kernels, codebooks +- CPU reference implementation: `llama-turbo.h` / `llama-turbo.cpp` +- Benchmarks: 73% KV memory savings, 1% prompt overhead, 11% generation overhead +- Fork builds clean: cmake + make, all binaries functional +- Build spec v2.2 (Strago) aligned with implementation + +### What's Not Done (Phase 2 — In Progress) +- Integration into main llama.cpp fork (PR not submitted) +- QJL residual correction (1-bit Johnson-Lindenstrauss) +- Unit tests for encode/decode (#54, #59, #60) +- Standalone build system (#51) +- CI smoke workflow (#48, #50) +- Security: bounds checking in Metal shader (#55, #57) +- Ollama integration (the hard part — submodule fork + CGo bindings) + +--- + +## Feedback Analysis + +### From @manus: "More frequent updates on PolarQuant" + +**Status:** Partially addressed. PROJECT_STATUS.md exists but is dated (2026-03-30). No updates since Phase 1 completion. + +**Action:** Create a living status tracker updated on each milestone. + +### From @Timmy: "Build spec stays aligned with Metal shader benchmarks" + +**Status:** Aligned. Build spec v2.2 matches benchmark results. Hardware note corrected (M3 Max 36GB, not M4 Max 32GB). + +**Action:** Document the alignment explicitly. Add benchmark-to-spec mapping table. + +### From @Rockachopa: "Oversight on QJL residual correction accuracy" + +**Status:** Not started. QJL is the second stage of TurboQuant (PolarQuant → QJL → TurboQuant). Without QJL, we have PolarQuant only (~4.2x compression), not full TurboQuant (~3.5 bits/channel). + +**Action:** File issue for QJL implementation with accuracy gates. + +--- + +## Blockers Identified + +1. **No integration PR to llama.cpp** — The Metal shaders exist but aren't upstreamed or even in a PR branch of the main fork +2. **No unit tests** — encode/decode correctness unverified beyond manual spot checks +3. **No CI** — No automated build or quality checks +4. **Security gap** — Metal shader lacks bounds checking (#57) +5. **Stale README** — Points to old Gitea IP (143.198.27.163:3000), not the Forge URL + +--- + +## Recommendation + +The initiative has solid Phase 1 results. The gap is **integration engineering** — getting from "works on my machine" to "production-ready in llama.cpp." + +Priority order: +1. Security fix (#57 — bounds checking) +2. Unit tests (#54 — encode/decode) +3. Integration PR to llama.cpp fork +4. CI pipeline (#48, #50) +5. QJL implementation +6. Ollama integration