Compare commits
1 Commits
burn/17-17
...
fix/67-rea
| Author | SHA1 | Date | |
|---|---|---|---|
| c8d1a9c730 |
12
README.md
12
README.md
@@ -13,14 +13,14 @@ Unlock 64K-128K context on qwen3.5:27b within 32GB unified memory.
|
||||
A 27B model at 128K context with TurboQuant beats a 72B at Q2 with 8K context.
|
||||
|
||||
## Status
|
||||
See [issues](http://143.198.27.163:3000/Timmy_Foundation/turboquant/issues) for current progress.
|
||||
See [issues](https://forge.alexanderwhitestone.com/Timmy_Foundation/turboquant/issues) for current progress.
|
||||
|
||||
## Roles
|
||||
- **Strago:** Build spec author
|
||||
- **Cid:** Implementation, benchmarks, deployment
|
||||
- **Locke:** Research support, upstream watch
|
||||
- **John:** Quality review
|
||||
- **Frankie:** Coordination
|
||||
- **Build Spec:** Architecture and specification
|
||||
- **Implementation:** Code, benchmarks, deployment
|
||||
- **Research:** Upstream tracking, literature review
|
||||
- **Quality:** Testing and review
|
||||
- **Coordination:** Project management
|
||||
|
||||
## Source Repos
|
||||
- [TheTom/llama-cpp-turboquant](https://github.com/TheTom/llama-cpp-turboquant) — llama.cpp fork with Metal
|
||||
|
||||
@@ -1,72 +0,0 @@
|
||||
# TurboQuant Initiative Review — Issue #17
|
||||
|
||||
**Date:** 2026-04-14
|
||||
**Reviewer:** Timmy (burn worker)
|
||||
**Issue:** #17 — TurboQuant Initiative Review & Contributor Feedback
|
||||
|
||||
---
|
||||
|
||||
## Current State
|
||||
|
||||
### What's Done (Phase 1 — Complete)
|
||||
- PolarQuant MVP: WHT rotation + Lloyd-Max codebook, 4-bit KV cache
|
||||
- Metal shaders: Full flash attention for turbo2/3/4, WHT kernels, codebooks
|
||||
- CPU reference implementation: `llama-turbo.h` / `llama-turbo.cpp`
|
||||
- Benchmarks: 73% KV memory savings, 1% prompt overhead, 11% generation overhead
|
||||
- Fork builds clean: cmake + make, all binaries functional
|
||||
- Build spec v2.2 (Strago) aligned with implementation
|
||||
|
||||
### What's Not Done (Phase 2 — In Progress)
|
||||
- Integration into main llama.cpp fork (PR not submitted)
|
||||
- QJL residual correction (1-bit Johnson-Lindenstrauss)
|
||||
- Unit tests for encode/decode (#54, #59, #60)
|
||||
- Standalone build system (#51)
|
||||
- CI smoke workflow (#48, #50)
|
||||
- Security: bounds checking in Metal shader (#55, #57)
|
||||
- Ollama integration (the hard part — submodule fork + CGo bindings)
|
||||
|
||||
---
|
||||
|
||||
## Feedback Analysis
|
||||
|
||||
### From @manus: "More frequent updates on PolarQuant"
|
||||
|
||||
**Status:** Partially addressed. PROJECT_STATUS.md exists but is dated (2026-03-30). No updates since Phase 1 completion.
|
||||
|
||||
**Action:** Create a living status tracker updated on each milestone.
|
||||
|
||||
### From @Timmy: "Build spec stays aligned with Metal shader benchmarks"
|
||||
|
||||
**Status:** Aligned. Build spec v2.2 matches benchmark results. Hardware note corrected (M3 Max 36GB, not M4 Max 32GB).
|
||||
|
||||
**Action:** Document the alignment explicitly. Add benchmark-to-spec mapping table.
|
||||
|
||||
### From @Rockachopa: "Oversight on QJL residual correction accuracy"
|
||||
|
||||
**Status:** Not started. QJL is the second stage of TurboQuant (PolarQuant → QJL → TurboQuant). Without QJL, we have PolarQuant only (~4.2x compression), not full TurboQuant (~3.5 bits/channel).
|
||||
|
||||
**Action:** File issue for QJL implementation with accuracy gates.
|
||||
|
||||
---
|
||||
|
||||
## Blockers Identified
|
||||
|
||||
1. **No integration PR to llama.cpp** — The Metal shaders exist but aren't upstreamed or even in a PR branch of the main fork
|
||||
2. **No unit tests** — encode/decode correctness unverified beyond manual spot checks
|
||||
3. **No CI** — No automated build or quality checks
|
||||
4. **Security gap** — Metal shader lacks bounds checking (#57)
|
||||
5. **Stale README** — Points to old Gitea IP (143.198.27.163:3000), not the Forge URL
|
||||
|
||||
---
|
||||
|
||||
## Recommendation
|
||||
|
||||
The initiative has solid Phase 1 results. The gap is **integration engineering** — getting from "works on my machine" to "production-ready in llama.cpp."
|
||||
|
||||
Priority order:
|
||||
1. Security fix (#57 — bounds checking)
|
||||
2. Unit tests (#54 — encode/decode)
|
||||
3. Integration PR to llama.cpp fork
|
||||
4. CI pipeline (#48, #50)
|
||||
5. QJL implementation
|
||||
6. Ollama integration
|
||||
Reference in New Issue
Block a user