feat: Comprehensive review and improvements for TurboQuant (#17)

This commit addresses issue #17 by providing a comprehensive review of the TurboQuant initiative and implementing key improvements. ## Changes ### 1. Initiative Review (docs/INITIATIVE_REVIEW.md) - Comprehensive assessment of current state - Code quality findings and recommendations - Contributor feedback for @manus, @Timmy, @Rockachopa - Implementation plan with clear milestones ### 2. Code Improvements #### llama-turbo.cpp - Added input validation with assertions - Optimized Lloyd-Max search with binary search (O(log n) vs O(n)) - Added stack allocation for d=128 (avoids heap allocation in hot path) - Added error handling for edge cases - Added decision boundaries for efficient quantization #### ggml-metal-turbo.metal - Added bounds checking to all kernels - Added NaN/Inf handling for numerical stability - Completed fused attention kernel (was stub) - Added fused attention with softmax kernel - Added Metal encoding kernel for completeness - Added binary search for quantization ### 3. Testing (tests/test_turbo.cpp) - Unit tests for encode/decode round-trip - Tests for known values (zeros, ones) - Tests for edge cases (large/small values) - Error handling tests ### 4. Build System (CMakeLists.txt) - Added CMake configuration for building library - Added test executable - Added install targets ### 5. Documentation (README.md) - Added build instructions - Added API documentation - Added contributing guidelines - Added code style guide ## Key Improvements 1. **Performance**: Binary search instead of linear search for Lloyd-Max quantization 2. **Memory**: Stack allocation for common case (d=128) 3. **Reliability**: Input validation and error handling 4. **Metal Integration**: Complete fused attention implementation 5. **Testing**: Unit tests for correctness verification 6. **Documentation**: Contributor guidelines and API docs ## Next Steps 1. Run benchmarks to verify performance improvements 2. Test with actual models (qwen3.5:27b) 3. Integrate with llama.cpp fork 4. Deploy to production Closes #17
2026-04-14 22:07:21 -04:00
parent 7a7ce0e652
commit d2ef914edd
6 changed files with 732 additions and 21 deletions
--- a/README.md
+++ b/README.md
@@ -15,6 +15,93 @@ A 27B model at 128K context with TurboQuant beats a 72B at Q2 with 8K context.
 ## Status
 See [issues](http://143.198.27.163:3000/Timmy_Foundation/turboquant/issues) for current progress.

+## Building
+
+### Prerequisites
+- CMake 3.10+
+- C++11 compiler
+- Xcode Command Line Tools (for Metal on macOS)
+
+### Build Instructions
+```bash
+# Clone the repository
+git clone https://forge.alexanderwhitestone.com/Timmy_Foundation/turboquant.git
+cd turboquant
+
+# Build with CMake
+cmake -B build -DCMAKE_BUILD_TYPE=Release
+cmake --build build
+
+# Run tests
+cd build && ctest
+```
+
+### Integration with llama.cpp
+See [PR-IMPLEMENTATION-PLAN.md](PR-IMPLEMENTATION-PLAN.md) for integration steps.
+
+## API
+
+### CPU Reference Implementation
+```c
+// Encode: Compress float vector to 4-bit packed representation
+void polar_quant_encode_turbo4(
+    const float* src,    // Input: float array [d]
+    uint8_t* dst,        // Output: packed 4-bit indices [d/2]
+    float* norm,         // Output: L2 norm (radius)
+    int d                // Dimension (must be power of 2, e.g., 128)
+);
+
+// Decode: Decompress 4-bit packed representation to float vector
+void polar_quant_decode_turbo4(
+    const uint8_t* src,  // Input: packed 4-bit indices [d/2]
+    float* dst,          // Output: float array [d]
+    float norm,          // Input: L2 norm (radius)
+    int d                // Dimension (must be power of 2, e.g., 128)
+);
+```
+
+### Metal Shaders
+See `ggml-metal-turbo.metal` for GPU-accelerated kernels:
+- `kernel_fwht_128`: Fast Walsh-Hadamard Transform
+- `kernel_turbo4_dequant`: Dequantization for attention
+- `kernel_attention_turbo4`: Fused attention computation
+- `kernel_attention_turbo4_softmax`: Fused attention with softmax
+- `kernel_turbo4_encode`: Encoding on GPU
+
+## Contributing
+
+### Getting Started
+1. Fork the repository
+2. Create a feature branch: `git checkout -b feature/your-feature`
+3. Make your changes
+4. Add tests for new functionality
+5. Run the test suite: `cd build && ctest`
+6. Submit a pull request
+
+### Code Style
+- C++11 standard
+- 4-space indentation
+- Snake_case for functions and variables
+- UPPER_CASE for constants
+- Add comments for complex algorithms
+
+### Testing
+- All new code must have unit tests
+- Run tests before submitting PR: `cd build && ctest`
+- Test on both CPU and Metal (if applicable)
+
+### Pull Request Process
+1. Update documentation if needed
+2. Add tests for new functionality
+3. Ensure all tests pass
+4. Request review from maintainers
+
+### Issues
+- Use issue templates when available
+- Tag issues appropriately (`bug`, `enhancement`, `documentation`)
+- Include reproduction steps for bugs
+- For performance issues, include benchmark results
+
 ## Roles
 - **Strago:** Build spec author
 - **Cid:** Implementation, benchmarks, deployment
@@ -30,3 +117,5 @@ See [issues](http://143.198.27.163:3000/Timmy_Foundation/turboquant/issues) for

 ## Docs
 - [BUILD-SPEC.md](BUILD-SPEC.md) — Full build specification (Strago, v2.2)
+- [docs/PROJECT_STATUS.md](docs/PROJECT_STATUS.md) — Current project status
+- [docs/INITIATIVE_REVIEW.md](docs/INITIATIVE_REVIEW.md) — Initiative review and feedback