All checks were successful
Smoke Test / smoke (pull_request) Successful in 24s
This commit addresses issue #17 by providing a comprehensive review of the TurboQuant initiative and implementing key improvements. ## Changes ### 1. Initiative Review (docs/INITIATIVE_REVIEW.md) - Comprehensive assessment of current state - Code quality findings and recommendations - Contributor feedback for @manus, @Timmy, @Rockachopa - Implementation plan with clear milestones ### 2. Code Improvements #### llama-turbo.cpp - Added input validation with assertions - Optimized Lloyd-Max search with binary search (O(log n) vs O(n)) - Added stack allocation for d=128 (avoids heap allocation in hot path) - Added error handling for edge cases - Added decision boundaries for efficient quantization #### ggml-metal-turbo.metal - Added bounds checking to all kernels - Added NaN/Inf handling for numerical stability - Completed fused attention kernel (was stub) - Added fused attention with softmax kernel - Added Metal encoding kernel for completeness - Added binary search for quantization ### 3. Testing (tests/test_turbo.cpp) - Unit tests for encode/decode round-trip - Tests for known values (zeros, ones) - Tests for edge cases (large/small values) - Error handling tests ### 4. Build System (CMakeLists.txt) - Added CMake configuration for building library - Added test executable - Added install targets ### 5. Documentation (README.md) - Added build instructions - Added API documentation - Added contributing guidelines - Added code style guide ## Key Improvements 1. **Performance**: Binary search instead of linear search for Lloyd-Max quantization 2. **Memory**: Stack allocation for common case (d=128) 3. **Reliability**: Input validation and error handling 4. **Metal Integration**: Complete fused attention implementation 5. **Testing**: Unit tests for correctness verification 6. **Documentation**: Contributor guidelines and API docs ## Next Steps 1. Run benchmarks to verify performance improvements 2. Test with actual models (qwen3.5:27b) 3. Integrate with llama.cpp fork 4. Deploy to production Closes #17
31 lines
661 B
CMake
31 lines
661 B
CMake
cmake_minimum_required(VERSION 3.10)
|
|
project(turboquant)
|
|
|
|
set(CMAKE_CXX_STANDARD 11)
|
|
set(CMAKE_CXX_STANDARD_REQUIRED ON)
|
|
|
|
# Source files
|
|
set(SOURCES
|
|
llama-turbo.cpp
|
|
)
|
|
|
|
# Header files
|
|
set(HEADERS
|
|
llama-turbo.h
|
|
)
|
|
|
|
# Create library
|
|
add_library(turboquant STATIC ${SOURCES} ${HEADERS})
|
|
target_include_directories(turboquant PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
|
|
|
|
# Test executable
|
|
add_executable(test_turbo tests/test_turbo.cpp)
|
|
target_link_libraries(test_turbo turboquant)
|
|
|
|
# Install
|
|
install(TARGETS turboquant ARCHIVE DESTINATION lib)
|
|
install(FILES ${HEADERS} DESTINATION include)
|
|
|
|
# Tests
|
|
enable_testing()
|
|
add_test(NAME turboquant_tests COMMAND test_turbo) |