Closes#54
26 tests across 6 test classes:
- TestEncodeDecodeRoundtrip (8): encode→decode recovers original
within tolerance. Tests zero vectors, unit vectors, random vectors,
various dimensions (16/32/64/128).
- TestInnerProductPreservation (2): Q·K ≈ Q·dequant(quant(K)).
Inner products and self-inner-products preserved through compression.
- TestWHTOrthogonality (3): WHT^T · WHT = I. Double-WHT recovers
original. WHT preserves L2 norm. Identity vector produces equal components.
- TestCodebookCorrectness (5): 16 centroids, symmetric around zero,
ordered ascending, covers unit range, all quantize to valid [0,15].
- TestBitPacking (4): 4-bit packing halves byte count. Even indices
in low nibble. Correct nibble extraction. No overflow at 4096 dims.
- TestEdgeCases (4): non-power-of-2 fails gracefully. All-same values.
Large values don't produce NaN/Inf. Alternating signs.
Pure Python implementation mirrors llama-turbo.cpp algorithms.
No C++ compilation required.