fix: reorder QUANT_LEVELS so q4_0 (4.0b) comes before turbo2 (1.5b)

q4_0 has higher bits_per_channel (4.0 vs 1.5), meaning higher quality. The selection logic iterates QUANT_LEVELS in order, picking the first level that fits available memory. q4_0 should be tried before turbo2 because it produces better quality output when memory allows. Closes #139
2026-04-21 11:19:54 +00:00
parent 04fa60a53d
commit eea73f1ce5
1 changed files with 10 additions and 10 deletions
--- a/evolution/quant_selector.py
+++ b/evolution/quant_selector.py
@@ -62,16 +62,6 @@ QUANT_LEVELS = [
        min_memory_headroom_gb=3.0,
        description="3-bit TurboQuant. High quality, ~6x KV compression."
    ),
-    QuantLevel(
-        name="turbo2",
-        bits_per_channel=1.5,
-        compression_ratio=10.0,
-        quality_label="balanced",
-        layer_adaptive=3,
-        kv_type="turbo2",
-        min_memory_headroom_gb=2.0,
-        description="2-bit TurboQuant. Balanced, ~10x KV compression."
-    ),
    QuantLevel(
        name="q4_0",
        bits_per_channel=4.0,
@@ -82,6 +72,16 @@ QUANT_LEVELS = [
        min_memory_headroom_gb=1.5,
        description="Standard 4-bit quant. Fast fallback, no TurboQuant."
    ),
+    QuantLevel(
+        name="turbo2",
+        bits_per_channel=1.5,
+        compression_ratio=10.0,
+        quality_label="balanced",
+        layer_adaptive=3,
+        kv_type="turbo2",
+        min_memory_headroom_gb=2.0,
+        description="2-bit TurboQuant. Balanced, ~10x KV compression."
+    ),
 ]