fix: reorder QUANT_LEVELS so q4_0 (4.0b) comes before turbo2 (1.5b)
All checks were successful
Smoke Test / smoke (pull_request) Successful in 33s

q4_0 has higher bits_per_channel (4.0 vs 1.5), meaning higher quality.
The selection logic iterates QUANT_LEVELS in order, picking the first
level that fits available memory. q4_0 should be tried before turbo2
because it produces better quality output when memory allows.

Closes #139
This commit is contained in:
2026-04-21 11:19:54 +00:00
parent 04fa60a53d
commit eea73f1ce5

View File

@@ -62,16 +62,6 @@ QUANT_LEVELS = [
min_memory_headroom_gb=3.0,
description="3-bit TurboQuant. High quality, ~6x KV compression."
),
QuantLevel(
name="turbo2",
bits_per_channel=1.5,
compression_ratio=10.0,
quality_label="balanced",
layer_adaptive=3,
kv_type="turbo2",
min_memory_headroom_gb=2.0,
description="2-bit TurboQuant. Balanced, ~10x KV compression."
),
QuantLevel(
name="q4_0",
bits_per_channel=4.0,
@@ -82,6 +72,16 @@ QUANT_LEVELS = [
min_memory_headroom_gb=1.5,
description="Standard 4-bit quant. Fast fallback, no TurboQuant."
),
QuantLevel(
name="turbo2",
bits_per_channel=1.5,
compression_ratio=10.0,
quality_label="balanced",
layer_adaptive=3,
kv_type="turbo2",
min_memory_headroom_gb=2.0,
description="2-bit TurboQuant. Balanced, ~10x KV compression."
),
]