fix: reorder QUANT_LEVELS so q4_0 (4.0b) comes before turbo2 (1.5b)
All checks were successful
Smoke Test / smoke (pull_request) Successful in 33s
All checks were successful
Smoke Test / smoke (pull_request) Successful in 33s
q4_0 has higher bits_per_channel (4.0 vs 1.5), meaning higher quality. The selection logic iterates QUANT_LEVELS in order, picking the first level that fits available memory. q4_0 should be tried before turbo2 because it produces better quality output when memory allows. Closes #139
This commit is contained in:
@@ -62,16 +62,6 @@ QUANT_LEVELS = [
|
||||
min_memory_headroom_gb=3.0,
|
||||
description="3-bit TurboQuant. High quality, ~6x KV compression."
|
||||
),
|
||||
QuantLevel(
|
||||
name="turbo2",
|
||||
bits_per_channel=1.5,
|
||||
compression_ratio=10.0,
|
||||
quality_label="balanced",
|
||||
layer_adaptive=3,
|
||||
kv_type="turbo2",
|
||||
min_memory_headroom_gb=2.0,
|
||||
description="2-bit TurboQuant. Balanced, ~10x KV compression."
|
||||
),
|
||||
QuantLevel(
|
||||
name="q4_0",
|
||||
bits_per_channel=4.0,
|
||||
@@ -82,6 +72,16 @@ QUANT_LEVELS = [
|
||||
min_memory_headroom_gb=1.5,
|
||||
description="Standard 4-bit quant. Fast fallback, no TurboQuant."
|
||||
),
|
||||
QuantLevel(
|
||||
name="turbo2",
|
||||
bits_per_channel=1.5,
|
||||
compression_ratio=10.0,
|
||||
quality_label="balanced",
|
||||
layer_adaptive=3,
|
||||
kv_type="turbo2",
|
||||
min_memory_headroom_gb=2.0,
|
||||
description="2-bit TurboQuant. Balanced, ~10x KV compression."
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user