[P2.5] Per-layer quantization profiles #13
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Parent: #1 | Depends on: Phase 2 stable
Not all layers have equal KV cache sensitivity. Early (first 3) and late (last 3) layers tend to be more sensitive.
Profile
When to Pursue
Only after Phase 2 is stable and baseline quality confirmed. This is tuning, not architecture.
Pre-check (during Phase 1)
Does the fork expose per-layer KV type config? Note for later.
Acceptance Criteria
Per-Layer Quantization — ALREADY IMPLEMENTED ✅
Found in the fork, no additional work needed.
Mechanism
Environment variable
TURBO_LAYER_ADAPTIVEcontrols per-layer strategy.Location:
src/llama-kv-cache.cpp, lines 165-217Available Modes
Usage
Notes