[P1-S2] Baseline benchmarks — FP16 KV cache (no TurboQuant) #6
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Parent: #1 | Depends on: #4 (build)
Establish FP16 KV cache baseline — these are the numbers we compare against.
Tests
llama-perplexityon WikiText-2 with qwen3.5:27b at 8K contextllama-benchgeneration speedfootprint -p <pid>orvmmap --summaryat 8K, 32K, 64K contextReport
Acceptance Criteria
FP16 KV Baseline Benchmarks
Model: Hermes-4-14B Q4_K_M (8.38 GiB, 14.77B params)
Machine: Apple M3 Max, 36GB unified
KV Cache: f16 (default)
Throughput (llama-bench, 3-run average)
Memory (Metal GPU)
Perplexity
⏭️ Skipped — no wikitext-2-raw corpus bundled. Need to download for proper PPL test.
These are the baseline numbers for TurboQuant comparison.