fix: add perplexity limitation docs to run_benchmarks.py (#63)
All checks were successful
Smoke Test / smoke (pull_request) Successful in 14s

This commit is contained in:
2026-04-16 02:53:13 +00:00
parent ea7f89cc2d
commit fa9d4d569b

View File

@@ -5,8 +5,16 @@ TurboQuant Benchmarking Suite — Multi-Backend (Issue #29)
Supports Ollama and llama-server backends with KV cache type configuration.
Measures: TTFT, tokens/sec, latency, peak memory.
IMPORTANT — Perplexity Limitation (Issue #63):
Ollama does NOT expose token logprobs. This means:
- True perplexity (PPL) cannot be measured via the Ollama backend
- The metrics here (tok/s, latency) are throughput proxies, not quality gates
- For real perplexity measurement, use benchmarks/run_perplexity.py
which calls llama-perplexity directly (--logprobs support)
- The pass criterion "PPL delta <= 0.5" cannot be validated via Ollama
Usage:
# Ollama (default)
# Ollama (default) — throughput benchmarks only, NOT perplexity
python3 benchmarks/run_benchmarks.py --backend ollama --model llama3
# llama-server with turbo4 KV