From 90b5eddfa1b7c1f77d53f62724dc00cddfe517c9 Mon Sep 17 00:00:00 2001 From: Alexander Whitestone Date: Tue, 14 Apr 2026 23:23:38 -0400 Subject: [PATCH] =?UTF-8?q?docs:=20Document=20Ollama=20perplexity=20limita?= =?UTF-8?q?tion=20=E2=80=94=20no=20logprob=20support=20(closes=20#63)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Ollama lacks token logprob API, so true perplexity cannot be measured via the Ollama backend. Added warning to run_benchmarks.py docstring directing users to run_perplexity.py (llama-perplexity binary) for real PPL measurement with --logprobs support. --- benchmarks/run_benchmarks.py | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/benchmarks/run_benchmarks.py b/benchmarks/run_benchmarks.py index 11367f5c..1bbe331b 100644 --- a/benchmarks/run_benchmarks.py +++ b/benchmarks/run_benchmarks.py @@ -5,8 +5,16 @@ TurboQuant Benchmarking Suite — Multi-Backend (Issue #29) Supports Ollama and llama-server backends with KV cache type configuration. Measures: TTFT, tokens/sec, latency, peak memory. +IMPORTANT — Perplexity Limitation (Issue #63): + Ollama does NOT expose token logprobs. This means: + - True perplexity (PPL) cannot be measured via the Ollama backend + - The metrics here (tok/s, latency) are throughput proxies, not quality gates + - For real perplexity measurement, use benchmarks/run_perplexity.py + which calls llama-perplexity directly (--logprobs support) + - The pass criterion "PPL delta <= 0.5" cannot be validated via Ollama + Usage: - # Ollama (default) + # Ollama (default) — throughput benchmarks only, NOT perplexity python3 benchmarks/run_benchmarks.py --backend ollama --model llama3 # llama-server with turbo4 KV