From 90b5eddfa1b7c1f77d53f62724dc00cddfe517c9 Mon Sep 17 00:00:00 2001
From: Alexander Whitestone <alexpaynex@gmail.com>
Date: Tue, 14 Apr 2026 23:23:38 -0400
Subject: [PATCH] =?UTF-8?q?docs:=20Document=20Ollama=20perplexity=20limita?=
 =?UTF-8?q?tion=20=E2=80=94=20no=20logprob=20support=20(closes=20#63)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Ollama lacks token logprob API, so true perplexity cannot be measured
via the Ollama backend. Added warning to run_benchmarks.py docstring
directing users to run_perplexity.py (llama-perplexity binary) for
real PPL measurement with --logprobs support.
---
 benchmarks/run_benchmarks.py | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/benchmarks/run_benchmarks.py b/benchmarks/run_benchmarks.py
index 11367f5c..1bbe331b 100644
--- a/benchmarks/run_benchmarks.py
+++ b/benchmarks/run_benchmarks.py
@@ -5,8 +5,16 @@ TurboQuant Benchmarking Suite — Multi-Backend (Issue #29)
 Supports Ollama and llama-server backends with KV cache type configuration.
 Measures: TTFT, tokens/sec, latency, peak memory.
 
+IMPORTANT — Perplexity Limitation (Issue #63):
+  Ollama does NOT expose token logprobs. This means:
+  - True perplexity (PPL) cannot be measured via the Ollama backend
+  - The metrics here (tok/s, latency) are throughput proxies, not quality gates
+  - For real perplexity measurement, use benchmarks/run_perplexity.py
+    which calls llama-perplexity directly (--logprobs support)
+  - The pass criterion "PPL delta <= 0.5" cannot be validated via Ollama
+
 Usage:
-    # Ollama (default)
+    # Ollama (default) — throughput benchmarks only, NOT perplexity
     python3 benchmarks/run_benchmarks.py --backend ollama --model llama3
 
     # llama-server with turbo4 KV