fix(benchmarks): separate quality measurement from efficiency proxy (issue #63)
All checks were successful
Smoke Test / smoke (pull_request) Successful in 27s
All checks were successful
Smoke Test / smoke (pull_request) Successful in 27s
- Add --quality flag to run_benchmarks.py that delegates to llama-perplexity - Clarify token/sec is an efficiency metric, not perplexity - Ollama cannot provide true logprob-based PPL (no logprob API) - Quality gate now runs llama-perplexity binary directly when requested Closes #63
This commit is contained in:
@@ -1,8 +1,9 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
TurboQuant Perplexity Quality Gate (Issue #21)
|
||||
TurboQuant Perplexity Quality Gate (Issues #21, #63)
|
||||
|
||||
Compares text generation quality between f16 KV and turbo4 KV cache
|
||||
Measures true perplexity via llama-perplexity binary (logprob-based).
|
||||
Ollama cannot provide perplexity due to missing logprob API (issue #63).
|
||||
configurations using llama.cpp's perplexity tool on the wikitext-2 corpus.
|
||||
|
||||
Usage:
|
||||
|
||||
Reference in New Issue
Block a user