Compare commits
1 Commits
fix/63-per
...
burn/20260
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
78352b0a91 |
@@ -13,12 +13,12 @@ jobs:
|
||||
python-version: '3.11'
|
||||
- name: Parse check
|
||||
run: |
|
||||
find . -name '*.yml' -o -name '*.yaml' | grep -v .gitea | grep -v llama-cpp-fork | xargs -r python3 -c "import sys,yaml; [yaml.safe_load(open(f)) for f in sys.argv[1:]]"
|
||||
find . -name '*.json' | grep -v llama-cpp-fork | while read f; do python3 -m json.tool "$f" > /dev/null || exit 1; done
|
||||
find . -name '*.py' | grep -v llama-cpp-fork | xargs -r python3 -m py_compile
|
||||
find . -name '*.yml' -o -name '*.yaml' | grep -v .gitea | xargs -r python3 -c "import sys,yaml; [yaml.safe_load(open(f)) for f in sys.argv[1:]]"
|
||||
find . -name '*.json' | xargs -r python3 -m json.tool > /dev/null
|
||||
find . -name '*.py' | xargs -r python3 -m py_compile
|
||||
find . -name '*.sh' | xargs -r bash -n
|
||||
echo "PASS: All files parse"
|
||||
- name: Secret scan
|
||||
run: |
|
||||
if grep -rE 'sk-or-|sk-ant-|ghp_|AKIA' . --include='*.yml' --include='*.py' --include='*.sh' 2>/dev/null | grep -v .gitea | grep -v llama-cpp-fork; then exit 1; fi
|
||||
if grep -rE 'sk-or-|sk-ant-|ghp_|AKIA' . --include='*.yml' --include='*.py' --include='*.sh' 2>/dev/null | grep -v .gitea; then exit 1; fi
|
||||
echo "PASS: No secrets"
|
||||
|
||||
@@ -5,16 +5,8 @@ TurboQuant Benchmarking Suite — Multi-Backend (Issue #29)
|
||||
Supports Ollama and llama-server backends with KV cache type configuration.
|
||||
Measures: TTFT, tokens/sec, latency, peak memory.
|
||||
|
||||
IMPORTANT — Perplexity Limitation (Issue #63):
|
||||
Ollama does NOT expose token logprobs. This means:
|
||||
- True perplexity (PPL) cannot be measured via the Ollama backend
|
||||
- The metrics here (tok/s, latency) are throughput proxies, not quality gates
|
||||
- For real perplexity measurement, use benchmarks/run_perplexity.py
|
||||
which calls llama-perplexity directly (--logprobs support)
|
||||
- The pass criterion "PPL delta <= 0.5" cannot be validated via Ollama
|
||||
|
||||
Usage:
|
||||
# Ollama (default) — throughput benchmarks only, NOT perplexity
|
||||
# Ollama (default)
|
||||
python3 benchmarks/run_benchmarks.py --backend ollama --model llama3
|
||||
|
||||
# llama-server with turbo4 KV
|
||||
|
||||
Reference in New Issue
Block a user