Commit Graph

3 Commits

Author SHA1 Message Date
ccbcc8ab7b fix(benchmarks): separate quality measurement from efficiency proxy (issue #63)
All checks were successful
Smoke Test / smoke (pull_request) Successful in 27s
- Add --quality flag to run_benchmarks.py that delegates to llama-perplexity
- Clarify token/sec is an efficiency metric, not perplexity
- Ollama cannot provide true logprob-based PPL (no logprob API)
- Quality gate now runs llama-perplexity binary directly when requested

Closes #63
2026-04-26 10:55:40 -04:00
ab4020cca0 feat: multi-backend benchmark suite with TTFT + memory tracking (#37)
Some checks failed
Smoke Test / smoke (push) Failing after 4s
Auto-merged by Timmy overnight cycle
2026-04-13 14:05:17 +00:00
88b8a7c75d feat: add benchmarking script for quality assessment 2026-03-30 21:14:49 +00:00