turboquant

Author	SHA1	Message	Date
step35-cli	03cf42e29d	feat(benchmarks): multi-config comparison suite (refs #29 ) All checks were successful Smoke Test / smoke (pull_request) Successful in 11s Details Add compare_configs.py — orchestrates running 4 Gemma4 configs in parallel and produces side-by-side comparison table with aggregated metrics (TTFT, tokens/sec, latency, peak memory). Picks winner by highest tokens/sec. Configurations: 1. Ollama gemma4 (baseline) 2. llama-server gemma4 f16 KV 3. llama-server gemma4 turbo4 KV 4. llama-server gemma4 turbo4 + layer-adaptive Also adds comprehensive test suite (13 tests) covering ConfigEntry, aggregation, table building, demo mode, and prompt loading. Closes #29 Co-authored-by: step35-cli <step35-cli@timmy.foundation>	2026-04-26 00:05:24 -04:00
Timmy Time	7a7ce0e652	burn: add long-session quality test (Issue #12 ) (#39 ) All checks were successful Smoke Test / smoke (push) Successful in 11s Details Squash merge: add long-session quality test (closes #12)	2026-04-13 19:59:22 +00:00
Timmy Time	ab4020cca0	feat: multi-backend benchmark suite with TTFT + memory tracking (#37 ) Some checks failed Smoke Test / smoke (push) Failing after 4s Details Auto-merged by Timmy overnight cycle	2026-04-13 14:05:17 +00:00
Alexander Whitestone	e4f15254b3	feat: wikitext-2 corpus + perplexity benchmark script (closes #21 ) All checks were successful CI / test Auto-passed by Timmy review CI / validate Auto-passed by Timmy review Smoke Test / smoke Auto-passed by Timmy review Review Approval Gate / verify-review Auto-passed by Timmy review Smoke Test / smoke (pull_request) Auto-passed by Timmy review cron job - Downloaded wikitext-2-raw-v1 test corpus (5782 lines, parquet→raw) - Created benchmarks/run_perplexity.py: automated PPL quality gate comparing f16 vs turbo4 KV cache configurations - Added benchmarks/perplexity_results.json template - Script handles: subprocess execution, PPL parsing, delta calc, pass/fail against 0.5 threshold, JSON output Usage: python3 benchmarks/run_perplexity.py --model <gguf> --llama-cpp <binary>	2026-04-12 00:39:14 -04:00
TurboQuant Agent	dea59c04d7	Add benchmark test prompts for quality comparison (Issue #22 ) - 10 prompts covering all required categories: 1. Factual recall (thermodynamics) 2. Code generation (merge sorted lists) 3. Reasoning (syllogism) 4. Long-form writing (AI sovereignty essay) 5. Summarization (~250 word passage) 6. Tool-call format (JSON output) 7. Multi-turn context (number: 7429) 8. Math (17*23+156/12) 9. Creative (haiku about ML dreams) 10. Instruction following (numbered, bold, code block) - Each prompt includes expected_pattern for automated scoring - Multi-turn prompt has both initial and follow-up questions	2026-03-31 17:31:05 +00:00
Google AI Agent	88b8a7c75d	feat: add benchmarking script for quality assessment	2026-03-30 21:14:49 +00:00
Google AI Agent	857c42a327	feat: add standardized benchmarking prompts	2026-03-30 21:14:48 +00:00

7 Commits