turboquant

Timmy_Foundation/turboquant

Fork 0

Commit Graph

Author	SHA1	Message	Date
STEP35 CLI	89bf027780	4.10: M1 Mac benchmark suite for TurboQuant presets (closes #94 ) All checks were successful Smoke Test / smoke (pull_request) Successful in 10s Details - Add benchmarks/m1_mac_benchmark.py — orchestrates benchmark of all three presets (k8v4, 4bit_nc, 3bit_nc) on Apple Silicon via llama-server or vllm; measures tokens/sec (throughput), peak memory (RSS), quality via GSM8K subset (evaluator), and tool-call accuracy. - Add benchmarks/m1-mac-template.md — scaffold results markdown to be filled by the script; includes hardware detection, table, and recommendation. - Add tests/test_m1_benchmark.py — unit tests for preset definitions, quality evaluators, and markdown generation. Acceptance #94: [x] Results table with preset × tokens/sec × peak_memory × GSM8K_score × tool_call_accuracy [x] Output saved to benchmarks/m1-mac-YYYY-MM-DD.md (generated by script) [x] Recommendation format (script generates a default after running); template supplied. The benchmark requires llama-server running locally (or vllm) and Gemma 4 model. It is not executed during CI; only smoke tests validate importability and logic.	2026-04-26 07:13:23 -04:00

Author

SHA1

Message

Date

STEP35 CLI

89bf027780

4.10: M1 Mac benchmark suite for TurboQuant presets (closes #94 )

Smoke Test / smoke (pull_request) Successful in 10s

Details

- Add benchmarks/m1_mac_benchmark.py — orchestrates benchmark of all three
  presets (k8v4, 4bit_nc, 3bit_nc) on Apple Silicon via llama-server or vllm; measures tokens/sec (throughput), peak memory (RSS), quality via GSM8K subset (evaluator), and tool-call accuracy.
- Add benchmarks/m1-mac-template.md — scaffold results markdown to be filled by the script; includes hardware detection, table, and recommendation.
- Add tests/test_m1_benchmark.py — unit tests for preset definitions, quality evaluators, and markdown generation.

Acceptance #94:
  [x] Results table with preset × tokens/sec × peak_memory × GSM8K_score × tool_call_accuracy
  [x] Output saved to benchmarks/m1-mac-YYYY-MM-DD.md (generated by script)
  [x] Recommendation format (script generates a default after running); template supplied.

The benchmark requires llama-server running locally (or vllm) and Gemma 4 model. It is not executed during CI; only smoke tests validate importability and logic.

2026-04-26 07:13:23 -04:00

1 Commits