Alexander Whitestone
|
5bfb9eb000
|
feat: multi-config benchmark comparison suite (Issue #29)
Smoke Test / smoke (pull_request) Successful in 14s
benchmarks/compare_configs.py:
- Runs 4 configs (ollama, llama-f16, llama-turbo4, llama-turbo4-adaptive)
- Aggregates TTFT, tok/s, latency, peak memory
- Picks winner by highest tok/s
- Outputs JSON report + human-readable table
- --demo mode for testing without live servers
tests/test_compare_configs.py (13 tests):
- ConfigEntry, ConfigResult, default configs
- Aggregation logic, winner selection, table format
- Demo mode with and without output file
- Prompt loading from test_prompts.json
Closes #29.
|
2026-04-13 21:42:29 -04:00 |
|