All checks were successful
Smoke Test / smoke (pull_request) Successful in 11s
Adds comprehensive regression test suite for TurboQuant-compressed models
to verify hermes tool calling functionality remains intact after quantization.
- New test: tests/tool_call_regression.py
* Schema contract tests for 5 core tools (read_file, web_search,
terminal, execute_code, delegate_task)
* Parallel tool calling validation
* Profile configuration validation (TurboQuant settings, server flags)
* Live integration tests (skipped unless TURBOQUANT_SERVER_URL set)
* Results matrix generator (benchmarks/tool-call-regression.md)
* Enforces 95% accuracy threshold via pytest assertion
- New results matrix: benchmarks/tool-call-regression.md
* Markdown table logging model/preset/accuracy/per-tool results
* Auto-updates when tests run with --generate-matrix
- CI gate: .gitea/workflows/smoke.yml
* Runs tool call regression suite on every push/PR
* Live tests will fail pipeline if accuracy drops below 95%
Closes #96
240 B
240 B
| Timestamp | Model | Preset | Accuracy | read_file | web_search | terminal | execute_code | delegate_task | Parallel |
|---|