Timmy (Step35)
|
2fca513e26
|
test: add tool call regression suite with CI gate (issue #96)
Smoke Test / smoke (pull_request) Successful in 11s
Adds comprehensive regression test suite for TurboQuant-compressed models
to verify hermes tool calling functionality remains intact after quantization.
- New test: tests/tool_call_regression.py
* Schema contract tests for 5 core tools (read_file, web_search,
terminal, execute_code, delegate_task)
* Parallel tool calling validation
* Profile configuration validation (TurboQuant settings, server flags)
* Live integration tests (skipped unless TURBOQUANT_SERVER_URL set)
* Results matrix generator (benchmarks/tool-call-regression.md)
* Enforces 95% accuracy threshold via pytest assertion
- New results matrix: benchmarks/tool-call-regression.md
* Markdown table logging model/preset/accuracy/per-tool results
* Auto-updates when tests run with --generate-matrix
- CI gate: .gitea/workflows/smoke.yml
* Runs tool call regression suite on every push/PR
* Live tests will fail pipeline if accuracy drops below 95%
Closes #96
|
2026-04-29 00:13:35 -04:00 |
|