Files
turboquant/benchmarks/tool-call-regression.md
Timmy (Step35) 2fca513e26
All checks were successful
Smoke Test / smoke (pull_request) Successful in 11s
test: add tool call regression suite with CI gate (issue #96)
Adds comprehensive regression test suite for TurboQuant-compressed models
to verify hermes tool calling functionality remains intact after quantization.

- New test: tests/tool_call_regression.py
  * Schema contract tests for 5 core tools (read_file, web_search,
    terminal, execute_code, delegate_task)
  * Parallel tool calling validation
  * Profile configuration validation (TurboQuant settings, server flags)
  * Live integration tests (skipped unless TURBOQUANT_SERVER_URL set)
  * Results matrix generator (benchmarks/tool-call-regression.md)
  * Enforces 95% accuracy threshold via pytest assertion

- New results matrix: benchmarks/tool-call-regression.md
  * Markdown table logging model/preset/accuracy/per-tool results
  * Auto-updates when tests run with --generate-matrix

- CI gate: .gitea/workflows/smoke.yml
  * Runs tool call regression suite on every push/PR
  * Live tests will fail pipeline if accuracy drops below 95%

Closes #96
2026-04-29 00:13:35 -04:00

3 lines
240 B
Markdown

| Timestamp | Model | Preset | Accuracy | read_file | web_search | terminal | execute_code | delegate_task | Parallel |
|-----------|-------|--------|----------|-----------|------------|----------|--------------|---------------|----------|