benchmarks/tool-call-regression.md

| Timestamp | Model | Preset | Accuracy | read_file | web_search | terminal | execute_code | delegate_task | Parallel |
|-----------|-------|--------|----------|-----------|------------|----------|--------------|---------------|----------|
test: add tool call regression suite with CI gate (issue #96) Adds comprehensive regression test suite for TurboQuant-compressed models to verify hermes tool calling functionality remains intact after quantization. - New test: tests/tool_call_regression.py * Schema contract tests for 5 core tools (read_file, web_search, terminal, execute_code, delegate_task) * Parallel tool calling validation * Profile configuration validation (TurboQuant settings, server flags) * Live integration tests (skipped unless TURBOQUANT_SERVER_URL set) * Results matrix generator (benchmarks/tool-call-regression.md) * Enforces 95% accuracy threshold via pytest assertion - New results matrix: benchmarks/tool-call-regression.md * Markdown table logging model/preset/accuracy/per-tool results * Auto-updates when tests run with --generate-matrix - CI gate: .gitea/workflows/smoke.yml * Runs tool call regression suite on every push/PR * Live tests will fail pipeline if accuracy drops below 95% Closes #96 2026-04-29 00:13:35 -04:00			`\| Timestamp \| Model \| Preset \| Accuracy \| read_file \| web_search \| terminal \| execute_code \| delegate_task \| Parallel \|`
			`\|-----------\|-------\|--------\|----------\|-----------\|------------\|----------\|--------------\|---------------\|----------\|`