33 lines
1.1 KiB
Markdown
33 lines
1.1 KiB
Markdown
|
|
# Tool Call Regression Results
|
||
|
|
|
||
|
|
**Generated:** 2026-04-16T01:56:48.462512+00:00
|
||
|
|
**Model:** dry-run
|
||
|
|
**Endpoint:** none
|
||
|
|
**KV Type:** none
|
||
|
|
|
||
|
|
## Summary
|
||
|
|
|
||
|
|
| Metric | Value |
|
||
|
|
|--------|-------|
|
||
|
|
| Total tests | 10 |
|
||
|
|
| Passed | 10 |
|
||
|
|
| Failed | 0 |
|
||
|
|
| Accuracy | 100.0% |
|
||
|
|
| Threshold | 100% |
|
||
|
|
| Verdict | PASS |
|
||
|
|
|
||
|
|
## Test Matrix
|
||
|
|
|
||
|
|
| Test ID | Tool Expected | Tool Called | Schema | Args | Latency | Status |
|
||
|
|
|---------|--------------|-------------|--------|------|---------|--------|
|
||
|
|
| read_file_basic | read_file | none | OK | OK | 0ms | PASS |
|
||
|
|
| read_file_offset | read_file | none | OK | OK | 0ms | PASS |
|
||
|
|
| web_search_basic | web_search | none | OK | OK | 0ms | PASS |
|
||
|
|
| terminal_basic | terminal | none | OK | OK | 0ms | PASS |
|
||
|
|
| terminal_complex | terminal | none | OK | OK | 0ms | PASS |
|
||
|
|
| code_exec_basic | execute_code | none | OK | OK | 0ms | PASS |
|
||
|
|
| code_exec_complex | execute_code | none | OK | OK | 0ms | PASS |
|
||
|
|
| delegate_basic | delegate_task | none | OK | OK | 0ms | PASS |
|
||
|
|
| delegate_context | delegate_task | none | OK | OK | 0ms | PASS |
|
||
|
|
| parallel_two | read_file | none | OK | OK | 0ms | PASS |
|