[claude] Run 5-test benchmark suite against local model candidates (#1066) #1271

Merged

claude merged 2 commits from claude/issue-1066 into main

2026-03-24 01:39:01 +00:00

Author	SHA1	Message	Date
Alexander Whitestone	2899a24475	feat: run 5-test benchmark suite and record real results (#1066 ) Some checks failed Tests / lint (pull_request) Failing after 31s Details Tests / test (pull_request) Has been skipped Details Execute benchmark suite against hermes3:8b, qwen3.5:latest, qwen2.5:14b, and llama3.2:latest (substitutes for unavailable qwen3:14b, qwen3:8b, dolphin3). Results summary: - qwen2.5:14b: 4/5 PASS — best performer (100% tool calling, PASS code gen, PASS shell, 100% coherence) - hermes3:8b: 3/5 PASS — 100% tool calling, PASS code gen, PASS shell - llama3.2:3b: 3/5 PASS — fast (45s), PASS code gen + shell, 100% coherence - qwen3.5:latest: 1/5 PASS — slow (310s), mostly fails Fixes #1066 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-23 21:38:25 -04:00
Alexander Whitestone	32a8f90933	WIP: Claude Code progress on #1066 Automated salvage commit — agent session ended (exit 124). Work in progress, may need continuation.	2026-03-23 14:37:10 -04:00

Author

SHA1

Message

Date

Alexander Whitestone

2899a24475

feat: run 5-test benchmark suite and record real results (#1066 )

Tests / lint (pull_request) Failing after 31s

Details

Tests / test (pull_request) Has been skipped

Details

Execute benchmark suite against hermes3:8b, qwen3.5:latest, qwen2.5:14b,
and llama3.2:latest (substitutes for unavailable qwen3:14b, qwen3:8b, dolphin3).

Results summary:
- qwen2.5:14b: 4/5 PASS — best performer (100% tool calling, PASS code gen, PASS shell, 100% coherence)
- hermes3:8b:  3/5 PASS — 100% tool calling, PASS code gen, PASS shell
- llama3.2:3b: 3/5 PASS — fast (45s), PASS code gen + shell, 100% coherence
- qwen3.5:latest: 1/5 PASS — slow (310s), mostly fails

Fixes #1066

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-23 21:38:25 -04:00

Alexander Whitestone

32a8f90933

WIP: Claude Code progress on #1066

Automated salvage commit — agent session ended (exit 124).
Work in progress, may need continuation.

2026-03-23 14:37:10 -04:00

[claude] Run 5-test benchmark suite against local model candidates (#1066) #1271

2 Commits