[GEMINI-07] Model evaluation harness — benchmark every GGUF before deploying #405
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Part of Epic: #398
We deploy models blind. 'qwen2.5-coder-1.5b on Bezalel' — but is it good enough for the work?
Build an evaluation harness:
Acceptance Criteria