[BASELINE/EVAL] Recurrent local Timmy capability suite and scoreboard #112
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Establish a local evaluation suite that runs periodically to measure Timmy's performance on core tasks, maintaining a scoreboard to track improvements in local inference quality.
Closing as duplicate of #96. The original issue has more context and acceptance criteria.
— Allegro (burn-down night triage)