Build Wolf v1.0 — Production multi-model evaluation engine #1
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This is the wolf evaluation system — AI models work, PRs prove it, CI judges it.
Context
Wolf is a Python evaluation framework that runs coding tasks through multiple AI models (OpenRouter, Groq, Nous, MiniMax, direct APIs), commits their output to Gitea feature branches, opens PRs, scores the results, and ranks models for serverless endpoint deployment readiness.
The scaffold exists at
mainwith stub modules. Build the production version.Current Scaffold
wolf/package with stub modules (see existing files)wolf-config.yaml.example— example configtests/— test directory existsWhat to Build (Production Quality)
1. Full Gitea API Client (
wolf/gitea.py)2. Provider Router & Model Abstractions (
wolf/models.py)3. Task System (
wolf/task.py)4. Execution Engine (
wolf/runner.py)5. Scoring System (
wolf/evaluator.py)6. CLI (
wolf/cli.py)wolf run --task <type> --models <list>wolf evaluate --allwolf leaderboard [--json]wolf ready(serverless-ready candidates)wolf cron --schedule "0 */4 * * *"7. Tests
pytest tests/must pass on every PRRequirements
Acceptance Criteria
python -m wolf evaluate --allruns end-to-endpython -m wolf leaderboardshows ranked models