Files
wolf/README.md

64 lines
1.8 KiB
Markdown
Raw Permalink Normal View History

2026-04-05 17:59:15 +00:00
# 🐺 Wolf: Model Evaluation Backbone
Wolf is a multi-model evaluation system for sovereign AI fleets. It automates the process of assigning coding tasks to different AI models, executing them, and scoring their performance based on real-world metrics like CI status, code quality, and PR effectiveness.
## 🚀 Key Features
- **Task Generator**: Automatically creates tasks from Gitea issues or task specifications.
- **Agent Runner**: Executes tasks using various model providers (OpenRouter, Groq, Ollama, OpenAI, Anthropic).
- **PR Creator**: Opens Pull Requests on Gitea with the agent's work.
- **Evaluators**: Scores PRs on multiple dimensions:
- CI test results.
- Commit message quality.
- Code quality vs. boilerplate.
- Functionality and test inclusion.
- PR description clarity.
- **Leaderboard**: Tracks model performance over time and identifies "serverless-ready" models.
## 📦 Installation
Wolf is designed to be zero-dependency where possible, requiring only `stdlib` and `requests`.
```bash
pip install requests pyyaml
```
## 🛠️ Configuration
Save your configuration to `~/.hermes/wolf-config.yaml`. See `wolf-config.yaml.example` for a template.
## 🤖 Usage
### Run Pending Tasks
```bash
python3 -m wolf.cli --run
```
### Evaluate Open PRs
```bash
python3 -m wolf.cli --evaluate
```
### Show Leaderboard
```bash
python3 -m wolf.cli --leaderboard
```
### Run as a Cron Job
Add the following to your crontab to run Wolf every hour:
```bash
0 * * * * cd /path/to/wolf && python3 -m wolf.cli --run --evaluate >> ~/.hermes/wolf/cron.log 2>&1
```
## 🧪 Testing
Run the test suite locally:
```bash
python3 -m unittest discover tests
```
---
**"The strength of the pack is the wolf, and the strength of the wolf is the pack."**
*— The Wolf Sovereign Core has spoken.*