# 🐺 Wolf: Model Evaluation Backbone

Wolf is a multi-model evaluation system for sovereign AI fleets. It automates the process of assigning coding tasks to different AI models, executing them, and scoring their performance based on real-world metrics like CI status, code quality, and PR effectiveness.

## 🚀 Key Features

- **Task Generator**: Automatically creates tasks from Gitea issues or task specifications.
- **Agent Runner**: Executes tasks using various model providers (OpenRouter, Groq, Ollama, OpenAI, Anthropic).
- **PR Creator**: Opens Pull Requests on Gitea with the agent's work.
- **Evaluators**: Scores PRs on multiple dimensions:
  - CI test results.
  - Commit message quality.
  - Code quality vs. boilerplate.
  - Functionality and test inclusion.
  - PR description clarity.
- **Leaderboard**: Tracks model performance over time and identifies "serverless-ready" models.

## 📦 Installation

Wolf is designed to be zero-dependency where possible, requiring only `stdlib` and `requests`.

```bash
pip install requests pyyaml
```

## 🛠️ Configuration

Save your configuration to `~/.hermes/wolf-config.yaml`. See `wolf-config.yaml.example` for a template.

## 🤖 Usage

### Run Pending Tasks
```bash
python3 -m wolf.cli --run
```

### Evaluate Open PRs
```bash
python3 -m wolf.cli --evaluate
```

### Show Leaderboard
```bash
python3 -m wolf.cli --leaderboard
```

### Run as a Cron Job
Add the following to your crontab to run Wolf every hour:
```bash
0 * * * * cd /path/to/wolf && python3 -m wolf.cli --run --evaluate >> ~/.hermes/wolf/cron.log 2>&1
```

## 🧪 Testing

Run the test suite locally:
```bash
python3 -m unittest discover tests
```

---

**"The strength of the pack is the wolf, and the strength of the wolf is the pack."**
*— The Wolf Sovereign Core has spoken.*