README.md

# 🐺 Wolf: Model Evaluation Backbone

Wolf is a multi-model evaluation system for sovereign AI fleets. It automates the process of assigning coding tasks to different AI models, executing them, and scoring their performance based on real-world metrics like CI status, code quality, and PR effectiveness.

## 🚀 Key Features

- **Task Generator**: Automatically creates tasks from Gitea issues or task specifications.
- **Agent Runner**: Executes tasks using various model providers (OpenRouter, Groq, Ollama, OpenAI, Anthropic).
- **PR Creator**: Opens Pull Requests on Gitea with the agent's work.
- **Evaluators**: Scores PRs on multiple dimensions:
  - CI test results.
  - Commit message quality.
  - Code quality vs. boilerplate.
  - Functionality and test inclusion.
  - PR description clarity.
- **Leaderboard**: Tracks model performance over time and identifies "serverless-ready" models.

## 📦 Installation

Wolf is designed to be zero-dependency where possible, requiring only `stdlib` and `requests`.

```bash
pip install requests pyyaml
```

## 🛠️ Configuration

Save your configuration to `~/.hermes/wolf-config.yaml`. See `wolf-config.yaml.example` for a template.

## 🤖 Usage

### Run Pending Tasks
```bash
python3 -m wolf.cli --run
```

### Evaluate Open PRs
```bash
python3 -m wolf.cli --evaluate
```

### Show Leaderboard
```bash
python3 -m wolf.cli --leaderboard
```

### Run as a Cron Job
Add the following to your crontab to run Wolf every hour:
```bash
0 * * * * cd /path/to/wolf && python3 -m wolf.cli --run --evaluate >> ~/.hermes/wolf/cron.log 2>&1
```

## 🧪 Testing

Run the test suite locally:
```bash
python3 -m unittest discover tests
```

---

**"The strength of the pack is the wolf, and the strength of the wolf is the pack."**
*— The Wolf Sovereign Core has spoken.*
Add/Update README.md by Wolf 2026-04-05 17:59:15 +00:00			`# 🐺 Wolf: Model Evaluation Backbone`

			`Wolf is a multi-model evaluation system for sovereign AI fleets. It automates the process of assigning coding tasks to different AI models, executing them, and scoring their performance based on real-world metrics like CI status, code quality, and PR effectiveness.`

			`## 🚀 Key Features`

			`- Task Generator: Automatically creates tasks from Gitea issues or task specifications.`
			`- Agent Runner: Executes tasks using various model providers (OpenRouter, Groq, Ollama, OpenAI, Anthropic).`
			`- PR Creator: Opens Pull Requests on Gitea with the agent's work.`
			`- Evaluators: Scores PRs on multiple dimensions:`
			`- CI test results.`
			`- Commit message quality.`
			`- Code quality vs. boilerplate.`
			`- Functionality and test inclusion.`
			`- PR description clarity.`
			`- Leaderboard: Tracks model performance over time and identifies "serverless-ready" models.`

			`## 📦 Installation`

			Wolf is designed to be zero-dependency where possible, requiring only `stdlib` and `requests`.

			```bash
			`pip install requests pyyaml`
			```

			`## 🛠️ Configuration`

			Save your configuration to `~/.hermes/wolf-config.yaml`. See `wolf-config.yaml.example` for a template.

			`## 🤖 Usage`

			`### Run Pending Tasks`
			```bash
			`python3 -m wolf.cli --run`
			```

			`### Evaluate Open PRs`
			```bash
			`python3 -m wolf.cli --evaluate`
			```

			`### Show Leaderboard`
			```bash
			`python3 -m wolf.cli --leaderboard`
			```

			`### Run as a Cron Job`
			`Add the following to your crontab to run Wolf every hour:`
			```bash
			`0 * * * * cd /path/to/wolf && python3 -m wolf.cli --run --evaluate >> ~/.hermes/wolf/cron.log 2>&1`
			```

			`## 🧪 Testing`

			`Run the test suite locally:`
			```bash
			`python3 -m unittest discover tests`
			```

			`---`

			`"The strength of the pack is the wolf, and the strength of the wolf is the pack."`
			`— The Wolf Sovereign Core has spoken.`