Wolf is a sovereign multi-model evaluation engine with two real operating modes:
**Wolf** is a multi-model evaluation engine for sovereign AI fleets. It runs prompts against multiple LLM providers, scores responses on relevance, coherence, and safety, and outputs structured JSON results for model selection and ranking.
1. Prompt evaluation mode
- runs a set of prompts against multiple model providers
- scores responses on relevance, coherence, and safety
- emits structured JSON results plus a console leaderboard
2. Legacy task / PR mode
- fetches Gitea issues
- assigns them to configured models/providers
- generates output files and opens PRs
- records task scores in a leaderboard
**Core principle:** agents work, PRs prove it, CI judges it.
Current repo shape observed directly:
- 9 Python modules under `wolf/`
- 5 active test modules under `tests/`
- 63 tests passing across `test_config.py`, `test_evaluator.py`, `test_gitea.py`, `test_models.py`, `test_runner.py`
- two smoke workflows: `.gitea/workflows/smoke.yml` and `.github/workflows/smoke-test.yml`
- a checked-in `GENOME.md` at repo root
**Status:** v1.0.0 — production-ready for prompt evaluation. Legacy PR evaluation module retained for backward compatibility.
-`wolf/config.py` imports `yaml` when available and falls back to a simple parser if PyYAML is absent
- CI installs `pyyaml`
-`requirements.txt` does not list `pyyaml`
So PyYAML is operationally expected in normal use and CI, but not formally pinned in `requirements.txt`.
| Dependency | Used By | Purpose |
|------------|---------|---------|
|`requests` | models.py, gitea.py | HTTP client for all API calls |
| `pyyaml` (optional) | config.py | YAML config parsing (falls back to line parser) |
## Security Considerations
1.Plaintext secrets in config
- model API keys and Gitea tokens are expected via config files
- this is user-controlled but still a secret-handling risk
2.Arbitrary base URLs
- provider configs can point to arbitrary endpoints
- useful for sovereignty, but also expands trust boundaries
3. PR automation blast radius
-`AgentRunner.execute_task()` can create branches, files, and PRs
- bad prompts or weak issue filtering could create noisy or unsafe PRs
4. Prompt-injection exposure
- model prompts and issue bodies are passed through with limited sanitization
5. Leaderboard persistence without locking
-`leaderboard.json` writes are not protected against concurrent writers
## Repository Notes
Notable current-repo facts that the host-repo genome should preserve:
- Wolf already ships its own `GENOME.md` at repo root
- the timmy-home deliverable for issue #683 is therefore a host-repo genome artifact that mirrors / tracks the current wolf repo, not the first genome ever written for wolf
- current smoke workflows exist in both `.gitea/` and `.github/`
1.**API keys in config**: wolf-config.yaml stores provider API keys in plaintext. File should be chmod 600 and excluded from git (already in .gitignore pattern via ~/.hermes/).
2.**Gitea token**: Full access token used for branch creation, file commits, and PR creation. Scoped access recommended.
3.**No input sanitization**: Prompts from Gitea issues are passed directly to models without filtering. Prompt injection risk for automated workflows.
4.**No rate limiting**: Model API calls are sequential with no backoff or rate limiting. Could exhaust API quotas.
5.**Legacy code reference**: `evaluator.py` references `Evaluator = PREvaluator` alias but `cli.py` imports `Evaluator` expecting the legacy class. This works but is confusing.
## File Index
Observed module sizes:
-`wolf/evaluator.py` — 465 lines
-`wolf/runner.py` — 311 lines
-`wolf/models.py` — 120 lines
-`wolf/gitea.py` — 95 lines
-`wolf/cli.py` — 94 lines
-`wolf/leaderboard.py` — 77 lines
-`wolf/task.py`— 63 lines
-`wolf/config.py` — 51 lines
-`wolf/__init__.py` — 12 lines
| File | LOC | Purpose |
|------|-----|---------|
|`wolf/__init__.py` | 12 | Package init, version |
`timmy-config` is a sidecar overlay repository for the Timmy ecosystem. It is **not** a Hermes-agent fork. It provides configuration, deployment automation, and orchestration tooling that wraps around the core Timmy services.
The repo ships its own `GENOME.md` on `main`, making this host-repo artifact a cross-repo genome lane entry that documents `timmy-config`'s role relative to `timmy-home` and the broader fleet.
Current target-repo test health: `python3 -m pytest -q` stops at **7 collection errors** on `main`. This is documented and tracked in upstream issue timmy-config #823.
## Architecture
```mermaid
graph TD
DEPLOY[deploy.sh] --> PLAY[playbooks/]
DEPLOY --> BIN[bin/]
CONFIG[config.yaml] --> ORCH[orchestration.py]
CONFIG --> GITEA[gitea_client.py]
ORCH --> TASKS[tasks.py]
GITEA --> API[Gitea API]
TASKS --> TRAINING[training/]
DOCS[README.md] --> BOUNDARY{timmy-config vs timmy-home\narchitectural boundary}
BOUNDARY --> SIDECAR[Sidecar overlay pattern]
SIDECAR --> HERMES[Hermes ecosystem integration]
```
## Entry Points and Data Flow
### `deploy.sh`
Primary deployment entry point. Orchestrates the rollout of configuration and sidecar services.
### `config.yaml`
Central configuration surface. Feeds into orchestration and task scheduling.
### `gitea_client.py`
Gitea API client. Handles communication with the Forge for issue and PR operations.
### `orchestration.py`
Orchestration engine. Coordinates task execution and deployment workflows.
### `tasks.py`
Task definitions. Contains the concrete work units dispatched by the orchestrator.
## Key Abstractions
- **Sidecar overlay**: `timmy-config` layers on top of core Timmy services without forking the Hermes-agent pattern
- **Control-plane surfaces**: `deploy.sh`, `config.yaml`, `gitea_client.py`, `orchestration.py`, `tasks.py` form the clearest control-plane surfaces
- **Architectural boundary**: The README boundary between `timmy-config` and `timmy-home` is architecturally important
## API Surface
- Gitea client API via `gitea_client.py`
- Task scheduling via `tasks.py`
- Deployment automation via `deploy.sh` and playbooks
## Test Coverage Gaps
- **7 collection errors** on `main` prevent pytest from running any tests
- Related genome: target repo ships its own `GENOME.md` on main
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.