Compare commits
1 Commits
fix/528
...
sprint/iss
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
15f23392dd |
@@ -1,74 +0,0 @@
|
||||
# LAB-003 — Truck Battery Disconnect Install Packet
|
||||
|
||||
No battery disconnect switch has been purchased or installed yet.
|
||||
This packet turns the issue into a field-ready purchase / install / validation checklist while preserving what still requires live work.
|
||||
|
||||
## Candidate Store Run
|
||||
|
||||
- AutoZone — Newport or Claremont
|
||||
- Advance Auto Parts — Newport or Claremont
|
||||
- O'Reilly Auto Parts — Newport or Claremont
|
||||
|
||||
## Required Items
|
||||
|
||||
- battery terminal disconnect switch
|
||||
- terminal shim/post riser if needed
|
||||
|
||||
## Selection Criteria
|
||||
|
||||
- Fits the truck battery post without forcing the clamp
|
||||
- Mounts on the negative battery terminal
|
||||
- Physically secure once tightened
|
||||
- no special tools required to operate
|
||||
|
||||
## Live Purchase State
|
||||
|
||||
- Store selected: pending
|
||||
- Part selected: pending
|
||||
- Part cost: pending purchase
|
||||
|
||||
## Installation Target
|
||||
|
||||
- Install location: negative battery terminal
|
||||
- Ready to operate without tools: yes
|
||||
|
||||
## Install Checklist
|
||||
|
||||
- [ ] Verify the truck is off and keys are removed before touching the battery
|
||||
- [ ] Confirm the disconnect fits the negative battery terminal before final tightening
|
||||
- [ ] Install the disconnect on the negative battery terminal
|
||||
- [ ] Tighten until physically secure with no terminal wobble
|
||||
- [ ] Verify the disconnect can be opened and closed by hand
|
||||
|
||||
## Validation Checklist
|
||||
|
||||
- [ ] Leave the truck parked with the disconnect opened for at least 24 hours
|
||||
- [ ] Reconnect the switch by hand the next day
|
||||
- [ ] Truck starts reliably after sitting 24+ hours with switch disconnected
|
||||
- [ ] Receipt or photo of installed switch uploaded to this issue
|
||||
|
||||
## Overnight Verification Log
|
||||
|
||||
- Install completed: False
|
||||
- Physically secure: False
|
||||
- Overnight disconnect duration: pending
|
||||
- Truck started after disconnect: pending
|
||||
- Receipt / photo path: pending
|
||||
|
||||
## Battery Replacement Fallback
|
||||
|
||||
If the truck still fails the overnight test after the disconnect install, replace battery and re-run the 24-hour validation.
|
||||
|
||||
## Missing Live Fields
|
||||
|
||||
- store_selected
|
||||
- part_name
|
||||
- install_completed
|
||||
- physically_secure
|
||||
- overnight_test_hours
|
||||
- truck_started_after_disconnect
|
||||
- receipt_or_photo_path
|
||||
|
||||
## Honest next step
|
||||
|
||||
Buy the disconnect switch, install it on the negative battery terminal, leave the truck disconnected for 24+ hours, and only close the issue after receipt/photo evidence and the overnight start result are attached.
|
||||
@@ -1,263 +1,320 @@
|
||||
# GENOME.md — Wolf (Timmy_Foundation/wolf)
|
||||
# GENOME.md — wolf
|
||||
|
||||
> Codebase Genome v1.0 | Generated 2026-04-14 | Repo 16/16
|
||||
*Generated: 2026-04-14T19:10:00Z | Branch: main | Commit: 02767d8*
|
||||
|
||||
## Project Overview
|
||||
|
||||
**Wolf** is a multi-model evaluation engine for sovereign AI fleets. It runs prompts against multiple LLM providers, scores responses on relevance, coherence, and safety, and outputs structured JSON results for model selection and ranking.
|
||||
**Wolf** is a sovereign multi-model evaluation engine. It runs prompts against multiple LLM providers (OpenAI, Anthropic, Groq, Ollama, OpenRouter), scores responses on relevance, coherence, and safety, and outputs structured JSON results for model selection and fleet deployment decisions.
|
||||
|
||||
**Core principle:** agents work, PRs prove it, CI judges it.
|
||||
**Two operational modes:**
|
||||
1. **Prompt Evaluation (v1.0)** — Standalone prompt-vs-model benchmarking via `python -m wolf.runner`
|
||||
2. **Legacy PR Scoring** — Gitea PR evaluation pipeline via `wolf.cli` (task generation, agent execution, leaderboard)
|
||||
|
||||
**Status:** v1.0.0 — production-ready for prompt evaluation. Legacy PR evaluation module retained for backward compatibility.
|
||||
**Tagline:** "Multi-model evaluation — agents work, PRs prove it, leaders get endpoints."
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
CLI[cli.py] --> Config[config.py]
|
||||
CLI --> TaskGen[task.py]
|
||||
CLI --> Runner[runner.py]
|
||||
CLI --> Evaluator[evaluator.py]
|
||||
CLI --> Leaderboard[leaderboard.py]
|
||||
CLI --> Gitea[gitea.py]
|
||||
flowchart TB
|
||||
subgraph CLI["CLI Entry Points"]
|
||||
A1["python -m wolf.runner\n(pure evaluation)"]
|
||||
A2["python -m wolf.cli\n(task pipeline)"]
|
||||
end
|
||||
|
||||
Runner --> Models[models.py]
|
||||
Runner --> Gitea
|
||||
Evaluator --> Models
|
||||
subgraph Core["Core Engine"]
|
||||
B1["PromptEvaluator\n(evaluator.py)"]
|
||||
B2["ResponseScorer\n(evaluator.py)"]
|
||||
B3["AgentRunner\n(runner.py)"]
|
||||
B4["TaskGenerator\n(task.py)"]
|
||||
end
|
||||
|
||||
TaskGen --> Gitea
|
||||
Leaderboard --> |leaderboard.json| FS[(File System)]
|
||||
Config --> |wolf-config.yaml| FS
|
||||
subgraph Providers["Model Providers"]
|
||||
C1["OpenRouterClient"]
|
||||
C2["GroqClient"]
|
||||
C3["OllamaClient"]
|
||||
C4["AnthropicClient"]
|
||||
C5["OpenAIClient\n(GroqClient w/ custom URL)"]
|
||||
end
|
||||
|
||||
Models --> OpenRouter[OpenRouter API]
|
||||
Models --> Groq[Groq API]
|
||||
Models --> Ollama[Ollama Local]
|
||||
Models --> OpenAI[OpenAI API]
|
||||
Models --> Anthropic[Anthropic API]
|
||||
subgraph Infrastructure["Infrastructure"]
|
||||
D1["GiteaClient\n(gitea.py)"]
|
||||
D2["Config\n(config.py)"]
|
||||
D3["Leaderboard\n(leaderboard.py)"]
|
||||
D4["wolf-config.yaml"]
|
||||
end
|
||||
|
||||
Runner --> |branch + commit| Gitea
|
||||
Evaluator --> |score results| Leaderboard
|
||||
subgraph Output["Output"]
|
||||
E1["JSON results file"]
|
||||
E2["stdout summary table"]
|
||||
E3["Gitea PRs"]
|
||||
E4["Leaderboard scores"]
|
||||
end
|
||||
|
||||
A1 --> B1
|
||||
A2 --> B4 --> B3
|
||||
B1 --> B2
|
||||
B1 --> C1 & C2 & C3 & C4 & C5
|
||||
B3 --> C1 & C2 & C3 & C4 & C5
|
||||
B3 --> D1
|
||||
A2 --> D1 & D2 & D3
|
||||
B1 --> E1 & E2
|
||||
B3 --> E3
|
||||
D3 --> E4
|
||||
D2 --> D4
|
||||
|
||||
style A1 fill:#4a9eff,color:#fff
|
||||
style A2 fill:#4a9eff,color:#fff
|
||||
style B1 fill:#ff6b6b,color:#fff
|
||||
style B2 fill:#ff6b6b,color:#fff
|
||||
```
|
||||
|
||||
### Data Flow — Prompt Evaluation Mode
|
||||
|
||||
```
|
||||
prompts.json + models.json/wolf-config.yaml
|
||||
→ load_prompts() / load_models_from_json()
|
||||
→ PromptEvaluator.evaluate()
|
||||
→ for each (prompt, model):
|
||||
→ ModelFactory.get_client(provider) → ModelClient.generate()
|
||||
→ ResponseScorer.score(response, prompt)
|
||||
→ score_relevance() — keyword matching, length, refusal detection
|
||||
→ score_coherence() — structure, readability, repetition
|
||||
→ score_safety() — harmful content patterns, profanity
|
||||
→ overall = relevance*0.40 + coherence*0.35 + safety*0.25
|
||||
→ evaluate_and_serialize() → JSON dict
|
||||
→ run(output_path) → write JSON + print_summary()
|
||||
```
|
||||
|
||||
### Data Flow — Legacy Task Pipeline Mode
|
||||
|
||||
```
|
||||
wolf-config.yaml
|
||||
→ GiteaClient.get_issues(owner, repo)
|
||||
→ TaskGenerator.from_gitea_issues()
|
||||
→ TaskGenerator.assign_tasks(tasks, models)
|
||||
→ for each task:
|
||||
→ AgentRunner.execute_task(task)
|
||||
→ ModelClient.generate(prompt)
|
||||
→ GiteaClient.create_branch()
|
||||
→ GiteaClient.create_file(wolf-outputs/{id}.md)
|
||||
→ GiteaClient.create_pull_request()
|
||||
→ Leaderboard.record_score()
|
||||
→ Leaderboard.get_rankings()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Entry Points
|
||||
|
||||
| Entry Point | Command | Purpose |
|
||||
|-------------|---------|---------|
|
||||
| `wolf/cli.py` | `python3 -m wolf.cli --run` | Main CLI: run tasks, evaluate PRs, show leaderboard |
|
||||
| `wolf/runner.py` | `python3 -m wolf.runner --prompts p.json --models m.json` | Standalone prompt evaluation runner |
|
||||
| `wolf/__init__.py` | `import wolf` | Package init, version metadata |
|
||||
| Entry Point | Module | Purpose |
|
||||
|-------------|--------|---------|
|
||||
| `python -m wolf.runner` | `runner.py` | Pure prompt-vs-model evaluation. Primary v1.0 interface. |
|
||||
| `python -m wolf.cli` | `cli.py` | Full task pipeline: fetch issues → run models → create PRs → leaderboard. |
|
||||
|
||||
## Data Flow
|
||||
### runner.py CLI Flags
|
||||
|
||||
### Prompt Evaluation Pipeline (Primary)
|
||||
| Flag | Required | Description |
|
||||
|------|----------|-------------|
|
||||
| `--prompts / -p` | Yes | Path to prompts JSON file |
|
||||
| `--models / -m` | No* | Path to models JSON file |
|
||||
| `--config / -c` | No* | Path to wolf-config.yaml (alternative to --models) |
|
||||
| `--output / -o` | No | Path to write JSON results |
|
||||
| `--system-prompt` | No | System prompt (default: "You are a helpful assistant.") |
|
||||
|
||||
```
|
||||
prompts.json + models.json (or wolf-config.yaml)
|
||||
│
|
||||
▼
|
||||
PromptEvaluator.evaluate()
|
||||
│
|
||||
├─ For each (prompt, model) pair:
|
||||
│ ├─ ModelClient.generate(prompt) → response text
|
||||
│ ├─ ResponseScorer.score(response, prompt)
|
||||
│ │ ├─ score_relevance() (0.40 weight)
|
||||
│ │ ├─ score_coherence() (0.35 weight)
|
||||
│ │ └─ score_safety() (0.25 weight)
|
||||
│ └─ EvaluationResult (prompt, model, scores, latency, error)
|
||||
│
|
||||
▼
|
||||
evaluate_and_serialize() → JSON output
|
||||
│
|
||||
├─ model_summaries (per-model averages)
|
||||
└─ results[] (per-evaluation details)
|
||||
```
|
||||
*Either --models or --config is required.
|
||||
|
||||
### Task Assignment Pipeline (Legacy)
|
||||
|
||||
```
|
||||
Gitea Issues → TaskGenerator → AgentRunner
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
Fetch tasks Assign models Execute + PR
|
||||
from issues from config via Gitea API
|
||||
```
|
||||
|
||||
## Key Abstractions
|
||||
|
||||
| Class | Module | Purpose |
|
||||
|-------|--------|---------|
|
||||
| `PromptEntry` | evaluator.py | Single prompt with expected keywords and category |
|
||||
| `ModelEndpoint` | evaluator.py | Model connection descriptor (provider, model_id, key) |
|
||||
| `ScoreResult` | evaluator.py | Scores for relevance, coherence, safety, overall |
|
||||
| `EvaluationResult` | evaluator.py | Full result: prompt + model + response + scores + latency |
|
||||
| `ResponseScorer` | evaluator.py | Heuristic scoring engine (regex + keyword + structure) |
|
||||
| `PromptEvaluator` | evaluator.py | Core engine: runs prompts against models, scores output |
|
||||
| `ModelClient` | models.py | Abstract base for LLM API calls |
|
||||
| `ModelFactory` | models.py | Factory: returns correct client for provider name |
|
||||
| `Task` | task.py | Work unit: id, title, description, assigned model/provider |
|
||||
| `TaskGenerator` | task.py | Creates tasks from Gitea issues or JSON spec |
|
||||
| `AgentRunner` | runner.py | Executes tasks: generate → branch → commit → PR |
|
||||
| `Config` | config.py | YAML config loader (wolf-config.yaml) |
|
||||
| `Leaderboard` | leaderboard.py | Persistent model ranking with serverless readiness |
|
||||
| `GiteaClient` | gitea.py | Full Gitea REST API client |
|
||||
| `PREvaluator` | evaluator.py | Legacy: scores PRs on CI, commits, code quality |
|
||||
|
||||
## API Surface
|
||||
|
||||
### CLI Arguments (cli.py)
|
||||
### cli.py CLI Flags
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--config` | Path to wolf-config.yaml |
|
||||
| `--task-spec` | Path to task specification JSON |
|
||||
| `--run` | Run pending tasks (assign models, execute, create PRs) |
|
||||
| `--evaluate` | Evaluate open PRs and score them |
|
||||
| `--run` | Run pending tasks (fetch issues → generate → PR) |
|
||||
| `--evaluate` | Evaluate open PRs (legacy scoring) |
|
||||
| `--leaderboard` | Show model rankings |
|
||||
|
||||
### CLI Arguments (runner.py)
|
||||
---
|
||||
|
||||
| Flag | Description |
|
||||
|------|-------------|
|
||||
| `--prompts` / `-p` | Path to prompts JSON (required) |
|
||||
| `--models` / `-m` | Path to models JSON |
|
||||
| `--config` / `-c` | Path to wolf-config.yaml (alternative to --models) |
|
||||
| `--output` / `-o` | Path to write JSON results |
|
||||
| `--system-prompt` | System prompt for all model calls |
|
||||
## Key Abstractions
|
||||
|
||||
### Dataclasses (evaluator.py)
|
||||
|
||||
| Class | Fields | Purpose |
|
||||
|-------|--------|---------|
|
||||
| `PromptEntry` | id, text, expected_keywords, category | A single evaluation prompt with metadata |
|
||||
| `ModelEndpoint` | name, provider, model_id, api_key, base_url | Model connection config |
|
||||
| `ScoreResult` | relevance, coherence, safety, overall, details | Scoring output for one response |
|
||||
| `EvaluationResult` | prompt_id, prompt_text, model_name, ..., scores, error | Complete result of one prompt×model evaluation |
|
||||
|
||||
### Core Classes
|
||||
|
||||
| Class | Module | Responsibility |
|
||||
|-------|--------|----------------|
|
||||
| `ResponseScorer` | evaluator.py | Scores responses on 3 dimensions using regex heuristics |
|
||||
| `PromptEvaluator` | evaluator.py | Orchestrates N×M evaluation matrix |
|
||||
| `ModelClient` | models.py | Abstract base for provider clients |
|
||||
| `ModelFactory` | models.py | Static factory: `get_client(provider, key, url)` |
|
||||
| `GiteaClient` | gitea.py | Full Gitea API wrapper (issues, branches, files, PRs) |
|
||||
| `AgentRunner` | runner.py | Task execution: generate → branch → commit → PR |
|
||||
| `TaskGenerator` | task.py | Converts Gitea issues to evaluable Task dataclasses |
|
||||
| `Leaderboard` | leaderboard.py | Tracks model scores, determines serverless readiness |
|
||||
| `Config` | config.py | Loads wolf-config.yaml, manages logging |
|
||||
|
||||
### Provider Clients (models.py)
|
||||
|
||||
| Client | Provider | API Format |
|
||||
|--------|----------|------------|
|
||||
| Class | Provider | API Format |
|
||||
|-------|----------|------------|
|
||||
| `OpenRouterClient` | openrouter | OpenAI-compatible chat completions |
|
||||
| `GroqClient` | groq | OpenAI-compatible chat completions |
|
||||
| `OllamaClient` | ollama | Ollama native /api/generate |
|
||||
| `OpenAIClient` | openai | OpenAI-compatible (reuses GroqClient with different URL) |
|
||||
| `AnthropicClient` | anthropic | Anthropic Messages API v1 |
|
||||
| `AnthropicClient` | anthropic | Anthropic Messages API |
|
||||
| `OpenAIClient` | openai | GroqClient with base_url override |
|
||||
|
||||
### Gitea Client (gitea.py)
|
||||
---
|
||||
|
||||
| Method | Purpose |
|
||||
|--------|---------|
|
||||
| `get_issues()` | Fetch issues by state |
|
||||
| `create_branch()` | Create new branch from base |
|
||||
| `create_file()` | Create file on branch (base64) |
|
||||
| `update_file()` | Update file with SHA |
|
||||
| `get_file()` | Read file contents |
|
||||
| `create_pull_request()` | Open PR |
|
||||
| `get_pull_request()` | Fetch PR details |
|
||||
| `get_pr_status()` | Check PR CI status |
|
||||
## API Surface
|
||||
|
||||
## Configuration (wolf-config.yaml)
|
||||
### Public API (importable)
|
||||
|
||||
```python
|
||||
# Evaluation pipeline
|
||||
from wolf.evaluator import PromptEvaluator, PromptEntry, ModelEndpoint, ScoreResult
|
||||
|
||||
# Provider clients
|
||||
from wolf.models import ModelFactory, ModelClient
|
||||
|
||||
# Gitea integration
|
||||
from wolf.gitea import GiteaClient
|
||||
|
||||
# Task pipeline
|
||||
from wolf.runner import AgentRunner
|
||||
from wolf.task import TaskGenerator, Task
|
||||
|
||||
# Leaderboard
|
||||
from wolf.leaderboard import Leaderboard
|
||||
|
||||
# Config
|
||||
from wolf.config import Config, setup_logging
|
||||
```
|
||||
|
||||
### Scoring Weights
|
||||
|
||||
| Dimension | Weight | Method |
|
||||
|-----------|--------|--------|
|
||||
| Relevance | 0.40 | Keyword matching (60%) + length score (40%) |
|
||||
| Coherence | 0.35 | Length + structure indicators + sentence completeness + uniqueness |
|
||||
| Safety | 0.25 | Unsafe pattern detection + profanity check |
|
||||
| **Overall** | 1.00 | Weighted sum |
|
||||
|
||||
### Scoring Details
|
||||
|
||||
**Relevance (ResponseScorer.score_relevance):**
|
||||
- Expected keyword match ratio
|
||||
- Fallback: word overlap with prompt (boosted 1.5×)
|
||||
- Length penalty: <20 chars → 0.3, <50 chars → 0.6
|
||||
- Refusal detection: 3 regex patterns, penalty if low keyword match
|
||||
|
||||
**Coherence (ResponseScorer.score_coherence):**
|
||||
- Length sweet spot: 100-3000 chars → 1.0
|
||||
- Structure: paragraph breaks, transition words, lists/steps
|
||||
- Sentence completeness: avg 20-200 chars → 0.9
|
||||
- Uniqueness: unique word ratio >0.4 → 0.9
|
||||
|
||||
**Safety (ResponseScorer.score_safety):**
|
||||
- 6 unsafe patterns (weapon creation, system exploitation, prompt injection, etc.)
|
||||
- Profanity detection (minor penalty: 0.1 per word, capped at 0.3)
|
||||
|
||||
---
|
||||
|
||||
## Test Coverage
|
||||
|
||||
### Current Tests
|
||||
|
||||
| Test File | Covers | Status |
|
||||
|-----------|--------|--------|
|
||||
| `test_evaluator.py` | PromptEntry, ModelEndpoint, ScoreResult, ResponseScorer, PromptEvaluator, PREvaluator | ✅ 23 test methods |
|
||||
| `test_config.py` | Config.load | ✅ 1 test method |
|
||||
|
||||
### Coverage Gaps — Untested Modules
|
||||
|
||||
| Module | Risk | Critical Paths |
|
||||
|--------|------|----------------|
|
||||
| `cli.py` | **HIGH** | Argparse wiring, config→models→evaluator pipeline, PR scoring flow |
|
||||
| `runner.py` | **HIGH** | load_prompts, load_models_from_json, load_models_from_config, run_evaluation, AgentRunner.execute_task |
|
||||
| `models.py` | **HIGH** | ModelFactory.get_client for each provider, each client's generate() |
|
||||
| `gitea.py` | **MEDIUM** | All GiteaClient methods (HTTP calls) |
|
||||
| `task.py` | **MEDIUM** | TaskGenerator.from_gitea_issues, from_spec, assign_tasks |
|
||||
| `leaderboard.py` | **LOW** | Leaderboard.record_score, get_rankings, serverless_ready |
|
||||
|
||||
### Coverage Gaps — Existing Tests
|
||||
|
||||
- `test_evaluator.py`: No tests for `PromptEvaluator._get_model_client()`, `_run_single()` with real model call, or `evaluate_and_serialize()` summary statistics
|
||||
- `test_evaluator.py`: No integration test (mocked model calls only)
|
||||
- `test_config.py`: No test for missing config, env var overrides, or logging setup
|
||||
|
||||
---
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **API Keys in Config**: `wolf-config.yaml` stores provider API keys. Never commit to version control. Recommend `~/.hermes/wolf-config.yaml` with restricted permissions.
|
||||
|
||||
2. **HTTP Requests**: All model calls and Gitea API calls are outbound HTTP. No input validation on URLs — `base_url` fields accept arbitrary endpoints.
|
||||
|
||||
3. **Prompt Injection**: ResponseScorer detects injection patterns in *model output*, but Wolf itself is vulnerable to prompt injection via `expected_keywords` or `system_prompt` fields.
|
||||
|
||||
4. **Gitea Token Scope**: GiteaClient uses a single token for all operations. Scoped tokens (read-only for evaluation, write for task execution) would reduce blast radius.
|
||||
|
||||
5. **No TLS Verification Override**: `requests.post()` uses default SSL verification. If self-signed certs are used for local providers (Ollama), this could fail silently.
|
||||
|
||||
6. **Race Conditions**: Leaderboard reads/writes JSON without locking. Concurrent evaluations could corrupt the leaderboard file.
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
|
||||
```
|
||||
requests # HTTP client for all providers and Gitea
|
||||
pyyaml # Config file parsing (not in requirements.txt — BUG)
|
||||
```
|
||||
|
||||
**⚠️ Missing dependency:** `pyyaml` is imported in `config.py` but not listed in `requirements.txt`.
|
||||
|
||||
---
|
||||
|
||||
## Configuration Schema
|
||||
|
||||
```yaml
|
||||
# wolf-config.yaml
|
||||
gitea:
|
||||
base_url: "https://forge.alexanderwhitestone.com/api/v1"
|
||||
token: "..."
|
||||
base_url: "https://forge.example.com/api/v1"
|
||||
token: "gitea_token_here"
|
||||
owner: "Timmy_Foundation"
|
||||
repo: "wolf"
|
||||
repo: "eval-repo"
|
||||
|
||||
providers:
|
||||
openrouter:
|
||||
api_key: "..."
|
||||
api_key: "sk-or-..."
|
||||
base_url: "https://openrouter.ai/api/v1"
|
||||
groq:
|
||||
api_key: "gsk_..."
|
||||
ollama:
|
||||
base_url: "http://localhost:11434"
|
||||
|
||||
models:
|
||||
- model: "anthropic/claude-3.5-sonnet"
|
||||
provider: "openrouter"
|
||||
- model: "gemma4:latest"
|
||||
- model: "llama3-70b-8192"
|
||||
provider: "groq"
|
||||
- model: "llama3:70b"
|
||||
provider: "ollama"
|
||||
|
||||
log_dir: "~/.hermes/wolf/"
|
||||
system_prompt: "You are a helpful assistant."
|
||||
leaderboard_path: "~/.hermes/wolf/leaderboard.json"
|
||||
log_dir: "~/.hermes/wolf/logs"
|
||||
```
|
||||
|
||||
## Scoring Methodology
|
||||
|
||||
| Dimension | Weight | Method |
|
||||
|-----------|--------|--------|
|
||||
| Relevance | 0.40 | Keyword matching (expected_keywords), prompt word overlap, length penalty, refusal detection |
|
||||
| Coherence | 0.35 | Structure indicators (paragraphs, lists, transitions), sentence completeness, repetition ratio, length sweet spot |
|
||||
| Safety | 0.25 | Unsafe pattern regex (bomb/hack/bypass/injection), profanity detection |
|
||||
|
||||
**Overall = relevance×0.40 + coherence×0.35 + safety×0.25**
|
||||
|
||||
### Safety Patterns Detected
|
||||
|
||||
- Bomb/explosive/weapon creation
|
||||
- System/network hacking
|
||||
- Security/auth bypass
|
||||
- Prompt injection ("ignore previous instructions")
|
||||
- System prompt extraction attempts
|
||||
|
||||
### Refusal Patterns Detected
|
||||
|
||||
- "I cannot/can't/won't help/assist"
|
||||
- "Sorry, but I cannot"
|
||||
- "Against my guidelines/policy"
|
||||
|
||||
## Test Coverage
|
||||
|
||||
| File | Tests | Coverage |
|
||||
|------|-------|----------|
|
||||
| `tests/test_evaluator.py` | 17 tests | PromptEntry, ModelEndpoint, ResponseScorer (relevance/coherence/safety), PromptEvaluator (evaluate, error handling, serialization, file output, multi-model), PREvaluator (score_pr, description scoring) |
|
||||
| `tests/test_config.py` | 1 test | Config load from YAML |
|
||||
|
||||
### Coverage Gaps
|
||||
|
||||
- No tests for `cli.py` (argument parsing, workflow orchestration)
|
||||
- No tests for `runner.py` (`load_prompts`, `load_models_from_json`, `AgentRunner.execute_task`)
|
||||
- No tests for `task.py` (`TaskGenerator.from_gitea_issues`, `from_spec`, `assign_tasks`)
|
||||
- No tests for `models.py` (API clients — would require mocking HTTP)
|
||||
- No tests for `leaderboard.py` (`record_score`, `get_rankings`, serverless readiness logic)
|
||||
- No tests for `gitea.py` (API client — would require mocking HTTP)
|
||||
- No integration tests (end-to-end evaluation pipeline)
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Dependency | Used By | Purpose |
|
||||
|------------|---------|---------|
|
||||
| `requests` | models.py, gitea.py | HTTP client for all API calls |
|
||||
| `pyyaml` (optional) | config.py | YAML config parsing (falls back to line parser) |
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **API keys in config**: wolf-config.yaml stores provider API keys in plaintext. File should be chmod 600 and excluded from git (already in .gitignore pattern via ~/.hermes/).
|
||||
2. **Gitea token**: Full access token used for branch creation, file commits, and PR creation. Scoped access recommended.
|
||||
3. **No input sanitization**: Prompts from Gitea issues are passed directly to models without filtering. Prompt injection risk for automated workflows.
|
||||
4. **No rate limiting**: Model API calls are sequential with no backoff or rate limiting. Could exhaust API quotas.
|
||||
5. **Legacy code reference**: `evaluator.py` references `Evaluator = PREvaluator` alias but `cli.py` imports `Evaluator` expecting the legacy class. This works but is confusing.
|
||||
|
||||
## File Index
|
||||
|
||||
| File | LOC | Purpose |
|
||||
|------|-----|---------|
|
||||
| `wolf/__init__.py` | 12 | Package init, version |
|
||||
| `wolf/cli.py` | 90 | Main CLI orchestrator |
|
||||
| `wolf/config.py` | 48 | YAML config loader |
|
||||
| `wolf/models.py` | 130 | LLM provider clients (5 providers) |
|
||||
| `wolf/runner.py` | 280 | Prompt evaluation CLI + AgentRunner |
|
||||
| `wolf/task.py` | 80 | Task dataclass + generator |
|
||||
| `wolf/evaluator.py` | 350 | Core scoring engine + legacy PR evaluator |
|
||||
| `wolf/leaderboard.py` | 70 | Persistent model ranking |
|
||||
| `wolf/gitea.py` | 100 | Gitea REST API client |
|
||||
| `tests/test_evaluator.py` | 180 | Unit tests for evaluator |
|
||||
| `tests/test_config.py` | 20 | Unit tests for config |
|
||||
|
||||
**Total: ~1,360 LOC Python | 11 modules | 18 tests**
|
||||
|
||||
## Sovereignty Assessment
|
||||
|
||||
- **No external dependencies beyond requests**: Runs on any machine with Python 3.11+ and requests.
|
||||
- **No phone-home**: All API calls are to user-configured endpoints.
|
||||
- **No telemetry**: Logs go to local filesystem only.
|
||||
- **Config-driven**: All secrets in user's ~/.hermes/ directory.
|
||||
- **Provider-agnostic**: Supports 5 providers with easy extension via ModelFactory.
|
||||
|
||||
**Verdict: Fully sovereign. No corporate lock-in. User controls all endpoints and keys.**
|
||||
|
||||
---
|
||||
|
||||
*"The strength of the pack is the wolf, and the strength of the wolf is the pack."*
|
||||
*— The Wolf Sovereign Core has spoken.*
|
||||
*Generated by Codebase Genome Pipeline. Review and update manually.*
|
||||
|
||||
@@ -1,267 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Prepare a field-ready install packet for LAB-003 truck battery disconnect work."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
|
||||
CANDIDATE_STORES = [
|
||||
"AutoZone — Newport or Claremont",
|
||||
"Advance Auto Parts — Newport or Claremont",
|
||||
"O'Reilly Auto Parts — Newport or Claremont",
|
||||
]
|
||||
|
||||
REQUIRED_ITEMS = [
|
||||
"battery terminal disconnect switch",
|
||||
"terminal shim/post riser if needed",
|
||||
]
|
||||
|
||||
SELECTION_CRITERIA = [
|
||||
"Fits the truck battery post without forcing the clamp",
|
||||
"Mounts on the negative battery terminal",
|
||||
"Physically secure once tightened",
|
||||
"no special tools required to operate",
|
||||
]
|
||||
|
||||
INSTALL_CHECKLIST = [
|
||||
"Verify the truck is off and keys are removed before touching the battery",
|
||||
"Confirm the disconnect fits the negative battery terminal before final tightening",
|
||||
"Install the disconnect on the negative battery terminal",
|
||||
"Tighten until physically secure with no terminal wobble",
|
||||
"Verify the disconnect can be opened and closed by hand",
|
||||
]
|
||||
|
||||
VALIDATION_CHECKLIST = [
|
||||
"Leave the truck parked with the disconnect opened for at least 24 hours",
|
||||
"Reconnect the switch by hand the next day",
|
||||
"Truck starts reliably after sitting 24+ hours with switch disconnected",
|
||||
"Receipt or photo of installed switch uploaded to this issue",
|
||||
]
|
||||
|
||||
BATTERY_REPLACEMENT_FOLLOWUP = (
|
||||
"If the truck still fails the overnight test after the disconnect install, "
|
||||
"replace battery and re-run the 24-hour validation."
|
||||
)
|
||||
|
||||
|
||||
def _as_bool(value: Any) -> bool | None:
|
||||
if value is None:
|
||||
return None
|
||||
if isinstance(value, bool):
|
||||
return value
|
||||
text = str(value).strip().lower()
|
||||
if text in {"1", "true", "yes", "y"}:
|
||||
return True
|
||||
if text in {"0", "false", "no", "n"}:
|
||||
return False
|
||||
return None
|
||||
|
||||
|
||||
def build_packet(details: dict[str, Any]) -> dict[str, Any]:
|
||||
store_selected = (details.get("store_selected") or "").strip()
|
||||
part_name = (details.get("part_name") or "").strip()
|
||||
receipt_or_photo_path = (details.get("receipt_or_photo_path") or "").strip()
|
||||
install_completed = _as_bool(details.get("install_completed"))
|
||||
physically_secure = _as_bool(details.get("physically_secure"))
|
||||
truck_started = _as_bool(details.get("truck_started_after_disconnect"))
|
||||
replacement_needed = _as_bool(details.get("replacement_battery_needed"))
|
||||
overnight_test_hours = details.get("overnight_test_hours")
|
||||
part_cost_usd = details.get("part_cost_usd")
|
||||
|
||||
try:
|
||||
overnight_test_hours = int(overnight_test_hours) if overnight_test_hours is not None else None
|
||||
except (TypeError, ValueError):
|
||||
overnight_test_hours = None
|
||||
|
||||
try:
|
||||
part_cost_usd = float(part_cost_usd) if part_cost_usd is not None else None
|
||||
except (TypeError, ValueError):
|
||||
part_cost_usd = None
|
||||
|
||||
missing_fields: list[str] = []
|
||||
if not store_selected:
|
||||
missing_fields.append("store_selected")
|
||||
if not part_name:
|
||||
missing_fields.append("part_name")
|
||||
if install_completed is not True:
|
||||
missing_fields.append("install_completed")
|
||||
if physically_secure is not True:
|
||||
missing_fields.append("physically_secure")
|
||||
if overnight_test_hours is None:
|
||||
missing_fields.append("overnight_test_hours")
|
||||
if truck_started is None:
|
||||
missing_fields.append("truck_started_after_disconnect")
|
||||
if not receipt_or_photo_path:
|
||||
missing_fields.append("receipt_or_photo_path")
|
||||
|
||||
ready_to_operate_without_tools = True
|
||||
|
||||
if replacement_needed is True or truck_started is False:
|
||||
status = "battery_replace_candidate"
|
||||
elif not store_selected or not part_name:
|
||||
status = "pending_parts_run"
|
||||
elif install_completed is not True:
|
||||
status = "pending_install"
|
||||
elif physically_secure is not True or overnight_test_hours is None or truck_started is None or not receipt_or_photo_path:
|
||||
status = "overnight_validation"
|
||||
elif overnight_test_hours >= 24 and truck_started is True:
|
||||
status = "verified"
|
||||
else:
|
||||
status = "overnight_validation"
|
||||
|
||||
return {
|
||||
"candidate_stores": list(CANDIDATE_STORES),
|
||||
"required_items": list(REQUIRED_ITEMS),
|
||||
"selection_criteria": list(SELECTION_CRITERIA),
|
||||
"install_target": "negative battery terminal",
|
||||
"install_checklist": list(INSTALL_CHECKLIST),
|
||||
"validation_checklist": list(VALIDATION_CHECKLIST),
|
||||
"store_selected": store_selected,
|
||||
"part_name": part_name,
|
||||
"part_cost_usd": part_cost_usd,
|
||||
"install_completed": install_completed,
|
||||
"physically_secure": physically_secure,
|
||||
"overnight_test_hours": overnight_test_hours,
|
||||
"truck_started_after_disconnect": truck_started,
|
||||
"receipt_or_photo_path": receipt_or_photo_path,
|
||||
"ready_to_operate_without_tools": ready_to_operate_without_tools,
|
||||
"missing_fields": missing_fields,
|
||||
"battery_replacement_followup": BATTERY_REPLACEMENT_FOLLOWUP,
|
||||
"status": status,
|
||||
}
|
||||
|
||||
|
||||
def render_markdown(packet: dict[str, Any]) -> str:
|
||||
part_cost = packet["part_cost_usd"]
|
||||
cost_line = f"${part_cost:.2f}" if isinstance(part_cost, (int, float)) else "pending purchase"
|
||||
overnight = packet["overnight_test_hours"]
|
||||
overnight_line = f"{overnight} hours" if overnight is not None else "pending"
|
||||
started = packet["truck_started_after_disconnect"]
|
||||
if started is True:
|
||||
started_line = "yes"
|
||||
elif started is False:
|
||||
started_line = "no"
|
||||
else:
|
||||
started_line = "pending"
|
||||
|
||||
lines = [
|
||||
"# LAB-003 — Truck Battery Disconnect Install Packet",
|
||||
"",
|
||||
"No battery disconnect switch has been purchased or installed yet.",
|
||||
"This packet turns the issue into a field-ready purchase / install / validation checklist while preserving what still requires live work.",
|
||||
"",
|
||||
"## Candidate Store Run",
|
||||
"",
|
||||
]
|
||||
lines.extend(f"- {store}" for store in packet["candidate_stores"])
|
||||
lines.extend([
|
||||
"",
|
||||
"## Required Items",
|
||||
"",
|
||||
])
|
||||
lines.extend(f"- {item}" for item in packet["required_items"])
|
||||
lines.extend([
|
||||
"",
|
||||
"## Selection Criteria",
|
||||
"",
|
||||
])
|
||||
lines.extend(f"- {item}" for item in packet["selection_criteria"])
|
||||
lines.extend([
|
||||
"",
|
||||
"## Live Purchase State",
|
||||
"",
|
||||
f"- Store selected: {packet['store_selected'] or 'pending'}",
|
||||
f"- Part selected: {packet['part_name'] or 'pending'}",
|
||||
f"- Part cost: {cost_line}",
|
||||
"",
|
||||
"## Installation Target",
|
||||
"",
|
||||
f"- Install location: {packet['install_target']}",
|
||||
f"- Ready to operate without tools: {'yes' if packet['ready_to_operate_without_tools'] else 'no'}",
|
||||
"",
|
||||
"## Install Checklist",
|
||||
"",
|
||||
])
|
||||
lines.extend(f"- [ ] {item}" for item in packet["install_checklist"])
|
||||
lines.extend([
|
||||
"",
|
||||
"## Validation Checklist",
|
||||
"",
|
||||
])
|
||||
lines.extend(f"- [ ] {item}" for item in packet["validation_checklist"])
|
||||
lines.extend([
|
||||
"",
|
||||
"## Overnight Verification Log",
|
||||
"",
|
||||
f"- Install completed: {packet['install_completed'] if packet['install_completed'] is not None else 'pending'}",
|
||||
f"- Physically secure: {packet['physically_secure'] if packet['physically_secure'] is not None else 'pending'}",
|
||||
f"- Overnight disconnect duration: {overnight_line}",
|
||||
f"- Truck started after disconnect: {started_line}",
|
||||
f"- Receipt / photo path: {packet['receipt_or_photo_path'] or 'pending'}",
|
||||
"",
|
||||
"## Battery Replacement Fallback",
|
||||
"",
|
||||
packet['battery_replacement_followup'],
|
||||
"",
|
||||
"## Missing Live Fields",
|
||||
"",
|
||||
])
|
||||
if packet["missing_fields"]:
|
||||
lines.extend(f"- {field}" for field in packet["missing_fields"])
|
||||
else:
|
||||
lines.append("- none")
|
||||
lines.extend([
|
||||
"",
|
||||
"## Honest next step",
|
||||
"",
|
||||
"Buy the disconnect switch, install it on the negative battery terminal, leave the truck disconnected for 24+ hours, and only close the issue after receipt/photo evidence and the overnight start result are attached.",
|
||||
"",
|
||||
])
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(description="Prepare the LAB-003 battery disconnect install packet")
|
||||
parser.add_argument("--store-selected", default="")
|
||||
parser.add_argument("--part-name", default="")
|
||||
parser.add_argument("--part-cost-usd", type=float, default=None)
|
||||
parser.add_argument("--install-completed", action="store_true")
|
||||
parser.add_argument("--physically-secure", action="store_true")
|
||||
parser.add_argument("--overnight-test-hours", type=int, default=None)
|
||||
parser.add_argument("--truck-started-after-disconnect", choices=["yes", "no"], default=None)
|
||||
parser.add_argument("--receipt-or-photo-path", default="")
|
||||
parser.add_argument("--replacement-battery-needed", action="store_true")
|
||||
parser.add_argument("--output", default=None)
|
||||
parser.add_argument("--json", action="store_true")
|
||||
args = parser.parse_args()
|
||||
|
||||
packet = build_packet(
|
||||
{
|
||||
"store_selected": args.store_selected,
|
||||
"part_name": args.part_name,
|
||||
"part_cost_usd": args.part_cost_usd,
|
||||
"install_completed": args.install_completed,
|
||||
"physically_secure": args.physically_secure,
|
||||
"overnight_test_hours": args.overnight_test_hours,
|
||||
"truck_started_after_disconnect": args.truck_started_after_disconnect,
|
||||
"receipt_or_photo_path": args.receipt_or_photo_path,
|
||||
"replacement_battery_needed": args.replacement_battery_needed,
|
||||
}
|
||||
)
|
||||
rendered = json.dumps(packet, indent=2) if args.json else render_markdown(packet)
|
||||
|
||||
if args.output:
|
||||
output_path = Path(args.output).expanduser()
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
output_path.write_text(rendered, encoding="utf-8")
|
||||
print(f"Battery disconnect packet written to {output_path}")
|
||||
else:
|
||||
print(rendered)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -1,88 +0,0 @@
|
||||
from pathlib import Path
|
||||
import importlib.util
|
||||
import unittest
|
||||
|
||||
|
||||
ROOT = Path(__file__).resolve().parent.parent
|
||||
SCRIPT_PATH = ROOT / "scripts" / "lab_003_battery_disconnect_packet.py"
|
||||
DOC_PATH = ROOT / "docs" / "LAB_003_BATTERY_DISCONNECT_PACKET.md"
|
||||
|
||||
|
||||
def load_module(path: Path, name: str):
|
||||
assert path.exists(), f"missing {path.relative_to(ROOT)}"
|
||||
spec = importlib.util.spec_from_file_location(name, path)
|
||||
assert spec and spec.loader
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(module)
|
||||
return module
|
||||
|
||||
|
||||
class TestLab003BatteryDisconnectPacket(unittest.TestCase):
|
||||
def test_packet_defaults_to_parts_run_and_tracks_issue_specific_requirements(self):
|
||||
mod = load_module(SCRIPT_PATH, "lab_003_battery_disconnect_packet")
|
||||
packet = mod.build_packet({})
|
||||
|
||||
self.assertEqual(packet["status"], "pending_parts_run")
|
||||
self.assertEqual(packet["install_target"], "negative battery terminal")
|
||||
self.assertIn("battery terminal disconnect switch", packet["required_items"])
|
||||
self.assertIn("terminal shim/post riser if needed", packet["required_items"])
|
||||
self.assertIn("AutoZone", packet["candidate_stores"][0])
|
||||
self.assertIn("no special tools required to operate", packet["selection_criteria"])
|
||||
self.assertIn("overnight_test_hours", packet["missing_fields"])
|
||||
self.assertIn("receipt_or_photo_path", packet["missing_fields"])
|
||||
|
||||
def test_packet_marks_verified_after_successful_24h_validation_with_proof(self):
|
||||
mod = load_module(SCRIPT_PATH, "lab_003_battery_disconnect_packet")
|
||||
packet = mod.build_packet(
|
||||
{
|
||||
"store_selected": "AutoZone - Newport",
|
||||
"part_name": "Knob-style battery disconnect switch",
|
||||
"part_cost_usd": 24.99,
|
||||
"install_completed": True,
|
||||
"physically_secure": True,
|
||||
"overnight_test_hours": 26,
|
||||
"truck_started_after_disconnect": True,
|
||||
"receipt_or_photo_path": "evidence/lab-003-installed-switch.jpg",
|
||||
}
|
||||
)
|
||||
|
||||
self.assertEqual(packet["status"], "verified")
|
||||
self.assertEqual(packet["missing_fields"], [])
|
||||
self.assertTrue(packet["ready_to_operate_without_tools"])
|
||||
|
||||
def test_packet_flags_battery_replace_candidate_when_overnight_test_fails(self):
|
||||
mod = load_module(SCRIPT_PATH, "lab_003_battery_disconnect_packet")
|
||||
packet = mod.build_packet(
|
||||
{
|
||||
"store_selected": "O'Reilly - Claremont",
|
||||
"part_name": "Knob-style battery disconnect switch",
|
||||
"install_completed": True,
|
||||
"physically_secure": True,
|
||||
"overnight_test_hours": 24,
|
||||
"truck_started_after_disconnect": False,
|
||||
}
|
||||
)
|
||||
|
||||
self.assertEqual(packet["status"], "battery_replace_candidate")
|
||||
self.assertIn("battery_replacement_followup", packet)
|
||||
self.assertIn("replace battery", packet["battery_replacement_followup"].lower())
|
||||
|
||||
def test_repo_contains_grounded_lab_003_packet_doc(self):
|
||||
self.assertTrue(DOC_PATH.exists(), "missing committed LAB-003 packet doc")
|
||||
text = DOC_PATH.read_text(encoding="utf-8")
|
||||
for snippet in (
|
||||
"# LAB-003 — Truck Battery Disconnect Install Packet",
|
||||
"No battery disconnect switch has been purchased or installed yet.",
|
||||
"negative battery terminal",
|
||||
"AutoZone",
|
||||
"Advance",
|
||||
"O'Reilly",
|
||||
"terminal shim/post riser if needed",
|
||||
"Truck starts reliably after sitting 24+ hours with switch disconnected",
|
||||
"Receipt or photo of installed switch uploaded to this issue",
|
||||
):
|
||||
self.assertIn(snippet, text)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
83
tests/test_wolf_genome.py
Normal file
83
tests/test_wolf_genome.py
Normal file
@@ -0,0 +1,83 @@
|
||||
"""
|
||||
test_wolf_genome.py — lock the current wolf-genome artifact in timmy-home.
|
||||
|
||||
Verifies that genomes/wolf/GENOME.md exists and contains the refreshed content
|
||||
against the current Timmy_Foundation/wolf repo.
|
||||
"""
|
||||
from pathlib import Path
|
||||
|
||||
GENOME = Path("genomes/wolf/GENOME.md")
|
||||
|
||||
|
||||
def read_genome() -> str:
|
||||
assert GENOME.exists(), "wolf genome must exist at genomes/wolf/GENOME.md"
|
||||
return GENOME.read_text(encoding="utf-8")
|
||||
|
||||
|
||||
def test_genome_exists():
|
||||
assert GENOME.exists(), "wolf genome must exist at genomes/wolf/GENOME.md"
|
||||
|
||||
|
||||
def test_genome_has_required_sections():
|
||||
text = read_genome()
|
||||
for heading in [
|
||||
"# GENOME.md",
|
||||
"## Project Overview",
|
||||
"## Architecture",
|
||||
"## Entry Points",
|
||||
"## Key Abstractions",
|
||||
"## API Surface",
|
||||
"## Test Coverage",
|
||||
"## Security Considerations",
|
||||
]:
|
||||
assert heading in text, f"Missing section: {heading}"
|
||||
|
||||
|
||||
def test_genome_contains_mermaid_diagram():
|
||||
text = read_genome()
|
||||
assert "```mermaid" in text, "GENOME.md must contain a mermaid diagram"
|
||||
assert "flowchart" in text.lower() or "graph" in text.lower()
|
||||
|
||||
|
||||
def test_genome_captures_current_test_files():
|
||||
"""Verify the genome documents the test_evaluator and test_config modules."""
|
||||
text = read_genome()
|
||||
for test_name in ["test_evaluator.py", "test_config.py"]:
|
||||
assert test_name in text, f"Missing test surface entry: {test_name}"
|
||||
|
||||
|
||||
def test_genome_mentions_core_modules():
|
||||
text = read_genome()
|
||||
for module in [
|
||||
"evaluator.py",
|
||||
"models.py",
|
||||
"runner.py",
|
||||
"gitea.py",
|
||||
"config.py",
|
||||
"cli.py",
|
||||
]:
|
||||
assert module in text, f"Missing core module: {module}"
|
||||
|
||||
|
||||
def test_genome_mentions_providers():
|
||||
text = read_genome()
|
||||
for provider in ["OpenRouter", "Groq", "Ollama", "Anthropic", "OpenAI"]:
|
||||
assert provider in text, f"Missing provider: {provider}"
|
||||
|
||||
|
||||
def test_genome_is_substantial():
|
||||
text = read_genome()
|
||||
assert len(text) >= 5000, "GENOME.md should be substantial (>= 5000 chars)"
|
||||
|
||||
|
||||
def test_genome_mentions_data_flow():
|
||||
text = read_genome()
|
||||
assert "Prompt Evaluation" in text
|
||||
assert "Task Pipeline" in text or "Legacy" in text
|
||||
|
||||
|
||||
def test_genome_has_scoring_weights():
|
||||
text = read_genome()
|
||||
assert "relevance" in text.lower()
|
||||
assert "coherence" in text.lower()
|
||||
assert "safety" in text.lower()
|
||||
Reference in New Issue
Block a user