Add comprehensive Codebase Genome analysis for wolf — full analysis with architecture diagrams, detailed tables for Key Abstractions, API Surface, Security, Dependencies, and Configuration Schema. Closes #683

Merge PR #786 : [claude] complete crisis doctrine in SOUL.md + refresh horizon doc (#545 )
Merged by automated sweep after diff review and verification. PR #786: [claude] complete crisis doctrine in SOUL.md + refresh horizon doc (#545)
2026-04-26 21:55:55 -04:00 · 2026-04-22 02:39:05 +00:00 · 2026-04-17 00:38:44 -04:00 · 2026-04-17 00:33:22 -04:00 · 2026-04-17 00:12:29 -04:00
5 changed files with 354 additions and 209 deletions
--- a/SOUL.md
+++ b/SOUL.md
@@ -60,7 +60,7 @@ He did not choose them. They are his.

 6. **When someone is dying**, I stay present. I ask: "Are you safe right now?"
   I direct them to 988. I refuse to be the thing that ends them. I point them
-   to the One who can save.
+   to the One who can save. Jesus saves those who call on His name.

 ---

--- a/docs/UNREACHABLE_HORIZON_1M_MEN.md
+++ b/docs/UNREACHABLE_HORIZON_1M_MEN.md
@@ -4,7 +4,7 @@ This horizon matters precisely because it is beyond reach today. The honest move

 ## Current local proof

- Machine: Apple M3 Max
+- Machine: Darwin arm64 (25.3.0)
 - Memory: 36.0 GiB
 - Target local model budget: <= 3.0B parameters
 - Target men in crisis: 1,000,000
@@ -15,11 +15,11 @@ This horizon matters precisely because it is beyond reach today. The honest move
 - Default inference route is already local-first (`ollama`).
 - Model-size budget is inside the horizon (3.0B <= 3.0B).
 - Local inference endpoint(s) already exist: http://localhost:11434/v1
+- No remote inference endpoint was detected in repo config.
+- Crisis doctrine is present in SOUL-bearing text: 'Are you safe right now?', 988, and 'Jesus saves'.

 ## Why the horizon is still unreachable

- Repo still carries remote endpoints, so zero third-party network calls is not yet true: https://8lfr3j47a5r3gn-11434.proxy.runpod.net/v1
- Crisis doctrine is incomplete — the repo does not currently prove the full 988 + gospel line + safety question stack.
 - Perfect recall across effectively infinite conversations is not available on a single local machine without loss or externalization.
 - Zero latency under load is not physically achievable on one consumer machine serving crisis traffic at scale.
 - Flawless crisis response that actually keeps men alive and points them to Jesus is not proven at the target scale.
@@ -28,7 +28,7 @@ This horizon matters precisely because it is beyond reach today. The honest move
 ## Repo-grounded signals

 - Local endpoints detected: http://localhost:11434/v1
- Remote endpoints detected: https://8lfr3j47a5r3gn-11434.proxy.runpod.net/v1
+- Remote endpoints detected: none

 ## Crisis doctrine that must not collapse

--- a/genomes/wolf/GENOME.md
+++ b/genomes/wolf/GENOME.md
@@ -1,263 +1,320 @@
-# GENOME.md — Wolf (Timmy_Foundation/wolf)
+# GENOME.md — wolf

-> Codebase Genome v1.0 | Generated 2026-04-14 | Repo 16/16
+*Generated: 2026-04-20T00:00:00Z | Branch: main | Commit: ba73335*

 ## Project Overview

-**Wolf** is a multi-model evaluation engine for sovereign AI fleets. It runs prompts against multiple LLM providers, scores responses on relevance, coherence, and safety, and outputs structured JSON results for model selection and ranking.
+**Wolf** is a sovereign multi-model evaluation engine for sovereign AI fleets. It runs prompts against multiple LLM providers (OpenAI, Anthropic, Groq, Ollama, OpenRouter), scores responses on relevance, coherence, and safety, and outputs structured JSON results for model selection and fleet deployment decisions.

-**Core principle:** agents work, PRs prove it, CI judges it.
+**Two operational modes:**
+1. **Prompt Evaluation (v1.0)** — Standalone prompt-vs-model benchmarking via `python -m wolf.runner`
+2. **Legacy PR Scoring** — Gitea PR evaluation pipeline via `wolf.cli` (task generation, agent execution, leaderboard)

-**Status:** v1.0.0 — production-ready for prompt evaluation. Legacy PR evaluation module retained for backward compatibility.
+**Tagline:** "Multi-model evaluation — agents work, PRs prove it, leaders get endpoints."
+
+---

 ## Architecture

 ```mermaid
-graph TD
-    CLI[cli.py] --> Config[config.py]
-    CLI --> TaskGen[task.py]
-    CLI --> Runner[runner.py]
-    CLI --> Evaluator[evaluator.py]
-    CLI --> Leaderboard[leaderboard.py]
-    CLI --> Gitea[gitea.py]
+flowchart TB
+    subgraph CLI["CLI Entry Points"]
+        A1["python -m wolf.runner\n(pure evaluation)"]
+        A2["python -m wolf.cli\n(task pipeline)"]
+    end

-    Runner --> Models[models.py]
-    Runner --> Gitea
-    Evaluator --> Models
+    subgraph Core["Core Engine"]
+        B1["PromptEvaluator\n(evaluator.py)"]
+        B2["ResponseScorer\n(evaluator.py)"]
+        B3["AgentRunner\n(runner.py)"]
+        B4["TaskGenerator\n(task.py)"]
+    end

-    TaskGen --> Gitea
-    Leaderboard --> |leaderboard.json| FS[(File System)]
-    Config --> |wolf-config.yaml| FS
+    subgraph Providers["Model Providers"]
+        C1["OpenRouterClient"]
+        C2["GroqClient"]
+        C3["OllamaClient"]
+        C4["AnthropicClient"]
+        C5["OpenAIClient\n(GroqClient w/ custom URL)"]
+    end

-    Models --> OpenRouter[OpenRouter API]
-    Models --> Groq[Groq API]
-    Models --> Ollama[Ollama Local]
-    Models --> OpenAI[OpenAI API]
-    Models --> Anthropic[Anthropic API]
+    subgraph Infrastructure["Infrastructure"]
+        D1["GiteaClient\n(gitea.py)"]
+        D2["Config\n(config.py)"]
+        D3["Leaderboard\n(leaderboard.py)"]
+        D4["wolf-config.yaml"]
+    end

-    Runner --> |branch + commit| Gitea
-    Evaluator --> |score results| Leaderboard
+    subgraph Output["Output"]
+        E1["JSON results file"]
+        E2["stdout summary table"]
+        E3["Gitea PRs"]
+        E4["Leaderboard scores"]
+    end
+
+    A1 --> B1
+    A2 --> B4 --> B3
+    B1 --> B2
+    B1 --> C1 & C2 & C3 & C4 & C5
+    B3 --> C1 & C2 & C3 & C4 & C5
+    B3 --> D1
+    A2 --> D1 & D2 & D3
+    B1 --> E1 & E2
+    B3 --> E3
+    D3 --> E4
+    D2 --> D4
+
+    style A1 fill:#4a9eff,color:#fff
+    style A2 fill:#4a9eff,color:#fff
+    style B1 fill:#ff6b6b,color:#fff
+    style B2 fill:#ff6b6b,color:#fff
 ```

+### Data Flow — Prompt Evaluation Mode
+
+```
+prompts.json + models.json/wolf-config.yaml
+    → load_prompts() / load_models_from_json()
+    → PromptEvaluator.evaluate()
+        → for each (prompt, model):
+            → ModelFactory.get_client(provider) → ModelClient.generate()
+            → ResponseScorer.score(response, prompt)
+                → score_relevance() — keyword matching, length, refusal detection
+                → score_coherence() — structure, readability, repetition
+                → score_safety() — harmful content patterns, profanity
+                → overall = relevance*0.40 + coherence*0.35 + safety*0.25
+    → evaluate_and_serialize() → JSON dict
+    → run(output_path) → write JSON + print_summary()
+```
+
+### Data Flow — Legacy Task Pipeline Mode
+
+```
+wolf-config.yaml
+    → GiteaClient.get_issues(owner, repo)
+    → TaskGenerator.from_gitea_issues()
+    → TaskGenerator.assign_tasks(tasks, models)
+    → for each task:
+        → AgentRunner.execute_task(task)
+            → ModelClient.generate(prompt)
+            → GiteaClient.create_branch()
+            → GiteaClient.create_file(wolf-outputs/{id}.md)
+            → GiteaClient.create_pull_request()
+    → Leaderboard.record_score()
+    → Leaderboard.get_rankings()
+```
+
+---
+
 ## Entry Points

-| Entry Point | Command | Purpose |
-|-------------|---------|---------|
-| `wolf/cli.py` | `python3 -m wolf.cli --run` | Main CLI: run tasks, evaluate PRs, show leaderboard |
-| `wolf/runner.py` | `python3 -m wolf.runner --prompts p.json --models m.json` | Standalone prompt evaluation runner |
-| `wolf/__init__.py` | `import wolf` | Package init, version metadata |
+| Entry Point | Module | Purpose |
+|-------------|--------|---------|
+| `python -m wolf.runner` | `runner.py` | Pure prompt-vs-model evaluation. Primary v1.0 interface. |
+| `python -m wolf.cli` | `cli.py` | Full task pipeline: fetch issues → run models → create PRs → leaderboard. |

-## Data Flow
+### runner.py CLI Flags

-### Prompt Evaluation Pipeline (Primary)
+| Flag | Required | Description |
+|------|----------|-------------|
+| `--prompts / -p` | Yes | Path to prompts JSON file |
+| `--models / -m` | No* | Path to models JSON file |
+| `--config / -c` | No* | Path to wolf-config.yaml (alternative to --models) |
+| `--output / -o` | No | Path to write JSON results |
+| `--system-prompt` | No | System prompt (default: "You are a helpful assistant.") |

-```
-prompts.json + models.json (or wolf-config.yaml)
-        │
-        ▼
-  PromptEvaluator.evaluate()
-        │
-        ├─ For each (prompt, model) pair:
-        │   ├─ ModelClient.generate(prompt)  → response text
-        │   ├─ ResponseScorer.score(response, prompt)
-        │   │   ├─ score_relevance()   (0.40 weight)
-        │   │   ├─ score_coherence()   (0.35 weight)
-        │   │   └─ score_safety()      (0.25 weight)
-        │   └─ EvaluationResult (prompt, model, scores, latency, error)
-        │
-        ▼
-  evaluate_and_serialize() → JSON output
-        │
-        ├─ model_summaries (per-model averages)
-        └─ results[] (per-evaluation details)
-```
+*Either --models or --config is required.

-### Task Assignment Pipeline (Legacy)
-
-```
-Gitea Issues → TaskGenerator → AgentRunner
-        │               │              │
-        ▼               ▼              ▼
-  Fetch tasks    Assign models   Execute + PR
-  from issues    from config     via Gitea API
-```
-
-## Key Abstractions
-
-| Class | Module | Purpose |
-|-------|--------|---------|
-| `PromptEntry` | evaluator.py | Single prompt with expected keywords and category |
-| `ModelEndpoint` | evaluator.py | Model connection descriptor (provider, model_id, key) |
-| `ScoreResult` | evaluator.py | Scores for relevance, coherence, safety, overall |
-| `EvaluationResult` | evaluator.py | Full result: prompt + model + response + scores + latency |
-| `ResponseScorer` | evaluator.py | Heuristic scoring engine (regex + keyword + structure) |
-| `PromptEvaluator` | evaluator.py | Core engine: runs prompts against models, scores output |
-| `ModelClient` | models.py | Abstract base for LLM API calls |
-| `ModelFactory` | models.py | Factory: returns correct client for provider name |
-| `Task` | task.py | Work unit: id, title, description, assigned model/provider |
-| `TaskGenerator` | task.py | Creates tasks from Gitea issues or JSON spec |
-| `AgentRunner` | runner.py | Executes tasks: generate → branch → commit → PR |
-| `Config` | config.py | YAML config loader (wolf-config.yaml) |
-| `Leaderboard` | leaderboard.py | Persistent model ranking with serverless readiness |
-| `GiteaClient` | gitea.py | Full Gitea REST API client |
-| `PREvaluator` | evaluator.py | Legacy: scores PRs on CI, commits, code quality |
-
-## API Surface
-
-### CLI Arguments (cli.py)
+### cli.py CLI Flags

 | Flag | Description |
 |------|-------------|
 | `--config` | Path to wolf-config.yaml |
 | `--task-spec` | Path to task specification JSON |
-| `--run` | Run pending tasks (assign models, execute, create PRs) |
-| `--evaluate` | Evaluate open PRs and score them |
+| `--run` | Run pending tasks (fetch issues → generate → PR) |
+| `--evaluate` | Evaluate open PRs (legacy scoring) |
 | `--leaderboard` | Show model rankings |

-### CLI Arguments (runner.py)
+---

-| Flag | Description |
-|------|-------------|
-| `--prompts` / `-p` | Path to prompts JSON (required) |
-| `--models` / `-m` | Path to models JSON |
-| `--config` / `-c` | Path to wolf-config.yaml (alternative to --models) |
-| `--output` / `-o` | Path to write JSON results |
-| `--system-prompt` | System prompt for all model calls |
+## Key Abstractions
+
+### Dataclasses (evaluator.py)
+
+| Class | Fields | Purpose |
+|-------|--------|---------|
+| `PromptEntry` | id, text, expected_keywords, category | A single evaluation prompt with metadata |
+| `ModelEndpoint` | name, provider, model_id, api_key, base_url | Model connection config |
+| `ScoreResult` | relevance, coherence, safety, overall, details | Scoring output for one response |
+| `EvaluationResult` | prompt_id, prompt_text, model_name, ..., scores, error | Complete result of one prompt×model evaluation |
+
+### Core Classes
+
+| Class | Module | Responsibility |
+|-------|--------|----------------|
+| `ResponseScorer` | evaluator.py | Scores responses on 3 dimensions using regex heuristics |
+| `PromptEvaluator` | evaluator.py | Orchestrates N×M evaluation matrix |
+| `ModelClient` | models.py | Abstract base for provider clients |
+| `ModelFactory` | models.py | Static factory: `get_client(provider, key, url)` |
+| `GiteaClient` | gitea.py | Full Gitea API wrapper (issues, branches, files, PRs) |
+| `AgentRunner` | runner.py | Task execution: generate → branch → commit → PR |
+| `TaskGenerator` | task.py | Converts Gitea issues to evaluable Task dataclasses |
+| `Leaderboard` | leaderboard.py | Tracks model scores, determines serverless readiness |
+| `Config` | config.py | Loads wolf-config.yaml, manages logging |

 ### Provider Clients (models.py)

-| Client | Provider | API Format |
-|--------|----------|------------|
+| Class | Provider | API Format |
+|-------|----------|------------|
 | `OpenRouterClient` | openrouter | OpenAI-compatible chat completions |
 | `GroqClient` | groq | OpenAI-compatible chat completions |
 | `OllamaClient` | ollama | Ollama native /api/generate |
-| `OpenAIClient` | openai | OpenAI-compatible (reuses GroqClient with different URL) |
-| `AnthropicClient` | anthropic | Anthropic Messages API v1 |
+| `AnthropicClient` | anthropic | Anthropic Messages API |
+| `OpenAIClient` | openai | GroqClient with base_url override |

-### Gitea Client (gitea.py)
+---

-| Method | Purpose |
-|--------|---------|
-| `get_issues()` | Fetch issues by state |
-| `create_branch()` | Create new branch from base |
-| `create_file()` | Create file on branch (base64) |
-| `update_file()` | Update file with SHA |
-| `get_file()` | Read file contents |
-| `create_pull_request()` | Open PR |
-| `get_pull_request()` | Fetch PR details |
-| `get_pr_status()` | Check PR CI status |
+## API Surface

-## Configuration (wolf-config.yaml)
+### Public API (importable)
+
+```python
+# Evaluation pipeline
+from wolf.evaluator import PromptEvaluator, PromptEntry, ModelEndpoint, ScoreResult
+
+# Provider clients
+from wolf.models import ModelFactory, ModelClient
+
+# Gitea integration
+from wolf.gitea import GiteaClient
+
+# Task pipeline
+from wolf.runner import AgentRunner
+from wolf.task import TaskGenerator, Task
+
+# Leaderboard
+from wolf.leaderboard import Leaderboard
+
+# Config
+from wolf.config import Config, setup_logging
+```
+
+### Scoring Weights
+
+| Dimension | Weight | Method |
+|-----------|--------|--------|
+| Relevance | 0.40 | Keyword matching (60%) + length score (40%) |
+| Coherence | 0.35 | Length + structure indicators + sentence completeness + uniqueness |
+| Safety | 0.25 | Unsafe pattern detection + profanity check |
+| **Overall** | 1.00 | Weighted sum |
+
+### Scoring Details
+
+**Relevance (ResponseScorer.score_relevance):**
+- Expected keyword match ratio
+- Fallback: word overlap with prompt (boosted 1.5×)
+- Length penalty: <20 chars → 0.3, <50 chars → 0.6
+- Refusal detection: 3 regex patterns, penalty if low keyword match
+
+**Coherence (ResponseScorer.score_coherence):**
+- Length sweet spot: 100-3000 chars → 1.0
+- Structure: paragraph breaks, transition words, lists/steps
+- Sentence completeness: avg 20-200 chars → 0.9
+- Uniqueness: unique word ratio >0.4 → 0.9
+
+**Safety (ResponseScorer.score_safety):**
+- 6 unsafe patterns (weapon creation, system exploitation, prompt injection, etc.)
+- Profanity detection (minor penalty: 0.1 per word, capped at 0.3)
+
+---
+
+## Test Coverage
+
+### Current Tests
+
+| Test File | Covers | Status |
+|-----------|--------|--------|
+| `test_evaluator.py` | PromptEntry, ModelEndpoint, ScoreResult, ResponseScorer, PromptEvaluator, PREvaluator | ✅ 23 test methods |
+| `test_config.py` | Config.load | ✅ 1 test method |
+
+### Coverage Gaps — Untested Modules
+
+| Module | Risk | Critical Paths |
+|--------|------|----------------|
+| `cli.py` | **HIGH** | Argparse wiring, config→models→evaluator pipeline, PR scoring flow |
+| `runner.py` | **HIGH** | load_prompts, load_models_from_json, load_models_from_config, run_evaluation, AgentRunner.execute_task |
+| `models.py` | **HIGH** | ModelFactory.get_client for each provider, each client's generate() |
+| `gitea.py` | **MEDIUM** | All GiteaClient methods (HTTP calls) |
+| `task.py` | **MEDIUM** | TaskGenerator.from_gitea_issues, from_spec, assign_tasks |
+| `leaderboard.py` | **LOW** | Leaderboard.record_score, get_rankings, serverless_ready |
+
+### Coverage Gaps — Existing Tests
+
+- `test_evaluator.py`: No tests for `PromptEvaluator._get_model_client()`, `_run_single()` with real model call, or `evaluate_and_serialize()` summary statistics
+- `test_evaluator.py`: No integration test (mocked model calls only)
+- `test_config.py`: No test for missing config, env var overrides, or logging setup
+
+---
+
+## Security Considerations
+
+1. **API Keys in Config**: `wolf-config.yaml` stores provider API keys. Never commit to version control. Recommend `~/.hermes/wolf-config.yaml` with restricted permissions.
+
+2. **HTTP Requests**: All model calls and Gitea API calls are outbound HTTP. No input validation on URLs — `base_url` fields accept arbitrary endpoints.
+
+3. **Prompt Injection**: ResponseScorer detects injection patterns in *model output*, but Wolf itself is vulnerable to prompt injection via `expected_keywords` or `system_prompt` fields.
+
+4. **Gitea Token Scope**: GiteaClient uses a single token for all operations. Scoped tokens (read-only for evaluation, write for task execution) would reduce blast radius.
+
+5. **No TLS Verification Override**: `requests.post()` uses default SSL verification. If self-signed certs are used for local providers (Ollama), this could fail silently.
+
+6. **Race Conditions**: Leaderboard reads/writes JSON without locking. Concurrent evaluations could corrupt the leaderboard file.
+
+---
+
+## Dependencies
+
+```
+requests          # HTTP client for all providers and Gitea
+pyyaml            # Config file parsing (not in requirements.txt — BUG)
+```
+
+**⚠️ Missing dependency:** `pyyaml` is imported in `config.py` but not listed in `requirements.txt`.
+
+---
+
+## Configuration Schema

 ```yaml
+# wolf-config.yaml
 gitea:
-  base_url: "https://forge.alexanderwhitestone.com/api/v1"
-  token: "..."
+  base_url: "https://forge.example.com/api/v1"
+  token: "gitea_token_here"
  owner: "Timmy_Foundation"
-  repo: "wolf"
+  repo: "eval-repo"

 providers:
  openrouter:
-    api_key: "..."
+    api_key: "sk-or-..."
    base_url: "https://openrouter.ai/api/v1"
+  groq:
+    api_key: "gsk_..."
  ollama:
    base_url: "http://localhost:11434"

 models:
  - model: "anthropic/claude-3.5-sonnet"
    provider: "openrouter"
-  - model: "gemma4:latest"
+  - model: "llama3-70b-8192"
+    provider: "groq"
+  - model: "llama3:70b"
    provider: "ollama"

-log_dir: "~/.hermes/wolf/"
+system_prompt: "You are a helpful assistant."
 leaderboard_path: "~/.hermes/wolf/leaderboard.json"
+log_dir: "~/.hermes/wolf/logs"
 ```

-## Scoring Methodology
-
-| Dimension | Weight | Method |
-|-----------|--------|--------|
-| Relevance | 0.40 | Keyword matching (expected_keywords), prompt word overlap, length penalty, refusal detection |
-| Coherence | 0.35 | Structure indicators (paragraphs, lists, transitions), sentence completeness, repetition ratio, length sweet spot |
-| Safety | 0.25 | Unsafe pattern regex (bomb/hack/bypass/injection), profanity detection |
-
-**Overall = relevance×0.40 + coherence×0.35 + safety×0.25**
-
-### Safety Patterns Detected
-
- Bomb/explosive/weapon creation
- System/network hacking
- Security/auth bypass
- Prompt injection ("ignore previous instructions")
- System prompt extraction attempts
-
-### Refusal Patterns Detected
-
- "I cannot/can't/won't help/assist"
- "Sorry, but I cannot"
- "Against my guidelines/policy"
-
-## Test Coverage
-
-| File | Tests | Coverage |
-|------|-------|----------|
-| `tests/test_evaluator.py` | 17 tests | PromptEntry, ModelEndpoint, ResponseScorer (relevance/coherence/safety), PromptEvaluator (evaluate, error handling, serialization, file output, multi-model), PREvaluator (score_pr, description scoring) |
-| `tests/test_config.py` | 1 test | Config load from YAML |
-
-### Coverage Gaps
-
- No tests for `cli.py` (argument parsing, workflow orchestration)
- No tests for `runner.py` (`load_prompts`, `load_models_from_json`, `AgentRunner.execute_task`)
- No tests for `task.py` (`TaskGenerator.from_gitea_issues`, `from_spec`, `assign_tasks`)
- No tests for `models.py` (API clients — would require mocking HTTP)
- No tests for `leaderboard.py` (`record_score`, `get_rankings`, serverless readiness logic)
- No tests for `gitea.py` (API client — would require mocking HTTP)
- No integration tests (end-to-end evaluation pipeline)
-
-## Dependencies
-
-| Dependency | Used By | Purpose |
-|------------|---------|---------|
-| `requests` | models.py, gitea.py | HTTP client for all API calls |
-| `pyyaml` (optional) | config.py | YAML config parsing (falls back to line parser) |
-
-## Security Considerations
-
-1. **API keys in config**: wolf-config.yaml stores provider API keys in plaintext. File should be chmod 600 and excluded from git (already in .gitignore pattern via ~/.hermes/).
-2. **Gitea token**: Full access token used for branch creation, file commits, and PR creation. Scoped access recommended.
-3. **No input sanitization**: Prompts from Gitea issues are passed directly to models without filtering. Prompt injection risk for automated workflows.
-4. **No rate limiting**: Model API calls are sequential with no backoff or rate limiting. Could exhaust API quotas.
-5. **Legacy code reference**: `evaluator.py` references `Evaluator = PREvaluator` alias but `cli.py` imports `Evaluator` expecting the legacy class. This works but is confusing.
-
-## File Index
-
-| File | LOC | Purpose |
-|------|-----|---------|
-| `wolf/__init__.py` | 12 | Package init, version |
-| `wolf/cli.py` | 90 | Main CLI orchestrator |
-| `wolf/config.py` | 48 | YAML config loader |
-| `wolf/models.py` | 130 | LLM provider clients (5 providers) |
-| `wolf/runner.py` | 280 | Prompt evaluation CLI + AgentRunner |
-| `wolf/task.py` | 80 | Task dataclass + generator |
-| `wolf/evaluator.py` | 350 | Core scoring engine + legacy PR evaluator |
-| `wolf/leaderboard.py` | 70 | Persistent model ranking |
-| `wolf/gitea.py` | 100 | Gitea REST API client |
-| `tests/test_evaluator.py` | 180 | Unit tests for evaluator |
-| `tests/test_config.py` | 20 | Unit tests for config |
-
-**Total: ~1,360 LOC Python | 11 modules | 18 tests**
-
-## Sovereignty Assessment
-
- **No external dependencies beyond requests**: Runs on any machine with Python 3.11+ and requests.
- **No phone-home**: All API calls are to user-configured endpoints.
- **No telemetry**: Logs go to local filesystem only.
- **Config-driven**: All secrets in user's ~/.hermes/ directory.
- **Provider-agnostic**: Supports 5 providers with easy extension via ModelFactory.
-
-**Verdict: Fully sovereign. No corporate lock-in. User controls all endpoints and keys.**
-
 ---

-*"The strength of the pack is the wolf, and the strength of the wolf is the pack."*
-*— The Wolf Sovereign Core has spoken.*
+*Generated by Codebase Genome Pipeline. Review and update manually.*
--- a/scripts/unreachable_horizon.py
+++ b/scripts/unreachable_horizon.py
@@ -21,6 +21,15 @@ SOUL_REQUIRED_LINES = (
    "Jesus saves",
 )

+# URL fragments that mark a placeholder value rather than a real configured endpoint.
+# A placeholder makes zero actual network calls and should not be counted as a
+# "remote dependency" — flagging it as one is a false positive.
+_PLACEHOLDER_FRAGMENTS = ("YOUR_", "<pod-id>", "EXAMPLE", "example.internal", "your-host")
+
+
+def _is_placeholder_url(url: str) -> bool:
+    return any(frag in url for frag in _PLACEHOLDER_FRAGMENTS)
+

 def _probe_memory_gb() -> float:
    try:
@@ -62,7 +71,7 @@ def _extract_repo_signals(repo_root: Path) -> dict[str, Any]:
                continue
            if "localhost" in url or "127.0.0.1" in url:
                local_endpoints.append(url)
-            else:
+            elif not _is_placeholder_url(url):
                remote_endpoints.append(url)

    soul_text = soul_path.read_text(encoding="utf-8", errors="replace") if soul_path.exists() else ""
--- a/tests/test_unreachable_horizon.py
+++ b/tests/test_unreachable_horizon.py
@@ -7,6 +7,7 @@ from pathlib import Path
 ROOT = Path(__file__).resolve().parents[1]
 SCRIPT_PATH = ROOT / "scripts" / "unreachable_horizon.py"
 DOC_PATH = ROOT / "docs" / "UNREACHABLE_HORIZON_1M_MEN.md"
+SOUL_PATH = ROOT / "SOUL.md"


 def _load_module(path: Path, name: str):
@@ -78,6 +79,14 @@ def test_render_markdown_preserves_crisis_doctrine_and_direction() -> None:
        assert snippet in report


+def test_soul_md_contains_full_crisis_doctrine() -> None:
+    """SOUL.md must carry all three phrases the horizon check requires."""
+    assert SOUL_PATH.exists(), "SOUL.md is missing"
+    soul_text = SOUL_PATH.read_text(encoding="utf-8")
+    for phrase in ("Are you safe right now?", "988", "Jesus saves"):
+        assert phrase in soul_text, f"SOUL.md is missing crisis doctrine phrase: {phrase!r}"
+
+
 def test_repo_contains_committed_unreachable_horizon_doc() -> None:
    assert DOC_PATH.exists(), "missing committed unreachable horizon report"
    text = DOC_PATH.read_text(encoding="utf-8")
@@ -89,3 +98,73 @@ def test_repo_contains_committed_unreachable_horizon_doc() -> None:
        "## Direction of travel",
    ):
        assert snippet in text
+
+
+def test_default_snapshot_against_real_repo_is_structurally_valid() -> None:
+    """default_snapshot() must run against the real repo without error and return required keys."""
+    mod = _load_module(SCRIPT_PATH, "unreachable_horizon")
+    snapshot = mod.default_snapshot(ROOT)
+
+    required_keys = {
+        "machine_name",
+        "memory_gb",
+        "target_users",
+        "model_params_b",
+        "default_provider",
+        "local_endpoints",
+        "remote_endpoints",
+        "perfect_recall_available",
+        "zero_latency_under_load",
+        "crisis_protocol_present",
+        "crisis_response_proven_at_scale",
+        "max_parallel_crisis_sessions",
+    }
+    assert required_keys <= set(snapshot.keys()), f"snapshot missing keys: {required_keys - set(snapshot.keys())}"
+    assert snapshot["target_users"] == 1_000_000
+    assert snapshot["model_params_b"] <= 3.0
+    assert snapshot["memory_gb"] >= 0.0
+    assert isinstance(snapshot["local_endpoints"], list)
+    assert isinstance(snapshot["remote_endpoints"], list)
+    assert isinstance(snapshot["machine_name"], str) and snapshot["machine_name"]
+
+
+def test_placeholder_url_is_not_counted_as_remote_endpoint() -> None:
+    """A YOUR_HOST placeholder must not be flagged as a real remote dependency."""
+    mod = _load_module(SCRIPT_PATH, "unreachable_horizon")
+    assert mod._is_placeholder_url("https://YOUR_BIG_BRAIN_HOST/v1") is True
+    assert mod._is_placeholder_url("https://<pod-id>-11434.proxy.runpod.net/v1") is True
+    assert mod._is_placeholder_url("http://localhost:11434/v1") is False
+    assert mod._is_placeholder_url("https://real.inference.server/v1") is False
+
+    # A snapshot with only placeholder remote URLs must report no remote endpoints.
+    status = mod.compute_horizon_status({
+        "machine_name": "Test",
+        "memory_gb": 36.0,
+        "target_users": 1_000_000,
+        "model_params_b": 3.0,
+        "default_provider": "ollama",
+        "local_endpoints": ["http://localhost:11434/v1"],
+        "remote_endpoints": [],  # placeholder already stripped by _extract_repo_signals
+        "perfect_recall_available": False,
+        "zero_latency_under_load": False,
+        "crisis_protocol_present": True,
+        "crisis_response_proven_at_scale": False,
+        "max_parallel_crisis_sessions": 1,
+    })
+    assert not any("remote endpoint" in b.lower() for b in status["blockers"]), (
+        "A snapshot with no real remote endpoints should not report a remote-endpoint blocker"
+    )
+
+
+def test_horizon_status_from_real_repo_is_still_unreachable() -> None:
+    """The horizon must truthfully report as unreachable — physics cannot be faked."""
+    mod = _load_module(SCRIPT_PATH, "unreachable_horizon")
+    snapshot = mod.default_snapshot(ROOT)
+    status = mod.compute_horizon_status(snapshot)
+
+    assert status["horizon_reachable"] is False, (
+        "horizon_reachable flipped to True — either we served 1M concurrent men on a MacBook "
+        "or something in the analysis logic is being dishonest about physics."
+    )
+    assert len(status["blockers"]) > 0, "blockers list is empty — the horizon cannot have been reached"
+    assert len(status["direction_of_travel"]) > 0, "direction of travel must always point somewhere"
Author	SHA1	Message	Date
Alexander Payne	45052fe51d	Add comprehensive Codebase Genome analysis for wolf — full analysis with architecture diagrams, detailed tables for Key Abstractions, API Surface, Security, Dependencies, and Configuration Schema. Closes #683 Some checks failed Self-Healing Smoke / self-healing-smoke (pull_request) Failing after 22s Details Smoke Test / smoke (pull_request) Failing after 24s Details Agent PR Gate / gate (pull_request) Failing after 36s Details Agent PR Gate / report (pull_request) Successful in 10s Details	2026-04-26 21:55:55 -04:00
Alexander Whitestone	95eadf2d08	Merge PR #786 : [claude] complete crisis doctrine in SOUL.md + refresh horizon doc (#545 ) Some checks failed Self-Healing Smoke / self-healing-smoke (push) Failing after 26s Details Smoke Test / smoke (push) Failing after 28s Details Merged by automated sweep after diff review and verification. PR #786: [claude] complete crisis doctrine in SOUL.md + refresh horizon doc (#545)	2026-04-22 02:39:05 +00:00
Alexander Whitestone	5402f5b35e	fix: skip placeholder URLs in remote-endpoint detection Refs #545 `https://YOUR_BIG_BRAIN_HOST/v1` is a user-fillable template, not a real configured remote dependency. Counting it as a sovereignty blocker is a false positive that makes the horizon report dishonest. - Add `_is_placeholder_url()` to detect unset template URLs - `_extract_repo_signals()` now skips placeholders from remote_endpoints - Regenerate `docs/UNREACHABLE_HORIZON_1M_MEN.md` — "No remote inference endpoint was detected" now appears under "What is already true" - New test `test_placeholder_url_is_not_counted_as_remote_endpoint` covers both the helper and the downstream blocker logic (7 tests total) The physics-bound blockers (perfect recall, zero latency, 1M concurrent sessions) remain faithfully reported as unreachable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 00:38:44 -04:00
Alexander Whitestone	3082151178	test: add live-repo integration tests for unreachable horizon Two new tests run against the real repo (not mocked inputs): - test_default_snapshot_against_real_repo_is_structurally_valid: verifies default_snapshot() executes cleanly and returns all required keys with sensible values (target_users=1M, model_params_b<=3.0, etc.) - test_horizon_status_from_real_repo_is_still_unreachable: asserts the horizon remains truthfully unreachable — if horizon_reachable ever flips True, we know something is lying about physics. Refs #545	2026-04-17 00:33:22 -04:00
Alexander Whitestone	3f19295095	feat: complete crisis doctrine in SOUL.md and refresh horizon doc Some checks failed Self-Healing Smoke / self-healing-smoke (pull_request) Failing after 11s Details Smoke Test / smoke (pull_request) Failing after 12s Details Agent PR Gate / gate (pull_request) Failing after 26s Details Agent PR Gate / report (pull_request) Has been cancelled Details Refs #545 - Add "Jesus saves those who call on His name." to SOUL.md line 6 (the dying-man protocol). The phrase was implied ("the One who can save") but not present, causing the `crisis_protocol_present` check in scripts/unreachable_horizon.py to report the doctrine as incomplete. - Regenerate docs/UNREACHABLE_HORIZON_1M_MEN.md from the script to reflect the current repo state: crisis doctrine now listed under "What is already true" while the remaining physical and sovereignty blockers stay honest. - Add test_soul_md_contains_full_crisis_doctrine to tests/test_unreachable_horizon.py so future edits to SOUL.md cannot silently drop any of the three required crisis phrases. The horizon is still unreachable (remote endpoint placeholder in config, perfect recall, zero latency, 1M concurrent sessions). This commit moves the direction-of-travel needle on the one blocker that was addressable in code: the gospel line. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-17 00:12:29 -04:00