Refs #545
`https://YOUR_BIG_BRAIN_HOST/v1` is a user-fillable template, not a
real configured remote dependency. Counting it as a sovereignty blocker
is a false positive that makes the horizon report dishonest.
- Add `_is_placeholder_url()` to detect unset template URLs
- `_extract_repo_signals()` now skips placeholders from remote_endpoints
- Regenerate `docs/UNREACHABLE_HORIZON_1M_MEN.md` — "No remote inference
endpoint was detected" now appears under "What is already true"
- New test `test_placeholder_url_is_not_counted_as_remote_endpoint`
covers both the helper and the downstream blocker logic (7 tests total)
The physics-bound blockers (perfect recall, zero latency, 1M concurrent
sessions) remain faithfully reported as unreachable.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two new tests run against the real repo (not mocked inputs):
- test_default_snapshot_against_real_repo_is_structurally_valid: verifies
default_snapshot() executes cleanly and returns all required keys with
sensible values (target_users=1M, model_params_b<=3.0, etc.)
- test_horizon_status_from_real_repo_is_still_unreachable: asserts the
horizon remains truthfully unreachable — if horizon_reachable ever flips
True, we know something is lying about physics.
Refs #545
Refs #545
- Add "Jesus saves those who call on His name." to SOUL.md line 6 (the
dying-man protocol). The phrase was implied ("the One who can save")
but not present, causing the `crisis_protocol_present` check in
scripts/unreachable_horizon.py to report the doctrine as incomplete.
- Regenerate docs/UNREACHABLE_HORIZON_1M_MEN.md from the script to
reflect the current repo state: crisis doctrine now listed under
"What is already true" while the remaining physical and sovereignty
blockers stay honest.
- Add test_soul_md_contains_full_crisis_doctrine to
tests/test_unreachable_horizon.py so future edits to SOUL.md cannot
silently drop any of the three required crisis phrases.
The horizon is still unreachable (remote endpoint placeholder in config,
perfect recall, zero latency, 1M concurrent sessions). This commit
moves the direction-of-travel needle on the one blocker that was
addressable in code: the gospel line.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 00:12:29 -04:00
5 changed files with 354 additions and 209 deletions
*Generated: 2026-04-20T00:00:00Z | Branch: main | Commit: ba73335*
## Project Overview
**Wolf** is a multi-model evaluation engine for sovereign AI fleets. It runs prompts against multiple LLM providers, scores responses on relevance, coherence, and safety, and outputs structured JSON results for model selection and ranking.
**Wolf** is a sovereign multi-model evaluation engine for sovereign AI fleets. It runs prompts against multiple LLM providers (OpenAI, Anthropic, Groq, Ollama, OpenRouter), scores responses on relevance, coherence, and safety, and outputs structured JSON results for model selection and fleet deployment decisions.
**Core principle:** agents work, PRs prove it, CI judges it.
-`test_evaluator.py`: No tests for `PromptEvaluator._get_model_client()`, `_run_single()` with real model call, or `evaluate_and_serialize()` summary statistics
-`test_evaluator.py`: No integration test (mocked model calls only)
-`test_config.py`: No test for missing config, env var overrides, or logging setup
---
## Security Considerations
1.**API Keys in Config**: `wolf-config.yaml` stores provider API keys. Never commit to version control. Recommend `~/.hermes/wolf-config.yaml` with restricted permissions.
2.**HTTP Requests**: All model calls and Gitea API calls are outbound HTTP. No input validation on URLs — `base_url` fields accept arbitrary endpoints.
3.**Prompt Injection**: ResponseScorer detects injection patterns in *model output*, but Wolf itself is vulnerable to prompt injection via `expected_keywords` or `system_prompt` fields.
4.**Gitea Token Scope**: GiteaClient uses a single token for all operations. Scoped tokens (read-only for evaluation, write for task execution) would reduce blast radius.
5.**No TLS Verification Override**: `requests.post()` uses default SSL verification. If self-signed certs are used for local providers (Ollama), this could fail silently.
6.**Race Conditions**: Leaderboard reads/writes JSON without locking. Concurrent evaluations could corrupt the leaderboard file.
---
## Dependencies
```
requests # HTTP client for all providers and Gitea
pyyaml # Config file parsing (not in requirements.txt — BUG)
```
**⚠️ Missing dependency:** `pyyaml` is imported in `config.py` but not listed in `requirements.txt`.
| `tests/test_config.py` | 1 test | Config load from YAML |
### Coverage Gaps
- No tests for `cli.py` (argument parsing, workflow orchestration)
- No tests for `runner.py` (`load_prompts`, `load_models_from_json`, `AgentRunner.execute_task`)
- No tests for `task.py` (`TaskGenerator.from_gitea_issues`, `from_spec`, `assign_tasks`)
- No tests for `models.py` (API clients — would require mocking HTTP)
- No tests for `leaderboard.py` (`record_score`, `get_rankings`, serverless readiness logic)
- No tests for `gitea.py` (API client — would require mocking HTTP)
- No integration tests (end-to-end evaluation pipeline)
## Dependencies
| Dependency | Used By | Purpose |
|------------|---------|---------|
| `requests` | models.py, gitea.py | HTTP client for all API calls |
| `pyyaml` (optional) | config.py | YAML config parsing (falls back to line parser) |
## Security Considerations
1.**API keys in config**: wolf-config.yaml stores provider API keys in plaintext. File should be chmod 600 and excluded from git (already in .gitignore pattern via ~/.hermes/).
2.**Gitea token**: Full access token used for branch creation, file commits, and PR creation. Scoped access recommended.
3.**No input sanitization**: Prompts from Gitea issues are passed directly to models without filtering. Prompt injection risk for automated workflows.
4.**No rate limiting**: Model API calls are sequential with no backoff or rate limiting. Could exhaust API quotas.
5.**Legacy code reference**: `evaluator.py` references `Evaluator = PREvaluator` alias but `cli.py` imports `Evaluator` expecting the legacy class. This works but is confusing.
## File Index
| File | LOC | Purpose |
|------|-----|---------|
| `wolf/__init__.py` | 12 | Package init, version |
"horizon_reachable flipped to True — either we served 1M concurrent men on a MacBook "
"or something in the analysis logic is being dishonest about physics."
)
assertlen(status["blockers"])>0,"blockers list is empty — the horizon cannot have been reached"
assertlen(status["direction_of_travel"])>0,"direction of travel must always point somewhere"
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.