docs: add phase-4 sovereignty audit (#551 )

2026-04-15 00:48:26 -04:00
3 changed files with 254 additions and 2 deletions
--- a/reports/evaluations/2026-04-15-phase-4-sovereignty-audit.md
+++ b/reports/evaluations/2026-04-15-phase-4-sovereignty-audit.md
@@ -0,0 +1,206 @@
+# Phase 4 Sovereignty Audit
+
+Generated: 2026-04-15 00:45:01 EDT  
+Issue: #551  
+Scope: repo-grounded audit of whether `timmy-home` currently proves **[PHASE-4] Sovereignty - Zero Cloud Dependencies**
+
+## Phase Definition
+
+Issue #551 defines Phase 4 as:
+- no API call leaves your infrastructure
+- no rate limits
+- no censorship
+- no shutdown dependency
+- trigger condition: all Phase-3 buildings operational and all models running locally
+
+The milestone sentence is explicit:
+
+> “A model ran locally for the first time. No cloud. No rate limits. No one can turn it off.”
+
+This audit asks a narrower, truthful question:
+
+**Does the current `timmy-home` repo prove that the Timmy harness is already in Phase 4?**
+
+## Current Repo Evidence
+
+### 1. The repo already contains a local-only cutover diagnosis — and it says the harness is not there yet
+Primary source:
+- `specs/2026-03-29-local-only-harness-cutover-plan.md`
+
+That plan records a live-state audit from 2026-03-29 and names concrete blockers:
+- active cloud default in `~/.hermes/config.yaml`
+- cloud fallback entries
+- enabled cron inheritance risk
+- legacy remote ops scripts still on the active path
+- optional Groq offload still present in the Nexus path
+
+Direct repo-grounded examples from that file:
+- `model.default: gpt-5.4`
+- `model.provider: openai-codex`
+- `model.base_url: https://chatgpt.com/backend-api/codex`
+- custom provider: Google Gemini
+- fallback path still pointing to Gemini
+- active cloud escape path via `groq_worker.py`
+
+The same cutover plan defines “done” in stricter terms than the issue body and plainly says those conditions were not yet met.
+
+### 2. The baseline report says sovereignty is still overwhelmingly cloud-backed
+Primary source:
+- `reports/production/2026-03-29-local-timmy-baseline.md`
+
+That report gives the clearest quantitative evidence in this repo:
+- sovereignty score: `0.7%` local
+- sessions: `403 total | 3 local | 400 cloud`
+- estimated cloud cost: `$125.83`
+
+That is incompatible with any honest claim that Phase 4 has already been reached.
+
+The same baseline also says:
+- local mind: alive
+- local session partner: usable
+- local Hermes agent: not ready
+
+So the repo's own truthful baseline says local capability exists, but zero-cloud operational sovereignty does not.
+
+### 3. The model tracker is built to measure local-vs-cloud reality because the transition is not finished
+Primary source:
+- `metrics/model_tracker.py`
+
+This file tracks:
+- `local_sessions`
+- `cloud_sessions`
+- `local_pct`
+- `est_cloud_cost`
+- `est_saved`
+
+That means the repo is architected to monitor a sovereignty transition, not to assume it is already complete.
+
+### 4. There is already a proof harness — and its existence implies proof is still needed
+Primary source:
+- `scripts/local_timmy_proof_test.py`
+
+This script explicitly searches for cloud/remote markers including:
+- `chatgpt.com/backend-api/codex`
+- `generativelanguage.googleapis.com`
+- `api.groq.com`
+- `143.198.27.163`
+
+It also frames the output question as:
+- is the active harness already local-only?
+- why or why not?
+
+A repo does not add a proof script like this if the zero-cloud cutover is already a settled fact.
+
+### 5. The local subtree is stronger than the harness, but it is still only the target architecture
+Primary sources:
+- `LOCAL_Timmy_REPORT.md`
+- `timmy-local/README.md`
+
+`LOCAL_Timmy_REPORT.md` documents real local-first building blocks:
+- local caching
+- local Evennia world shell
+- local ingestion pipeline
+- prompt warming
+
+Those are important Phase-4-aligned components.
+
+But the broader repo still includes evidence of non-sovereign dependencies or remote references, such as:
+- `scripts/evennia/bootstrap_local_evennia.py` defaulting operator email to `alexpaynex@gmail.com`
+- `timmy-local/evennia/commands/tools.py` hardcoding `http://143.198.27.163:3000/...`
+- `uni-wizard/tools/network_tools.py` hardcoding `GITEA_URL = "http://143.198.27.163:3000"`
+- `uni-wizard/v2/task_router_daemon.py` defaulting `--gitea-url` to that same remote endpoint
+
+These are not necessarily cloud inference dependencies, but they are still external dependency anchors inconsistent with the spirit of “No cloud. No rate limits. No one can turn it off.”
+
+## Contradictions and Drift
+
+### Contradiction A — local architecture exists, but repo evidence says cutover is incomplete
+- `LOCAL_Timmy_REPORT.md` celebrates local infrastructure delivery.
+- `reports/production/2026-03-29-local-timmy-baseline.md` still records `400 cloud` sessions and `0.7%` local.
+
+These are not actually contradictory if read honestly:
+- the local stack was delivered
+- the fleet had not yet switched over to it
+
+### Contradiction B — the local README was overstating current reality
+Before this PR, `timmy-local/README.md` said the stack:
+- “Runs entirely on your hardware with no cloud dependencies for core functionality.”
+
+That sentence was too strong given the rest of the repo evidence:
+- cloud defaults were still documented in the cutover plan
+- cloud session volume was still quantified in the baseline report
+- remote service references still existed across multiple scripts
+
+This PR fixes that wording so the README describes `timmy-local` as the destination shape, not proof that the whole harness is already sovereign.
+
+### Contradiction C — Phase 4 wants zero cloud dependencies, but the repo still documents explicit cloud-era markers
+The repo itself still names or scans for:
+- `openai-codex`
+- `chatgpt.com/backend-api/codex`
+- `generativelanguage.googleapis.com`
+- `api.groq.com`
+- `GROQ_API_KEY`
+
+That does not mean the system can never become sovereign. It does mean the repo currently documents an unfinished migration boundary.
+
+## Verdict
+
+**Phase 4 is not yet reached.**
+
+Why:
+1. the repo's own baseline report still shows `403 total | 3 local | 400 cloud`
+2. the repo's cutover plan still lists active cloud defaults and fallback paths as unresolved work
+3. proof/guard scripts exist specifically to detect unresolved cloud and remote dependency markers
+4. multiple runtime/ops files still point at external services such as `143.198.27.163`, `alexpaynex@gmail.com`, and Groq/OpenAI/Gemini-era paths
+
+The truthful repo-grounded statement is:
+- **local-first infrastructure exists**
+- **zero-cloud sovereignty is the target**
+- **the migration was not yet complete at the time this repo evidence was written**
+
+## Highest-Leverage Next Actions
+
+1. **Eliminate cloud defaults and hidden fallbacks first**
+   - follow `specs/2026-03-29-local-only-harness-cutover-plan.md`
+   - remove `openai-codex`, Gemini fallback, and any active cloud default path
+
+2. **Kill cron inheritance bugs**
+   - no enabled cron should run with null model/provider if cloud defaults still exist anywhere
+
+3. **Quarantine remote-ops scripts and hardcoded remote endpoints**
+   - `143.198.27.163` still appears in active repo scripts and command surfaces
+   - move legacy remote ops into quarantine or replace with local truth surfaces
+
+4. **Run and preserve proof artifacts, not just intentions**
+   - the repo already has `scripts/local_timmy_proof_test.py`
+   - use it as the phase-gate proof generator
+
+5. **Use the sovereignty scoreboard as a real gate**
+   - Phase 4 should not be declared complete while reports still show materially nonzero cloud sessions as the operating norm
+
+## Definition of Done
+
+Issue #551 should only be considered truly complete when the repo can point to evidence that all of the following are true:
+
+1. no active model default points to a remote inference API
+2. no fallback path silently escapes to cloud inference
+3. no enabled cron can inherit a remote model/provider
+4. active runtime paths no longer depend on Groq/OpenAI/Gemini-era inference markers
+5. operator-critical services do not depend on external platforms like Gmail
+6. remote hardcoded ops endpoints such as `143.198.27.163` are removed from the active Timmy path or clearly quarantined
+7. the local proof script passes end-to-end
+8. the sovereignty scoreboard shows cloud usage reduced to the point that “Zero Cloud Dependencies” is a truthful operational statement, not just an architectural aspiration
+
+## Recommendation for This PR
+
+This PR should **advance** Phase 4 by making the repo's public local-first docs honest and by recording a clear audit of why the milestone remains open.
+
+That means the right PR reference style is:
+- `Refs #551`
+
+not:
+- `Closes #551`
+
+because the evidence in this repo shows the milestone is still in progress.
+
+*Sovereignty and service always.*
--- a/tests/docs/test_phase4_sovereignty_audit.py
+++ b/tests/docs/test_phase4_sovereignty_audit.py
@@ -0,0 +1,46 @@
+from pathlib import Path
+
+
+REPORT = Path("reports/evaluations/2026-04-15-phase-4-sovereignty-audit.md")
+README = Path("timmy-local/README.md")
+
+
+def _report() -> str:
+    return REPORT.read_text()
+
+
+def _readme() -> str:
+    return README.read_text()
+
+
+def test_phase4_audit_report_exists() -> None:
+    assert REPORT.exists()
+
+
+def test_phase4_audit_report_has_required_sections() -> None:
+    content = _report()
+    assert "# Phase 4 Sovereignty Audit" in content
+    assert "## Phase Definition" in content
+    assert "## Current Repo Evidence" in content
+    assert "## Contradictions and Drift" in content
+    assert "## Verdict" in content
+    assert "## Highest-Leverage Next Actions" in content
+    assert "## Definition of Done" in content
+
+
+def test_phase4_audit_captures_key_repo_findings() -> None:
+    content = _report()
+    assert "#551" in content
+    assert "0.7%" in content
+    assert "400 cloud" in content
+    assert "openai-codex" in content
+    assert "GROQ_API_KEY" in content
+    assert "143.198.27.163" in content
+    assert "not yet reached" in content.lower()
+
+
+def test_timmy_local_readme_is_honest_about_phase4_status() -> None:
+    content = _readme()
+    assert "Phase 4" in content
+    assert "zero-cloud sovereignty is not yet complete" in content
+    assert "no cloud dependencies for core functionality" not in content
--- a/timmy-local/README.md
+++ b/timmy-local/README.md
@@ -1,6 +1,6 @@
 # Timmy Local — Sovereign AI Infrastructure

-Local infrastructure for Timmy's sovereign AI operation. Runs entirely on your hardware with no cloud dependencies for core functionality.
+Local infrastructure for Timmy's sovereign AI operation. This subtree is the local-first target architecture, but **Phase 4 zero-cloud sovereignty is not yet complete** across the wider Timmy harness.

 ## Quick Start

@@ -176,7 +176,7 @@ gitea:
         └────────┘  └────────┘  └────────┘
 ```

-Local Timmy operates sovereignly. Cloud backends provide additional capacity but Timmy survives without them.
+Local Timmy is the sovereign target architecture for the fleet. The wider harness still contains cloud-era defaults, remote service references, and cutover work tracked under Phase 4, so this repo should be read as the destination shape rather than proof that zero-cloud sovereignty is already complete.

 ## Performance Targets