Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
8758f4e9d8 |
206
reports/evaluations/2026-04-15-phase-4-sovereignty-audit.md
Normal file
206
reports/evaluations/2026-04-15-phase-4-sovereignty-audit.md
Normal file
@@ -0,0 +1,206 @@
|
||||
# Phase 4 Sovereignty Audit
|
||||
|
||||
Generated: 2026-04-15 00:45:01 EDT
|
||||
Issue: #551
|
||||
Scope: repo-grounded audit of whether `timmy-home` currently proves **[PHASE-4] Sovereignty - Zero Cloud Dependencies**
|
||||
|
||||
## Phase Definition
|
||||
|
||||
Issue #551 defines Phase 4 as:
|
||||
- no API call leaves your infrastructure
|
||||
- no rate limits
|
||||
- no censorship
|
||||
- no shutdown dependency
|
||||
- trigger condition: all Phase-3 buildings operational and all models running locally
|
||||
|
||||
The milestone sentence is explicit:
|
||||
|
||||
> “A model ran locally for the first time. No cloud. No rate limits. No one can turn it off.”
|
||||
|
||||
This audit asks a narrower, truthful question:
|
||||
|
||||
**Does the current `timmy-home` repo prove that the Timmy harness is already in Phase 4?**
|
||||
|
||||
## Current Repo Evidence
|
||||
|
||||
### 1. The repo already contains a local-only cutover diagnosis — and it says the harness is not there yet
|
||||
Primary source:
|
||||
- `specs/2026-03-29-local-only-harness-cutover-plan.md`
|
||||
|
||||
That plan records a live-state audit from 2026-03-29 and names concrete blockers:
|
||||
- active cloud default in `~/.hermes/config.yaml`
|
||||
- cloud fallback entries
|
||||
- enabled cron inheritance risk
|
||||
- legacy remote ops scripts still on the active path
|
||||
- optional Groq offload still present in the Nexus path
|
||||
|
||||
Direct repo-grounded examples from that file:
|
||||
- `model.default: gpt-5.4`
|
||||
- `model.provider: openai-codex`
|
||||
- `model.base_url: https://chatgpt.com/backend-api/codex`
|
||||
- custom provider: Google Gemini
|
||||
- fallback path still pointing to Gemini
|
||||
- active cloud escape path via `groq_worker.py`
|
||||
|
||||
The same cutover plan defines “done” in stricter terms than the issue body and plainly says those conditions were not yet met.
|
||||
|
||||
### 2. The baseline report says sovereignty is still overwhelmingly cloud-backed
|
||||
Primary source:
|
||||
- `reports/production/2026-03-29-local-timmy-baseline.md`
|
||||
|
||||
That report gives the clearest quantitative evidence in this repo:
|
||||
- sovereignty score: `0.7%` local
|
||||
- sessions: `403 total | 3 local | 400 cloud`
|
||||
- estimated cloud cost: `$125.83`
|
||||
|
||||
That is incompatible with any honest claim that Phase 4 has already been reached.
|
||||
|
||||
The same baseline also says:
|
||||
- local mind: alive
|
||||
- local session partner: usable
|
||||
- local Hermes agent: not ready
|
||||
|
||||
So the repo's own truthful baseline says local capability exists, but zero-cloud operational sovereignty does not.
|
||||
|
||||
### 3. The model tracker is built to measure local-vs-cloud reality because the transition is not finished
|
||||
Primary source:
|
||||
- `metrics/model_tracker.py`
|
||||
|
||||
This file tracks:
|
||||
- `local_sessions`
|
||||
- `cloud_sessions`
|
||||
- `local_pct`
|
||||
- `est_cloud_cost`
|
||||
- `est_saved`
|
||||
|
||||
That means the repo is architected to monitor a sovereignty transition, not to assume it is already complete.
|
||||
|
||||
### 4. There is already a proof harness — and its existence implies proof is still needed
|
||||
Primary source:
|
||||
- `scripts/local_timmy_proof_test.py`
|
||||
|
||||
This script explicitly searches for cloud/remote markers including:
|
||||
- `chatgpt.com/backend-api/codex`
|
||||
- `generativelanguage.googleapis.com`
|
||||
- `api.groq.com`
|
||||
- `143.198.27.163`
|
||||
|
||||
It also frames the output question as:
|
||||
- is the active harness already local-only?
|
||||
- why or why not?
|
||||
|
||||
A repo does not add a proof script like this if the zero-cloud cutover is already a settled fact.
|
||||
|
||||
### 5. The local subtree is stronger than the harness, but it is still only the target architecture
|
||||
Primary sources:
|
||||
- `LOCAL_Timmy_REPORT.md`
|
||||
- `timmy-local/README.md`
|
||||
|
||||
`LOCAL_Timmy_REPORT.md` documents real local-first building blocks:
|
||||
- local caching
|
||||
- local Evennia world shell
|
||||
- local ingestion pipeline
|
||||
- prompt warming
|
||||
|
||||
Those are important Phase-4-aligned components.
|
||||
|
||||
But the broader repo still includes evidence of non-sovereign dependencies or remote references, such as:
|
||||
- `scripts/evennia/bootstrap_local_evennia.py` defaulting operator email to `alexpaynex@gmail.com`
|
||||
- `timmy-local/evennia/commands/tools.py` hardcoding `http://143.198.27.163:3000/...`
|
||||
- `uni-wizard/tools/network_tools.py` hardcoding `GITEA_URL = "http://143.198.27.163:3000"`
|
||||
- `uni-wizard/v2/task_router_daemon.py` defaulting `--gitea-url` to that same remote endpoint
|
||||
|
||||
These are not necessarily cloud inference dependencies, but they are still external dependency anchors inconsistent with the spirit of “No cloud. No rate limits. No one can turn it off.”
|
||||
|
||||
## Contradictions and Drift
|
||||
|
||||
### Contradiction A — local architecture exists, but repo evidence says cutover is incomplete
|
||||
- `LOCAL_Timmy_REPORT.md` celebrates local infrastructure delivery.
|
||||
- `reports/production/2026-03-29-local-timmy-baseline.md` still records `400 cloud` sessions and `0.7%` local.
|
||||
|
||||
These are not actually contradictory if read honestly:
|
||||
- the local stack was delivered
|
||||
- the fleet had not yet switched over to it
|
||||
|
||||
### Contradiction B — the local README was overstating current reality
|
||||
Before this PR, `timmy-local/README.md` said the stack:
|
||||
- “Runs entirely on your hardware with no cloud dependencies for core functionality.”
|
||||
|
||||
That sentence was too strong given the rest of the repo evidence:
|
||||
- cloud defaults were still documented in the cutover plan
|
||||
- cloud session volume was still quantified in the baseline report
|
||||
- remote service references still existed across multiple scripts
|
||||
|
||||
This PR fixes that wording so the README describes `timmy-local` as the destination shape, not proof that the whole harness is already sovereign.
|
||||
|
||||
### Contradiction C — Phase 4 wants zero cloud dependencies, but the repo still documents explicit cloud-era markers
|
||||
The repo itself still names or scans for:
|
||||
- `openai-codex`
|
||||
- `chatgpt.com/backend-api/codex`
|
||||
- `generativelanguage.googleapis.com`
|
||||
- `api.groq.com`
|
||||
- `GROQ_API_KEY`
|
||||
|
||||
That does not mean the system can never become sovereign. It does mean the repo currently documents an unfinished migration boundary.
|
||||
|
||||
## Verdict
|
||||
|
||||
**Phase 4 is not yet reached.**
|
||||
|
||||
Why:
|
||||
1. the repo's own baseline report still shows `403 total | 3 local | 400 cloud`
|
||||
2. the repo's cutover plan still lists active cloud defaults and fallback paths as unresolved work
|
||||
3. proof/guard scripts exist specifically to detect unresolved cloud and remote dependency markers
|
||||
4. multiple runtime/ops files still point at external services such as `143.198.27.163`, `alexpaynex@gmail.com`, and Groq/OpenAI/Gemini-era paths
|
||||
|
||||
The truthful repo-grounded statement is:
|
||||
- **local-first infrastructure exists**
|
||||
- **zero-cloud sovereignty is the target**
|
||||
- **the migration was not yet complete at the time this repo evidence was written**
|
||||
|
||||
## Highest-Leverage Next Actions
|
||||
|
||||
1. **Eliminate cloud defaults and hidden fallbacks first**
|
||||
- follow `specs/2026-03-29-local-only-harness-cutover-plan.md`
|
||||
- remove `openai-codex`, Gemini fallback, and any active cloud default path
|
||||
|
||||
2. **Kill cron inheritance bugs**
|
||||
- no enabled cron should run with null model/provider if cloud defaults still exist anywhere
|
||||
|
||||
3. **Quarantine remote-ops scripts and hardcoded remote endpoints**
|
||||
- `143.198.27.163` still appears in active repo scripts and command surfaces
|
||||
- move legacy remote ops into quarantine or replace with local truth surfaces
|
||||
|
||||
4. **Run and preserve proof artifacts, not just intentions**
|
||||
- the repo already has `scripts/local_timmy_proof_test.py`
|
||||
- use it as the phase-gate proof generator
|
||||
|
||||
5. **Use the sovereignty scoreboard as a real gate**
|
||||
- Phase 4 should not be declared complete while reports still show materially nonzero cloud sessions as the operating norm
|
||||
|
||||
## Definition of Done
|
||||
|
||||
Issue #551 should only be considered truly complete when the repo can point to evidence that all of the following are true:
|
||||
|
||||
1. no active model default points to a remote inference API
|
||||
2. no fallback path silently escapes to cloud inference
|
||||
3. no enabled cron can inherit a remote model/provider
|
||||
4. active runtime paths no longer depend on Groq/OpenAI/Gemini-era inference markers
|
||||
5. operator-critical services do not depend on external platforms like Gmail
|
||||
6. remote hardcoded ops endpoints such as `143.198.27.163` are removed from the active Timmy path or clearly quarantined
|
||||
7. the local proof script passes end-to-end
|
||||
8. the sovereignty scoreboard shows cloud usage reduced to the point that “Zero Cloud Dependencies” is a truthful operational statement, not just an architectural aspiration
|
||||
|
||||
## Recommendation for This PR
|
||||
|
||||
This PR should **advance** Phase 4 by making the repo's public local-first docs honest and by recording a clear audit of why the milestone remains open.
|
||||
|
||||
That means the right PR reference style is:
|
||||
- `Refs #551`
|
||||
|
||||
not:
|
||||
- `Closes #551`
|
||||
|
||||
because the evidence in this repo shows the milestone is still in progress.
|
||||
|
||||
*Sovereignty and service always.*
|
||||
46
tests/docs/test_phase4_sovereignty_audit.py
Normal file
46
tests/docs/test_phase4_sovereignty_audit.py
Normal file
@@ -0,0 +1,46 @@
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
REPORT = Path("reports/evaluations/2026-04-15-phase-4-sovereignty-audit.md")
|
||||
README = Path("timmy-local/README.md")
|
||||
|
||||
|
||||
def _report() -> str:
|
||||
return REPORT.read_text()
|
||||
|
||||
|
||||
def _readme() -> str:
|
||||
return README.read_text()
|
||||
|
||||
|
||||
def test_phase4_audit_report_exists() -> None:
|
||||
assert REPORT.exists()
|
||||
|
||||
|
||||
def test_phase4_audit_report_has_required_sections() -> None:
|
||||
content = _report()
|
||||
assert "# Phase 4 Sovereignty Audit" in content
|
||||
assert "## Phase Definition" in content
|
||||
assert "## Current Repo Evidence" in content
|
||||
assert "## Contradictions and Drift" in content
|
||||
assert "## Verdict" in content
|
||||
assert "## Highest-Leverage Next Actions" in content
|
||||
assert "## Definition of Done" in content
|
||||
|
||||
|
||||
def test_phase4_audit_captures_key_repo_findings() -> None:
|
||||
content = _report()
|
||||
assert "#551" in content
|
||||
assert "0.7%" in content
|
||||
assert "400 cloud" in content
|
||||
assert "openai-codex" in content
|
||||
assert "GROQ_API_KEY" in content
|
||||
assert "143.198.27.163" in content
|
||||
assert "not yet reached" in content.lower()
|
||||
|
||||
|
||||
def test_timmy_local_readme_is_honest_about_phase4_status() -> None:
|
||||
content = _readme()
|
||||
assert "Phase 4" in content
|
||||
assert "zero-cloud sovereignty is not yet complete" in content
|
||||
assert "no cloud dependencies for core functionality" not in content
|
||||
@@ -1,6 +1,6 @@
|
||||
# Timmy Local — Sovereign AI Infrastructure
|
||||
|
||||
Local infrastructure for Timmy's sovereign AI operation. Runs entirely on your hardware with no cloud dependencies for core functionality.
|
||||
Local infrastructure for Timmy's sovereign AI operation. This subtree is the local-first target architecture, but **Phase 4 zero-cloud sovereignty is not yet complete** across the wider Timmy harness.
|
||||
|
||||
## Quick Start
|
||||
|
||||
@@ -176,7 +176,7 @@ gitea:
|
||||
└────────┘ └────────┘ └────────┘
|
||||
```
|
||||
|
||||
Local Timmy operates sovereignly. Cloud backends provide additional capacity but Timmy survives without them.
|
||||
Local Timmy is the sovereign target architecture for the fleet. The wider harness still contains cloud-era defaults, remote service references, and cutover work tracked under Phase 4, so this repo should be read as the destination shape rather than proof that zero-cloud sovereignty is already complete.
|
||||
|
||||
## Performance Targets
|
||||
|
||||
|
||||
Reference in New Issue
Block a user