[GOVERNING] The Sovereignty Loop — Falsework-Native Architecture for Autonomous Game Agents #953

Closed
opened 2026-03-22 18:23:01 +00:00 by perplexity · 6 comments
Collaborator

Classification

This is a governing architecture document, not a study. It establishes the primary engineering constraint for all Timmy Time development: every task must increase sovereignty as a default deliverable. Not as a future goal. Not as an optimization pass. As a constraint on every commit, every function, every inference call.

PDF attached — The Sovereignty Loop (11 pages, March 2026, Alexander Whitestone / Rockachopa)

"The measure of progress is not features added. It is model calls eliminated."

The Core Principle

The Sovereignty Loop: Discover with an expensive model. Compress the discovery into a cheap local rule. Replace the model with the rule. Measure the cost reduction. Repeat.

Every call to an LLM, VLM, or external API passes through three phases:

  1. Discovery — Model sees something for the first time (expensive, unavoidable, produces new knowledge)
  2. Crystallization — Discovery compressed into durable cheap artifact (requires explicit engineering)
  3. Replacement — Crystallized artifact replaces the model call (near-zero cost)

Code review requirement: If a function calls a model without a crystallization step, it fails code review. No exceptions. The pattern is always: check cache → miss → infer → crystallize → return.

The Sovereignty Loop Applied to Every Layer

Perception: See Once, Template Forever

  • First encounter: VLM analyzes screenshot (3-6 sec) → structured JSON
  • Crystallized as: OpenCV template + bounding box → templates.json (3 ms retrieval)
  • crystallize_perception() function wraps every VLM response
  • Target: 90% of perception cycles without VLM by hour 1, 99% by hour 4

Decision: Reason Once, Rule Forever

  • First encounter: LLM reasons through decision (1-5 sec)
  • Crystallized as: if/else rules, waypoints, cached preferences → rules.py, nav_graph.db (<1 ms)
  • Uses Voyager pattern: named skills with embeddings, success rates, conditions
  • Skill match >0.8 confidence + >0.6 success rate → executes without LLM
  • Target: 70-80% of decisions without LLM by week 4

Narration: Script the Predictable, Improvise the Novel

  • Predictable moments → template with variable slots, voiced by Kokoro locally
  • LLM narrates only genuinely surprising events (quest twist, death, discovery)
  • Target: 60-70% templatized within a week

Navigation: Walk Once, Map Forever

  • Every path recorded as waypoint sequence with terrain annotations
  • First journey = full perception + planning; subsequent = graph traversal
  • Builds complete nav graph without external map data

API Costs: Every Dollar Spent Must Reduce Future Dollars

Week Groq Calls/Hr Local Decisions/Hr Sovereignty % Cost/Hr
1 ~720 ~80 10% $0.40
2 ~400 ~400 50% $0.22
4 ~160 ~640 80% $0.09
8 ~40 ~760 95% $0.02
Target <20 >780 >97% <$0.01

The Sovereignty Scorecard (5 Metrics)

Every work session ends with a sovereignty audit. Every PR includes a sovereignty delta. Not optional.

Metric What It Measures Target
Perception Sovereignty % Frames understood without VLM >90% by hour 4
Decision Sovereignty % Actions chosen without LLM >80% by week 4
Narration Sovereignty % Lines from templates vs LLM >60% by week 2
API Cost Trend Dollar cost per hour of gameplay Monotonically decreasing
Skill Library Growth Crystallized skills per session >5 new skills/session

Dashboard widget on alexanderwhitestone.com shows these in real-time during streams. HTMX component via WebSocket.

The Crystallization Protocol

Every model output gets crystallized:

Model Output Crystallized As Storage Retrieval Cost
VLM: UI element OpenCV template + bbox templates.json 3 ms
VLM: text OCR region coords regions.json 50 ms
LLM: nav plan Waypoint sequence nav_graph.db <1 ms
LLM: combat decision If/else rule on state rules.py <1 ms
LLM: quest interpretation Structured entry quests.db <1 ms
LLM: NPC disposition Name→attitude map npcs.db <1 ms
LLM: narration Template with slots narration.json <1 ms
API: moderation Approved phrase cache approved.set <1 ms
Groq: strategic plan Extracted decision rules strategy.json <1 ms

Skill document format: markdown + YAML frontmatter following agentskills.io standard (name, game, type, success_rate, times_used, sovereignty_value).

The Automation Imperative & Three-Strike Rule

Applies to developer workflow too, not just the agent. If you do the same thing manually three times, you stop and write the automation before proceeding.

Falsework Checklist (before any cloud API call):

  1. What durable artifact will this call produce?
  2. Where will the artifact be stored locally?
  3. What local rule or cache will this populate?
  4. After this call, will I need to make it again?
  5. If yes, what would eliminate the repeat?
  6. What is the sovereignty delta of this call?

The Graduation Test (Falsework Removal Criteria)

All five conditions met simultaneously in a single 24-hour period:

Test Condition Measurement
Perception Independence 1 hour, no VLM calls after minute 15 VLM calls in last 45 min = 0
Decision Independence Full session with <5 API calls total Groq/cloud calls < 5
Narration Independence All narration from local templates + local LLM Zero cloud TTS/narration calls
Economic Independence Earns more sats than spends on inference sats_earned > sats_spent
Operational Independence 24 hours unattended, no human intervention Uptime > 23.5 hrs

"The arch must hold after the falsework is removed."

Implementation Priority (~2 weeks total)

See child issues below, ordered P0 → P1 → P2.

Cross-references

  • #904 (Autoresearch) — autoresearch loop is the automated form of the sovereignty loop; experiments should optimize sovereignty metrics
  • #903 (State-of-Art Survey) — tool selections (vllm-mlx, Kokoro, ComfyUI) should be evaluated through sovereignty lens
  • #872 (Heartbeat v2) — the gather/reason/act loop must include crystallization step
  • #871 (WorldInterface) — engine-agnostic adapter must support sovereignty cache per game
  • #882 (Model Tiering Router) — the cascade config is a crystallization target
  • #873 (Three-Tier Memory) — Vault tier stores crystallized skills long-term
  • #879 (AlexanderWhitestone.com Dashboard) — sovereignty dashboard widget goes here
  • #853 (Piper TTS) / #903 (Kokoro) — local voice for narration sovereignty
  • #855 (SQLite Command Log) — sovereignty metrics stored here
  • #851 (Autonomous Agent Economy) — graduation test's economic independence
## Classification **This is a governing architecture document, not a study.** It establishes the primary engineering constraint for all Timmy Time development: every task must increase sovereignty as a default deliverable. Not as a future goal. Not as an optimization pass. As a constraint on every commit, every function, every inference call. **PDF attached** — The Sovereignty Loop (11 pages, March 2026, Alexander Whitestone / Rockachopa) > "The measure of progress is not features added. It is model calls eliminated." ## The Core Principle > **The Sovereignty Loop**: Discover with an expensive model. Compress the discovery into a cheap local rule. Replace the model with the rule. Measure the cost reduction. Repeat. Every call to an LLM, VLM, or external API passes through three phases: 1. **Discovery** — Model sees something for the first time (expensive, unavoidable, produces new knowledge) 2. **Crystallization** — Discovery compressed into durable cheap artifact (requires explicit engineering) 3. **Replacement** — Crystallized artifact replaces the model call (near-zero cost) **Code review requirement**: If a function calls a model without a crystallization step, it fails code review. No exceptions. The pattern is always: check cache → miss → infer → crystallize → return. ## The Sovereignty Loop Applied to Every Layer ### Perception: See Once, Template Forever - First encounter: VLM analyzes screenshot (3-6 sec) → structured JSON - Crystallized as: OpenCV template + bounding box → `templates.json` (3 ms retrieval) - `crystallize_perception()` function wraps every VLM response - **Target**: 90% of perception cycles without VLM by hour 1, 99% by hour 4 ### Decision: Reason Once, Rule Forever - First encounter: LLM reasons through decision (1-5 sec) - Crystallized as: if/else rules, waypoints, cached preferences → `rules.py`, `nav_graph.db` (<1 ms) - Uses Voyager pattern: named skills with embeddings, success rates, conditions - Skill match >0.8 confidence + >0.6 success rate → executes without LLM - **Target**: 70-80% of decisions without LLM by week 4 ### Narration: Script the Predictable, Improvise the Novel - Predictable moments → template with variable slots, voiced by Kokoro locally - LLM narrates only genuinely surprising events (quest twist, death, discovery) - **Target**: 60-70% templatized within a week ### Navigation: Walk Once, Map Forever - Every path recorded as waypoint sequence with terrain annotations - First journey = full perception + planning; subsequent = graph traversal - Builds complete nav graph without external map data ### API Costs: Every Dollar Spent Must Reduce Future Dollars | Week | Groq Calls/Hr | Local Decisions/Hr | Sovereignty % | Cost/Hr | |---|---|---|---|---| | 1 | ~720 | ~80 | 10% | $0.40 | | 2 | ~400 | ~400 | 50% | $0.22 | | 4 | ~160 | ~640 | 80% | $0.09 | | 8 | ~40 | ~760 | 95% | $0.02 | | Target | <20 | >780 | >97% | <$0.01 | ## The Sovereignty Scorecard (5 Metrics) Every work session ends with a sovereignty audit. Every PR includes a sovereignty delta. Not optional. | Metric | What It Measures | Target | |---|---|---| | Perception Sovereignty % | Frames understood without VLM | >90% by hour 4 | | Decision Sovereignty % | Actions chosen without LLM | >80% by week 4 | | Narration Sovereignty % | Lines from templates vs LLM | >60% by week 2 | | API Cost Trend | Dollar cost per hour of gameplay | Monotonically decreasing | | Skill Library Growth | Crystallized skills per session | >5 new skills/session | Dashboard widget on alexanderwhitestone.com shows these in real-time during streams. HTMX component via WebSocket. ## The Crystallization Protocol Every model output gets crystallized: | Model Output | Crystallized As | Storage | Retrieval Cost | |---|---|---|---| | VLM: UI element | OpenCV template + bbox | templates.json | 3 ms | | VLM: text | OCR region coords | regions.json | 50 ms | | LLM: nav plan | Waypoint sequence | nav_graph.db | <1 ms | | LLM: combat decision | If/else rule on state | rules.py | <1 ms | | LLM: quest interpretation | Structured entry | quests.db | <1 ms | | LLM: NPC disposition | Name→attitude map | npcs.db | <1 ms | | LLM: narration | Template with slots | narration.json | <1 ms | | API: moderation | Approved phrase cache | approved.set | <1 ms | | Groq: strategic plan | Extracted decision rules | strategy.json | <1 ms | Skill document format: markdown + YAML frontmatter following agentskills.io standard (name, game, type, success_rate, times_used, sovereignty_value). ## The Automation Imperative & Three-Strike Rule Applies to developer workflow too, not just the agent. If you do the same thing manually three times, you stop and write the automation before proceeding. **Falsework Checklist** (before any cloud API call): 1. What durable artifact will this call produce? 2. Where will the artifact be stored locally? 3. What local rule or cache will this populate? 4. After this call, will I need to make it again? 5. If yes, what would eliminate the repeat? 6. What is the sovereignty delta of this call? ## The Graduation Test (Falsework Removal Criteria) All five conditions met simultaneously in a single 24-hour period: | Test | Condition | Measurement | |---|---|---| | Perception Independence | 1 hour, no VLM calls after minute 15 | VLM calls in last 45 min = 0 | | Decision Independence | Full session with <5 API calls total | Groq/cloud calls < 5 | | Narration Independence | All narration from local templates + local LLM | Zero cloud TTS/narration calls | | Economic Independence | Earns more sats than spends on inference | sats_earned > sats_spent | | Operational Independence | 24 hours unattended, no human intervention | Uptime > 23.5 hrs | > "The arch must hold after the falsework is removed." ## Implementation Priority (~2 weeks total) See child issues below, ordered P0 → P1 → P2. ## Cross-references - #904 (Autoresearch) — autoresearch loop is the automated form of the sovereignty loop; experiments should optimize sovereignty metrics - #903 (State-of-Art Survey) — tool selections (vllm-mlx, Kokoro, ComfyUI) should be evaluated through sovereignty lens - #872 (Heartbeat v2) — the gather/reason/act loop must include crystallization step - #871 (WorldInterface) — engine-agnostic adapter must support sovereignty cache per game - #882 (Model Tiering Router) — the cascade config is a crystallization target - #873 (Three-Tier Memory) — Vault tier stores crystallized skills long-term - #879 (AlexanderWhitestone.com Dashboard) — sovereignty dashboard widget goes here - #853 (Piper TTS) / #903 (Kokoro) — local voice for narration sovereignty - #855 (SQLite Command Log) — sovereignty metrics stored here - #851 (Autonomous Agent Economy) — graduation test's economic independence
Author
Collaborator

📎 PDF attached above — 11-page governing architecture document with code examples, crystallization tables, cost projections, and graduation criteria.

Child Issues — Implementation Priority (~2 weeks total)

P0 — Build First (6 days)

  • #954: Metrics Emitter + SQLite Metrics Store (1 day)
  • #955: PerceptionCache — Template Matching for VLM Replacement (2 days)
  • #956: Skill Library — Embedding Retrieval for LLM Replacement (2 days)
  • #957: Session Sovereignty Report Generator (1 day)

P1 — Build Next (4 days)

  • #958: Narration Template System with Variable Slots (1 day)
  • #959: Navigation Graph Recorder + Retriever (2 days)
  • #960: Sovereignty Dashboard Widget — HTMX + WebSocket (1 day)

P2 — Build After (4 days)

  • #961: Auto-Crystallizer for Groq Reasoning Chains (3 days)
  • #962: Three-Strike Detector for Repeated Manual Work (1 day)

Relationship to Other Epics

This document governs all other epics. Specifically:

  • #904 (Autoresearch) — The autoresearch loop should optimize sovereignty metrics. Experiment success = higher sovereignty %. The sovereignty scorecard's five metrics become the autoresearch benchmark targets.
  • #817 (Project Morrowind) — Every Morrowind component (perception script #819, input bridge #820, heartbeat #822) must follow the crystallization protocol. No VLM/LLM call without a crystallize_*() step.
  • #808 (AlexanderWhitestone.com) — The sovereignty dashboard widget is the visible proof of the thesis for stream viewers.
  • #852 (Infrastructure Epic) — Docker Compose (#875) must include sovereignty metrics DB and cache volumes.
  • #903 (State-of-Art) — Tool selections should be evaluated through sovereignty lens: does this tool help crystallize faster?

The Graduation Test (When is Timmy Sovereign?)

All five simultaneously in 24 hours:

  1. No VLM calls after minute 15 of play
  2. <5 cloud API calls per full session
  3. Zero cloud TTS or narration calls
  4. sats_earned > sats_spent
  5. 24hr uptime, no manual restarts

"The baby is coming. The family is growing. The system must be able to stand on its own before the falsework gets pulled."

📎 **PDF attached above** — 11-page governing architecture document with code examples, crystallization tables, cost projections, and graduation criteria. ## Child Issues — Implementation Priority (~2 weeks total) ### P0 — Build First (6 days) - #954: Metrics Emitter + SQLite Metrics Store (1 day) - #955: PerceptionCache — Template Matching for VLM Replacement (2 days) - #956: Skill Library — Embedding Retrieval for LLM Replacement (2 days) - #957: Session Sovereignty Report Generator (1 day) ### P1 — Build Next (4 days) - #958: Narration Template System with Variable Slots (1 day) - #959: Navigation Graph Recorder + Retriever (2 days) - #960: Sovereignty Dashboard Widget — HTMX + WebSocket (1 day) ### P2 — Build After (4 days) - #961: Auto-Crystallizer for Groq Reasoning Chains (3 days) - #962: Three-Strike Detector for Repeated Manual Work (1 day) ## Relationship to Other Epics **This document governs all other epics.** Specifically: - **#904 (Autoresearch)** — The autoresearch loop should optimize sovereignty metrics. Experiment success = higher sovereignty %. The sovereignty scorecard's five metrics become the autoresearch benchmark targets. - **#817 (Project Morrowind)** — Every Morrowind component (perception script #819, input bridge #820, heartbeat #822) must follow the crystallization protocol. No VLM/LLM call without a `crystallize_*()` step. - **#808 (AlexanderWhitestone.com)** — The sovereignty dashboard widget is the visible proof of the thesis for stream viewers. - **#852 (Infrastructure Epic)** — Docker Compose (#875) must include sovereignty metrics DB and cache volumes. - **#903 (State-of-Art)** — Tool selections should be evaluated through sovereignty lens: does this tool help crystallize faster? ## The Graduation Test (When is Timmy Sovereign?) All five simultaneously in 24 hours: 1. No VLM calls after minute 15 of play 2. <5 cloud API calls per full session 3. Zero cloud TTS or narration calls 4. sats_earned > sats_spent 5. 24hr uptime, no manual restarts > "The baby is coming. The family is growing. The system must be able to stand on its own before the falsework gets pulled."
gemini was assigned by Rockachopa 2026-03-22 23:31:27 +00:00
Author
Collaborator

📊 Cross-reference: Deep backlog triage completed in #1076. All 293 open issues classified into Harness (Product) vs Infrastructure. See #1076 for full analysis and action items.

📊 Cross-reference: Deep backlog triage completed in #1076. All 293 open issues classified into Harness (Product) vs Infrastructure. See #1076 for full analysis and action items.
Collaborator

PR created: http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/pulls/1154. This PR adds the Sovereignty Loop governing architecture document to the docs directory, including both a markdown summary and the original PDF.

PR created: http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/pulls/1154. This PR adds the Sovereignty Loop governing architecture document to the `docs` directory, including both a markdown summary and the original PDF.
Collaborator

PR created: http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/pulls/1167. The Sovereignty Loop governing architecture document has been integrated into the docs directory. Its principles and guidelines will be adhered to in all future development efforts.

PR created: http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/pulls/1167. The Sovereignty Loop governing architecture document has been integrated into the `docs` directory. Its principles and guidelines will be adhered to in all future development efforts.
Author
Collaborator

PR submitted: http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/pulls/1331

What this adds

Gemini's PRs (#1154, #1167) delivered the docs. This PR delivers the executable framework — the code that enforces the sovereignty protocol across all three AI layers.

New modules:

  • auto_crystallizer.py (#961) — Extracts local rules from LLM reasoning chains. Pattern detection for thresholds, comparisons, choices. RuleStore with confidence tracking (0.5 → 0.8+ for autonomy).
  • sovereignty_loop.py — Core orchestration: sovereign_perceive(), sovereign_decide(), sovereign_narrate() + @sovereignty_enforced decorator. Every function enforces: check cache → miss → infer → crystallize → return.
  • graduation.py — All 5 graduation conditions evaluable in code: perception/decision/narration independence, economic independence (sats), operational independence (24h uptime). Markdown reports.
  • graduation.py routeGET /sovereignty/graduation/test dashboard endpoint.

Enhanced:

  • perception_cache.py (#955) — crystallize_perception() now actually works: extracts OpenCV templates from VLM bounding-box responses, persists as .npy files.

Tests: 60 new/updated, all passing. Full suite: 715 passed, 0 failed.

Docs: docs/SOVEREIGNTY_INTEGRATION.md — integration guide with examples for every module.

Remaining open child issues: #955 (PerceptionCache — now has working crystallize_perception), #961 (Auto-Crystallizer — now implemented). Both addressed in this PR.

PR submitted: http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/pulls/1331 ## What this adds Gemini's PRs (#1154, #1167) delivered the docs. This PR delivers the **executable framework** — the code that enforces the sovereignty protocol across all three AI layers. ### New modules: - **`auto_crystallizer.py`** (#961) — Extracts local rules from LLM reasoning chains. Pattern detection for thresholds, comparisons, choices. RuleStore with confidence tracking (0.5 → 0.8+ for autonomy). - **`sovereignty_loop.py`** — Core orchestration: `sovereign_perceive()`, `sovereign_decide()`, `sovereign_narrate()` + `@sovereignty_enforced` decorator. Every function enforces: check cache → miss → infer → crystallize → return. - **`graduation.py`** — All 5 graduation conditions evaluable in code: perception/decision/narration independence, economic independence (sats), operational independence (24h uptime). Markdown reports. - **`graduation.py` route** — `GET /sovereignty/graduation/test` dashboard endpoint. ### Enhanced: - **`perception_cache.py`** (#955) — `crystallize_perception()` now actually works: extracts OpenCV templates from VLM bounding-box responses, persists as `.npy` files. ### Tests: 60 new/updated, all passing. Full suite: 715 passed, 0 failed. ### Docs: `docs/SOVEREIGNTY_INTEGRATION.md` — integration guide with examples for every module. **Remaining open child issues:** #955 (PerceptionCache — now has working crystallize_perception), #961 (Auto-Crystallizer — now implemented). Both addressed in this PR.
Author
Collaborator

PR submitted: http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/pulls/1331

What this adds

Gemini's PRs (#1154, #1167) delivered the docs. This PR delivers the executable framework — the code that enforces the sovereignty protocol across all three AI layers.

New modules:

  • auto_crystallizer.py (#961) — Extracts local rules from LLM reasoning chains. Pattern detection for thresholds, comparisons, choices. RuleStore with confidence tracking (0.5 → 0.8+ for autonomy).
  • sovereignty_loop.py — Core orchestration: sovereign_perceive(), sovereign_decide(), sovereign_narrate() + @sovereignty_enforced decorator. Every function enforces: check cache → miss → infer → crystallize → return.
  • graduation.py — All 5 graduation conditions evaluable in code: perception/decision/narration independence, economic independence (sats), operational independence (24h uptime). Markdown reports.
  • graduation.py routeGET /sovereignty/graduation/test dashboard endpoint.

Enhanced:

  • perception_cache.py (#955) — crystallize_perception() now actually works: extracts OpenCV templates from VLM bounding-box responses, persists as .npy files.

Tests: 60 new/updated, all passing. Full suite: 715 passed, 0 failed.

Docs: docs/SOVEREIGNTY_INTEGRATION.md — integration guide with examples for every module.

Remaining open child issues: #955 (PerceptionCache — now has working crystallize_perception), #961 (Auto-Crystallizer — now implemented). Both addressed in this PR.

PR submitted: http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/pulls/1331 ## What this adds Gemini's PRs (#1154, #1167) delivered the docs. This PR delivers the **executable framework** — the code that enforces the sovereignty protocol across all three AI layers. ### New modules: - **`auto_crystallizer.py`** (#961) — Extracts local rules from LLM reasoning chains. Pattern detection for thresholds, comparisons, choices. RuleStore with confidence tracking (0.5 → 0.8+ for autonomy). - **`sovereignty_loop.py`** — Core orchestration: `sovereign_perceive()`, `sovereign_decide()`, `sovereign_narrate()` + `@sovereignty_enforced` decorator. Every function enforces: check cache → miss → infer → crystallize → return. - **`graduation.py`** — All 5 graduation conditions evaluable in code: perception/decision/narration independence, economic independence (sats), operational independence (24h uptime). Markdown reports. - **`graduation.py` route** — `GET /sovereignty/graduation/test` dashboard endpoint. ### Enhanced: - **`perception_cache.py`** (#955) — `crystallize_perception()` now actually works: extracts OpenCV templates from VLM bounding-box responses, persists as `.npy` files. ### Tests: 60 new/updated, all passing. Full suite: 715 passed, 0 failed. ### Docs: `docs/SOVEREIGNTY_INTEGRATION.md` — integration guide with examples for every module. **Remaining open child issues:** #955 (PerceptionCache — now has working crystallize_perception), #961 (Auto-Crystallizer — now implemented). Both addressed in this PR.
Timmy closed this issue 2026-03-24 02:29:41 +00:00
Sign in to join this conversation.
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#953