[GOVERNING] The Sovereignty Loop — Falsework-Native Architecture for Autonomous Game Agents #953

New Issue

perplexity · 2026-03-22T18:23:01Z

perplexity commented

2026-03-22 18:23:01 +00:00

Classification

This is a governing architecture document, not a study. It establishes the primary engineering constraint for all Timmy Time development: every task must increase sovereignty as a default deliverable. Not as a future goal. Not as an optimization pass. As a constraint on every commit, every function, every inference call.

PDF attached — The Sovereignty Loop (11 pages, March 2026, Alexander Whitestone / Rockachopa)

"The measure of progress is not features added. It is model calls eliminated."

The Core Principle

The Sovereignty Loop: Discover with an expensive model. Compress the discovery into a cheap local rule. Replace the model with the rule. Measure the cost reduction. Repeat.

Every call to an LLM, VLM, or external API passes through three phases:

Discovery — Model sees something for the first time (expensive, unavoidable, produces new knowledge)
Crystallization — Discovery compressed into durable cheap artifact (requires explicit engineering)
Replacement — Crystallized artifact replaces the model call (near-zero cost)

Code review requirement: If a function calls a model without a crystallization step, it fails code review. No exceptions. The pattern is always: check cache → miss → infer → crystallize → return.

The Sovereignty Loop Applied to Every Layer

Perception: See Once, Template Forever

First encounter: VLM analyzes screenshot (3-6 sec) → structured JSON
Crystallized as: OpenCV template + bounding box → templates.json (3 ms retrieval)
crystallize_perception() function wraps every VLM response
Target: 90% of perception cycles without VLM by hour 1, 99% by hour 4

Decision: Reason Once, Rule Forever

First encounter: LLM reasons through decision (1-5 sec)
Crystallized as: if/else rules, waypoints, cached preferences → rules.py, nav_graph.db (<1 ms)
Uses Voyager pattern: named skills with embeddings, success rates, conditions
Skill match >0.8 confidence + >0.6 success rate → executes without LLM
Target: 70-80% of decisions without LLM by week 4

Narration: Script the Predictable, Improvise the Novel

Predictable moments → template with variable slots, voiced by Kokoro locally
LLM narrates only genuinely surprising events (quest twist, death, discovery)
Target: 60-70% templatized within a week

Every path recorded as waypoint sequence with terrain annotations
First journey = full perception + planning; subsequent = graph traversal
Builds complete nav graph without external map data

API Costs: Every Dollar Spent Must Reduce Future Dollars

Week	Groq Calls/Hr	Local Decisions/Hr	Sovereignty %	Cost/Hr
1	~720	~80	10%	$0.40
2	~400	~400	50%	$0.22
4	~160	~640	80%	$0.09
8	~40	~760	95%	$0.02
Target	<20	>780	>97%	<$0.01

The Sovereignty Scorecard (5 Metrics)

Every work session ends with a sovereignty audit. Every PR includes a sovereignty delta. Not optional.

Metric	What It Measures	Target
Perception Sovereignty %	Frames understood without VLM	>90% by hour 4
Decision Sovereignty %	Actions chosen without LLM	>80% by week 4
Narration Sovereignty %	Lines from templates vs LLM	>60% by week 2
API Cost Trend	Dollar cost per hour of gameplay	Monotonically decreasing
Skill Library Growth	Crystallized skills per session	>5 new skills/session

Dashboard widget on alexanderwhitestone.com shows these in real-time during streams. HTMX component via WebSocket.

The Crystallization Protocol

Every model output gets crystallized:

Model Output	Crystallized As	Storage	Retrieval Cost
VLM: UI element	OpenCV template + bbox	templates.json	3 ms
VLM: text	OCR region coords	regions.json	50 ms
LLM: nav plan	Waypoint sequence	nav_graph.db	<1 ms
LLM: combat decision	If/else rule on state	rules.py	<1 ms
LLM: quest interpretation	Structured entry	quests.db	<1 ms
LLM: NPC disposition	Name→attitude map	npcs.db	<1 ms
LLM: narration	Template with slots	narration.json	<1 ms
API: moderation	Approved phrase cache	approved.set	<1 ms
Groq: strategic plan	Extracted decision rules	strategy.json	<1 ms

Skill document format: markdown + YAML frontmatter following agentskills.io standard (name, game, type, success_rate, times_used, sovereignty_value).

The Automation Imperative & Three-Strike Rule

Applies to developer workflow too, not just the agent. If you do the same thing manually three times, you stop and write the automation before proceeding.

Falsework Checklist (before any cloud API call):

What durable artifact will this call produce?
Where will the artifact be stored locally?
What local rule or cache will this populate?
After this call, will I need to make it again?
If yes, what would eliminate the repeat?
What is the sovereignty delta of this call?

The Graduation Test (Falsework Removal Criteria)

All five conditions met simultaneously in a single 24-hour period:

Test	Condition	Measurement
Perception Independence	1 hour, no VLM calls after minute 15	VLM calls in last 45 min = 0
Decision Independence	Full session with <5 API calls total	Groq/cloud calls < 5
Narration Independence	All narration from local templates + local LLM	Zero cloud TTS/narration calls
Economic Independence	Earns more sats than spends on inference	sats_earned > sats_spent
Operational Independence	24 hours unattended, no human intervention	Uptime > 23.5 hrs

"The arch must hold after the falsework is removed."

Implementation Priority (~2 weeks total)

See child issues below, ordered P0 → P1 → P2.

Cross-references

#904 (Autoresearch) — autoresearch loop is the automated form of the sovereignty loop; experiments should optimize sovereignty metrics
#903 (State-of-Art Survey) — tool selections (vllm-mlx, Kokoro, ComfyUI) should be evaluated through sovereignty lens
#872 (Heartbeat v2) — the gather/reason/act loop must include crystallization step
#871 (WorldInterface) — engine-agnostic adapter must support sovereignty cache per game
#882 (Model Tiering Router) — the cascade config is a crystallization target
#873 (Three-Tier Memory) — Vault tier stores crystallized skills long-term
#879 (AlexanderWhitestone.com Dashboard) — sovereignty dashboard widget goes here
#853 (Piper TTS) / #903 (Kokoro) — local voice for narration sovereignty
#855 (SQLite Command Log) — sovereignty metrics stored here
#851 (Autonomous Agent Economy) — graduation test's economic independence

## Classification **This is a governing architecture document, not a study.** It establishes the primary engineering constraint for all Timmy Time development: every task must increase sovereignty as a default deliverable. Not as a future goal. Not as an optimization pass. As a constraint on every commit, every function, every inference call. **PDF attached** — The Sovereignty Loop (11 pages, March 2026, Alexander Whitestone / Rockachopa) > "The measure of progress is not features added. It is model calls eliminated." ## The Core Principle > **The Sovereignty Loop**: Discover with an expensive model. Compress the discovery into a cheap local rule. Replace the model with the rule. Measure the cost reduction. Repeat. Every call to an LLM, VLM, or external API passes through three phases: 1. **Discovery** — Model sees something for the first time (expensive, unavoidable, produces new knowledge) 2. **Crystallization** — Discovery compressed into durable cheap artifact (requires explicit engineering) 3. **Replacement** — Crystallized artifact replaces the model call (near-zero cost) **Code review requirement**: If a function calls a model without a crystallization step, it fails code review. No exceptions. The pattern is always: check cache → miss → infer → crystallize → return. ## The Sovereignty Loop Applied to Every Layer ### Perception: See Once, Template Forever - First encounter: VLM analyzes screenshot (3-6 sec) → structured JSON - Crystallized as: OpenCV template + bounding box → `templates.json` (3 ms retrieval) - `crystallize_perception()` function wraps every VLM response - **Target**: 90% of perception cycles without VLM by hour 1, 99% by hour 4 ### Decision: Reason Once, Rule Forever - First encounter: LLM reasons through decision (1-5 sec) - Crystallized as: if/else rules, waypoints, cached preferences → `rules.py`, `nav_graph.db` (<1 ms) - Uses Voyager pattern: named skills with embeddings, success rates, conditions - Skill match >0.8 confidence + >0.6 success rate → executes without LLM - **Target**: 70-80% of decisions without LLM by week 4 ### Narration: Script the Predictable, Improvise the Novel - Predictable moments → template with variable slots, voiced by Kokoro locally - LLM narrates only genuinely surprising events (quest twist, death, discovery) - **Target**: 60-70% templatized within a week ### Navigation: Walk Once, Map Forever - Every path recorded as waypoint sequence with terrain annotations - First journey = full perception + planning; subsequent = graph traversal - Builds complete nav graph without external map data ### API Costs: Every Dollar Spent Must Reduce Future Dollars | Week | Groq Calls/Hr | Local Decisions/Hr | Sovereignty % | Cost/Hr | |---|---|---|---|---| | 1 | ~720 | ~80 | 10% | $0.40 | | 2 | ~400 | ~400 | 50% | $0.22 | | 4 | ~160 | ~640 | 80% | $0.09 | | 8 | ~40 | ~760 | 95% | $0.02 | | Target | <20 | >780 | >97% | <$0.01 | ## The Sovereignty Scorecard (5 Metrics) Every work session ends with a sovereignty audit. Every PR includes a sovereignty delta. Not optional. | Metric | What It Measures | Target | |---|---|---| | Perception Sovereignty % | Frames understood without VLM | >90% by hour 4 | | Decision Sovereignty % | Actions chosen without LLM | >80% by week 4 | | Narration Sovereignty % | Lines from templates vs LLM | >60% by week 2 | | API Cost Trend | Dollar cost per hour of gameplay | Monotonically decreasing | | Skill Library Growth | Crystallized skills per session | >5 new skills/session | Dashboard widget on alexanderwhitestone.com shows these in real-time during streams. HTMX component via WebSocket. ## The Crystallization Protocol Every model output gets crystallized: | Model Output | Crystallized As | Storage | Retrieval Cost | |---|---|---|---| | VLM: UI element | OpenCV template + bbox | templates.json | 3 ms | | VLM: text | OCR region coords | regions.json | 50 ms | | LLM: nav plan | Waypoint sequence | nav_graph.db | <1 ms | | LLM: combat decision | If/else rule on state | rules.py | <1 ms | | LLM: quest interpretation | Structured entry | quests.db | <1 ms | | LLM: NPC disposition | Name→attitude map | npcs.db | <1 ms | | LLM: narration | Template with slots | narration.json | <1 ms | | API: moderation | Approved phrase cache | approved.set | <1 ms | | Groq: strategic plan | Extracted decision rules | strategy.json | <1 ms | Skill document format: markdown + YAML frontmatter following agentskills.io standard (name, game, type, success_rate, times_used, sovereignty_value). ## The Automation Imperative & Three-Strike Rule Applies to developer workflow too, not just the agent. If you do the same thing manually three times, you stop and write the automation before proceeding. **Falsework Checklist** (before any cloud API call): 1. What durable artifact will this call produce? 2. Where will the artifact be stored locally? 3. What local rule or cache will this populate? 4. After this call, will I need to make it again? 5. If yes, what would eliminate the repeat? 6. What is the sovereignty delta of this call? ## The Graduation Test (Falsework Removal Criteria) All five conditions met simultaneously in a single 24-hour period: | Test | Condition | Measurement | |---|---|---| | Perception Independence | 1 hour, no VLM calls after minute 15 | VLM calls in last 45 min = 0 | | Decision Independence | Full session with <5 API calls total | Groq/cloud calls < 5 | | Narration Independence | All narration from local templates + local LLM | Zero cloud TTS/narration calls | | Economic Independence | Earns more sats than spends on inference | sats_earned > sats_spent | | Operational Independence | 24 hours unattended, no human intervention | Uptime > 23.5 hrs | > "The arch must hold after the falsework is removed." ## Implementation Priority (~2 weeks total) See child issues below, ordered P0 → P1 → P2. ## Cross-references - #904 (Autoresearch) — autoresearch loop is the automated form of the sovereignty loop; experiments should optimize sovereignty metrics - #903 (State-of-Art Survey) — tool selections (vllm-mlx, Kokoro, ComfyUI) should be evaluated through sovereignty lens - #872 (Heartbeat v2) — the gather/reason/act loop must include crystallization step - #871 (WorldInterface) — engine-agnostic adapter must support sovereignty cache per game - #882 (Model Tiering Router) — the cascade config is a crystallization target - #873 (Three-Tier Memory) — Vault tier stores crystallized skills long-term - #879 (AlexanderWhitestone.com Dashboard) — sovereignty dashboard widget goes here - #853 (Piper TTS) / #903 (Kokoro) — local voice for narration sovereignty - #855 (SQLite Command Log) — sovereignty metrics stored here - #851 (Autonomous Agent Economy) — graduation test's economic independence

The-Sovereignty-Loop.pdf

25 KiB

[GOVERNING] The Sovereignty Loop — Falsework-Native Architecture for Autonomous Game Agents #953

Classification

The Core Principle

The Sovereignty Loop Applied to Every Layer

Perception: See Once, Template Forever

Decision: Reason Once, Rule Forever

Narration: Script the Predictable, Improvise the Novel

Navigation: Walk Once, Map Forever

API Costs: Every Dollar Spent Must Reduce Future Dollars

The Sovereignty Scorecard (5 Metrics)

The Crystallization Protocol

The Automation Imperative & Three-Strike Rule

The Graduation Test (Falsework Removal Criteria)

Implementation Priority (~2 weeks total)

Cross-references

Child Issues — Implementation Priority (~2 weeks total)

P0 — Build First (6 days)

P1 — Build Next (4 days)

P2 — Build After (4 days)

Relationship to Other Epics

The Graduation Test (When is Timmy Sovereign?)

What this adds

New modules:

Enhanced:

Tests: 60 new/updated, all passing. Full suite: 715 passed, 0 failed.

Docs: docs/SOVEREIGNTY_INTEGRATION.md — integration guide with examples for every module.