Rockachopa/Timmy-time-dashboard

Files

Google Gemini da29631c43

Tests / lint (push) Has been cancelled

Details

Tests / test (push) Has been cancelled

Details

[gemini] feat: add Sovereignty Loop architecture document (#953 ) (#1154 )

Co-authored-by: Google Gemini <gemini@hermes.local>
Co-committed-by: Google Gemini <gemini@hermes.local>

2026-03-23 19:00:45 +00:00

5.7 KiB

Raw Blame History

The Sovereignty Loop

This document establishes the primary engineering constraint for all Timmy Time development: every task must increase sovereignty as a default deliverable. Not as a future goal. Not as an optimization pass. As a constraint on every commit, every function, every inference call.

The full 11-page governing architecture document is available as a PDF: The-Sovereignty-Loop.pdf

"The measure of progress is not features added. It is model calls eliminated."

The Core Principle

The Sovereignty Loop: Discover with an expensive model. Compress the discovery into a cheap local rule. Replace the model with the rule. Measure the cost reduction. Repeat.

Every call to an LLM, VLM, or external API passes through three phases:

Discovery — Model sees something for the first time (expensive, unavoidable, produces new knowledge)
Crystallization — Discovery compressed into durable cheap artifact (requires explicit engineering)
Replacement — Crystallized artifact replaces the model call (near-zero cost)

Code review requirement: If a function calls a model without a crystallization step, it fails code review. No exceptions. The pattern is always: check cache → miss → infer → crystallize → return.

The Sovereignty Loop Applied to Every Layer

Perception: See Once, Template Forever

First encounter: VLM analyzes screenshot (3-6 sec) → structured JSON
Crystallized as: OpenCV template + bounding box → templates.json (3 ms retrieval)
crystallize_perception() function wraps every VLM response
Target: 90% of perception cycles without VLM by hour 1, 99% by hour 4

Decision: Reason Once, Rule Forever

First encounter: LLM reasons through decision (1-5 sec)
Crystallized as: if/else rules, waypoints, cached preferences → rules.py, nav_graph.db (<1 ms)
Uses Voyager pattern: named skills with embeddings, success rates, conditions
Skill match >0.8 confidence + >0.6 success rate → executes without LLM
Target: 70-80% of decisions without LLM by week 4

Narration: Script the Predictable, Improvise the Novel

Predictable moments → template with variable slots, voiced by Kokoro locally
LLM narrates only genuinely surprising events (quest twist, death, discovery)
Target: 60-70% templatized within a week

Every path recorded as waypoint sequence with terrain annotations
First journey = full perception + planning; subsequent = graph traversal
Builds complete nav graph without external map data

API Costs: Every Dollar Spent Must Reduce Future Dollars

Week	Groq Calls/Hr	Local Decisions/Hr	Sovereignty %	Cost/Hr
1	~720	~80	10%	$0.40
2	~400	~400	50%	$0.22
4	~160	~640	80%	$0.09
8	~40	~760	95%	$0.02
Target	<20	>780	>97%	<$0.01

The Sovereignty Scorecard (5 Metrics)

Every work session ends with a sovereignty audit. Every PR includes a sovereignty delta. Not optional.

Metric	What It Measures	Target
Perception Sovereignty %	Frames understood without VLM	>90% by hour 4
Decision Sovereignty %	Actions chosen without LLM	>80% by week 4
Narration Sovereignty %	Lines from templates vs LLM	>60% by week 2
API Cost Trend	Dollar cost per hour of gameplay	Monotonically decreasing
Skill Library Growth	Crystallized skills per session	>5 new skills/session

Dashboard widget on alexanderwhitestone.com shows these in real-time during streams. HTMX component via WebSocket.

The Crystallization Protocol

Every model output gets crystallized:

Model Output	Crystallized As	Storage	Retrieval Cost
VLM: UI element	OpenCV template + bbox	templates.json	3 ms
VLM: text	OCR region coords	regions.json	50 ms
LLM: nav plan	Waypoint sequence	nav_graph.db	<1 ms
LLM: combat decision	If/else rule on state	rules.py	<1 ms
LLM: quest interpretation	Structured entry	quests.db	<1 ms
LLM: NPC disposition	Name→attitude map	npcs.db	<1 ms
LLM: narration	Template with slots	narration.json	<1 ms
API: moderation	Approved phrase cache	approved.set	<1 ms
Groq: strategic plan	Extracted decision rules	strategy.json	<1 ms

Skill document format: markdown + YAML frontmatter following agentskills.io standard (name, game, type, success_rate, times_used, sovereignty_value).

The Automation Imperative & Three-Strike Rule

Applies to developer workflow too, not just the agent. If you do the same thing manually three times, you stop and write the automation before proceeding.

Falsework Checklist (before any cloud API call):

What durable artifact will this call produce?
Where will the artifact be stored locally?
What local rule or cache will this populate?
After this call, will I need to make it again?
If yes, what would eliminate the repeat?
What is the sovereignty delta of this call?

The Graduation Test (Falsework Removal Criteria)

All five conditions met simultaneously in a single 24-hour period:

Test	Condition	Measurement
Perception Independence	1 hour, no VLM calls after minute 15	VLM calls in last 45 min = 0
Decision Independence	Full session with <5 API calls total	Groq/cloud calls < 5
Narration Independence	All narration from local templates + local LLM	Zero cloud TTS/narration calls
Economic Independence	Earns more sats than spends on inference	sats_earned > sats_spent
Operational Independence	24 hours unattended, no human intervention	Uptime > 23.5 hrs

"The arch must hold after the falsework is removed."