This repository has been archived on 2026-03-24. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
Timmy-time-dashboard/docs/SOVEREIGNTY_LOOP.md
2026-03-23 19:00:45 +00:00

5.7 KiB

The Sovereignty Loop

This document establishes the primary engineering constraint for all Timmy Time development: every task must increase sovereignty as a default deliverable. Not as a future goal. Not as an optimization pass. As a constraint on every commit, every function, every inference call.

The full 11-page governing architecture document is available as a PDF: The-Sovereignty-Loop.pdf

"The measure of progress is not features added. It is model calls eliminated."

The Core Principle

The Sovereignty Loop: Discover with an expensive model. Compress the discovery into a cheap local rule. Replace the model with the rule. Measure the cost reduction. Repeat.

Every call to an LLM, VLM, or external API passes through three phases:

  1. Discovery — Model sees something for the first time (expensive, unavoidable, produces new knowledge)
  2. Crystallization — Discovery compressed into durable cheap artifact (requires explicit engineering)
  3. Replacement — Crystallized artifact replaces the model call (near-zero cost)

Code review requirement: If a function calls a model without a crystallization step, it fails code review. No exceptions. The pattern is always: check cache → miss → infer → crystallize → return.

The Sovereignty Loop Applied to Every Layer

Perception: See Once, Template Forever

  • First encounter: VLM analyzes screenshot (3-6 sec) → structured JSON
  • Crystallized as: OpenCV template + bounding box → templates.json (3 ms retrieval)
  • crystallize_perception() function wraps every VLM response
  • Target: 90% of perception cycles without VLM by hour 1, 99% by hour 4

Decision: Reason Once, Rule Forever

  • First encounter: LLM reasons through decision (1-5 sec)
  • Crystallized as: if/else rules, waypoints, cached preferences → rules.py, nav_graph.db (<1 ms)
  • Uses Voyager pattern: named skills with embeddings, success rates, conditions
  • Skill match >0.8 confidence + >0.6 success rate → executes without LLM
  • Target: 70-80% of decisions without LLM by week 4

Narration: Script the Predictable, Improvise the Novel

  • Predictable moments → template with variable slots, voiced by Kokoro locally
  • LLM narrates only genuinely surprising events (quest twist, death, discovery)
  • Target: 60-70% templatized within a week

Navigation: Walk Once, Map Forever

  • Every path recorded as waypoint sequence with terrain annotations
  • First journey = full perception + planning; subsequent = graph traversal
  • Builds complete nav graph without external map data

API Costs: Every Dollar Spent Must Reduce Future Dollars

Week Groq Calls/Hr Local Decisions/Hr Sovereignty % Cost/Hr
1 ~720 ~80 10% $0.40
2 ~400 ~400 50% $0.22
4 ~160 ~640 80% $0.09
8 ~40 ~760 95% $0.02
Target <20 >780 >97% <$0.01

The Sovereignty Scorecard (5 Metrics)

Every work session ends with a sovereignty audit. Every PR includes a sovereignty delta. Not optional.

Metric What It Measures Target
Perception Sovereignty % Frames understood without VLM >90% by hour 4
Decision Sovereignty % Actions chosen without LLM >80% by week 4
Narration Sovereignty % Lines from templates vs LLM >60% by week 2
API Cost Trend Dollar cost per hour of gameplay Monotonically decreasing
Skill Library Growth Crystallized skills per session >5 new skills/session

Dashboard widget on alexanderwhitestone.com shows these in real-time during streams. HTMX component via WebSocket.

The Crystallization Protocol

Every model output gets crystallized:

Model Output Crystallized As Storage Retrieval Cost
VLM: UI element OpenCV template + bbox templates.json 3 ms
VLM: text OCR region coords regions.json 50 ms
LLM: nav plan Waypoint sequence nav_graph.db <1 ms
LLM: combat decision If/else rule on state rules.py <1 ms
LLM: quest interpretation Structured entry quests.db <1 ms
LLM: NPC disposition Name→attitude map npcs.db <1 ms
LLM: narration Template with slots narration.json <1 ms
API: moderation Approved phrase cache approved.set <1 ms
Groq: strategic plan Extracted decision rules strategy.json <1 ms

Skill document format: markdown + YAML frontmatter following agentskills.io standard (name, game, type, success_rate, times_used, sovereignty_value).

The Automation Imperative & Three-Strike Rule

Applies to developer workflow too, not just the agent. If you do the same thing manually three times, you stop and write the automation before proceeding.

Falsework Checklist (before any cloud API call):

  1. What durable artifact will this call produce?
  2. Where will the artifact be stored locally?
  3. What local rule or cache will this populate?
  4. After this call, will I need to make it again?
  5. If yes, what would eliminate the repeat?
  6. What is the sovereignty delta of this call?

The Graduation Test (Falsework Removal Criteria)

All five conditions met simultaneously in a single 24-hour period:

Test Condition Measurement
Perception Independence 1 hour, no VLM calls after minute 15 VLM calls in last 45 min = 0
Decision Independence Full session with <5 API calls total Groq/cloud calls < 5
Narration Independence All narration from local templates + local LLM Zero cloud TTS/narration calls
Economic Independence Earns more sats than spends on inference sats_earned > sats_spent
Operational Independence 24 hours unattended, no human intervention Uptime > 23.5 hrs

"The arch must hold after the falsework is removed."