Timmy-time-dashboard/docs/SOVEREIGNTY_LOOP.md

# The Sovereignty Loop

This document establishes the primary engineering constraint for all Timmy Time development: every task must increase sovereignty as a default deliverable. Not as a future goal. Not as an optimization pass. As a constraint on every commit, every function, every inference call.

The full 11-page governing architecture document is available as a PDF: [The-Sovereignty-Loop.pdf](./The-Sovereignty-Loop.pdf)

> "The measure of progress is not features added. It is model calls eliminated."

## The Core Principle

> **The Sovereignty Loop**: Discover with an expensive model. Compress the discovery into a cheap local rule. Replace the model with the rule. Measure the cost reduction. Repeat.

Every call to an LLM, VLM, or external API passes through three phases:
1. **Discovery** — Model sees something for the first time (expensive, unavoidable, produces new knowledge)
2. **Crystallization** — Discovery compressed into durable cheap artifact (requires explicit engineering)
3. **Replacement** — Crystallized artifact replaces the model call (near-zero cost)

**Code review requirement**: If a function calls a model without a crystallization step, it fails code review. No exceptions. The pattern is always: check cache → miss → infer → crystallize → return.

## The Sovereignty Loop Applied to Every Layer

### Perception: See Once, Template Forever
- First encounter: VLM analyzes screenshot (3-6 sec) → structured JSON
- Crystallized as: OpenCV template + bounding box → `templates.json` (3 ms retrieval)
- `crystallize_perception()` function wraps every VLM response
- **Target**: 90% of perception cycles without VLM by hour 1, 99% by hour 4

### Decision: Reason Once, Rule Forever
- First encounter: LLM reasons through decision (1-5 sec)
- Crystallized as: if/else rules, waypoints, cached preferences → `rules.py`, `nav_graph.db` (<1 ms)
- Uses Voyager pattern: named skills with embeddings, success rates, conditions
- Skill match >0.8 confidence + >0.6 success rate → executes without LLM
- **Target**: 70-80% of decisions without LLM by week 4

### Narration: Script the Predictable, Improvise the Novel
- Predictable moments → template with variable slots, voiced by Kokoro locally
- LLM narrates only genuinely surprising events (quest twist, death, discovery)
- **Target**: 60-70% templatized within a week

### Navigation: Walk Once, Map Forever
- Every path recorded as waypoint sequence with terrain annotations
- First journey = full perception + planning; subsequent = graph traversal
- Builds complete nav graph without external map data

### API Costs: Every Dollar Spent Must Reduce Future Dollars

| Week | Groq Calls/Hr | Local Decisions/Hr | Sovereignty % | Cost/Hr |
|---|---|---|---|---|
| 1 | ~720 | ~80 | 10% | $0.40 |
| 2 | ~400 | ~400 | 50% | $0.22 |
| 4 | ~160 | ~640 | 80% | $0.09 |
| 8 | ~40 | ~760 | 95% | $0.02 |
| Target | <20 | >780 | >97% | <$0.01 |

## The Sovereignty Scorecard (5 Metrics)

Every work session ends with a sovereignty audit. Every PR includes a sovereignty delta. Not optional.

| Metric | What It Measures | Target |
|---|---|---|
| Perception Sovereignty % | Frames understood without VLM | >90% by hour 4 |
| Decision Sovereignty % | Actions chosen without LLM | >80% by week 4 |
| Narration Sovereignty % | Lines from templates vs LLM | >60% by week 2 |
| API Cost Trend | Dollar cost per hour of gameplay | Monotonically decreasing |
| Skill Library Growth | Crystallized skills per session | >5 new skills/session |

Dashboard widget on alexanderwhitestone.com shows these in real-time during streams. HTMX component via WebSocket.

## The Crystallization Protocol

Every model output gets crystallized:

| Model Output | Crystallized As | Storage | Retrieval Cost |
|---|---|---|---|
| VLM: UI element | OpenCV template + bbox | templates.json | 3 ms |
| VLM: text | OCR region coords | regions.json | 50 ms |
| LLM: nav plan | Waypoint sequence | nav_graph.db | <1 ms |
| LLM: combat decision | If/else rule on state | rules.py | <1 ms |
| LLM: quest interpretation | Structured entry | quests.db | <1 ms |
| LLM: NPC disposition | Name→attitude map | npcs.db | <1 ms |
| LLM: narration | Template with slots | narration.json | <1 ms |
| API: moderation | Approved phrase cache | approved.set | <1 ms |
| Groq: strategic plan | Extracted decision rules | strategy.json | <1 ms |

Skill document format: markdown + YAML frontmatter following agentskills.io standard (name, game, type, success_rate, times_used, sovereignty_value).

## The Automation Imperative & Three-Strike Rule

Applies to developer workflow too, not just the agent. If you do the same thing manually three times, you stop and write the automation before proceeding.

**Falsework Checklist** (before any cloud API call):
1. What durable artifact will this call produce?
2. Where will the artifact be stored locally?
3. What local rule or cache will this populate?
4. After this call, will I need to make it again?
5. If yes, what would eliminate the repeat?
6. What is the sovereignty delta of this call?

## The Graduation Test (Falsework Removal Criteria)

All five conditions met simultaneously in a single 24-hour period:

| Test | Condition | Measurement |
|---|---|---|
| Perception Independence | 1 hour, no VLM calls after minute 15 | VLM calls in last 45 min = 0 |
| Decision Independence | Full session with <5 API calls total | Groq/cloud calls < 5 |
| Narration Independence | All narration from local templates + local LLM | Zero cloud TTS/narration calls |
| Economic Independence | Earns more sats than spends on inference | sats_earned > sats_spent |
| Operational Independence | 24 hours unattended, no human intervention | Uptime > 23.5 hrs |

> "The arch must hold after the falsework is removed."