Co-authored-by: Google Gemini <gemini@hermes.local> Co-committed-by: Google Gemini <gemini@hermes.local>
5.7 KiB
The Sovereignty Loop
This document establishes the primary engineering constraint for all Timmy Time development: every task must increase sovereignty as a default deliverable. Not as a future goal. Not as an optimization pass. As a constraint on every commit, every function, every inference call.
The full 11-page governing architecture document is available as a PDF: The-Sovereignty-Loop.pdf
"The measure of progress is not features added. It is model calls eliminated."
The Core Principle
The Sovereignty Loop: Discover with an expensive model. Compress the discovery into a cheap local rule. Replace the model with the rule. Measure the cost reduction. Repeat.
Every call to an LLM, VLM, or external API passes through three phases:
- Discovery — Model sees something for the first time (expensive, unavoidable, produces new knowledge)
- Crystallization — Discovery compressed into durable cheap artifact (requires explicit engineering)
- Replacement — Crystallized artifact replaces the model call (near-zero cost)
Code review requirement: If a function calls a model without a crystallization step, it fails code review. No exceptions. The pattern is always: check cache → miss → infer → crystallize → return.
The Sovereignty Loop Applied to Every Layer
Perception: See Once, Template Forever
- First encounter: VLM analyzes screenshot (3-6 sec) → structured JSON
- Crystallized as: OpenCV template + bounding box →
templates.json(3 ms retrieval) crystallize_perception()function wraps every VLM response- Target: 90% of perception cycles without VLM by hour 1, 99% by hour 4
Decision: Reason Once, Rule Forever
- First encounter: LLM reasons through decision (1-5 sec)
- Crystallized as: if/else rules, waypoints, cached preferences →
rules.py,nav_graph.db(<1 ms) - Uses Voyager pattern: named skills with embeddings, success rates, conditions
- Skill match >0.8 confidence + >0.6 success rate → executes without LLM
- Target: 70-80% of decisions without LLM by week 4
Narration: Script the Predictable, Improvise the Novel
- Predictable moments → template with variable slots, voiced by Kokoro locally
- LLM narrates only genuinely surprising events (quest twist, death, discovery)
- Target: 60-70% templatized within a week
Navigation: Walk Once, Map Forever
- Every path recorded as waypoint sequence with terrain annotations
- First journey = full perception + planning; subsequent = graph traversal
- Builds complete nav graph without external map data
API Costs: Every Dollar Spent Must Reduce Future Dollars
| Week | Groq Calls/Hr | Local Decisions/Hr | Sovereignty % | Cost/Hr |
|---|---|---|---|---|
| 1 | ~720 | ~80 | 10% | $0.40 |
| 2 | ~400 | ~400 | 50% | $0.22 |
| 4 | ~160 | ~640 | 80% | $0.09 |
| 8 | ~40 | ~760 | 95% | $0.02 |
| Target | <20 | >780 | >97% | <$0.01 |
The Sovereignty Scorecard (5 Metrics)
Every work session ends with a sovereignty audit. Every PR includes a sovereignty delta. Not optional.
| Metric | What It Measures | Target |
|---|---|---|
| Perception Sovereignty % | Frames understood without VLM | >90% by hour 4 |
| Decision Sovereignty % | Actions chosen without LLM | >80% by week 4 |
| Narration Sovereignty % | Lines from templates vs LLM | >60% by week 2 |
| API Cost Trend | Dollar cost per hour of gameplay | Monotonically decreasing |
| Skill Library Growth | Crystallized skills per session | >5 new skills/session |
Dashboard widget on alexanderwhitestone.com shows these in real-time during streams. HTMX component via WebSocket.
The Crystallization Protocol
Every model output gets crystallized:
| Model Output | Crystallized As | Storage | Retrieval Cost |
|---|---|---|---|
| VLM: UI element | OpenCV template + bbox | templates.json | 3 ms |
| VLM: text | OCR region coords | regions.json | 50 ms |
| LLM: nav plan | Waypoint sequence | nav_graph.db | <1 ms |
| LLM: combat decision | If/else rule on state | rules.py | <1 ms |
| LLM: quest interpretation | Structured entry | quests.db | <1 ms |
| LLM: NPC disposition | Name→attitude map | npcs.db | <1 ms |
| LLM: narration | Template with slots | narration.json | <1 ms |
| API: moderation | Approved phrase cache | approved.set | <1 ms |
| Groq: strategic plan | Extracted decision rules | strategy.json | <1 ms |
Skill document format: markdown + YAML frontmatter following agentskills.io standard (name, game, type, success_rate, times_used, sovereignty_value).
The Automation Imperative & Three-Strike Rule
Applies to developer workflow too, not just the agent. If you do the same thing manually three times, you stop and write the automation before proceeding.
Falsework Checklist (before any cloud API call):
- What durable artifact will this call produce?
- Where will the artifact be stored locally?
- What local rule or cache will this populate?
- After this call, will I need to make it again?
- If yes, what would eliminate the repeat?
- What is the sovereignty delta of this call?
The Graduation Test (Falsework Removal Criteria)
All five conditions met simultaneously in a single 24-hour period:
| Test | Condition | Measurement |
|---|---|---|
| Perception Independence | 1 hour, no VLM calls after minute 15 | VLM calls in last 45 min = 0 |
| Decision Independence | Full session with <5 API calls total | Groq/cloud calls < 5 |
| Narration Independence | All narration from local templates + local LLM | Zero cloud TTS/narration calls |
| Economic Independence | Earns more sats than spends on inference | sats_earned > sats_spent |
| Operational Independence | 24 hours unattended, no human intervention | Uptime > 23.5 hrs |
"The arch must hold after the falsework is removed."