Co-authored-by: Google Gemini <gemini@hermes.local> Co-committed-by: Google Gemini <gemini@hermes.local>
112 lines
5.7 KiB
Markdown
112 lines
5.7 KiB
Markdown
# The Sovereignty Loop
|
|
|
|
This document establishes the primary engineering constraint for all Timmy Time development: every task must increase sovereignty as a default deliverable. Not as a future goal. Not as an optimization pass. As a constraint on every commit, every function, every inference call.
|
|
|
|
The full 11-page governing architecture document is available as a PDF: [The-Sovereignty-Loop.pdf](./The-Sovereignty-Loop.pdf)
|
|
|
|
> "The measure of progress is not features added. It is model calls eliminated."
|
|
|
|
## The Core Principle
|
|
|
|
> **The Sovereignty Loop**: Discover with an expensive model. Compress the discovery into a cheap local rule. Replace the model with the rule. Measure the cost reduction. Repeat.
|
|
|
|
Every call to an LLM, VLM, or external API passes through three phases:
|
|
1. **Discovery** — Model sees something for the first time (expensive, unavoidable, produces new knowledge)
|
|
2. **Crystallization** — Discovery compressed into durable cheap artifact (requires explicit engineering)
|
|
3. **Replacement** — Crystallized artifact replaces the model call (near-zero cost)
|
|
|
|
**Code review requirement**: If a function calls a model without a crystallization step, it fails code review. No exceptions. The pattern is always: check cache → miss → infer → crystallize → return.
|
|
|
|
## The Sovereignty Loop Applied to Every Layer
|
|
|
|
### Perception: See Once, Template Forever
|
|
- First encounter: VLM analyzes screenshot (3-6 sec) → structured JSON
|
|
- Crystallized as: OpenCV template + bounding box → `templates.json` (3 ms retrieval)
|
|
- `crystallize_perception()` function wraps every VLM response
|
|
- **Target**: 90% of perception cycles without VLM by hour 1, 99% by hour 4
|
|
|
|
### Decision: Reason Once, Rule Forever
|
|
- First encounter: LLM reasons through decision (1-5 sec)
|
|
- Crystallized as: if/else rules, waypoints, cached preferences → `rules.py`, `nav_graph.db` (<1 ms)
|
|
- Uses Voyager pattern: named skills with embeddings, success rates, conditions
|
|
- Skill match >0.8 confidence + >0.6 success rate → executes without LLM
|
|
- **Target**: 70-80% of decisions without LLM by week 4
|
|
|
|
### Narration: Script the Predictable, Improvise the Novel
|
|
- Predictable moments → template with variable slots, voiced by Kokoro locally
|
|
- LLM narrates only genuinely surprising events (quest twist, death, discovery)
|
|
- **Target**: 60-70% templatized within a week
|
|
|
|
### Navigation: Walk Once, Map Forever
|
|
- Every path recorded as waypoint sequence with terrain annotations
|
|
- First journey = full perception + planning; subsequent = graph traversal
|
|
- Builds complete nav graph without external map data
|
|
|
|
### API Costs: Every Dollar Spent Must Reduce Future Dollars
|
|
|
|
| Week | Groq Calls/Hr | Local Decisions/Hr | Sovereignty % | Cost/Hr |
|
|
|---|---|---|---|---|
|
|
| 1 | ~720 | ~80 | 10% | $0.40 |
|
|
| 2 | ~400 | ~400 | 50% | $0.22 |
|
|
| 4 | ~160 | ~640 | 80% | $0.09 |
|
|
| 8 | ~40 | ~760 | 95% | $0.02 |
|
|
| Target | <20 | >780 | >97% | <$0.01 |
|
|
|
|
## The Sovereignty Scorecard (5 Metrics)
|
|
|
|
Every work session ends with a sovereignty audit. Every PR includes a sovereignty delta. Not optional.
|
|
|
|
| Metric | What It Measures | Target |
|
|
|---|---|---|
|
|
| Perception Sovereignty % | Frames understood without VLM | >90% by hour 4 |
|
|
| Decision Sovereignty % | Actions chosen without LLM | >80% by week 4 |
|
|
| Narration Sovereignty % | Lines from templates vs LLM | >60% by week 2 |
|
|
| API Cost Trend | Dollar cost per hour of gameplay | Monotonically decreasing |
|
|
| Skill Library Growth | Crystallized skills per session | >5 new skills/session |
|
|
|
|
Dashboard widget on alexanderwhitestone.com shows these in real-time during streams. HTMX component via WebSocket.
|
|
|
|
## The Crystallization Protocol
|
|
|
|
Every model output gets crystallized:
|
|
|
|
| Model Output | Crystallized As | Storage | Retrieval Cost |
|
|
|---|---|---|---|
|
|
| VLM: UI element | OpenCV template + bbox | templates.json | 3 ms |
|
|
| VLM: text | OCR region coords | regions.json | 50 ms |
|
|
| LLM: nav plan | Waypoint sequence | nav_graph.db | <1 ms |
|
|
| LLM: combat decision | If/else rule on state | rules.py | <1 ms |
|
|
| LLM: quest interpretation | Structured entry | quests.db | <1 ms |
|
|
| LLM: NPC disposition | Name→attitude map | npcs.db | <1 ms |
|
|
| LLM: narration | Template with slots | narration.json | <1 ms |
|
|
| API: moderation | Approved phrase cache | approved.set | <1 ms |
|
|
| Groq: strategic plan | Extracted decision rules | strategy.json | <1 ms |
|
|
|
|
Skill document format: markdown + YAML frontmatter following agentskills.io standard (name, game, type, success_rate, times_used, sovereignty_value).
|
|
|
|
## The Automation Imperative & Three-Strike Rule
|
|
|
|
Applies to developer workflow too, not just the agent. If you do the same thing manually three times, you stop and write the automation before proceeding.
|
|
|
|
**Falsework Checklist** (before any cloud API call):
|
|
1. What durable artifact will this call produce?
|
|
2. Where will the artifact be stored locally?
|
|
3. What local rule or cache will this populate?
|
|
4. After this call, will I need to make it again?
|
|
5. If yes, what would eliminate the repeat?
|
|
6. What is the sovereignty delta of this call?
|
|
|
|
## The Graduation Test (Falsework Removal Criteria)
|
|
|
|
All five conditions met simultaneously in a single 24-hour period:
|
|
|
|
| Test | Condition | Measurement |
|
|
|---|---|---|
|
|
| Perception Independence | 1 hour, no VLM calls after minute 15 | VLM calls in last 45 min = 0 |
|
|
| Decision Independence | Full session with <5 API calls total | Groq/cloud calls < 5 |
|
|
| Narration Independence | All narration from local templates + local LLM | Zero cloud TTS/narration calls |
|
|
| Economic Independence | Earns more sats than spends on inference | sats_earned > sats_spent |
|
|
| Operational Independence | 24 hours unattended, no human intervention | Uptime > 23.5 hrs |
|
|
|
|
> "The arch must hold after the falsework is removed."
|