[Study] Solving the Perception Bottleneck — API-First Architecture on Apple Silicon #963

New Issue

perplexity · 2026-03-22T18:44:34Z

perplexity commented

2026-03-22 18:44:34 +00:00

Summary

This paper presents a complete architecture for eliminating Timmy's $50/day cloud VLM dependency by shifting from screen-interpretation to API-first perception via OpenMW Lua. The thesis: OpenMW Lua is "Mineflayer for Morrowind" — providing structured access to ~95% of game state without any vision model, reducing weighted-average perception latency from multi-second cloud calls to ~70ms locally on M3 Max.

Core Architecture

4-Level Perception Hierarchy (cheapest first)

Level	Method	Latency	Use Case	Frequency
L1	OpenMW Lua API	~1ms	Position, stats, nearby entities, quest state, inventory	100% of ticks
L2	Core ML classifier	~3ms	UI state detection (gameplay/dialogue/inventory/map/menu)	~30% of ticks
L3	PaddleOCR	~40ms	Dialogue text extraction (fallback when API prediction < 0.8 confidence)	~5% of ticks
L4	FastVLM-0.5B (local)	~300ms	Genuinely ambiguous visual state, stuck detection	~3% of ticks

Three-Tier Metabolic LLM Model (decision-making)

Tier	Model	Quant	RAM	Use Case	Frequency
T1 (Routine)	Qwen3-3B	Q8_0	3.5GB	Simple choices, navigation	20%
T2 (Medium)	Llama-3.1-8B	Q4_K_M	5GB	Dialogue, inventory management	7%
T3 (Complex)	Qwen3-32B	Q4_K_M	20GB	Quest planning, stuck recovery (game paused)	3%

Behavior Trees

Handle 70% of all actions at zero inference cost (~2ms)
Walk, attack, loot, basic combat — all BT-driven
LLMs reserved only for genuinely novel decisions

Key Technical Details

OpenMW Lua API Surface

openmw.self — position, rotation, health, magicka, fatigue, equipment, inventory
openmw.nearby — actors and objects within radius, with stats and disposition
openmw.world — time of day, weather, cell name, active quests, quest stages
world.pause("agent") / world.unpause("agent") — freeze game for complex reasoning
Covers character stats, nearby entities, quest journal, inventory, active effects, cell transitions

ESM Data Pre-Extraction

Use tes3conv to convert .esm → JSON
Extract: NPC locations, dialogue trees (topic → response + conditions), path grid nodes
Build NetworkX navigation graph from path grid data
Pre-evaluate dialogue conditions at load time → know available topics before conversation
Store in SQLite (~200MB) for O(1) lookup

UESP RAG Pipeline

Scrape ~4,300 quest/location/NPC pages from UESP wiki
Chunk into ~15K passages, embed with nomic-embed-text-v1.5
Store in ChromaDB (~500MB)
Runtime query: e.g., "What do I do after delivering the package to Caius?"

GBNF Grammar for Constrained Decoding

Forces LLM output into valid game commands only
Grammar covers: move_to, interact, attack, use_item, cast_spell, wait, dialogue_choose, navigate_menu
Eliminates parsing failures and hallucinated actions
Supported by llama.cpp and MLX

Apple Silicon Optimization (M3 Max)

Co-processor	Cores	Use
Neural Engine	16 (18 TOPS)	Core ML UI classifier (~2MB model, <5ms)
GPU	40 (Metal)	All LLM/VLM via MLX (21-87% faster than llama.cpp)
CPU	16	Orchestration, pathfinding, BT execution, ESM queries

mlx-vlm (v0.3.11) for VLM inference, vllm-mlx for production serving with prefix caching (28x speedup)
ScreenCaptureKit for hardware-accelerated window capture (2-8ms latency)

Memory Budget (128GB Unified)

Component	RAM
OpenMW game client	~2GB
Qwen3-3B Q8 (T1)	3.5GB
Llama-8B Q4 (T2)	5GB
Qwen3-32B Q4 (T3, on-demand)	20GB
FastVLM-0.5B 4-bit	350MB
nomic-embed-text (RAG)	300MB
ChromaDB + knowledge base	~500MB
ESM databases (SQLite + NetworkX)	~200MB
Total	~40GB (88GB headroom)

Complete Heartbeat Loop

The paper provides full PerceptionStack pseudocode with a 6-phase tick() method:

API perception (~1ms) — read Lua state
Behavior tree check (~0ms) — if BT confident >0.9, execute immediately
Screen capture + CV (~20ms) — only if BT can't handle, with frame change detection
Text extraction (~40ms) — dialogue prediction first, OCR fallback
VLM (~300ms) — only for unknown UI state or stuck detection
LLM decision (50-1500ms) — tiered model selection, game pauses for T3

Expected Performance

Scenario	Frequency	Latency
BT-handled (walk, attack, loot)	70%	~2ms
T1 LLM (simple choice)	20%	~100-150ms
T2 LLM + OCR (dialogue, inventory)	7%	~300-500ms
T3 LLM + VLM (quest planning, stuck)	3%	~1.5-2.5s (paused)
Weighted average	—	~70ms

VLM Model Recommendations

FastVLM-0.5B (4-bit, 350MB) — primary vision fallback, Apple-optimized
Moondream-2B — more capable alternative
Qwen2.5-VL-3B — strongest small VLM

Conclusion

The fundamental shift: stop treating Morrowind like a black-box screen to interpret, start treating it like an API to query. Pre-computation eliminates runtime discovery. Behavior trees handle 70% at zero cost. OpenMW pause converts latency problems into correctness problems. The entire stack fits in ~40GB with 88GB headroom.

PDF attached below. See cross-reference comment for links to related tickets.

## Summary This paper presents a complete architecture for eliminating Timmy's $50/day cloud VLM dependency by shifting from screen-interpretation to API-first perception via OpenMW Lua. The thesis: **OpenMW Lua is "Mineflayer for Morrowind"** — providing structured access to ~95% of game state without any vision model, reducing weighted-average perception latency from multi-second cloud calls to ~70ms locally on M3 Max. ## Core Architecture ### 4-Level Perception Hierarchy (cheapest first) | Level | Method | Latency | Use Case | Frequency | |-------|--------|---------|----------|-----------| | L1 | OpenMW Lua API | ~1ms | Position, stats, nearby entities, quest state, inventory | 100% of ticks | | L2 | Core ML classifier | ~3ms | UI state detection (gameplay/dialogue/inventory/map/menu) | ~30% of ticks | | L3 | PaddleOCR | ~40ms | Dialogue text extraction (fallback when API prediction < 0.8 confidence) | ~5% of ticks | | L4 | FastVLM-0.5B (local) | ~300ms | Genuinely ambiguous visual state, stuck detection | ~3% of ticks | ### Three-Tier Metabolic LLM Model (decision-making) | Tier | Model | Quant | RAM | Use Case | Frequency | |------|-------|-------|-----|----------|-----------| | T1 (Routine) | Qwen3-3B | Q8_0 | 3.5GB | Simple choices, navigation | 20% | | T2 (Medium) | Llama-3.1-8B | Q4_K_M | 5GB | Dialogue, inventory management | 7% | | T3 (Complex) | Qwen3-32B | Q4_K_M | 20GB | Quest planning, stuck recovery (game paused) | 3% | ### Behavior Trees - Handle **70% of all actions** at zero inference cost (~2ms) - Walk, attack, loot, basic combat — all BT-driven - LLMs reserved only for genuinely novel decisions ## Key Technical Details ### OpenMW Lua API Surface - `openmw.self` — position, rotation, health, magicka, fatigue, equipment, inventory - `openmw.nearby` — actors and objects within radius, with stats and disposition - `openmw.world` — time of day, weather, cell name, active quests, quest stages - `world.pause("agent")` / `world.unpause("agent")` — freeze game for complex reasoning - Covers character stats, nearby entities, quest journal, inventory, active effects, cell transitions ### ESM Data Pre-Extraction - Use `tes3conv` to convert `.esm` → JSON - Extract: NPC locations, dialogue trees (topic → response + conditions), path grid nodes - Build **NetworkX navigation graph** from path grid data - Pre-evaluate dialogue conditions at load time → know available topics before conversation - Store in SQLite (~200MB) for O(1) lookup ### UESP RAG Pipeline - Scrape ~4,300 quest/location/NPC pages from UESP wiki - Chunk into ~15K passages, embed with `nomic-embed-text-v1.5` - Store in ChromaDB (~500MB) - Runtime query: e.g., "What do I do after delivering the package to Caius?" ### GBNF Grammar for Constrained Decoding - Forces LLM output into valid game commands only - Grammar covers: `move_to`, `interact`, `attack`, `use_item`, `cast_spell`, `wait`, `dialogue_choose`, `navigate_menu` - Eliminates parsing failures and hallucinated actions - Supported by llama.cpp and MLX ### Apple Silicon Optimization (M3 Max) | Co-processor | Cores | Use | |-------------|-------|-----| | Neural Engine | 16 (18 TOPS) | Core ML UI classifier (~2MB model, <5ms) | | GPU | 40 (Metal) | All LLM/VLM via MLX (21-87% faster than llama.cpp) | | CPU | 16 | Orchestration, pathfinding, BT execution, ESM queries | - **mlx-vlm** (v0.3.11) for VLM inference, **vllm-mlx** for production serving with prefix caching (28x speedup) - **ScreenCaptureKit** for hardware-accelerated window capture (2-8ms latency) ### Memory Budget (128GB Unified) | Component | RAM | |-----------|-----| | OpenMW game client | ~2GB | | Qwen3-3B Q8 (T1) | 3.5GB | | Llama-8B Q4 (T2) | 5GB | | Qwen3-32B Q4 (T3, on-demand) | 20GB | | FastVLM-0.5B 4-bit | 350MB | | nomic-embed-text (RAG) | 300MB | | ChromaDB + knowledge base | ~500MB | | ESM databases (SQLite + NetworkX) | ~200MB | | **Total** | **~40GB (88GB headroom)** | ### Complete Heartbeat Loop The paper provides full `PerceptionStack` pseudocode with a 6-phase `tick()` method: 1. API perception (~1ms) — read Lua state 2. Behavior tree check (~0ms) — if BT confident >0.9, execute immediately 3. Screen capture + CV (~20ms) — only if BT can't handle, with frame change detection 4. Text extraction (~40ms) — dialogue prediction first, OCR fallback 5. VLM (~300ms) — only for unknown UI state or stuck detection 6. LLM decision (50-1500ms) — tiered model selection, game pauses for T3 ### Expected Performance | Scenario | Frequency | Latency | Cloud Calls | |----------|-----------|---------|-------------| | BT-handled (walk, attack, loot) | 70% | ~2ms | 0 | | T1 LLM (simple choice) | 20% | ~100-150ms | 0 | | T2 LLM + OCR (dialogue, inventory) | 7% | ~300-500ms | 0 | | T3 LLM + VLM (quest planning, stuck) | 3% | ~1.5-2.5s (paused) | 0 | | **Weighted average** | — | **~70ms** | **0** | ### VLM Model Recommendations - **FastVLM-0.5B** (4-bit, 350MB) — primary vision fallback, Apple-optimized - **Moondream-2B** — more capable alternative - **Qwen2.5-VL-3B** — strongest small VLM ## Conclusion The fundamental shift: stop treating Morrowind like a black-box screen to interpret, start treating it like an API to query. Pre-computation eliminates runtime discovery. Behavior trees handle 70% at zero cost. OpenMW pause converts latency problems into correctness problems. The entire stack fits in ~40GB with 88GB headroom. --- **PDF attached below. See cross-reference comment for links to related tickets.**

Solving-the-Perception-Bottleneck-for-a-Morrowind-AI-Agent_-API-First-Architecture-on-Apple-Silicon.pdf

194 KiB

perplexity referenced this issue

2026-03-22 18:45:49 +00:00

Implement OpenMW Lua perception bridge (IPC layer) #964

perplexity referenced this issue

2026-03-22 18:45:49 +00:00

Build Core ML UI state classifier for Morrowind #965

perplexity referenced this issue

2026-03-22 18:45:49 +00:00

Implement three-tier metabolic LLM router (Qwen3-3B / Llama-8B / Qwen3-32B) #966

perplexity referenced this issue

2026-03-22 18:45:50 +00:00

Extract ESM data via tes3conv and build NetworkX navigation graph #967

perplexity referenced this issue

2026-03-22 18:45:50 +00:00

Define GBNF grammar for constrained game-command decoding #968

perplexity referenced this issue

2026-03-22 18:45:50 +00:00

Build UESP RAG knowledge pipeline (ChromaDB + nomic-embed) #969

perplexity referenced this issue

2026-03-22 18:45:51 +00:00

Implement MorrowindBehaviorTree engine for zero-cost routine actions #970

perplexity commented

2026-03-22 18:46:45 +00:00

Cross-References

Work Suggestions (from this paper)

#964 — Implement OpenMW Lua perception bridge (IPC layer)
#965 — Build Core ML UI state classifier for Morrowind
#966 — Implement three-tier metabolic LLM router (Qwen3-3B / Llama-8B / Qwen3-32B)
#967 — Extract ESM data via tes3conv and build NetworkX navigation graph
#968 — Define GBNF grammar for constrained game-command decoding
#969 — Build UESP RAG knowledge pipeline (ChromaDB + nomic-embed)
#970 — Implement MorrowindBehaviorTree engine for zero-cost routine actions

Architecture / PRs:

PR #900 — WorldInterface + Heartbeat v2 (the heartbeat loop this paper's tick() would integrate with)
PR #864 — Morrowind Protocol + Command Log (command dispatch aligns with send_command())
PR #865 — FastAPI Harness + SOUL.md Framework

Sovereignty Loop Implementation (#953 children):

#954 — Metrics emitter (feeds into this paper's performance profiling)
#955 — PerceptionCache wrapper (directly implements this paper's perception hierarchy caching)
#956 — Skill library crystallizer (complements UESP RAG for learned behaviors)
#957 — Navigation graph builder (overlaps with #967 — ESM path grid extraction)
#958 — Dashboard widget for session data
#959 — Narration templates
#960 — Nav graph for Morrowind (direct overlap with #967)
#961 — Auto-crystallizer
#962 — Three-strike anomaly detector

Autoresearch (#904 children):

#905-#911 — Self-improvement loop infrastructure (experiment governance applies to perception stack experiments)

Other Studies:

#903 — State-of-the-Art Open Source for Sovereign Creative AI Agents
#953 — The Sovereignty Loop — Falsework-Native Architecture

Key Overlaps to Resolve

#957/#960 vs #967: Navigation graph appears in both Sovereignty Loop and this paper. #967 is the more detailed spec (tes3conv + NetworkX). Consider merging.
#955 vs #965: PerceptionCache (#955) and Core ML classifier (#965) are complementary — the cache wraps the classifier output.
Heartbeat loop: PR #900's heartbeat v2 is the execution framework; this paper's tick() pseudocode is the perception-specific implementation that runs inside it.

## Cross-References ### Work Suggestions (from this paper) - #964 — Implement OpenMW Lua perception bridge (IPC layer) - #965 — Build Core ML UI state classifier for Morrowind - #966 — Implement three-tier metabolic LLM router (Qwen3-3B / Llama-8B / Qwen3-32B) - #967 — Extract ESM data via tes3conv and build NetworkX navigation graph - #968 — Define GBNF grammar for constrained game-command decoding - #969 — Build UESP RAG knowledge pipeline (ChromaDB + nomic-embed) - #970 — Implement MorrowindBehaviorTree engine for zero-cost routine actions ### Related Existing Tickets **Architecture / PRs:** - PR #900 — WorldInterface + Heartbeat v2 (the heartbeat loop this paper's `tick()` would integrate with) - PR #864 — Morrowind Protocol + Command Log (command dispatch aligns with `send_command()`) - PR #865 — FastAPI Harness + SOUL.md Framework **Sovereignty Loop Implementation (#953 children):** - #954 — Metrics emitter (feeds into this paper's performance profiling) - #955 — PerceptionCache wrapper (directly implements this paper's perception hierarchy caching) - #956 — Skill library crystallizer (complements UESP RAG for learned behaviors) - #957 — Navigation graph builder (overlaps with #967 — ESM path grid extraction) - #958 — Dashboard widget for session data - #959 — Narration templates - #960 — Nav graph for Morrowind (direct overlap with #967) - #961 — Auto-crystallizer - #962 — Three-strike anomaly detector **Autoresearch (#904 children):** - #905-#911 — Self-improvement loop infrastructure (experiment governance applies to perception stack experiments) **Other Studies:** - #903 — State-of-the-Art Open Source for Sovereign Creative AI Agents - #953 — The Sovereignty Loop — Falsework-Native Architecture ### Key Overlaps to Resolve 1. **#957/#960 vs #967**: Navigation graph appears in both Sovereignty Loop and this paper. #967 is the more detailed spec (tes3conv + NetworkX). Consider merging. 2. **#955 vs #965**: PerceptionCache (#955) and Core ML classifier (#965) are complementary — the cache wraps the classifier output. 3. **Heartbeat loop**: PR #900's heartbeat v2 is the execution framework; this paper's `tick()` pseudocode is the perception-specific implementation that runs inside it.

perplexity referenced this issue

2026-03-22 19:05:38 +00:00

Incase morrowind isn't your first choice timmy #896

perplexity referenced this issue

2026-03-22 19:10:53 +00:00

Restore MCP integration via FastMCP v3.1.1 at /tools/mcp #984

perplexity referenced this issue

2026-03-22 19:12:47 +00:00

[GOVERNING] Session Crystallization & Operational Playbook — Master Handoff Document #982

perplexity referenced this issue

2026-03-22 19:12:48 +00:00

Ingest this research and triage any work to be done here #946

gemini referenced this issue

2026-03-22 23:13:39 +00:00

PR for #984: Restore MCP integration via FastMCP v3.1.1 at /tools/mcp #1041

gemini was assigned by Rockachopa

2026-03-22 23:31:02 +00:00

claude referenced this issue

2026-03-23 01:38:25 +00:00

Add content moderation pipeline (Llama Guard + game-context prompts) #1056

claude referenced this issue

2026-03-23 01:40:13 +00:00

[claude] Ingest integration architecture research and triage work (#946) #1057

claude referenced this issue

2026-03-23 01:40:26 +00:00

Ingest this research and triage any work to be done here #946

perplexity referenced this issue

2026-03-23 13:23:55 +00:00

[GOVERNING] Timmy Handoff — March 23, 2026 Operational Briefing #1074

perplexity commented

2026-03-23 13:24:54 +00:00

📎 Cross-reference: #1074 — Timmy Handoff contains the full perception stack solution from Report 6. Also see the Session Crystallization v2 attached to #982 which details the 4-level perception hierarchy with latency targets and the pre-computation strategy (ESM parsing, navigation graph, dialogue pre-eval, UESP quest KB).

📎 **Cross-reference:** #1074 — Timmy Handoff contains the full perception stack solution from Report 6. Also see the Session Crystallization v2 attached to #982 which details the 4-level perception hierarchy with latency targets and the pre-computation strategy (ESM parsing, navigation graph, dialogue pre-eval, UESP quest KB).

claude added the harness morrowind p1-important labels 2026-03-23 13:54:03 +00:00

perplexity closed this issue

2026-03-23 16:47:54 +00:00

Sign in to join this conversation.

Branches Tags

main

gemini/issue-892

claude/issue-1342

claude/issue-1346

claude/issue-1351

claude/issue-1340

fix/test-llm-triage-syntax

gemini/issue-1014

gemini/issue-932

claude/issue-1277

claude/issue-1139

claude/issue-870

claude/issue-1285

claude/issue-1292

claude/issue-1281

claude/issue-917

claude/issue-1275

claude/issue-925

claude/issue-1019

claude/issue-1094

claude/issue-1019-v3

fix/flaky-vassal-xdist-tests

fix/test-config-env-isolation

claude/issue-1019-v2

claude/issue-957-v2

claude/issue-1218

claude/issue-1217

test/chat-store-unit-tests

claude/issue-1191

claude/issue-1186

claude/issue-957

gemini/issue-936

claude/issue-1065

gemini/issue-976

gemini/issue-1149

claude/issue-1135

claude/issue-1064

gemini/issue-1012

claude/issue-1095

claude/issue-1102

claude/issue-1114

gemini/issue-978

gemini/issue-971

claude/issue-1074

claude/issue-987

claude/issue-1011

feature/internal-monologue

feature/issue-1006

feature/issue-1007

feature/issue-1008

feature/issue-1009

feature/issue-1010

feature/issue-1011

feature/issue-1012

feature/issue-1013

feature/issue-1014

feature/issue-981

feature/issue-982

feature/issue-983

feature/issue-984

feature/issue-985

feature/issue-986

feature/issue-987

feature/issue-993

claude/issue-943

claude/issue-975

claude/issue-989

claude/issue-988

fix/loop-guard-gitea-api-and-queue-validation

feature/lhf-tech-debt-fixes

kimi/issue-753

kimi/issue-714

kimi/issue-716

fix/csrf-check-before-execute

chore/migrate-gitea-to-vps

kimi/issue-640

fix/utcnow-calm-py

kimi/issue-635

kimi/issue-625

fix/router-api-truncated-param

kimi/issue-604

kimi/issue-594

review-fixes

kimi/issue-570

kimi/issue-554

kimi/issue-539

kimi/issue-540

feature/ipad-v1-api

kimi/issue-506

kimi/issue-512

refactor/airllm-doc-cleanup

kimi/issue-513

kimi/issue-514

kimi/issue-500

kimi/issue-492

kimi/issue-490

kimi/issue-459

kimi/issue-472

kimi/issue-473

kimi/issue-462

kimi/issue-463

kimi/issue-454

kimi/issue-445

kimi/issue-446

kimi/issue-431

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#963