# Timmy Time Integration Architecture: Eight Deep Dives into Real Deployment > **Source:** PDF attached to issue #946, written during Veloren exploration phase. > Many patterns are game-agnostic and apply to the Morrowind/OpenClaw pivot. ## Summary of Eight Deep Dives ### 1. Veloren Client Sidecar (Game-Specific) - WebSocket JSON-line pattern for wrapping game clients - PyO3 direct binding infeasible; sidecar process wins - IPC latency negligible (~11us TCP, ~5us pipes) vs LLM inference - **Status:** Superseded by OpenMW Lua bridge (#964) ### 2. Agno Ollama Tool Calling is Broken - Agno issues #2231, #2625, #1419, #1612, #4715 document persistent breakage - Root cause: Agno's Ollama model class doesn't robustly parse native tool_calls - **Fix:** Use Ollama's `format` parameter with Pydantic JSON schemas directly - Recommended models: qwen3-coder:32b (top), glm-4.7-flash, gpt-oss:20b - Critical settings: temperature 0.0-0.2, stream=False for tool calls - **Status:** Covered by #966 (three-tier router) ### 3. MCP is the Right Abstraction - FastMCP averages 26.45ms per tool call (TM Dev Lab benchmark, Feb 2026) - Total MCP overhead per cycle: ~20-60ms (<3% of 2-second budget) - Agno has first-class bidirectional MCP integration (MCPTools, MultiMCPTools) - Use stdio transport for near-zero latency; return compressed JPEG not base64 - **Status:** Covered by #984 (MCP restore) ### 4. Human + AI Co-op Architecture (Game-Specific) - Headless client treated identically to graphical client by server - Leverages party system, trade API, and /tell for communication - Mode switching: solo autonomous play when human absent, assist when present - **Status:** Defer until after tutorial completion ### 5. Real Latency Numbers - All-local M3 Max pipeline: 4-9 seconds per full cycle - Groq hybrid pipeline: 3-7 seconds per full cycle - VLM inference is 50-70% of total pipeline time (bottleneck) - Dual-model Ollama on 96GB M3 Max: ~11-14GB, ~70GB free - **Status:** Superseded by API-first perception (#963) ### 6. Content Moderation (Three-Layer Defense) - Layer 1: Game-context system prompts (Morrowind themes as game mechanics) - Layer 2: Llama Guard 3 1B at <30ms/sentence for real-time filtering - Layer 3: Per-game moderation profiles with vocabulary whitelists - Run moderation + TTS preprocessing in parallel for zero added latency - Neuro-sama incident (Dec 2022) is the cautionary tale - **Status:** New issue created → #1056 ### 7. Model Selection (Qwen3-8B vs Hermes 3) - Three-role architecture: Perception (Qwen3-VL 8B), Decision (Qwen3-8B), Narration (Hermes 3 8B) - Qwen3-8B outperforms Qwen2.5-14B on 15 benchmarks - Hermes 3 best for narration (steerability, roleplaying) - Both use identical Hermes Function Calling standard - **Status:** Partially covered by #966 (three-tier router) ### 8. Split Hetzner + Mac Deployment - Hetzner GEX44 (RTX 4000 SFF Ada, €184/month) for rendering/streaming - Mac M3 Max for all AI inference via Tailscale - Use FFmpeg x11grab + NVENC, not OBS (no headless support) - Use headless Xorg, not Xvfb (GPU access required for Vulkan) - Total cost: ~$200/month - **Status:** Referenced in #982 sprint plan ## Cross-Reference to Active Issues | Research Topic | Active Issue | Status | |---------------|-------------|--------| | Pydantic structured output for Ollama | #966 (three-tier router) | In progress | | FastMCP tool server | #984 (MCP restore) | In progress | | Content moderation pipeline | #1056 (new) | Created from this research | | Split Hetzner + Mac deployment | #982 (sprint plan) | Referenced | | VLM latency / perception | #963 (perception bottleneck) | API-first approach | | OpenMW bridge (replaces Veloren sidecar) | #964 | In progress |