Rockachopa/Timmy-time-dashboard

Files

Claude (Opus 4.6) 092c982341

Tests / lint (push) Has been cancelled

Details

Tests / test (push) Has been cancelled

Details

[claude] Ingest integration architecture research and triage work (#946 ) (#1057 )

2026-03-23 01:40:39 +00:00

3.6 KiB

Raw Blame History

Timmy Time Integration Architecture: Eight Deep Dives into Real Deployment

Source: PDF attached to issue #946, written during Veloren exploration phase. Many patterns are game-agnostic and apply to the Morrowind/OpenClaw pivot.

Summary of Eight Deep Dives

1. Veloren Client Sidecar (Game-Specific)

WebSocket JSON-line pattern for wrapping game clients
PyO3 direct binding infeasible; sidecar process wins
IPC latency negligible (~11us TCP, ~5us pipes) vs LLM inference
Status: Superseded by OpenMW Lua bridge (#964)

2. Agno Ollama Tool Calling is Broken

Agno issues #2231, #2625, #1419, #1612, #4715 document persistent breakage
Root cause: Agno's Ollama model class doesn't robustly parse native tool_calls
Fix: Use Ollama's format parameter with Pydantic JSON schemas directly
Recommended models: qwen3-coder:32b (top), glm-4.7-flash, gpt-oss:20b
Critical settings: temperature 0.0-0.2, stream=False for tool calls
Status: Covered by #966 (three-tier router)

3. MCP is the Right Abstraction

FastMCP averages 26.45ms per tool call (TM Dev Lab benchmark, Feb 2026)
Total MCP overhead per cycle: ~20-60ms (<3% of 2-second budget)
Agno has first-class bidirectional MCP integration (MCPTools, MultiMCPTools)
Use stdio transport for near-zero latency; return compressed JPEG not base64
Status: Covered by #984 (MCP restore)

4. Human + AI Co-op Architecture (Game-Specific)

Headless client treated identically to graphical client by server
Leverages party system, trade API, and /tell for communication
Mode switching: solo autonomous play when human absent, assist when present
Status: Defer until after tutorial completion

5. Real Latency Numbers

All-local M3 Max pipeline: 4-9 seconds per full cycle
Groq hybrid pipeline: 3-7 seconds per full cycle
VLM inference is 50-70% of total pipeline time (bottleneck)
Dual-model Ollama on 96GB M3 Max: ~11-14GB, ~70GB free
Status: Superseded by API-first perception (#963)

6. Content Moderation (Three-Layer Defense)

Layer 1: Game-context system prompts (Morrowind themes as game mechanics)
Layer 2: Llama Guard 3 1B at <30ms/sentence for real-time filtering
Layer 3: Per-game moderation profiles with vocabulary whitelists
Run moderation + TTS preprocessing in parallel for zero added latency
Neuro-sama incident (Dec 2022) is the cautionary tale
Status: New issue created → #1056

7. Model Selection (Qwen3-8B vs Hermes 3)

Three-role architecture: Perception (Qwen3-VL 8B), Decision (Qwen3-8B), Narration (Hermes 3 8B)
Qwen3-8B outperforms Qwen2.5-14B on 15 benchmarks
Hermes 3 best for narration (steerability, roleplaying)
Both use identical Hermes Function Calling standard
Status: Partially covered by #966 (three-tier router)

8. Split Hetzner + Mac Deployment

Hetzner GEX44 (RTX 4000 SFF Ada, €184/month) for rendering/streaming
Mac M3 Max for all AI inference via Tailscale
Use FFmpeg x11grab + NVENC, not OBS (no headless support)
Use headless Xorg, not Xvfb (GPU access required for Vulkan)
Total cost: ~$200/month
Status: Referenced in #982 sprint plan

Cross-Reference to Active Issues

Research Topic	Active Issue	Status
Pydantic structured output for Ollama	#966 (three-tier router)	In progress
FastMCP tool server	#984 (MCP restore)	In progress
Content moderation pipeline	#1056 (new)	Created from this research
Split Hetzner + Mac deployment	#982 (sprint plan)	Referenced
VLM latency / perception	#963 (perception bottleneck)	API-first approach
OpenMW bridge (replaces Veloren sidecar)	#964	In progress