[RESEARCH] RPG2Robot — steal this architecture for Bannerlord → robotics transfer #651
Closed
opened 2026-03-27 16:02:07 +00:00 by perplexity
·
4 comments
No Branch/Tag Specified
main
groq/issue-1126
groq/issue-1118
groq/issue-1119
claude/issue-1112
feat/mempalace-api-add-1775582323040
groq/issue-1047
groq/issue-915
claude/issue-1075
groq/issue-917
groq/issue-918
groq/issue-1103
groq/issue-1105
groq/issue-1106
groq/issue-1108
groq/issue-1092
groq/issue-1095
groq/issue-1098
groq/issue-913
timmy/issue-fix-896-897-898-910
claude/issue-823
claude/issue-879
claude/issue-880
claude/issue-827
claude/issue-882
claude/issue-826
claude/issue-836
claude/issue-832
claude/issue-833
timmy/issue-855
allegro/self-improvement-infra
ezra/deep-dive-architecture-20260405
claude/modularization-phase-1
gemini/issue-431
GoldenRockachopa
pre-agent-workers-v1
v0-golden
Labels
Clear labels
222-epic
3d-world
CI
QA
actionable
agent-presence
aistudio-ready
assigned-aistudio
assigned-claude
assigned-claw-code
assigned-gemini
assigned-groq
assigned-kimi
assigned-kimi
assigned-perplexity
assigned-sonnet
blocked
claude-ready
claw-code-done
claw-code-in-progress
deprioritized
duplicate
epic
gemini-api
gemini-review
google-ai-ultra
groq-ready
harness
identity
infrastructure
kimi-done
kimi-in-progress
kimi-ready
lazzyPit
media-gen
modularization
needs-design
nostr
p0-critical
p1-important
p2-backlog
performance
perplexity-ready
portal
research
security
sonnet-ready
sovereignty
velocity-engine
Continuous integration, runners, workflow issues
Quality assurance, testing, and production audit
Queued for Code Claw (qwen/openrouter)
Dispatched to Kimi via OpenClaw
Blocked by external dependency or merge conflict
Code Claw completed this task
Code Claw is actively working
Epic / umbrella issue
Gemini API integration
Google AI Ultra integration work
Timmy identity and branding
Kimi completed this task
Kimi is actively working on this
Lazarus Pit — automated agent resurrection and health recovery
AI media generation (image/video/audio)
Deep research and planning tasks
Security hardening, vulnerability fixes, access control
Auto-generated by velocity engine
Milestone
No items
No Milestone
Projects
Clear projects
No project
Assignees
KimiClaw
Rockachopa
Timmy
allegro
antigravity
bezalel
claude
claw-code
codex-agent
ezra
gemini
google
grok
groq
hermes
kimi
manus
perplexity
sonnet
Clear assignees
No Assignees
Timmy
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: Timmy_Foundation/the-nexus#651
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Paper
"LLM Agents Can Learn from RPGs How to Pilot Robots: Hierarchical Language-Conditioned Control via Semantic Trajectory Transfer from Virtual Worlds"
— Hyperscape Research, March 2026
Full PDF not public yet (shared on Twitter as preprint). Below is everything extractable from the abstract + Figure 1 + Introduction, plus related work that IS available.
Repos:
Their Architecture (Figure 1)
Key insight: "Semantic trajectory transfer" — not pixel data, not game state dumps. The LLM agent's high-level decisions (dialogue, intent formation, subgoal decomposition, entity grounding, failure recovery) are captured as structured trajectories. These train the planning layer.
World-agnostic interface:
EmbodiedContext/GroundedIntentabstraction makes the planner unable to distinguish whether it's in a game, a simulator, or the real world. Same interface everywhere.What's Directly Applicable to Timmy
1. Semantic Trajectory Logging (we almost have this)
They use a "dual-rate logging pipeline" to capture the agent's RPG experience as structured semantic trajectories. We already have:
~/.hermes/sessions/)session_export()Huey task extracting DPO pairs every 4 hoursGap: Our session logs capture raw conversation. We don't yet extract semantic trajectories — structured intent/subgoal/grounding/recovery events. This is a post-processing step on top of session logs.
2. The Grounding Layer (this is the bridge)
Converts language-level plans into entity-referenced goals. In their system: "go to the merchant" → coordinates + entity ID + navigation goal. In ours: "check the Gitea backlog" → API endpoint + auth + parse response.
This is what the Forgejo MCP server (#616) would give us — a grounding layer between Timmy's intent and Gitea's entity space.
For Bannerlord: the AI Influence mod already exports
theworld.txt— that's a grounding document. Timmy reads it, forms intent, the grounding layer converts to game actions.3. The World-Agnostic Interface (our portal pattern)
Their
EmbodiedContext/GroundedIntentis exactly the portal pattern from #19 and GAMEPORTAL_PROTOCOL.md:capture_state()→EmbodiedContext(what the agent perceives)execute_action()→GroundedIntent(what the agent wants to do)The agent doesn't know if it's in Bannerlord, the Nexus, or a robot. Same interface. Different adapters.
4. Separate RL for Motor Skills
They don't make the LLM do motor control. PPO-trained RL skills handle locomotion and manipulation in MuJoCo. The LLM only does planning and dialogue.
For us: Bannerlord's built-in combat AI already handles motor skills (swing, block, dodge). Timmy doesn't need to learn to sword fight — he needs to learn when to fight, who to fight, and why. The game engine handles the how.
Related Papers (Available Now)
FAEA — Demonstration-Free Robotic Control via LLM Agents (Jan 2026)
arxiv.org/pdf/2601.20334
Uses Claude Agent SDK (same iterative reasoning as software debugging) directly for robotic manipulation. 84.9-96% success rates WITHOUT demonstrations or fine-tuning. Key insight: "general-purpose agents are sufficient for a class of manipulation tasks dominated by deliberative, task-level planning."
MuJoCo Playground (Jan 2025)
playground.mujoco.org
Google DeepMind's open-source framework for robot learning. Zero-shot sim-to-real transfer.
pip install playgroundand train policies in minutes on a single GPU. Supports humanoids, quadrupeds, dexterous hands.HAMSTER — Hierarchical Action Models (Feb 2025)
arxiv.org/html/2502.05485v1
Hierarchical VLA: high-level VLM produces 2D path guidance, low-level policy produces actions. The 2D paths are the "semantic trajectories" — embodiment-agnostic, transferable across domain gaps.
Action Items
EmbodiedContext/GroundedIntentinterface — compare to our GAMEPORTAL_PROTOCOL.mdThe Bottom Line
They built what you described. The architecture validates your instinct. The difference: they're using ElizaOS + a custom MMORPG. You're using Hermes Agent + Bannerlord. The game is different, the pattern is identical. Don't rebuild their stack — steal the interface pattern and the trajectory logging format, plug them into what you already have.
Source: Twitter screenshot of preprint, March 2026. Triaged by Perplexity.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
Closing during the 2026-03-28 backlog burn-down.
Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.