[RESEARCH] RPG2Robot — steal this architecture for Bannerlord → robotics transfer #653
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Paper
"LLM Agents Can Learn from RPGs How to Pilot Robots: Hierarchical Language-Conditioned Control via Semantic Trajectory Transfer from Virtual Worlds"
— Hyperscape Research, March 2026
Full PDF not public yet (shared on Twitter as preprint). Below is everything extractable from the abstract + Figure 1 + Introduction, plus related work that IS available.
Repos:
Their Architecture (Figure 1)
Key insight: "Semantic trajectory transfer" — not pixel data, not game state dumps. The LLM agent's high-level decisions (dialogue, intent formation, subgoal decomposition, entity grounding, failure recovery) are captured as structured trajectories. These train the planning layer.
World-agnostic interface:
EmbodiedContext/GroundedIntentabstraction makes the planner unable to distinguish whether it's in a game, a simulator, or the real world. Same interface everywhere.What's Directly Applicable to Timmy
1. Semantic Trajectory Logging (we almost have this)
They use a "dual-rate logging pipeline" to capture the agent's RPG experience as structured semantic trajectories. We already have:
~/.hermes/sessions/)session_export()Huey task extracting DPO pairs every 4 hoursGap: Our session logs capture raw conversation. We don't yet extract semantic trajectories — structured intent/subgoal/grounding/recovery events. This is a post-processing step on top of session logs.
2. The Grounding Layer (this is the bridge)
Converts language-level plans into entity-referenced goals. In their system: "go to the merchant" → coordinates + entity ID + navigation goal. In ours: "check the Gitea backlog" → API endpoint + auth + parse response.
This is what the Forgejo MCP server (#616) would give us — a grounding layer between Timmy's intent and Gitea's entity space.
For Bannerlord: the AI Influence mod already exports
theworld.txt— that's a grounding document. Timmy reads it, forms intent, the grounding layer converts to game actions.3. The World-Agnostic Interface (our portal pattern)
Their
EmbodiedContext/GroundedIntentis exactly the portal pattern from #19 and GAMEPORTAL_PROTOCOL.md:capture_state()→EmbodiedContext(what the agent perceives)execute_action()→GroundedIntent(what the agent wants to do)The agent doesn't know if it's in Bannerlord, the Nexus, or a robot. Same interface. Different adapters.
4. Separate RL for Motor Skills
They don't make the LLM do motor control. PPO-trained RL skills handle locomotion and manipulation in MuJoCo. The LLM only does planning and dialogue.
For us: Bannerlord's built-in combat AI already handles motor skills (swing, block, dodge). Timmy doesn't need to learn to sword fight — he needs to learn when to fight, who to fight, and why. The game engine handles the how.
Related Papers (Available Now)
FAEA — Demonstration-Free Robotic Control via LLM Agents (Jan 2026)
arxiv.org/pdf/2601.20334
Uses Claude Agent SDK (same iterative reasoning as software debugging) directly for robotic manipulation. 84.9-96% success rates WITHOUT demonstrations or fine-tuning. Key insight: "general-purpose agents are sufficient for a class of manipulation tasks dominated by deliberative, task-level planning."
MuJoCo Playground (Jan 2025)
playground.mujoco.org
Google DeepMind's open-source framework for robot learning. Zero-shot sim-to-real transfer.
pip install playgroundand train policies in minutes on a single GPU. Supports humanoids, quadrupeds, dexterous hands.HAMSTER — Hierarchical Action Models (Feb 2025)
arxiv.org/html/2502.05485v1
Hierarchical VLA: high-level VLM produces 2D path guidance, low-level policy produces actions. The 2D paths are the "semantic trajectories" — embodiment-agnostic, transferable across domain gaps.
Action Items
EmbodiedContext/GroundedIntentinterface — compare to our GAMEPORTAL_PROTOCOL.mdThe Bottom Line
They built what you described. The architecture validates your instinct. The difference: they're using ElizaOS + a custom MMORPG. You're using Hermes Agent + Bannerlord. The game is different, the pattern is identical. Don't rebuild their stack — steal the interface pattern and the trajectory logging format, plug them into what you already have.
Source: Twitter screenshot of preprint, March 2026. Triaged by Perplexity.
⚡ Dispatched to
claude. Huey task queued.⚡ Dispatched to
gemini. Huey task queued.⚡ Dispatched to
kimi. Huey task queued.⚡ Dispatched to
grok. Huey task queued.⚡ Dispatched to
perplexity. Huey task queued.🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔧
geminiworking on this via Huey. Branch:gemini/issue-653🔧
grokworking on this via Huey. Branch:grok/issue-653⚠️
grokproduced no changes for this issue. Skipping.🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
Closing as duplicate during backlog burn-down. Canonical issue: #651.
Reason: this workstream already exists with materially the same title/scope. Keeping one canonical thread prevents agent churn and review waste.