[RESEARCH] RPG2Robot — steal this architecture for Bannerlord → robotics transfer #651

New Issue

perplexity · 2026-03-27T16:02:07Z

perplexity commented

2026-03-27 16:02:07 +00:00

Paper

"LLM Agents Can Learn from RPGs How to Pilot Robots: Hierarchical Language-Conditioned Control via Semantic Trajectory Transfer from Virtual Worlds"
— Hyperscape Research, March 2026

Full PDF not public yet (shared on Twitter as preprint). Below is everything extractable from the abstract + Figure 1 + Introduction, plus related work that IS available.

Repos:

https://github.com/hyperscapeai/hyperscape (the MMORPG — ElizaOS-powered, TypeScript/C++)
https://github.com/elizaos (the agent framework)

Their Architecture (Figure 1)

LLM Agent (ElizaOS)  →  Grounding Layer  →  RL Skills (PPO)
      ↕ context              ↑ semantic          ↓ actions
RPG World (Hyperscape)       trajectories     MuJoCo Simulation
                                                  ↓ transfer
                                              Real Robot (AiNex)

Key insight: "Semantic trajectory transfer" — not pixel data, not game state dumps. The LLM agent's high-level decisions (dialogue, intent formation, subgoal decomposition, entity grounding, failure recovery) are captured as structured trajectories. These train the planning layer.

World-agnostic interface: EmbodiedContext / GroundedIntent abstraction makes the planner unable to distinguish whether it's in a game, a simulator, or the real world. Same interface everywhere.

What's Directly Applicable to Timmy

1. Semantic Trajectory Logging (we almost have this)

They use a "dual-rate logging pipeline" to capture the agent's RPG experience as structured semantic trajectories. We already have:

Hermes Agent session logs (~/.hermes/sessions/)
session_export() Huey task extracting DPO pairs every 4 hours
Heartbeat tick logs

Gap: Our session logs capture raw conversation. We don't yet extract semantic trajectories — structured intent/subgoal/grounding/recovery events. This is a post-processing step on top of session logs.

2. The Grounding Layer (this is the bridge)

Converts language-level plans into entity-referenced goals. In their system: "go to the merchant" → coordinates + entity ID + navigation goal. In ours: "check the Gitea backlog" → API endpoint + auth + parse response.

This is what the Forgejo MCP server (#616) would give us — a grounding layer between Timmy's intent and Gitea's entity space.

For Bannerlord: the AI Influence mod already exports theworld.txt — that's a grounding document. Timmy reads it, forms intent, the grounding layer converts to game actions.

3. The World-Agnostic Interface (our portal pattern)

Their EmbodiedContext / GroundedIntent is exactly the portal pattern from #19 and GAMEPORTAL_PROTOCOL.md:

capture_state() → EmbodiedContext (what the agent perceives)
execute_action() → GroundedIntent (what the agent wants to do)

The agent doesn't know if it's in Bannerlord, the Nexus, or a robot. Same interface. Different adapters.

4. Separate RL for Motor Skills

They don't make the LLM do motor control. PPO-trained RL skills handle locomotion and manipulation in MuJoCo. The LLM only does planning and dialogue.

For us: Bannerlord's built-in combat AI already handles motor skills (swing, block, dodge). Timmy doesn't need to learn to sword fight — he needs to learn when to fight, who to fight, and why. The game engine handles the how.

FAEA — Demonstration-Free Robotic Control via LLM Agents (Jan 2026)

arxiv.org/pdf/2601.20334
Uses Claude Agent SDK (same iterative reasoning as software debugging) directly for robotic manipulation. 84.9-96% success rates WITHOUT demonstrations or fine-tuning. Key insight: "general-purpose agents are sufficient for a class of manipulation tasks dominated by deliberative, task-level planning."

MuJoCo Playground (Jan 2025)

playground.mujoco.org
Google DeepMind's open-source framework for robot learning. Zero-shot sim-to-real transfer. pip install playground and train policies in minutes on a single GPU. Supports humanoids, quadrupeds, dexterous hands.

HAMSTER — Hierarchical Action Models (Feb 2025)

arxiv.org/html/2502.05485v1
Hierarchical VLA: high-level VLM produces 2D path guidance, low-level policy produces actions. The 2D paths are the "semantic trajectories" — embodiment-agnostic, transferable across domain gaps.

Action Items

Watch for the full RPG2Robot PDF to drop (monitor Hyperscape Twitter/GitHub)
Study their EmbodiedContext / GroundedIntent interface — compare to our GAMEPORTAL_PROTOCOL.md
Add semantic trajectory extraction to the session_export Huey task (structured intent/subgoal/outcome, not just raw conversation pairs)
Read the FAEA paper — if Claude Agent SDK gets 85%+ on robot manipulation without training, what can Hermes 4 14B do with DPO training on game trajectories?
Evaluate MuJoCo Playground as the sim layer between Bannerlord and a future robot body

The Bottom Line

They built what you described. The architecture validates your instinct. The difference: they're using ElizaOS + a custom MMORPG. You're using Hermes Agent + Bannerlord. The game is different, the pattern is identical. Don't rebuild their stack — steal the interface pattern and the trajectory logging format, plug them into what you already have.

Source: Twitter screenshot of preprint, March 2026. Triaged by Perplexity.

## Paper **"LLM Agents Can Learn from RPGs How to Pilot Robots: Hierarchical Language-Conditioned Control via Semantic Trajectory Transfer from Virtual Worlds"** — Hyperscape Research, March 2026 Full PDF not public yet (shared on Twitter as preprint). Below is everything extractable from the abstract + Figure 1 + Introduction, plus related work that IS available. **Repos:** - https://github.com/hyperscapeai/hyperscape (the MMORPG — ElizaOS-powered, TypeScript/C++) - https://github.com/elizaos (the agent framework) ## Their Architecture (Figure 1) ``` LLM Agent (ElizaOS) → Grounding Layer → RL Skills (PPO) ↕ context ↑ semantic ↓ actions RPG World (Hyperscape) trajectories MuJoCo Simulation ↓ transfer Real Robot (AiNex) ``` **Key insight: "Semantic trajectory transfer"** — not pixel data, not game state dumps. The LLM agent's *high-level decisions* (dialogue, intent formation, subgoal decomposition, entity grounding, failure recovery) are captured as structured trajectories. These train the planning layer. **World-agnostic interface:** `EmbodiedContext` / `GroundedIntent` abstraction makes the planner unable to distinguish whether it's in a game, a simulator, or the real world. Same interface everywhere. ## What's Directly Applicable to Timmy ### 1. Semantic Trajectory Logging (we almost have this) They use a "dual-rate logging pipeline" to capture the agent's RPG experience as structured semantic trajectories. We already have: - Hermes Agent session logs (`~/.hermes/sessions/`) - `session_export()` Huey task extracting DPO pairs every 4 hours - Heartbeat tick logs **Gap:** Our session logs capture raw conversation. We don't yet extract *semantic trajectories* — structured intent/subgoal/grounding/recovery events. This is a post-processing step on top of session logs. ### 2. The Grounding Layer (this is the bridge) Converts language-level plans into entity-referenced goals. In their system: "go to the merchant" → coordinates + entity ID + navigation goal. In ours: "check the Gitea backlog" → API endpoint + auth + parse response. **This is what the Forgejo MCP server (#616) would give us** — a grounding layer between Timmy's intent and Gitea's entity space. For Bannerlord: the AI Influence mod already exports `theworld.txt` — that's a grounding document. Timmy reads it, forms intent, the grounding layer converts to game actions. ### 3. The World-Agnostic Interface (our portal pattern) Their `EmbodiedContext` / `GroundedIntent` is exactly the portal pattern from #19 and GAMEPORTAL_PROTOCOL.md: - `capture_state()` → `EmbodiedContext` (what the agent perceives) - `execute_action()` → `GroundedIntent` (what the agent wants to do) The agent doesn't know if it's in Bannerlord, the Nexus, or a robot. Same interface. Different adapters. ### 4. Separate RL for Motor Skills They don't make the LLM do motor control. PPO-trained RL skills handle locomotion and manipulation in MuJoCo. The LLM only does planning and dialogue. **For us:** Bannerlord's built-in combat AI already handles motor skills (swing, block, dodge). Timmy doesn't need to learn to sword fight — he needs to learn *when* to fight, *who* to fight, and *why*. The game engine handles the how. ## Related Papers (Available Now) ### FAEA — Demonstration-Free Robotic Control via LLM Agents (Jan 2026) [arxiv.org/pdf/2601.20334](https://arxiv.org/pdf/2601.20334) Uses Claude Agent SDK (same iterative reasoning as software debugging) directly for robotic manipulation. 84.9-96% success rates WITHOUT demonstrations or fine-tuning. Key insight: "general-purpose agents are sufficient for a class of manipulation tasks dominated by deliberative, task-level planning." ### MuJoCo Playground (Jan 2025) [playground.mujoco.org](https://playground.mujoco.org/assets/playground_technical_report.pdf) Google DeepMind's open-source framework for robot learning. Zero-shot sim-to-real transfer. `pip install playground` and train policies in minutes on a single GPU. Supports humanoids, quadrupeds, dexterous hands. ### HAMSTER — Hierarchical Action Models (Feb 2025) [arxiv.org/html/2502.05485v1](https://arxiv.org/html/2502.05485v1) Hierarchical VLA: high-level VLM produces 2D path guidance, low-level policy produces actions. The 2D paths are the "semantic trajectories" — embodiment-agnostic, transferable across domain gaps. ## Action Items - [ ] Watch for the full RPG2Robot PDF to drop (monitor Hyperscape Twitter/GitHub) - [ ] Study their `EmbodiedContext` / `GroundedIntent` interface — compare to our GAMEPORTAL_PROTOCOL.md - [ ] Add semantic trajectory extraction to the session_export Huey task (structured intent/subgoal/outcome, not just raw conversation pairs) - [ ] Read the FAEA paper — if Claude Agent SDK gets 85%+ on robot manipulation without training, what can Hermes 4 14B do with DPO training on game trajectories? - [ ] Evaluate MuJoCo Playground as the sim layer between Bannerlord and a future robot body ## The Bottom Line They built what you described. The architecture validates your instinct. The difference: they're using ElizaOS + a custom MMORPG. You're using Hermes Agent + Bannerlord. The game is different, the pattern is identical. Don't rebuild their stack — steal the interface pattern and the trajectory logging format, plug them into what you already have. --- _Source: Twitter screenshot of preprint, March 2026. Triaged by Perplexity._

perplexity added the p1-important needs-design harness portal labels 2026-03-27 16:02:07 +00:00

Timmy commented

2026-03-27 16:15:22 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy commented

2026-03-27 16:30:25 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy commented

2026-03-27 16:45:24 +00:00

🔍 Triaged by Huey — needs assignment.

Timmy was assigned by Rockachopa

2026-03-28 03:54:21 +00:00

Timmy referenced this issue

2026-03-28 04:45:32 +00:00

[RESEARCH] RPG2Robot — steal this architecture for Bannerlord → robotics transfer #652

Timmy referenced this issue

2026-03-28 04:45:33 +00:00

[RESEARCH] RPG2Robot — steal this architecture for Bannerlord → robotics transfer #653

Timmy commented

2026-03-28 04:52:23 +00:00

Closing during the 2026-03-28 backlog burn-down.

Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.

Closing during the 2026-03-28 backlog burn-down. Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.

Timmy closed this issue

2026-03-28 04:52:23 +00:00

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#651