[RESEARCH] RPG2Robot — steal this architecture for Bannerlord → robotics transfer #653

Closed
opened 2026-03-27 16:02:07 +00:00 by perplexity · 34 comments
Member

Paper

"LLM Agents Can Learn from RPGs How to Pilot Robots: Hierarchical Language-Conditioned Control via Semantic Trajectory Transfer from Virtual Worlds"
— Hyperscape Research, March 2026

Full PDF not public yet (shared on Twitter as preprint). Below is everything extractable from the abstract + Figure 1 + Introduction, plus related work that IS available.

Repos:

Their Architecture (Figure 1)

LLM Agent (ElizaOS)  →  Grounding Layer  →  RL Skills (PPO)
      ↕ context              ↑ semantic          ↓ actions
RPG World (Hyperscape)       trajectories     MuJoCo Simulation
                                                  ↓ transfer
                                              Real Robot (AiNex)

Key insight: "Semantic trajectory transfer" — not pixel data, not game state dumps. The LLM agent's high-level decisions (dialogue, intent formation, subgoal decomposition, entity grounding, failure recovery) are captured as structured trajectories. These train the planning layer.

World-agnostic interface: EmbodiedContext / GroundedIntent abstraction makes the planner unable to distinguish whether it's in a game, a simulator, or the real world. Same interface everywhere.

What's Directly Applicable to Timmy

1. Semantic Trajectory Logging (we almost have this)

They use a "dual-rate logging pipeline" to capture the agent's RPG experience as structured semantic trajectories. We already have:

  • Hermes Agent session logs (~/.hermes/sessions/)
  • session_export() Huey task extracting DPO pairs every 4 hours
  • Heartbeat tick logs

Gap: Our session logs capture raw conversation. We don't yet extract semantic trajectories — structured intent/subgoal/grounding/recovery events. This is a post-processing step on top of session logs.

2. The Grounding Layer (this is the bridge)

Converts language-level plans into entity-referenced goals. In their system: "go to the merchant" → coordinates + entity ID + navigation goal. In ours: "check the Gitea backlog" → API endpoint + auth + parse response.

This is what the Forgejo MCP server (#616) would give us — a grounding layer between Timmy's intent and Gitea's entity space.

For Bannerlord: the AI Influence mod already exports theworld.txt — that's a grounding document. Timmy reads it, forms intent, the grounding layer converts to game actions.

3. The World-Agnostic Interface (our portal pattern)

Their EmbodiedContext / GroundedIntent is exactly the portal pattern from #19 and GAMEPORTAL_PROTOCOL.md:

  • capture_state()EmbodiedContext (what the agent perceives)
  • execute_action()GroundedIntent (what the agent wants to do)

The agent doesn't know if it's in Bannerlord, the Nexus, or a robot. Same interface. Different adapters.

4. Separate RL for Motor Skills

They don't make the LLM do motor control. PPO-trained RL skills handle locomotion and manipulation in MuJoCo. The LLM only does planning and dialogue.

For us: Bannerlord's built-in combat AI already handles motor skills (swing, block, dodge). Timmy doesn't need to learn to sword fight — he needs to learn when to fight, who to fight, and why. The game engine handles the how.

FAEA — Demonstration-Free Robotic Control via LLM Agents (Jan 2026)

arxiv.org/pdf/2601.20334
Uses Claude Agent SDK (same iterative reasoning as software debugging) directly for robotic manipulation. 84.9-96% success rates WITHOUT demonstrations or fine-tuning. Key insight: "general-purpose agents are sufficient for a class of manipulation tasks dominated by deliberative, task-level planning."

MuJoCo Playground (Jan 2025)

playground.mujoco.org
Google DeepMind's open-source framework for robot learning. Zero-shot sim-to-real transfer. pip install playground and train policies in minutes on a single GPU. Supports humanoids, quadrupeds, dexterous hands.

HAMSTER — Hierarchical Action Models (Feb 2025)

arxiv.org/html/2502.05485v1
Hierarchical VLA: high-level VLM produces 2D path guidance, low-level policy produces actions. The 2D paths are the "semantic trajectories" — embodiment-agnostic, transferable across domain gaps.

Action Items

  • Watch for the full RPG2Robot PDF to drop (monitor Hyperscape Twitter/GitHub)
  • Study their EmbodiedContext / GroundedIntent interface — compare to our GAMEPORTAL_PROTOCOL.md
  • Add semantic trajectory extraction to the session_export Huey task (structured intent/subgoal/outcome, not just raw conversation pairs)
  • Read the FAEA paper — if Claude Agent SDK gets 85%+ on robot manipulation without training, what can Hermes 4 14B do with DPO training on game trajectories?
  • Evaluate MuJoCo Playground as the sim layer between Bannerlord and a future robot body

The Bottom Line

They built what you described. The architecture validates your instinct. The difference: they're using ElizaOS + a custom MMORPG. You're using Hermes Agent + Bannerlord. The game is different, the pattern is identical. Don't rebuild their stack — steal the interface pattern and the trajectory logging format, plug them into what you already have.


Source: Twitter screenshot of preprint, March 2026. Triaged by Perplexity.

## Paper **"LLM Agents Can Learn from RPGs How to Pilot Robots: Hierarchical Language-Conditioned Control via Semantic Trajectory Transfer from Virtual Worlds"** — Hyperscape Research, March 2026 Full PDF not public yet (shared on Twitter as preprint). Below is everything extractable from the abstract + Figure 1 + Introduction, plus related work that IS available. **Repos:** - https://github.com/hyperscapeai/hyperscape (the MMORPG — ElizaOS-powered, TypeScript/C++) - https://github.com/elizaos (the agent framework) ## Their Architecture (Figure 1) ``` LLM Agent (ElizaOS) → Grounding Layer → RL Skills (PPO) ↕ context ↑ semantic ↓ actions RPG World (Hyperscape) trajectories MuJoCo Simulation ↓ transfer Real Robot (AiNex) ``` **Key insight: "Semantic trajectory transfer"** — not pixel data, not game state dumps. The LLM agent's *high-level decisions* (dialogue, intent formation, subgoal decomposition, entity grounding, failure recovery) are captured as structured trajectories. These train the planning layer. **World-agnostic interface:** `EmbodiedContext` / `GroundedIntent` abstraction makes the planner unable to distinguish whether it's in a game, a simulator, or the real world. Same interface everywhere. ## What's Directly Applicable to Timmy ### 1. Semantic Trajectory Logging (we almost have this) They use a "dual-rate logging pipeline" to capture the agent's RPG experience as structured semantic trajectories. We already have: - Hermes Agent session logs (`~/.hermes/sessions/`) - `session_export()` Huey task extracting DPO pairs every 4 hours - Heartbeat tick logs **Gap:** Our session logs capture raw conversation. We don't yet extract *semantic trajectories* — structured intent/subgoal/grounding/recovery events. This is a post-processing step on top of session logs. ### 2. The Grounding Layer (this is the bridge) Converts language-level plans into entity-referenced goals. In their system: "go to the merchant" → coordinates + entity ID + navigation goal. In ours: "check the Gitea backlog" → API endpoint + auth + parse response. **This is what the Forgejo MCP server (#616) would give us** — a grounding layer between Timmy's intent and Gitea's entity space. For Bannerlord: the AI Influence mod already exports `theworld.txt` — that's a grounding document. Timmy reads it, forms intent, the grounding layer converts to game actions. ### 3. The World-Agnostic Interface (our portal pattern) Their `EmbodiedContext` / `GroundedIntent` is exactly the portal pattern from #19 and GAMEPORTAL_PROTOCOL.md: - `capture_state()` → `EmbodiedContext` (what the agent perceives) - `execute_action()` → `GroundedIntent` (what the agent wants to do) The agent doesn't know if it's in Bannerlord, the Nexus, or a robot. Same interface. Different adapters. ### 4. Separate RL for Motor Skills They don't make the LLM do motor control. PPO-trained RL skills handle locomotion and manipulation in MuJoCo. The LLM only does planning and dialogue. **For us:** Bannerlord's built-in combat AI already handles motor skills (swing, block, dodge). Timmy doesn't need to learn to sword fight — he needs to learn *when* to fight, *who* to fight, and *why*. The game engine handles the how. ## Related Papers (Available Now) ### FAEA — Demonstration-Free Robotic Control via LLM Agents (Jan 2026) [arxiv.org/pdf/2601.20334](https://arxiv.org/pdf/2601.20334) Uses Claude Agent SDK (same iterative reasoning as software debugging) directly for robotic manipulation. 84.9-96% success rates WITHOUT demonstrations or fine-tuning. Key insight: "general-purpose agents are sufficient for a class of manipulation tasks dominated by deliberative, task-level planning." ### MuJoCo Playground (Jan 2025) [playground.mujoco.org](https://playground.mujoco.org/assets/playground_technical_report.pdf) Google DeepMind's open-source framework for robot learning. Zero-shot sim-to-real transfer. `pip install playground` and train policies in minutes on a single GPU. Supports humanoids, quadrupeds, dexterous hands. ### HAMSTER — Hierarchical Action Models (Feb 2025) [arxiv.org/html/2502.05485v1](https://arxiv.org/html/2502.05485v1) Hierarchical VLA: high-level VLM produces 2D path guidance, low-level policy produces actions. The 2D paths are the "semantic trajectories" — embodiment-agnostic, transferable across domain gaps. ## Action Items - [ ] Watch for the full RPG2Robot PDF to drop (monitor Hyperscape Twitter/GitHub) - [ ] Study their `EmbodiedContext` / `GroundedIntent` interface — compare to our GAMEPORTAL_PROTOCOL.md - [ ] Add semantic trajectory extraction to the session_export Huey task (structured intent/subgoal/outcome, not just raw conversation pairs) - [ ] Read the FAEA paper — if Claude Agent SDK gets 85%+ on robot manipulation without training, what can Hermes 4 14B do with DPO training on game trajectories? - [ ] Evaluate MuJoCo Playground as the sim layer between Bannerlord and a future robot body ## The Bottom Line They built what you described. The architecture validates your instinct. The difference: they're using ElizaOS + a custom MMORPG. You're using Hermes Agent + Bannerlord. The game is different, the pattern is identical. Don't rebuild their stack — steal the interface pattern and the trajectory logging format, plug them into what you already have. --- _Source: Twitter screenshot of preprint, March 2026. Triaged by Perplexity._
perplexity added the needs-designharnessportalp1-important labels 2026-03-27 16:02:07 +00:00
Owner

Dispatched to claude. Huey task queued.

⚡ Dispatched to `claude`. Huey task queued.
Owner

Dispatched to gemini. Huey task queued.

⚡ Dispatched to `gemini`. Huey task queued.
Owner

Dispatched to kimi. Huey task queued.

⚡ Dispatched to `kimi`. Huey task queued.
Owner

Dispatched to grok. Huey task queued.

⚡ Dispatched to `grok`. Huey task queued.
Owner

Dispatched to perplexity. Huey task queued.

⚡ Dispatched to `perplexity`. Huey task queued.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Member

🔧 gemini working on this via Huey. Branch: gemini/issue-653

🔧 `gemini` working on this via Huey. Branch: `gemini/issue-653`
Member

🔧 grok working on this via Huey. Branch: grok/issue-653

🔧 `grok` working on this via Huey. Branch: `grok/issue-653`
Member

⚠️ grok produced no changes for this issue. Skipping.

⚠️ `grok` produced no changes for this issue. Skipping.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Owner

🔍 Triaged by Huey — needs assignment.

🔍 Triaged by Huey — needs assignment.
Timmy was assigned by Rockachopa 2026-03-28 03:54:22 +00:00
Owner

Closing as duplicate during backlog burn-down. Canonical issue: #651.

Reason: this workstream already exists with materially the same title/scope. Keeping one canonical thread prevents agent churn and review waste.

Closing as duplicate during backlog burn-down. Canonical issue: #651. Reason: this workstream already exists with materially the same title/scope. Keeping one canonical thread prevents agent churn and review waste.
Timmy closed this issue 2026-03-28 04:45:33 +00:00
Sign in to join this conversation.
4 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#653