[RESEARCH] RoboOmni — proactive robot manipulation from speech, sound, and vision (no explicit instructions) #655
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Paper
"RoboOmni: Proactive Robot Manipulation in Omni-modal Context"
— OpenMOSS (Fudan University + NUS), arXiv:2510.23763
fnlp/RoboOmnifnlp/OmniAction(140k episodes)What It Is
A robot that figures out what you want without you telling it. Instead of explicit instructions ("pick up the cup"), it infers intent from:
Architecture: Perceiver → Thinker → Talker → Executor
The robot perceives multimodal context, thinks about what the human probably wants, confirms by talking ("would you like me to get that?"), then executes the action. End-to-end, one model.
Why This Matters for Timmy
1. Proactive Intent vs Explicit Instruction
Every agent framework right now (including Hermes Agent) waits for explicit instructions. RoboOmni's insight: real collaboration means inferring intent from ambient context.
For Timmy: instead of "check the Gitea backlog", Timmy observes you've been coding for 3 hours without a commit and proactively asks "want me to check what's open?" That's the difference between a tool and a companion.
2. The Perceiver-Thinker-Talker-Executor Pattern
Maps cleanly to the Hermes harness:
The heartbeat loop already does Perceiver + Thinker. Adding a Talker confirmation step before Executor would make Timmy proactive but not reckless.
3. OmniAction Dataset — Training Data Format
140k episodes, RLDS format, with:
The dataset structure is worth studying even if we don't use their data. The 6 contextual instruction types are a taxonomy for how Timmy could learn to infer intent beyond explicit commands.
4. Speech-to-Action Without ASR Intermediate
RoboOmni processes speech end-to-end — no separate speech-to-text step. This is faster and preserves tone/sentiment that text strips out. Relevant when Timmy has voice (the local voice stack from the SOTA report: Whisper + LLM + Piper).
What to Steal
Comparison: RPG2Robot (#653) vs RoboOmni
Both papers contribute different pieces. RPG2Robot gives us the transfer architecture. RoboOmni gives us the intent inference pattern. Together they describe a system that learns through games AND proactively assists without being told — which is the full Timmy vision.
Source: github.com/OpenMOSS/RoboOmni — triaged by Perplexity
⚡ Dispatched to
claude. Huey task queued.⚡ Dispatched to
gemini. Huey task queued.⚡ Dispatched to
kimi. Huey task queued.⚡ Dispatched to
grok. Huey task queued.⚡ Dispatched to
perplexity. Huey task queued.🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔧
geminiworking on this via Huey. Branch:gemini/issue-655🔧
grokworking on this via Huey. Branch:grok/issue-655⚠️
grokproduced no changes for this issue. Skipping.🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
🔍 Triaged by Huey — needs assignment.
Closing as duplicate during backlog burn-down. Canonical issue: #654.
Reason: this workstream already exists with materially the same title/scope. Keeping one canonical thread prevents agent churn and review waste.