5.8 KiB
GamePortal Protocol
A thin interface contract for how Timmy perceives and acts in game worlds. No adapter code. The implementation IS the MCP servers.
The Contract
Every game portal implements two operations:
capture_state() → GameState
execute_action(action) → ActionResult
That's it. Everything else is game-specific configuration.
capture_state()
Returns a snapshot of what Timmy can see and know right now.
Composed from MCP tool calls:
| Data | MCP Server | Tool Call |
|---|---|---|
| Screenshot of game window | desktop-control | take_screenshot("game_window.png") |
| Screen dimensions | desktop-control | get_screen_size() |
| Mouse position | desktop-control | get_mouse_position() |
| Pixel at coordinate | desktop-control | pixel_color(x, y) |
| Current OS | desktop-control | get_os() |
| Recently played games | steam-info | steam-recently-played(user_id) |
| Game achievements | steam-info | steam-player-achievements(user_id, app_id) |
| Game stats | steam-info | steam-user-stats(user_id, app_id) |
| Live player count | steam-info | steam-current-players(app_id) |
| Game news | steam-info | steam-news(app_id) |
GameState schema:
{
"portal_id": "bannerlord",
"timestamp": "2026-03-25T19:30:00Z",
"visual": {
"screenshot_path": "/tmp/capture_001.png",
"screen_size": [2560, 1440],
"mouse_position": [800, 600]
},
"game_context": {
"app_id": 261550,
"playtime_hours": 142,
"achievements_unlocked": 23,
"achievements_total": 96,
"current_players_online": 8421
}
}
The heartbeat loop constructs GameState by calling the relevant MCP tools
and assembling the results. No intermediate format or adapter is needed —
the MCP responses ARE the state.
execute_action(action)
Sends an input to the game through the desktop.
Composed from MCP tool calls:
| Action | MCP Server | Tool Call |
|---|---|---|
| Click at position | desktop-control | click(x, y) |
| Right-click | desktop-control | right_click(x, y) |
| Double-click | desktop-control | double_click(x, y) |
| Move mouse | desktop-control | move_to(x, y) |
| Drag | desktop-control | drag_to(x, y, duration) |
| Type text | desktop-control | type_text("text") |
| Press key | desktop-control | press_key("space") |
| Key combo | desktop-control | hotkey("ctrl shift s") |
| Scroll | desktop-control | scroll(amount) |
ActionResult schema:
{
"success": true,
"action": "press_key",
"params": {"key": "space"},
"timestamp": "2026-03-25T19:30:01Z"
}
Actions are direct MCP calls. The model decides what to do;
the heartbeat loop translates tool_calls into MCP tools/call requests.
Adding a New Portal
A portal is a game configuration. To add one:
- Add entry to
portals.json:
{
"id": "new-game",
"name": "New Game",
"description": "What this portal is.",
"status": "offline",
"app_id": 12345,
"window_title": "New Game Window Title",
"destination": {
"type": "harness",
"params": { "world": "new-world" }
}
}
-
No code changes. The heartbeat loop reads
portals.json, usesapp_idfor Steam API calls andwindow_titlefor screenshot targeting. The MCP tools are game-agnostic. -
Game-specific prompts go in
training/data/prompts_*.yamlto teach the model what the game looks like and how to play it.
Portal: Bannerlord (Primary)
Steam App ID: 261550
Window title: Mount & Blade II: Bannerlord
Mod required: BannerlordTogether (multiplayer, ticket #549)
capture_state additions:
- Screenshot shows campaign map or battle view
- Steam stats include: battles won, settlements owned, troops recruited
- Achievement data shows campaign progress
Key actions:
- Campaign map: click settlements, right-click to move army
- Battle: click units to select, right-click to command
- Menus: press keys for inventory (I), character (C), party (P)
- Save/load: hotkey("ctrl s"), hotkey("ctrl l")
Training data needed:
- Screenshots of campaign map with annotations
- Screenshots of battle view with unit positions
- Decision examples: "I see my army near Vlandia. I should move toward the objective."
Portal: Morrowind (Secondary)
Steam App ID: 22320 (The Elder Scrolls III: Morrowind GOTY)
Window title: OpenMW (if using OpenMW) or Morrowind
Multiplayer: TES3MP (OpenMW fork with multiplayer)
capture_state additions:
- Screenshot shows first-person exploration or dialogue
- Stats include: playtime, achievements (limited on Steam for old games)
- OpenMW may expose additional data through log files
Key actions:
- Movement: WASD + mouse look
- Interact: click / press space on objects and NPCs
- Combat: click to attack, right-click to block
- Inventory: press Tab
- Journal: press J
- Rest: press T
Training data needed:
- Screenshots of Vvardenfell landscapes, towns, interiors
- Dialogue trees with NPC responses
- Navigation examples: "I see Balmora ahead. I should follow the road north."
What This Protocol Does NOT Do
- No game memory extraction. We read what's on screen, not in RAM.
- No mod APIs. We click and type, like a human at a keyboard.
- No custom adapters per game. Same MCP tools for every game.
- No network protocol. Local desktop control only.
The model learns to play by looking at screenshots and pressing keys. The same way a human learns. The protocol is just "look" and "act."
Mapping to the Three Pillars
| Pillar | How GamePortal serves it |
|---|---|
| Heartbeat | capture_state feeds the perception step. execute_action IS the action step. |
| Harness | The DPO model is trained on (screenshot, decision, action) trajectories from portal play. |
| Portal Interface | This protocol IS the portal interface. |