Files
the-nexus/docs/computer-use.md
Alexander Whitestone 220f20c794
Some checks failed
CI / test (pull_request) Failing after 8s
CI / validate (pull_request) Failing after 10s
Review Approval Gate / verify-review (pull_request) Failing after 2s
feat: add desktop automation primitives to Hermes (#1125)
Implements Phase 1 and Phase 2 tooling from issue #1125:

- nexus/computer_use.py: four Hermes tools with poka-yoke safety
    * computer_screenshot() — capture & base64-encode desktop snapshot
    * computer_click(x, y, button, confirm) — right/middle require confirm=True
    * computer_type(text, confirm) — sensitive keywords blocked without confirm=True;
      text value is never written to audit log
    * computer_scroll(x, y, amount) — scroll wheel
    * read_action_log() — inspect recent JSONL audit entries
    * pyautogui.FAILSAFE=True; all tools degrade gracefully when headless

- nexus/computer_use_demo.py: Phase 1 demo (baseline screenshot →
  open browser → navigate to Gitea forge → evidence screenshot)

- tests/test_computer_use.py: 32 unit tests, fully headless
  (pyautogui mocked), all passing

- docs/computer-use.md: API reference, safety table, phase roadmap,
  pilot recipes

- docker-compose.desktop.yml: sandboxed Xvfb + noVNC container

Fixes #1125

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 05:45:27 -04:00

4.8 KiB

Computer Use — Desktop Automation Primitives for Hermes

Issue: #1125

Overview

nexus/computer_use.py adds desktop automation primitives to the Hermes fleet. Agents can take screenshots, click, type, and scroll — enough to drive a browser, validate a UI, or diagnose a failed workflow page visually.

All actions are logged to a JSONL audit trail at ~/.nexus/computer_use_actions.jsonl.


Quick Start

Local (requires a real display or Xvfb)

# Install dependencies
pip install pyautogui Pillow

# Run the Phase 1 demo
python -m nexus.computer_use_demo

Sandboxed (Docker + Xvfb + noVNC)

docker compose -f docker-compose.desktop.yml up -d
# Visit http://localhost:6080 in your browser to see the virtual desktop

docker compose -f docker-compose.desktop.yml run hermes-desktop \
    python -m nexus.computer_use_demo

docker compose -f docker-compose.desktop.yml down

API Reference

computer_screenshot(save_path=None, log_path=...)

Capture the current desktop.

Param Type Description
save_path str | None Path to save PNG. If None, returns base64 string.
log_path Path Audit log file.

Returns dict:

{
  "ok": true,
  "image_b64": "<base64 PNG or null>",
  "saved_to": "<path or null>",
  "error": null
}

computer_click(x, y, button="left", confirm=False, log_path=...)

Click the mouse at screen coordinates.

Param Type Description
x int Horizontal coordinate
y int Vertical coordinate
button str "left" | "right" | "middle"
confirm bool Required True for right / middle (poka-yoke)

Returns dict:

{"ok": true, "error": null}

computer_type(text, confirm=False, interval=0.02, log_path=...)

Type text using the keyboard.

Param Type Description
text str Text to type
confirm bool Required True when text contains a sensitive keyword
interval float Delay between keystrokes (seconds)

Sensitive keywords (require confirm=True): password, passwd, secret, token, api_key, apikey, key, auth

Note: the actual text value is never written to the audit log — only its length and whether it was flagged as sensitive.

Returns dict:

{"ok": true, "error": null}

computer_scroll(x, y, amount=3, log_path=...)

Scroll the mouse wheel at screen coordinates.

Param Type Description
x int Horizontal coordinate
y int Vertical coordinate
amount int Scroll units. Positive = up, negative = down.

Returns dict:

{"ok": true, "error": null}

read_action_log(n=20, log_path=...)

Return the most recent n audit log entries, newest first.

from nexus.computer_use import read_action_log

for entry in read_action_log(n=5):
    print(entry["ts"], entry["action"], entry["result"]["ok"])

Safety Model

Action Safety gate
computer_click(button="right") Requires confirm=True
computer_click(button="middle") Requires confirm=True
computer_type with sensitive text Requires confirm=True
Mouse to top-left corner pyautogui FAILSAFE — aborts immediately
All actions Written to JSONL audit log with timestamp
Headless environment All tools degrade gracefully — return ok=False with error message

Phase Roadmap

Phase 1 — Environment & Primitives

  • Sandboxed desktop via Xvfb + noVNC (docker-compose.desktop.yml)
  • computer_screenshot, computer_click, computer_type, computer_scroll
  • Poka-yoke safety checks on all destructive actions
  • JSONL audit log for all actions
  • Demo: baseline screenshot → open browser → navigate to Gitea → evidence screenshot
  • 32 unit tests, fully headless (pyautogui mocked)

Phase 2 — Tool Integration (planned)

  • Register tools in the Hermes tool registry
  • LLM-based planner loop using screenshots as context
  • Destructive action confirmation UI

Phase 3 — Use-Case Pilots (planned)

  • Pilot 1: Automated visual regression test for fleet dashboard
  • Pilot 2: Screenshot-based diagnosis of failed CI workflow page

File Locations

File Purpose
nexus/computer_use.py Core tool primitives
nexus/computer_use_demo.py Phase 1 end-to-end demo
tests/test_computer_use.py 32 unit tests
docker-compose.desktop.yml Sandboxed desktop container
~/.nexus/computer_use_actions.jsonl Runtime audit log
~/.nexus/computer_use_evidence/ Screenshot evidence (demo output)