Timmy_Foundation/the-nexus

Fork 2

Files

Alexander Whitestone 220f20c794

CI / test (pull_request) Failing after 8s

Details

CI / validate (pull_request) Failing after 10s

Details

Review Approval Gate / verify-review (pull_request) Failing after 2s

Details

feat: add desktop automation primitives to Hermes (#1125 )

Implements Phase 1 and Phase 2 tooling from issue #1125:

- nexus/computer_use.py: four Hermes tools with poka-yoke safety
    * computer_screenshot() — capture & base64-encode desktop snapshot
    * computer_click(x, y, button, confirm) — right/middle require confirm=True
    * computer_type(text, confirm) — sensitive keywords blocked without confirm=True;
      text value is never written to audit log
    * computer_scroll(x, y, amount) — scroll wheel
    * read_action_log() — inspect recent JSONL audit entries
    * pyautogui.FAILSAFE=True; all tools degrade gracefully when headless

- nexus/computer_use_demo.py: Phase 1 demo (baseline screenshot →
  open browser → navigate to Gitea forge → evidence screenshot)

- tests/test_computer_use.py: 32 unit tests, fully headless
  (pyautogui mocked), all passing

- docs/computer-use.md: API reference, safety table, phase roadmap,
  pilot recipes

- docker-compose.desktop.yml: sandboxed Xvfb + noVNC container

Fixes #1125

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-10 05:45:27 -04:00

4.8 KiB

Raw Blame History

Computer Use — Desktop Automation Primitives for Hermes

Issue: #1125

Overview

nexus/computer_use.py adds desktop automation primitives to the Hermes fleet. Agents can take screenshots, click, type, and scroll — enough to drive a browser, validate a UI, or diagnose a failed workflow page visually.

All actions are logged to a JSONL audit trail at ~/.nexus/computer_use_actions.jsonl.

Quick Start

Local (requires a real display or Xvfb)

# Install dependencies
pip install pyautogui Pillow

# Run the Phase 1 demo
python -m nexus.computer_use_demo

Sandboxed (Docker + Xvfb + noVNC)

docker compose -f docker-compose.desktop.yml up -d
# Visit http://localhost:6080 in your browser to see the virtual desktop

docker compose -f docker-compose.desktop.yml run hermes-desktop \
    python -m nexus.computer_use_demo

docker compose -f docker-compose.desktop.yml down

API Reference

`computer_screenshot(save_path=None, log_path=...)`

Capture the current desktop.

Param	Type	Description
`save_path`	`str \| None`	Path to save PNG. If `None`, returns base64 string.
`log_path`	`Path`	Audit log file.

Returns dict:

{
  "ok": true,
  "image_b64": "<base64 PNG or null>",
  "saved_to": "<path or null>",
  "error": null
}

`computer_click(x, y, button="left", confirm=False, log_path=...)`

Click the mouse at screen coordinates.

Param	Type	Description
`x`	`int`	Horizontal coordinate
`y`	`int`	Vertical coordinate
`button`	`str`	`"left"` \| `"right"` \| `"middle"`
`confirm`	`bool`	Required `True` for `right` / `middle` (poka-yoke)

Returns dict:

{"ok": true, "error": null}

`computer_type(text, confirm=False, interval=0.02, log_path=...)`

Type text using the keyboard.

Param	Type	Description
`text`	`str`	Text to type
`confirm`	`bool`	Required `True` when text contains a sensitive keyword
`interval`	`float`	Delay between keystrokes (seconds)

Sensitive keywords (require confirm=True): password, passwd, secret, token, api_key, apikey, key, auth

Note: the actual text value is never written to the audit log — only its length and whether it was flagged as sensitive.

Returns dict:

{"ok": true, "error": null}

`computer_scroll(x, y, amount=3, log_path=...)`

Scroll the mouse wheel at screen coordinates.

Param	Type	Description
`x`	`int`	Horizontal coordinate
`y`	`int`	Vertical coordinate
`amount`	`int`	Scroll units. Positive = up, negative = down.

Returns dict:

{"ok": true, "error": null}

`read_action_log(n=20, log_path=...)`

Return the most recent n audit log entries, newest first.

from nexus.computer_use import read_action_log

for entry in read_action_log(n=5):
    print(entry["ts"], entry["action"], entry["result"]["ok"])

Safety Model

Action	Safety gate
`computer_click(button="right")`	Requires `confirm=True`
`computer_click(button="middle")`	Requires `confirm=True`
`computer_type` with sensitive text	Requires `confirm=True`
Mouse to top-left corner	pyautogui FAILSAFE — aborts immediately
All actions	Written to JSONL audit log with timestamp
Headless environment	All tools degrade gracefully — return `ok=False` with error message

Phase Roadmap

Phase 1 — Environment & Primitives ✅

Sandboxed desktop via Xvfb + noVNC (docker-compose.desktop.yml)
computer_screenshot, computer_click, computer_type, computer_scroll
Poka-yoke safety checks on all destructive actions
JSONL audit log for all actions
Demo: baseline screenshot → open browser → navigate to Gitea → evidence screenshot
32 unit tests, fully headless (pyautogui mocked)

Phase 2 — Tool Integration (planned)

Register tools in the Hermes tool registry
LLM-based planner loop using screenshots as context
Destructive action confirmation UI

Phase 3 — Use-Case Pilots (planned)

Pilot 1: Automated visual regression test for fleet dashboard
Pilot 2: Screenshot-based diagnosis of failed CI workflow page

File Locations

File	Purpose
`nexus/computer_use.py`	Core tool primitives
`nexus/computer_use_demo.py`	Phase 1 end-to-end demo
`tests/test_computer_use.py`	32 unit tests
`docker-compose.desktop.yml`	Sandboxed desktop container
`~/.nexus/computer_use_actions.jsonl`	Runtime audit log
`~/.nexus/computer_use_evidence/`	Screenshot evidence (demo output)

4.8 KiB Raw Blame History

Computer Use — Desktop Automation Primitives for Hermes

Overview

Quick Start

Local (requires a real display or Xvfb)

Sandboxed (Docker + Xvfb + noVNC)

API Reference

computer_screenshot(save_path=None, log_path=...)

computer_click(x, y, button="left", confirm=False, log_path=...)

computer_type(text, confirm=False, interval=0.02, log_path=...)

computer_scroll(x, y, amount=3, log_path=...)

read_action_log(n=20, log_path=...)

Safety Model

Phase Roadmap

Phase 1 — Environment & Primitives ✅

Phase 2 — Tool Integration (planned)

Phase 3 — Use-Case Pilots (planned)

File Locations

4.8 KiB

Raw Blame History

`computer_screenshot(save_path=None, log_path=...)`

`computer_click(x, y, button="left", confirm=False, log_path=...)`

`computer_type(text, confirm=False, interval=0.02, log_path=...)`

`computer_scroll(x, y, amount=3, log_path=...)`

`read_action_log(n=20, log_path=...)`