# Computer Use — Desktop Automation Primitives for Hermes Issue: [#1125](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/1125) ## Overview `nexus/computer_use.py` adds desktop automation primitives to the Hermes fleet. Agents can take screenshots, click, type, and scroll — enough to drive a browser, validate a UI, or diagnose a failed workflow page visually. All actions are logged to a JSONL audit trail at `~/.nexus/computer_use_actions.jsonl`. --- ## Quick Start ### Local (requires a real display or Xvfb) ```bash # Install dependencies pip install pyautogui Pillow # Run the Phase 1 demo python -m nexus.computer_use_demo ``` ### Sandboxed (Docker + Xvfb + noVNC) ```bash docker compose -f docker-compose.desktop.yml up -d # Visit http://localhost:6080 in your browser to see the virtual desktop docker compose -f docker-compose.desktop.yml run hermes-desktop \ python -m nexus.computer_use_demo docker compose -f docker-compose.desktop.yml down ``` --- ## API Reference ### `computer_screenshot(save_path=None, log_path=...)` Capture the current desktop. | Param | Type | Description | |-------|------|-------------| | `save_path` | `str \| None` | Path to save PNG. If `None`, returns base64 string. | | `log_path` | `Path` | Audit log file. | **Returns** `dict`: ```json { "ok": true, "image_b64": "", "saved_to": "", "error": null } ``` --- ### `computer_click(x, y, button="left", confirm=False, log_path=...)` Click the mouse at screen coordinates. | Param | Type | Description | |-------|------|-------------| | `x` | `int` | Horizontal coordinate | | `y` | `int` | Vertical coordinate | | `button` | `str` | `"left"` \| `"right"` \| `"middle"` | | `confirm` | `bool` | Required `True` for `right` / `middle` (poka-yoke) | **Returns** `dict`: ```json {"ok": true, "error": null} ``` --- ### `computer_type(text, confirm=False, interval=0.02, log_path=...)` Type text using the keyboard. | Param | Type | Description | |-------|------|-------------| | `text` | `str` | Text to type | | `confirm` | `bool` | Required `True` when text contains a sensitive keyword | | `interval` | `float` | Delay between keystrokes (seconds) | **Sensitive keywords** (require `confirm=True`): `password`, `passwd`, `secret`, `token`, `api_key`, `apikey`, `key`, `auth` > Note: the actual `text` value is never written to the audit log — only its length and whether it was flagged as sensitive. **Returns** `dict`: ```json {"ok": true, "error": null} ``` --- ### `computer_scroll(x, y, amount=3, log_path=...)` Scroll the mouse wheel at screen coordinates. | Param | Type | Description | |-------|------|-------------| | `x` | `int` | Horizontal coordinate | | `y` | `int` | Vertical coordinate | | `amount` | `int` | Scroll units. Positive = up, negative = down. | **Returns** `dict`: ```json {"ok": true, "error": null} ``` --- ### `read_action_log(n=20, log_path=...)` Return the most recent `n` audit log entries, newest first. ```python from nexus.computer_use import read_action_log for entry in read_action_log(n=5): print(entry["ts"], entry["action"], entry["result"]["ok"]) ``` --- ## Safety Model | Action | Safety gate | |--------|-------------| | `computer_click(button="right")` | Requires `confirm=True` | | `computer_click(button="middle")` | Requires `confirm=True` | | `computer_type` with sensitive text | Requires `confirm=True` | | Mouse to top-left corner | pyautogui FAILSAFE — aborts immediately | | All actions | Written to JSONL audit log with timestamp | | Headless environment | All tools degrade gracefully — return `ok=False` with error message | --- ## Phase Roadmap ### Phase 1 — Environment & Primitives ✅ - Sandboxed desktop via Xvfb + noVNC (`docker-compose.desktop.yml`) - `computer_screenshot`, `computer_click`, `computer_type`, `computer_scroll` - Poka-yoke safety checks on all destructive actions - JSONL audit log for all actions - Demo: baseline screenshot → open browser → navigate to Gitea → evidence screenshot - 32 unit tests, fully headless (pyautogui mocked) ### Phase 2 — Tool Integration (planned) - Register tools in the Hermes tool registry - LLM-based planner loop using screenshots as context - Destructive action confirmation UI ### Phase 3 — Use-Case Pilots (planned) - Pilot 1: Automated visual regression test for fleet dashboard - Pilot 2: Screenshot-based diagnosis of failed CI workflow page --- ## File Locations | File | Purpose | |------|---------| | `nexus/computer_use.py` | Core tool primitives | | `nexus/computer_use_demo.py` | Phase 1 end-to-end demo | | `tests/test_computer_use.py` | 32 unit tests | | `docker-compose.desktop.yml` | Sandboxed desktop container | | `~/.nexus/computer_use_actions.jsonl` | Runtime audit log | | `~/.nexus/computer_use_evidence/` | Screenshot evidence (demo output) |