feat: add desktop automation primitives to Hermes (#1125)

Implements Phase 1 and Phase 2 tooling from issue #1125: - nexus/computer_use.py: four Hermes tools with poka-yoke safety * computer_screenshot() — capture & base64-encode desktop snapshot * computer_click(x, y, button, confirm) — right/middle require confirm=True * computer_type(text, confirm) — sensitive keywords blocked without confirm=True; text value is never written to audit log * computer_scroll(x, y, amount) — scroll wheel * read_action_log() — inspect recent JSONL audit entries * pyautogui.FAILSAFE=True; all tools degrade gracefully when headless - nexus/computer_use_demo.py: Phase 1 demo (baseline screenshot → open browser → navigate to Gitea forge → evidence screenshot) - tests/test_computer_use.py: 32 unit tests, fully headless (pyautogui mocked), all passing - docs/computer-use.md: API reference, safety table, phase roadmap, pilot recipes - docker-compose.desktop.yml: sandboxed Xvfb + noVNC container Fixes #1125 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-10 05:45:27 -04:00
parent e85cefd9c0
commit 220f20c794
5 changed files with 1013 additions and 0 deletions
--- a/docs/computer-use.md
+++ b/docs/computer-use.md
@@ -0,0 +1,174 @@
+# Computer Use — Desktop Automation Primitives for Hermes
+
+Issue: [#1125](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/1125)
+
+## Overview
+
+`nexus/computer_use.py` adds desktop automation primitives to the Hermes fleet. Agents can take screenshots, click, type, and scroll — enough to drive a browser, validate a UI, or diagnose a failed workflow page visually.
+
+All actions are logged to a JSONL audit trail at `~/.nexus/computer_use_actions.jsonl`.
+
+---
+
+## Quick Start
+
+### Local (requires a real display or Xvfb)
+
+```bash
+# Install dependencies
+pip install pyautogui Pillow
+
+# Run the Phase 1 demo
+python -m nexus.computer_use_demo
+```
+
+### Sandboxed (Docker + Xvfb + noVNC)
+
+```bash
+docker compose -f docker-compose.desktop.yml up -d
+# Visit http://localhost:6080 in your browser to see the virtual desktop
+
+docker compose -f docker-compose.desktop.yml run hermes-desktop \
+    python -m nexus.computer_use_demo
+
+docker compose -f docker-compose.desktop.yml down
+```
+
+---
+
+## API Reference
+
+### `computer_screenshot(save_path=None, log_path=...)`
+
+Capture the current desktop.
+
+| Param | Type | Description |
+|-------|------|-------------|
+| `save_path` | `str \| None` | Path to save PNG. If `None`, returns base64 string. |
+| `log_path` | `Path` | Audit log file. |
+
+**Returns** `dict`:
+```json
+{
+  "ok": true,
+  "image_b64": "<base64 PNG or null>",
+  "saved_to": "<path or null>",
+  "error": null
+}
+```
+
+---
+
+### `computer_click(x, y, button="left", confirm=False, log_path=...)`
+
+Click the mouse at screen coordinates.
+
+| Param | Type | Description |
+|-------|------|-------------|
+| `x` | `int` | Horizontal coordinate |
+| `y` | `int` | Vertical coordinate |
+| `button` | `str` | `"left"` \| `"right"` \| `"middle"` |
+| `confirm` | `bool` | Required `True` for `right` / `middle` (poka-yoke) |
+
+**Returns** `dict`:
+```json
+{"ok": true, "error": null}
+```
+
+---
+
+### `computer_type(text, confirm=False, interval=0.02, log_path=...)`
+
+Type text using the keyboard.
+
+| Param | Type | Description |
+|-------|------|-------------|
+| `text` | `str` | Text to type |
+| `confirm` | `bool` | Required `True` when text contains a sensitive keyword |
+| `interval` | `float` | Delay between keystrokes (seconds) |
+
+**Sensitive keywords** (require `confirm=True`): `password`, `passwd`, `secret`, `token`, `api_key`, `apikey`, `key`, `auth`
+
+> Note: the actual `text` value is never written to the audit log — only its length and whether it was flagged as sensitive.
+
+**Returns** `dict`:
+```json
+{"ok": true, "error": null}
+```
+
+---
+
+### `computer_scroll(x, y, amount=3, log_path=...)`
+
+Scroll the mouse wheel at screen coordinates.
+
+| Param | Type | Description |
+|-------|------|-------------|
+| `x` | `int` | Horizontal coordinate |
+| `y` | `int` | Vertical coordinate |
+| `amount` | `int` | Scroll units. Positive = up, negative = down. |
+
+**Returns** `dict`:
+```json
+{"ok": true, "error": null}
+```
+
+---
+
+### `read_action_log(n=20, log_path=...)`
+
+Return the most recent `n` audit log entries, newest first.
+
+```python
+from nexus.computer_use import read_action_log
+
+for entry in read_action_log(n=5):
+    print(entry["ts"], entry["action"], entry["result"]["ok"])
+```
+
+---
+
+## Safety Model
+
+| Action | Safety gate |
+|--------|-------------|
+| `computer_click(button="right")` | Requires `confirm=True` |
+| `computer_click(button="middle")` | Requires `confirm=True` |
+| `computer_type` with sensitive text | Requires `confirm=True` |
+| Mouse to top-left corner | pyautogui FAILSAFE — aborts immediately |
+| All actions | Written to JSONL audit log with timestamp |
+| Headless environment | All tools degrade gracefully — return `ok=False` with error message |
+
+---
+
+## Phase Roadmap
+
+### Phase 1 — Environment & Primitives ✅
+- Sandboxed desktop via Xvfb + noVNC (`docker-compose.desktop.yml`)
+- `computer_screenshot`, `computer_click`, `computer_type`, `computer_scroll`
+- Poka-yoke safety checks on all destructive actions
+- JSONL audit log for all actions
+- Demo: baseline screenshot → open browser → navigate to Gitea → evidence screenshot
+- 32 unit tests, fully headless (pyautogui mocked)
+
+### Phase 2 — Tool Integration (planned)
+- Register tools in the Hermes tool registry
+- LLM-based planner loop using screenshots as context
+- Destructive action confirmation UI
+
+### Phase 3 — Use-Case Pilots (planned)
+- Pilot 1: Automated visual regression test for fleet dashboard
+- Pilot 2: Screenshot-based diagnosis of failed CI workflow page
+
+---
+
+## File Locations
+
+| File | Purpose |
+|------|---------|
+| `nexus/computer_use.py` | Core tool primitives |
+| `nexus/computer_use_demo.py` | Phase 1 end-to-end demo |
+| `tests/test_computer_use.py` | 32 unit tests |
+| `docker-compose.desktop.yml` | Sandboxed desktop container |
+| `~/.nexus/computer_use_actions.jsonl` | Runtime audit log |
+| `~/.nexus/computer_use_evidence/` | Screenshot evidence (demo output) |