# Computer Use — Desktop Automation Primitives for Hermes

Issue: [#1125](https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/1125)

## Overview

`nexus/computer_use.py` adds desktop automation primitives to the Hermes fleet. Agents can take screenshots, click, type, and scroll — enough to drive a browser, validate a UI, or diagnose a failed workflow page visually.

All actions are logged to a JSONL audit trail at `~/.nexus/computer_use_actions.jsonl`.

---

## Quick Start

### Local (requires a real display or Xvfb)

```bash
# Install dependencies
pip install pyautogui Pillow

# Run the Phase 1 demo
python -m nexus.computer_use_demo
```

### Sandboxed (Docker + Xvfb + noVNC)

```bash
docker compose -f docker-compose.desktop.yml up -d
# Visit http://localhost:6080 in your browser to see the virtual desktop

docker compose -f docker-compose.desktop.yml run hermes-desktop \
    python -m nexus.computer_use_demo

docker compose -f docker-compose.desktop.yml down
```

---

## API Reference

### `computer_screenshot(save_path=None, log_path=...)`

Capture the current desktop.

| Param | Type | Description |
|-------|------|-------------|
| `save_path` | `str \| None` | Path to save PNG. If `None`, returns base64 string. |
| `log_path` | `Path` | Audit log file. |

**Returns** `dict`:
```json
{
  "ok": true,
  "image_b64": "<base64 PNG or null>",
  "saved_to": "<path or null>",
  "error": null
}
```

---

### `computer_click(x, y, button="left", confirm=False, log_path=...)`

Click the mouse at screen coordinates.

| Param | Type | Description |
|-------|------|-------------|
| `x` | `int` | Horizontal coordinate |
| `y` | `int` | Vertical coordinate |
| `button` | `str` | `"left"` \| `"right"` \| `"middle"` |
| `confirm` | `bool` | Required `True` for `right` / `middle` (poka-yoke) |

**Returns** `dict`:
```json
{"ok": true, "error": null}
```

---

### `computer_type(text, confirm=False, interval=0.02, log_path=...)`

Type text using the keyboard.

| Param | Type | Description |
|-------|------|-------------|
| `text` | `str` | Text to type |
| `confirm` | `bool` | Required `True` when text contains a sensitive keyword |
| `interval` | `float` | Delay between keystrokes (seconds) |

**Sensitive keywords** (require `confirm=True`): `password`, `passwd`, `secret`, `token`, `api_key`, `apikey`, `key`, `auth`

> Note: the actual `text` value is never written to the audit log — only its length and whether it was flagged as sensitive.

**Returns** `dict`:
```json
{"ok": true, "error": null}
```

---

### `computer_scroll(x, y, amount=3, log_path=...)`

Scroll the mouse wheel at screen coordinates.

| Param | Type | Description |
|-------|------|-------------|
| `x` | `int` | Horizontal coordinate |
| `y` | `int` | Vertical coordinate |
| `amount` | `int` | Scroll units. Positive = up, negative = down. |

**Returns** `dict`:
```json
{"ok": true, "error": null}
```

---

### `read_action_log(n=20, log_path=...)`

Return the most recent `n` audit log entries, newest first.

```python
from nexus.computer_use import read_action_log

for entry in read_action_log(n=5):
    print(entry["ts"], entry["action"], entry["result"]["ok"])
```

---

## Safety Model

| Action | Safety gate |
|--------|-------------|
| `computer_click(button="right")` | Requires `confirm=True` |
| `computer_click(button="middle")` | Requires `confirm=True` |
| `computer_type` with sensitive text | Requires `confirm=True` |
| Mouse to top-left corner | pyautogui FAILSAFE — aborts immediately |
| All actions | Written to JSONL audit log with timestamp |
| Headless environment | All tools degrade gracefully — return `ok=False` with error message |

---

## Phase Roadmap

### Phase 1 — Environment & Primitives ✅
- Sandboxed desktop via Xvfb + noVNC (`docker-compose.desktop.yml`)
- `computer_screenshot`, `computer_click`, `computer_type`, `computer_scroll`
- Poka-yoke safety checks on all destructive actions
- JSONL audit log for all actions
- Demo: baseline screenshot → open browser → navigate to Gitea → evidence screenshot
- 32 unit tests, fully headless (pyautogui mocked)

### Phase 2 — Tool Integration (planned)
- Register tools in the Hermes tool registry
- LLM-based planner loop using screenshots as context
- Destructive action confirmation UI

### Phase 3 — Use-Case Pilots (planned)
- Pilot 1: Automated visual regression test for fleet dashboard
- Pilot 2: Screenshot-based diagnosis of failed CI workflow page

---

## File Locations

| File | Purpose |
|------|---------|
| `nexus/computer_use.py` | Core tool primitives |
| `nexus/computer_use_demo.py` | Phase 1 end-to-end demo |
| `tests/test_computer_use.py` | 32 unit tests |
| `docker-compose.desktop.yml` | Sandboxed desktop container |
| `~/.nexus/computer_use_actions.jsonl` | Runtime audit log |
| `~/.nexus/computer_use_evidence/` | Screenshot evidence (demo output) |