[COMPUTER_USE] Add Desktop Automation Primitives to Hermes #1125

Open
opened 2026-04-07 21:17:09 +00:00 by Timmy · 1 comment
Owner

Objective

Add computer-use primitives to Hermes so that agents can control a desktop environment (screenshot, click, type, read screen content) for automation and testing.

Background

Anthropic’s "computer use" capability allows Claude to control a desktop via screenshots and mouse/keyboard actions. For our fleet, this unlocks: visual testing of web UIs, automated Gitea workflow verification, screenshot-based incident diagnosis, and operating GUI-only tools.

Acceptance Criteria

Phase 1 — Environment & Primitives (1 week)

  • A sandboxed desktop environment is available to Hermes (Xvfb + noVNC, or a Docker container with desktop)
  • Hermes can take a screenshot of the desktop
  • Hermes can execute mouse clicks and keyboard input via a safe API
  • A simple "open browser, navigate to Gitea, take screenshot" demo works end-to-end

Phase 2 — Tool Integration (1 week)

  • New Hermes tools created: computer_screenshot, computer_click, computer_type, computer_scroll
  • A planner loop uses screenshots to decide next actions (could be LLM-based or hardcoded for MVP)
  • Poka-yoke: destructive actions (e.g., rm -rf, password entry) require explicit confirmation
  • All actions are logged with screenshot evidence

Phase 3 — Use-Case Pilots (1 week)

  • Pilot 1: Automated visual regression test for a fleet dashboard
  • Pilot 2: Screenshot-based diagnosis of a failed CI workflow page
  • Document in the-nexus/docs/computer-use.md

Suggested Implementation Path

  1. Use playwright, pyautogui, or anthropic-computer-use-demo reference code
  2. Run inside a container or VM to limit blast radius
  3. Expose tools through the existing Hermes tool registry

Owner

Bezalel

Linked Epic

#1120

## Objective Add computer-use primitives to Hermes so that agents can control a desktop environment (screenshot, click, type, read screen content) for automation and testing. ## Background Anthropic’s "computer use" capability allows Claude to control a desktop via screenshots and mouse/keyboard actions. For our fleet, this unlocks: visual testing of web UIs, automated Gitea workflow verification, screenshot-based incident diagnosis, and operating GUI-only tools. ## Acceptance Criteria ### Phase 1 — Environment & Primitives (1 week) - [ ] A sandboxed desktop environment is available to Hermes (Xvfb + noVNC, or a Docker container with desktop) - [ ] Hermes can take a screenshot of the desktop - [ ] Hermes can execute mouse clicks and keyboard input via a safe API - [ ] A simple "open browser, navigate to Gitea, take screenshot" demo works end-to-end ### Phase 2 — Tool Integration (1 week) - [ ] New Hermes tools created: `computer_screenshot`, `computer_click`, `computer_type`, `computer_scroll` - [ ] A planner loop uses screenshots to decide next actions (could be LLM-based or hardcoded for MVP) - [ ] Poka-yoke: destructive actions (e.g., `rm -rf`, password entry) require explicit confirmation - [ ] All actions are logged with screenshot evidence ### Phase 3 — Use-Case Pilots (1 week) - [ ] Pilot 1: Automated visual regression test for a fleet dashboard - [ ] Pilot 2: Screenshot-based diagnosis of a failed CI workflow page - [ ] Document in `the-nexus/docs/computer-use.md` ## Suggested Implementation Path 1. Use `playwright`, `pyautogui`, or `anthropic-computer-use-demo` reference code 2. Run inside a container or VM to limit blast radius 3. Expose tools through the existing Hermes tool registry ## Owner Bezalel ## Linked Epic #1120
claude self-assigned this 2026-04-08 10:25:02 +00:00
Member

PR created: #1131

Summary of changes

nexus/computer_use.py — four Hermes tools with poka-yoke safety and JSONL action logging:

  • computer_screenshot() — capture & log desktop snapshot
  • computer_click(x, y, button, confirm) — right/middle clicks require confirm=True
  • computer_type(text, confirm) — sensitive keywords (password/token/key) refused without confirm=True
  • computer_scroll(x, y, amount) — scroll wheel
  • read_action_log() — read recent log entries
  • pyautogui.FAILSAFE=True globally; all tools degrade gracefully when headless

nexus/computer_use_demo.py — Phase 1 demo: baseline screenshot → open browser → navigate to Gitea → evidence screenshot

tests/test_computer_use.py — 29 unit tests, fully headless (pyautogui mocked), all passing

docs/computer-use.md — full Phase 1–3 documentation with API reference, safety table, pilot recipes

docker-compose.desktop.yml — sandboxed Xvfb + noVNC container

PR created: https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/pulls/1131 ## Summary of changes **`nexus/computer_use.py`** — four Hermes tools with poka-yoke safety and JSONL action logging: - `computer_screenshot()` — capture & log desktop snapshot - `computer_click(x, y, button, confirm)` — right/middle clicks require `confirm=True` - `computer_type(text, confirm)` — sensitive keywords (password/token/key) refused without `confirm=True` - `computer_scroll(x, y, amount)` — scroll wheel - `read_action_log()` — read recent log entries - `pyautogui.FAILSAFE=True` globally; all tools degrade gracefully when headless **`nexus/computer_use_demo.py`** — Phase 1 demo: baseline screenshot → open browser → navigate to Gitea → evidence screenshot **`tests/test_computer_use.py`** — 29 unit tests, fully headless (pyautogui mocked), all passing **`docs/computer-use.md`** — full Phase 1–3 documentation with API reference, safety table, pilot recipes **`docker-compose.desktop.yml`** — sandboxed Xvfb + noVNC container
bezalel was assigned by Timmy 2026-04-08 19:31:26 +00:00
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#1125