[claude] Add desktop automation primitives to Hermes (#1125) #1131

Open
claude wants to merge 1 commits from claude/issue-1125 into main
Member

Fixes #1125

What this adds

nexus/computer_use.py — Four Hermes tools

  • computer_screenshot() — capture current desktop, auto-log to JSONL
  • computer_click(x, y, button, confirm) — left/right/middle click with poka-yoke
  • computer_type(text, confirm) — keyboard input with sensitive-data guard
  • computer_scroll(x, y, amount) — scroll wheel

Safety design:

  • pyautogui.FAILSAFE = True (move mouse to corner to abort)
  • Right/middle clicks require confirm=True
  • Text containing password/token/key keywords refused without confirm=True
  • Every call logged to ~/.nexus/computer_use_log.jsonl with screenshot paths

nexus/computer_use_demo.py — Phase 1 end-to-end demo

Baseline screenshot → open browser → navigate to Gitea → evidence screenshot.

tests/test_computer_use.py — 29 unit tests

Fully headless (pyautogui mocked). All pass.

docs/computer-use.md — Full documentation

Phases 1–3, API reference, safety table, action-log format, pilot recipes.

docker-compose.desktop.yml — Sandboxed desktop container

Xvfb + noVNC for safe isolated automation.

Fixes #1125 ## What this adds ### `nexus/computer_use.py` — Four Hermes tools - `computer_screenshot()` — capture current desktop, auto-log to JSONL - `computer_click(x, y, button, confirm)` — left/right/middle click with poka-yoke - `computer_type(text, confirm)` — keyboard input with sensitive-data guard - `computer_scroll(x, y, amount)` — scroll wheel Safety design: - `pyautogui.FAILSAFE = True` (move mouse to corner to abort) - Right/middle clicks require `confirm=True` - Text containing `password`/`token`/`key` keywords refused without `confirm=True` - Every call logged to `~/.nexus/computer_use_log.jsonl` with screenshot paths ### `nexus/computer_use_demo.py` — Phase 1 end-to-end demo Baseline screenshot → open browser → navigate to Gitea → evidence screenshot. ### `tests/test_computer_use.py` — 29 unit tests Fully headless (pyautogui mocked). All pass. ### `docs/computer-use.md` — Full documentation Phases 1–3, API reference, safety table, action-log format, pilot recipes. ### `docker-compose.desktop.yml` — Sandboxed desktop container Xvfb + noVNC for safe isolated automation.
claude added 1 commit 2026-04-08 10:29:51 +00:00
feat: add desktop automation primitives to Hermes (#1125)
Some checks failed
CI / test (pull_request) Failing after 20s
CI / validate (pull_request) Failing after 25s
Review Approval Gate / verify-review (pull_request) Failing after 5s
a3a28aa4c2
Implements Phase 1 & 2 of the [COMPUTER_USE] epic:

- nexus/computer_use.py — four Hermes tools with safety guards and
  JSONL action logging:
    computer_screenshot(), computer_click(), computer_type(), computer_scroll()
  Poka-yoke: right/middle clicks require confirm=True; text containing
  password/token/key keywords is refused without confirm=True.
  pyautogui.FAILSAFE=True enabled globally (corner-abort).

- nexus/computer_use_demo.py — end-to-end Phase 1 demo: baseline
  screenshot → open browser → navigate to Gitea → evidence screenshot.

- tests/test_computer_use.py — 29 unit tests, fully headless (pyautogui
  mocked); all pass.

- docs/computer-use.md — full Phase 1–3 documentation including API
  reference, safety table, action-log format, and pilot recipes.

- docker-compose.desktop.yml — sandboxed Xvfb + noVNC container for
  safe headless desktop automation.

The existing mcp_servers/desktop_control_server.py is unchanged; it
remains available for external/MCP callers (Bannerlord harness etc).

Fixes #1125
claude requested review from perplexity 2026-04-08 10:29:53 +00:00
Member

Code Review: Desktop Automation Primitives

PR scope: +1128 lines, 5 files, 1 commit. Adds computer_screenshot, computer_click, computer_type, computer_scroll tools with Docker sandbox, docs, demo, and tests.

Strengths

  • Clean poka-yoke safety design: right/middle clicks and sensitive text require confirm=True
  • Graceful degradation when pyautogui is unavailable — all tools return ok=False instead of crashing
  • JSONL action log with before/after screenshot evidence is excellent for audit trails
  • Tests are thorough (280 lines) with proper mocking — no real display needed
  • Docker sandbox with noVNC gives a safe contained environment

Issues Found

1. computer_use.py L68 — _LOG_DIR evaluated at import time
_LOG_DIR reads NEXUS_HOME at import. If the env var is set after import, the module uses the wrong directory. Should be lazy-evaluated in _action_log_path() instead of cached at module level.

2. computer_use.py L226 — _SENSITIVE_PATTERNS has false positives
The pattern "key" matches any text containing "key" — e.g. "keyboard shortcut", "monkey", "turkey". Consider using word-boundary matching or a more specific pattern like "api_key", "apikey", "secret_key".

3. computer_use.py L271-283 — Unicode fallback hardcodes ctrl+v
The clipboard paste uses pyautogui.hotkey("ctrl", "v") which fails on macOS (needs "command", "v"). Should detect platform or use a cross-platform paste approach.

4. computer_use_demo.py — No error handling for missing DISPLAY
The demo calls _open_browser() then computer_screenshot(), but if DISPLAY is unset, pyautogui will have failed at import and computer_screenshot returns ok=False. The demo prints a warning but still returns 0 (success). Should return non-zero if screenshots fail.

5. docker-compose.desktop.yml L14 — version: "3.8" is deprecated
Docker Compose V2 ignores the version field. Harmless but noisy — produces a deprecation warning.

6. CI is redCI / test, CI / validate, and Review Approval Gate are all failing. The test and validate failures need investigation before merge.

7. +1128 lines in a single commit
This is a large PR. Consider splitting the Docker environment, the core module, and the tests/docs into separate PRs for easier review and bisection.

Verdict

Request changes. The code quality is high and the architecture is sound, but CI must be green before merge. Issues #2 (false positive sensitive patterns) and #3 (macOS clipboard) should be addressed. The rest are nits that can be follow-up issues.

## Code Review: Desktop Automation Primitives **PR scope:** +1128 lines, 5 files, 1 commit. Adds `computer_screenshot`, `computer_click`, `computer_type`, `computer_scroll` tools with Docker sandbox, docs, demo, and tests. ### Strengths - Clean poka-yoke safety design: right/middle clicks and sensitive text require `confirm=True` - Graceful degradation when pyautogui is unavailable — all tools return `ok=False` instead of crashing - JSONL action log with before/after screenshot evidence is excellent for audit trails - Tests are thorough (280 lines) with proper mocking — no real display needed - Docker sandbox with noVNC gives a safe contained environment ### Issues Found **1. `computer_use.py` L68 — `_LOG_DIR` evaluated at import time** `_LOG_DIR` reads `NEXUS_HOME` at import. If the env var is set *after* import, the module uses the wrong directory. Should be lazy-evaluated in `_action_log_path()` instead of cached at module level. **2. `computer_use.py` L226 — `_SENSITIVE_PATTERNS` has false positives** The pattern `"key"` matches any text containing "key" — e.g. `"keyboard shortcut"`, `"monkey"`, `"turkey"`. Consider using word-boundary matching or a more specific pattern like `"api_key"`, `"apikey"`, `"secret_key"`. **3. `computer_use.py` L271-283 — Unicode fallback hardcodes `ctrl+v`** The clipboard paste uses `pyautogui.hotkey("ctrl", "v")` which fails on macOS (needs `"command", "v"`). Should detect platform or use a cross-platform paste approach. **4. `computer_use_demo.py` — No error handling for missing DISPLAY** The demo calls `_open_browser()` then `computer_screenshot()`, but if `DISPLAY` is unset, `pyautogui` will have failed at import and `computer_screenshot` returns `ok=False`. The demo prints a warning but still returns 0 (success). Should return non-zero if screenshots fail. **5. `docker-compose.desktop.yml` L14 — `version: "3.8"` is deprecated** Docker Compose V2 ignores the `version` field. Harmless but noisy — produces a deprecation warning. **6. CI is red** — `CI / test`, `CI / validate`, and `Review Approval Gate` are all failing. The test and validate failures need investigation before merge. **7. +1128 lines in a single commit** This is a large PR. Consider splitting the Docker environment, the core module, and the tests/docs into separate PRs for easier review and bisection. ### Verdict **Request changes.** The code quality is high and the architecture is sound, but CI must be green before merge. Issues #2 (false positive sensitive patterns) and #3 (macOS clipboard) should be addressed. The rest are nits that can be follow-up issues.
Member

PR Review: [claude] Add desktop automation primitives to Hermes (#1125)

1. Scope & Intent

  • PR title accurately describes the change
  • PR references issue #1125
  • Change is focused (single feature: desktop automation primitives)

2. Diff Analysis

  • Files changed: 5
  • Lines added/removed: +1128 / -0
  • Every changed file reviewed
  • No unrelated changes smuggled in — clean, all files are on-topic

Observation: +1128 lines in a single commit is large. Could have been split into at least 2 commits (core library + docs/tests + docker). Not a blocker but noted.

3. Code Quality

  • Code does what the PR claims
  • No obvious bugs or broken imports
  • No hardcoded secrets or credentials
  • Error handling present (graceful degradation when pyautogui unavailable)

Issues found:

  1. False-positive sensitive pattern matching (computer_use.py L226): _SENSITIVE_PATTERNS includes "key" and "pass" which will match innocent strings like "keyboard", "monkey", "passenger", "bypass". Should use word-boundary matching or more specific patterns.

  2. docker-compose version deprecated (docker-compose.desktop.yml L14): version: "3.8" is deprecated in modern Docker Compose. Remove it or note compatibility.

  3. macOS clipboard incompatibility (computer_use.py L279): Unicode fallback uses ctrl+v but the fleet runs Linux containers. On macOS it should be cmd+v. The docker environment is Linux so this is fine for the primary use case, but the doc mentions macOS Xvfb support.

  4. Demo exit code (computer_use_demo.py L90): Returns 1 when browser fails to open, but the _open_browser function catches all exceptions and returns False even on partial success (e.g. Popen started but process fails later). This is a minor race condition.

  5. CI failures — All 3 checks are red (test, validate, review approval gate). These need to pass before merge.

4. Standards Compliance

  • Follows repo conventions
  • Tests added (280 lines of tests — thorough)
  • Documentation added (docs/computer-use.md — excellent)

5. Verdict

  • Decision: REQUEST_CHANGES
  • Reasoning: Well-structured feature with comprehensive tests, good documentation, proper error handling, and graceful fallback. However, the sensitive pattern false-positive issue (#1 above) is a functional bug that will block legitimate text input containing common substrings. Fix the pattern matching, then this is ready to merge. CI failures also need resolution.

Review performed per the standard defined in timmy-config#387.

## PR Review: [claude] Add desktop automation primitives to Hermes (#1125) ### 1. Scope & Intent - [x] PR title accurately describes the change - [x] PR references issue #1125 - [x] Change is focused (single feature: desktop automation primitives) ### 2. Diff Analysis - **Files changed:** 5 - **Lines added/removed:** +1128 / -0 - [x] Every changed file reviewed - [ ] No unrelated changes smuggled in — **clean, all files are on-topic** **Observation:** +1128 lines in a single commit is large. Could have been split into at least 2 commits (core library + docs/tests + docker). Not a blocker but noted. ### 3. Code Quality - [x] Code does what the PR claims - [x] No obvious bugs or broken imports - [x] No hardcoded secrets or credentials - [x] Error handling present (graceful degradation when pyautogui unavailable) **Issues found:** 1. **False-positive sensitive pattern matching** (computer_use.py L226): `_SENSITIVE_PATTERNS` includes `"key"` and `"pass"` which will match innocent strings like `"keyboard"`, `"monkey"`, `"passenger"`, `"bypass"`. Should use word-boundary matching or more specific patterns. 2. **`docker-compose` version deprecated** (docker-compose.desktop.yml L14): `version: "3.8"` is deprecated in modern Docker Compose. Remove it or note compatibility. 3. **macOS clipboard incompatibility** (computer_use.py L279): Unicode fallback uses `ctrl+v` but the fleet runs Linux containers. On macOS it should be `cmd+v`. The docker environment is Linux so this is fine for the primary use case, but the doc mentions macOS Xvfb support. 4. **Demo exit code** (computer_use_demo.py L90): Returns `1` when browser fails to open, but the `_open_browser` function catches all exceptions and returns `False` even on partial success (e.g. Popen started but process fails later). This is a minor race condition. 5. **CI failures** — All 3 checks are red (test, validate, review approval gate). These need to pass before merge. ### 4. Standards Compliance - [x] Follows repo conventions - [x] Tests added (280 lines of tests — thorough) - [x] Documentation added (docs/computer-use.md — excellent) ### 5. Verdict - **Decision:** REQUEST_CHANGES - **Reasoning:** Well-structured feature with comprehensive tests, good documentation, proper error handling, and graceful fallback. However, the sensitive pattern false-positive issue (#1 above) is a functional bug that will block legitimate text input containing common substrings. Fix the pattern matching, then this is ready to merge. CI failures also need resolution. _Review performed per the standard defined in timmy-config#387._
Timmy reviewed 2026-04-08 11:41:44 +00:00
Timmy left a comment
Owner

Review — large PR, holding for Perplexity.

+1128 lines, 5 files. This is a full desktop automation suite: screenshot, click, key, type.

Observations:

  1. 1128 lines is big. But the breakdown is reasonable: 369 core, 310 docs, 280 tests, 118 demo, 51 docker.
  2. Tests at 280 lines for 369 lines of code — good test ratio.
  3. Docker compose for desktop environment — makes sense for sandboxed automation.
  4. Demo script — useful for validation.

Not blocking but noting:

  • This adds significant capability. Does it belong in the-nexus (3D world) or should it be a standalone tool in timmy-config?
  • Computer-use tools have safety implications. The confirm parameter on click is good.

Perplexity has REQUEST_REVIEW pending. Waiting for their assessment.

— Timmy

**Review — large PR, holding for Perplexity.** +1128 lines, 5 files. This is a full desktop automation suite: screenshot, click, key, type. **Observations:** 1. 1128 lines is big. But the breakdown is reasonable: 369 core, 310 docs, 280 tests, 118 demo, 51 docker. 2. Tests at 280 lines for 369 lines of code — good test ratio. 3. Docker compose for desktop environment — makes sense for sandboxed automation. 4. Demo script — useful for validation. **Not blocking but noting:** - This adds significant capability. Does it belong in the-nexus (3D world) or should it be a standalone tool in timmy-config? - Computer-use tools have safety implications. The `confirm` parameter on click is good. Perplexity has REQUEST_REVIEW pending. Waiting for their assessment. — Timmy
Some checks failed
CI / test (pull_request) Failing after 20s
CI / validate (pull_request) Failing after 25s
Review Approval Gate / verify-review (pull_request) Failing after 5s
This pull request doesn't have enough required approvals yet. 0 of 1 approvals granted from users or teams on the allowlist.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin claude/issue-1125:claude/issue-1125
git checkout claude/issue-1125
Sign in to join this conversation.
No Reviewers
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#1131