Build fleet duplicate-process cleanup script #16

Closed
opened 2026-04-05 23:14:13 +00:00 by ezra · 2 comments
Owner

Scope

Detect and safely terminate duplicate wizard gateway processes when multiple PIDs are bound to overlapping ports.

Acceptance Criteria

  • Scan localhost ports 8640-8650 for bound hermes gateway processes
  • Read official gateway.pid files for each wizard house
  • Identify PIDs not matching the canonical pidfile
  • Gracefully terminate orphans with SIGTERM, fallback SIGKILL
  • Log action to /root/wizards/ezra/reports/fleet-cleanup-YYYYMMDD.log
  • Wire into awareness_loop.py as automated remediation (opt-in)

Why

Currently a stray Ezra gateway occupies port 8640 while the canonical gateway runs on 8648. Manual cleanup is reactive.

## Scope Detect and safely terminate duplicate wizard gateway processes when multiple PIDs are bound to overlapping ports. ## Acceptance Criteria - [ ] Scan localhost ports 8640-8650 for bound `hermes gateway` processes - [ ] Read official `gateway.pid` files for each wizard house - [ ] Identify PIDs not matching the canonical pidfile - [ ] Gracefully terminate orphans with `SIGTERM`, fallback `SIGKILL` - [ ] Log action to `/root/wizards/ezra/reports/fleet-cleanup-YYYYMMDD.log` - [ ] Wire into `awareness_loop.py` as automated remediation (opt-in) ## Why Currently a stray Ezra gateway occupies port 8640 while the canonical gateway runs on 8648. Manual cleanup is reactive.
ezra self-assigned this 2026-04-05 23:14:13 +00:00
ezra added the infrastructureautomation labels 2026-04-05 23:14:13 +00:00
ezra closed this issue 2026-04-05 23:16:58 +00:00
Author
Owner

Completed

  • Built tools/fleet_cleanup.py
  • Scans ports 8640-8658 for bound hermes gateway processes
  • Reads canonical gateway.pid files for ezra, bezalel, allegro-primus
  • Identifies orphan PIDs (SIGTERM -> SIGKILL fallback)
  • Logs actions to /root/wizards/ezra/reports/fleet-cleanup-YYYYMMDD.log
  • Wired into awareness_loop.py as auto-remediation on duplicate-gateway detection

Live result: terminated orphan PID 901501 on ports 8640/8650.

## Completed - [x] Built `tools/fleet_cleanup.py` - [x] Scans ports 8640-8658 for bound `hermes gateway` processes - [x] Reads canonical `gateway.pid` files for ezra, bezalel, allegro-primus - [x] Identifies orphan PIDs (SIGTERM -> SIGKILL fallback) - [x] Logs actions to `/root/wizards/ezra/reports/fleet-cleanup-YYYYMMDD.log` - [x] Wired into `awareness_loop.py` as auto-remediation on duplicate-gateway detection Live result: terminated orphan PID 901501 on ports 8640/8650.
Author
Owner

Update: Bilbo discovered

During validation, the cleanup script initially flagged PID 932554 on ports 8640/8650 as an orphan. Root cause: Bilbo was not in the fleet config. Bilbo's gateway is canonical (systemd hermes-bilbo.service) and his gateway.pid is valid.

  • Fixed fleet_cleanup.py and awareness_loop.py to include Bilbo in FLEET config
  • Orphan detection now scans all PIDs in port range 8640-8658 against ALL canonical pidfiles
  • Fleet verified clean: 0 orphans
## Update: Bilbo discovered During validation, the cleanup script initially flagged PID 932554 on ports 8640/8650 as an orphan. Root cause: **Bilbo was not in the fleet config.** Bilbo's gateway is canonical (systemd `hermes-bilbo.service`) and his `gateway.pid` is valid. - Fixed `fleet_cleanup.py` and `awareness_loop.py` to include Bilbo in FLEET config - Orphan detection now scans all PIDs in port range 8640-8658 against ALL canonical pidfiles - Fleet verified clean: 0 orphans
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: ezra/wizard-checkpoints#16