Commit Graph

5 Commits

Author SHA1 Message Date
Alexander Whitestone
31d81e76b3 feat(ci): poka-yoke runner health — provision + probe scripts (#1097)
Some checks failed
CI / test (pull_request) Failing after 5s
CI / validate (pull_request) Failing after 2s
Add three deliverables to make unregistered Gitea runners impossible to miss:

1. scripts/provision-runner.sh
   - Downloads and installs act_runner binary at a pinned version
   - Registers the runner with Gitea (fails non-zero if registration fails)
   - Creates and enables act_runner systemd service
   - Installs runner-health-probe timer as part of the same provisioning pass
   - Prints a poka-yoke checklist summary at the end

2. scripts/runner-health-probe.sh
   - Queries Gitea admin API for active runner count every run
   - Reports count to systemd journal (visible via journalctl)
   - On zero runners: writes P1 alert marker to /var/lib/act_runner/alerts/,
     attempts to restart act_runner service, re-queries to verify recovery,
     logs recovery or continued failure with actionable next steps
   - Cooldown (10 min) prevents restart storm

3. scripts/systemd/runner-health-probe.{service,timer}
   - Systemd oneshot service wrapping the probe script
   - Timer fires 2 min after boot then every 5 min (OnUnitActiveSec=5min)
   - Persistent=true so missed runs fire on recovery from downtime

Fixes #1097
2026-04-07 10:31:58 -04:00
6ae9547145 fix(ci): repair JSON validation syntax, add repo-truth guard, copy robots.txt/index.html in Dockerfile
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
CI / test (pull_request) Failing after 8s
CI / validate (pull_request) Failing after 3s
2026-04-07 14:24:10 +00:00
52a1ade924 [claude] bezalel MemPalace field report + incremental mine script (#1072) (#1085)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-07 14:02:12 +00:00
68ee170bbb feat(ops): add cross-VPS fleet management script
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
- Local service control for alpha (4 agents) and beta (bezalel) hosts
- Status, restart, stop, start, update, and health commands
- Remote proxy via SSH with graceful fallback if keys not configured

Closes #910
2026-04-07 02:48:05 +00:00
fcf07357c1 [claude] Reassign Fenrir's orphaned issues to active wizards (#823) (#892)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
2026-04-06 18:20:13 +00:00