feat(ci): poka-yoke runner health — provision + probe scripts (#1097)
Add three deliverables to make unregistered Gitea runners impossible to miss:
1. scripts/provision-runner.sh
- Downloads and installs act_runner binary at a pinned version
- Registers the runner with Gitea (fails non-zero if registration fails)
- Creates and enables act_runner systemd service
- Installs runner-health-probe timer as part of the same provisioning pass
- Prints a poka-yoke checklist summary at the end
2. scripts/runner-health-probe.sh
- Queries Gitea admin API for active runner count every run
- Reports count to systemd journal (visible via journalctl)
- On zero runners: writes P1 alert marker to /var/lib/act_runner/alerts/,
attempts to restart act_runner service, re-queries to verify recovery,
logs recovery or continued failure with actionable next steps
- Cooldown (10 min) prevents restart storm
3. scripts/systemd/runner-health-probe.{service,timer}
- Systemd oneshot service wrapping the probe script
- Timer fires 2 min after boot then every 5 min (OnUnitActiveSec=5min)
- Persistent=true so missed runs fire on recovery from downtime
Fixes #1097
This commit is contained in:
16
scripts/systemd/runner-health-probe.service
Normal file
16
scripts/systemd/runner-health-probe.service
Normal file
@@ -0,0 +1,16 @@
|
||||
[Unit]
|
||||
Description=Gitea Runner Health Probe (poka-yoke zero-runner detection)
|
||||
Documentation=https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/1097
|
||||
After=network.target act_runner.service
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/root/wizards/the-nexus/scripts/runner-health-probe.sh
|
||||
StandardOutput=journal
|
||||
StandardError=journal
|
||||
Environment=HOME=/root
|
||||
# Token can be set here or via /etc/act_runner/gitea-probe-token file
|
||||
# EnvironmentFile=/etc/act_runner/probe.env
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
14
scripts/systemd/runner-health-probe.timer
Normal file
14
scripts/systemd/runner-health-probe.timer
Normal file
@@ -0,0 +1,14 @@
|
||||
[Unit]
|
||||
Description=Gitea Runner Health Probe — fires every 5 minutes (poka-yoke #1097)
|
||||
Documentation=https://forge.alexanderwhitestone.com/Timmy_Foundation/the-nexus/issues/1097
|
||||
|
||||
[Timer]
|
||||
# Start 2 minutes after boot (let network and act_runner settle)
|
||||
OnBootSec=2min
|
||||
# Then fire every 5 minutes
|
||||
OnUnitActiveSec=5min
|
||||
# Re-fire missed runs after downtime
|
||||
Persistent=true
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
Reference in New Issue
Block a user