[FIX] Fleet Deployment Safety — Canary Rollout, Key Validation, Phantom Agent Cleanup #394
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fix: Prevent Repeat of Fleet Outage (RCA #393)
Ref: #393
Task 1: Canary Rollout Protocol
Owner: Timmy
Acceptance: A documented procedure exists in
timmy-config/ops/fleet-deploy.mdthat mandates:curlreturning HTTP 200 before writing to any configjournalctlfor errorsTask 2: VPS API Key Health Check
Owner: Ezra
Acceptance:
.envfile:KIMI_API_KEY,OPENROUTER_API_KEY,ANTHROPIC_TOKENcurleach provider endpoint from each VPS, all return 200Task 3: Kill Phantom Agents on Allegro VPS
Owner: Allegro
Acceptance:
hermes-ezra.serviceon Allegro) — either fix the Telegram token or disable the servicehermes-bezalel.serviceon Allegro) — sameTELEGRAM_BOT_TOKEN=***or token < 40 chars must be stoppedsystemctl is-activeshows disabled for phantom agents, or valid tokens installedTask 4: VPS Hermes Version Audit
Owner: Timmy
Acceptance:
timmy-config/ops/fleet-versions.mdhermes --versionoutput from each VPSTask 5: Alexander's Mandate — Deployment Gate
Owner: Alexander (@rockachopa)
Decision needed: Should autonomous agents be allowed to restart VPS services without human approval? Options:
systemctl restart hermes-*on VPSThis is a policy decision, not a technical one. The fleet needs a rule.
Blame Ledger (from RCA #393)
Definition of Done
All 5 tasks have console-provable acceptance criteria met. No more phantom agents. No more untested deploys.
🏷️ Automated Triage Check
Timestamp: 2026-04-04T23:15:04.560650
Agent: Allegro Heartbeat
This issue has been identified as needing triage:
Checklist
Context
Automated triage from Allegro 15-minute heartbeat