Files
timmy-home/docs/FLEET_PHASE_1_SURVIVAL.md
Alexander Whitestone b90a15baca
Some checks failed
Smoke Test / smoke (pull_request) Failing after 11s
feat: codify phase-1 survival state (#548)
2026-04-15 00:24:25 -04:00

2.2 KiB

[PHASE-1] Survival - Keep the Lights On

Phase 1 is the manual-clicker stage of the fleet. The machines exist. The services exist. The human is still the automation loop.

Phase Definition

  • Current state: fleet exists, agents run, everything important still depends on human vigilance.
  • Resources tracked here: Capacity, Uptime.
  • Next phase: [PHASE-2] Automation - Self-Healing Infrastructure

Current Buildings

  • VPS hosts: Ezra, Allegro, Bezalel
  • Agents: Timmy harness, Code Claw heartbeat, Gemini AI Studio worker
  • Gitea forge
  • Evennia worlds

Current Resource Snapshot

  • Fleet operational: yes
  • Uptime baseline: 0.0%
  • Days at or above 95% uptime: 0
  • Capacity utilization: 0.0%

Next Phase Trigger

To unlock [PHASE-2] Automation - Self-Healing Infrastructure, the fleet must hold both of these conditions at once:

  • Uptime >= 95% for 30 consecutive days
  • Capacity utilization > 60%
  • Current trigger state: NOT READY

Missing Requirements

  • Uptime 0.0% / 95.0%
  • Days at or above 95% uptime: 0/30
  • Capacity utilization 0.0% / >60.0%

Manual Clicker Interpretation

Paperclips analogy: Phase 1 = Manual clicker. You ARE the automation. Every restart, every SSH, every check is a manual click.

Manual Clicks Still Required

  • Restart agents and services by hand when a node goes dark.
  • SSH into machines to verify health, disk, and memory.
  • Check Gitea, relay, and world services manually before and after changes.
  • Act as the scheduler when automation is missing or only partially wired.

Repo Signals Already Present

  • scripts/fleet_health_probe.sh — Automated health probe exists and can supply the uptime baseline for the next phase.
  • scripts/fleet_milestones.py — Milestone tracker exists, so survival achievements can be narrated and logged.
  • scripts/auto_restart_agent.sh — Auto-restart tooling already exists as phase-2 groundwork.
  • scripts/backup_pipeline.sh — Backup pipeline scaffold exists for post-survival automation work.
  • infrastructure/timmy-bridge/reports/generate_report.py — Bridge reporting exists and can summarize heartbeat-driven uptime.

Notes

  • The fleet is alive, but the human is still the control loop.
  • Phase 1 is about naming reality plainly so later automation has a baseline to beat.