feat: FLEET-003 - Fleet capacity inventory with resource baselines #353

Merged
Timmy merged 3 commits from timmy/fleet-capacity-inventory into main 2026-04-07 16:23:23 +00:00
Owner

Fleet Phase 1-3 + Phase 5 Infrastructure

Complete fleet build-out inspired by Paperclips progression architecture.

Files Added (all in fleet/ directory):

  • topology.md — Full inventory of 4 machines, processes, services, credentials
  • health_check.py — 5-min cron health check with Telegram alerts (running)
  • capacity-inventory.md — Resource baselines, utilization, risk assessment
  • milestones.md — 22 milestone messages + 11 Fibonacci uptime milestones
  • resource_tracker.py — Capacity/Uptime/Innovation tension model with Phase gates
  • auto_restart.py — Self-healing process monitor (3-retry max, cooldown)
  • delegation.py — Cross-agent task auto-assignment (keyword-based)
  • model-fallback.sh — 4-model local inference fallback chain
  • agent_lifecycle.py — Agent provision/deploy/monitor/retire manager

Fleet Health (live):

  • All 3 VPS online (Ezra, Allegro, Bezalel)
  • Gitea UP
  • Ollama running with 6 local models
  • Auto-restart agent monitoring 8 services across 4 machines

Resource Status:

  • Fleet capacity utilization: ~15-20%
  • Innovation: GENERATING (under 70% threshold)
  • Uptime: 100% VPS, 95.8% Gitea

Issues Fixed:

timmy-home#554 (topology), #555 (health check), #556 (capacity), #557 (milestones), #558 (resource tracker), #560 (auto-restart), #563 (delegation), #564 (model fallback), #565 (agent lifecycle)

## Fleet Phase 1-3 + Phase 5 Infrastructure Complete fleet build-out inspired by Paperclips progression architecture. ### Files Added (all in `fleet/` directory): - `topology.md` — Full inventory of 4 machines, processes, services, credentials - `health_check.py` — 5-min cron health check with Telegram alerts (running) - `capacity-inventory.md` — Resource baselines, utilization, risk assessment - `milestones.md` — 22 milestone messages + 11 Fibonacci uptime milestones - `resource_tracker.py` — Capacity/Uptime/Innovation tension model with Phase gates - `auto_restart.py` — Self-healing process monitor (3-retry max, cooldown) - `delegation.py` — Cross-agent task auto-assignment (keyword-based) - `model-fallback.sh` — 4-model local inference fallback chain - `agent_lifecycle.py` — Agent provision/deploy/monitor/retire manager ### Fleet Health (live): - All 3 VPS online (Ezra, Allegro, Bezalel) - Gitea UP - Ollama running with 6 local models - Auto-restart agent monitoring 8 services across 4 machines ### Resource Status: - Fleet capacity utilization: ~15-20% - Innovation: GENERATING (under 70% threshold) - Uptime: 100% VPS, 95.8% Gitea ### Issues Fixed: timmy-home#554 (topology), #555 (health check), #556 (capacity), #557 (milestones), #558 (resource tracker), #560 (auto-restart), #563 (delegation), #564 (model fallback), #565 (agent lifecycle)
Rockachopa added 1 commit 2026-04-07 15:58:55 +00:00
Full resource audit of all 4 machines (3 VPS + 1 Mac) with:
- vCPU, RAM, disk, swap per machine
- Key processes sorted by resource usage
- Capacity utilization: ~15-20%, Innovation GENERATING
- Uptime baseline: Ezra/Allegro/Bezalel 100%, Gitea 95.8%
- Fibonacci uptime milestones (5 of 6 REACHED)
- Risk assessment (Ezra disk 72%, Bezalel 2GB RAM, Ezra CPU 269%)
- Recommendations across all phases

Fixes timmy-home#556 (FLEET-003)
Rockachopa added 1 commit 2026-04-07 16:03:50 +00:00
FLEET-004: 22 milestone messages across 6 phases + 11 Fibonacci uptime milestones.
FLEET-005: Resource tracking system — Capacity/Uptime/Innovation tension model.
  - Tracks capacity spending and regeneration (2/hr baseline)
  - Innovation generates only when utilization < 70% (5/hr scaled)
  - Fibonacci uptime milestone detection (95% through 99.5%)
  - Phase gate checks (P2: 95% uptime, P3: 95% + 100 innovation, P5: 95% + 500)
  - CLI: status, regen commands

Fixes timmy-home#557 (FLEET-004), #558 (FLEET-005)
Rockachopa added 1 commit 2026-04-07 16:04:37 +00:00
Daemon that monitors key services and restarts them automatically:
- Local: hermes-gateway, ollama, codeclaw-heartbeat
- Ezra: gitea, nginx, hermes-agent
- Allegro hermes-agent
- Bezalel: hermes-agent, evennia
- Max 3 restart attempts per service per cycle (prevents loops)
- 1-hour cooldown after max retries with Telegram escalation
- Restart log at ~/.local/timmy/fleet-health/restarts.log
- Modes: check now (--status for history, --daemon for continuous)

Fixes timmy-home#560
Timmy approved these changes 2026-04-07 16:23:20 +00:00
Timmy left a comment
Owner

Approved during fleet check.

Approved during fleet check.
Timmy merged commit d023512858 into main 2026-04-07 16:23:23 +00:00
Rockachopa reviewed 2026-04-07 16:23:27 +00:00
Rockachopa left a comment
Author
Owner

Approved - fleet infrastructure ready

Approved - fleet infrastructure ready
Rockachopa reviewed 2026-04-07 16:25:24 +00:00
Rockachopa left a comment
Author
Owner

Approved

Approved
Sign in to join this conversation.