feat: FLEET-003 - Fleet capacity inventory with resource baselines #353

Merged
Timmy merged 3 commits from timmy/fleet-capacity-inventory into main 2026-04-07 16:23:23 +00:00
Showing only changes of commit 67c2927c1a - Show all commits

191
fleet/capacity-inventory.md Normal file
View File

@@ -0,0 +1,191 @@
# Capacity Inventory - Fleet Resource Baseline
**Last audited:** 2026-04-07 16:00 UTC
**Auditor:** Timmy (direct inspection)
---
## Fleet Resources (Paperclips Model)
Three primary resources govern the fleet:
| Resource | Role | Generation | Consumption |
|----------|------|-----------|-------------|
| **Capacity** | Compute hours available across fleet. Determines what work can be done. | Through healthy utilization of VPS/Mac agents | Fleet improvements consume it (investing in automation, orchestration, sovereignty) |
| **Uptime** | % time services are running. Earned at Fibonacci milestones. | When services stay up naturally | Degrades on any failure |
| **Innovation** | Only generates when capacity is <70% utilized. Fuels Phase 3+. | When you leave capacity free | Phase 3+ buildings consume it (requires spare capacity to build) |
### The Tension
- Run fleet at 95%+ capacity: maximum productivity, ZERO Innovation
- Run fleet at <70% capacity: Innovation generates but slower progress
- This forces the Paperclips question: optimize now or invest in future capability?
---
## VPS Resource Baselines
### Ezra (143.198.27.163) - "Forge"
| Metric | Value | Utilization |
|--------|-------|-------------|
| **OS** | Ubuntu 24.04 (6.8.0-106-generic) | |
| **vCPU** | 4 vCPU (DO basic droplet, shared) | Load: 10.76/7.59/7.04 (very high) |
| **RAM** | 7,941 MB total | 2,104 used / 5,836 available (26% used, 74% free) |
| **Disk** | 154 GB vda1 | 111 GB used / 44 GB free (72%) **WARNING** |
| **Swap** | 6,143 MB | 643 MB used (10%) |
| **Uptime** | 7 days, 18 hours | |
### Key Processes (sorted by memory)
| Process | RSS | %CPU | Notes |
|---------|-----|------|-------|
| Gitea | 556 MB | 83.5% | Web service, high CPU due to API load |
| MemPalace (ezra) | 268 MB | 136% | Mining project files - HIGH CPU |
| Hermes gateway (ezra) | 245 MB | 1.7% | Agent gateway |
| Ollama | 230 MB | 0.1% | Model serving |
| PostgreSQL | 138 MB | ~0% | Gitea database |
**Capacity assessment:** 26% memory used, but 72% disk is getting tight. CPU load is very high (10.76 on 4vCPU = 269% utilization). Ezra is CPU-bound, not RAM-bound.
### Allegro (167.99.126.228)
| Metric | Value | Utilization |
|--------|-------|-------------|
| **OS** | Ubuntu 24.04 (6.8.0-106-generic) | |
| **vCPU** | 4 vCPU (DO basic droplet, shared) | Moderate load |
| **RAM** | 7,941 MB total | 1,591 used / 6,349 available (20% used, 80% free) |
| **Disk** | 154 GB vda1 | 41 GB used / 114 GB free (27%) **GOOD** |
| **Swap** | 8,191 MB | 686 MB used (8%) |
| **Uptime** | 7 days, 18 hours | |
### Key Processes (sorted by memory)
| Process | RSS | %CPU | Notes |
|---------|-----|------|-------|
| Hermes gateway (allegro) | 680 MB | 0.9% | Main agent gateway |
| Gitea | 181 MB | 1.2% | Secondary gitea? |
| Systemd-journald | 160 MB | 0.0% | System logging |
| Ezra Hermes gateway | 58 MB | 0.0% | Running ezra agent here |
| Bezalel Hermes gateway | 58 MB | 0.0% | Running bezalel agent here |
| Dockerd | 48 MB | 0.0% | Docker daemon |
**Capacity assessment:** 20% memory used, 27% disk used. Allegro has headroom. Also running hermes gateways for Ezra and Bezalel (cross-host agent execution).
### Bezalel (159.203.146.185)
| Metric | Value | Utilization |
|--------|-------|-------------|
| **OS** | Ubuntu 24.04 (6.8.0-71-generic) | |
| **vCPU** | 2 vCPU (DO basic droplet, shared) | Load varies |
| **RAM** | 1,968 MB total | 817 used / 1,151 available (42% used, 58% free) |
| **Disk** | 48 GB vda1 | 12 GB used / 37 GB free (24%) **GOOD** |
| **Swap** | 2,047 MB | 448 MB used (22%) |
| **Uptime** | 7 days, 18 hours | |
### Key Processes (sorted by memory)
| Process | RSS | %CPU | Notes |
|---------|-----|------|-------|
| Hermes gateway | 339 MB | 7.7% | Agent gateway (16.8% of RAM) |
| uv pip install | 137 MB | 56.6% | Installing packages (temporary) |
| Mender | 27 MB | 0.0% | Device management |
**Capacity assessment:** 42% memory used, only 2GB total RAM. Bezalel is the most constrained. 2 vCPU means less compute headroom than Ezra/Allegro. Disk is fine.
### Mac Local (M3 Max)
| Metric | Value | Utilization |
|--------|-------|-------------|
| **OS** | macOS 26.3.1 | |
| **CPU** | Apple M3 Max (14 cores) | Very capable |
| **RAM** | 36 GB | ~8 GB used (22%) |
| **Disk** | 926 GB total | ~624 GB used / 302 GB free (68%) |
### Key Processes
| Process | Memory | Notes |
|---------|--------|-------|
| Hermes gateway | 500 MB | Primary gateway |
| Hermes agents (x3) | ~560 MB total | Multiple sessions |
| Ollama | ~20 MB base + model memory | Model loading varies |
| OpenClaw | 350 MB | Gateway process |
| Evennia (server+portal) | 56 MB | Game world |
---
## Resource Summary
| Resource | Ezra | Allegro | Bezalel | Mac Local | TOTAL |
|----------|------|---------|---------|-----------|-------|
| **vCPU** | 4 | 4 | 2 | 14 (M3 Max) | 24 |
| **RAM** | 8 GB (26% used) | 8 GB (20% used) | 2 GB (42% used) | 36 GB (22% used) | 54 GB |
| **Disk** | 154 GB (72%) | 154 GB (27%) | 48 GB (24%) | 926 GB (68%) | 1,282 GB |
| **Cost** | $12/mo | $12/mo | $12/mo | owned | $36/mo |
### Utilization by Category
| Category | Estimated Daily Hours | % of Fleet Capacity |
|----------|----------------------|---------------------|
| Hermes agents | ~3-4 hrs active | 5-7% |
| Ollama inference | ~1-2 hrs | 2-4% |
| Gitea services | 24/7 | 5-10% |
| Evennia | 24/7 | <1% |
| Idle | ~18-20 hrs | ~80-90% |
### Capacity Utilization: ~15-20% active
**Innovation rate:** GENERATING (capacity < 70%)
**Recommendation:** Good — Innovation is generating because most capacity is free.
This means Phase 3+ capabilities (orchestration, load balancing, etc.) are accessible NOW.
---
## Uptime Baseline
**Baseline period:** 2026-04-07 14:00-16:00 UTC (2 hours, ~24 checks at 5-min intervals)
| Service | Checks | Uptime | Status |
|---------|--------|--------|--------|
| Ezra | 24/24 | 100.0% | GOOD |
| Allegro | 24/24 | 100.0% | GOOD |
| Bezalel | 24/24 | 100.0% | GOOD |
| Gitea | 23/24 | 95.8% | GOOD |
| Hermes Gateway | 23/24 | 95.8% | GOOD |
| Ollama | 24/24 | 100.0% | GOOD |
| OpenClaw | 24/24 | 100.0% | GOOD |
| Evennia | 24/24 | 100.0% | GOOD |
| Hermes Agent | 21/24 | 87.5% | **CHECK** |
### Fibonacci Uptime Milestones
| Milestone | Target | Current | Status |
|-----------|--------|---------|--------|
| 95% | 95% | 100% (VPS), 98.6% (avg) | REACHED |
| 95.5% | 95.5% | 98.6% | REACHED |
| 96% | 96% | 98.6% | REACHED |
| 97% | 97% | 98.6% | REACHED |
| 98% | 98% | 98.6% | REACHED |
| 99% | 99% | 98.6% | APPROACHING |
---
## Risk Assessment
| Risk | Severity | Mitigation |
|------|----------|------------|
| Ezra disk 72% used | MEDIUM | Move non-essential data, add monitoring alert at 85% |
| Bezalel only 2GB RAM | HIGH | Cannot run large models locally. Good for Evennia, tight for agents |
| Ezra CPU load 269% | HIGH | MemPalace mining consuming 136% CPU. Consider scheduling |
| Mac disk 68% used | MEDIUM | 302 GB free still. Growing but not urgent |
| No cross-VPS mesh | LOW | SSH works but no Tailscale. No private network between VPSes |
---
## Recommendations
### Immediate (Phase 1-2)
1. **Ezra disk cleanup:** 44 GB free at 72%. Docker images, old logs, and MemPalace mine data could be rotated.
2. **Alert thresholds:** Add disk alerts at 85% (Ezra, Mac) before they become critical.
### Short-term (Phase 3)
3. **Load balancing:** Ezra is CPU-bound, Allegro has 80% RAM free. Move some agent processes from Ezra to Allegro.
4. **Innovation investment:** Since fleet is at 15-20% utilization, Innovation is high. This is the time to build Phase 3 capabilities.
### Medium-term (Phase 4)
5. **Bezalel RAM upgrade:** 2GB is tight. Consider upgrade to 4GB ($24/mo instead of $12/mo).
6. **Tailscale mesh:** Install on all VPSes for private inter-VPS network.
---