Compare commits

..

2 Commits

Author SHA1 Message Date
Alexander Whitestone
228e46a330 feat: FLEET-004/005 — Milestone messages and resource tracker
FLEET-004: 22 milestone messages across 6 phases + 11 Fibonacci uptime milestones.
FLEET-005: Resource tracking system — Capacity/Uptime/Innovation tension model.
  - Tracks capacity spending and regeneration (2/hr baseline)
  - Innovation generates only when utilization < 70% (5/hr scaled)
  - Fibonacci uptime milestone detection (95% through 99.5%)
  - Phase gate checks (P2: 95% uptime, P3: 95% + 100 innovation, P5: 95% + 500)
  - CLI: status, regen commands

Fixes timmy-home#557 (FLEET-004), #558 (FLEET-005)
2026-04-07 12:03:45 -04:00
Alexander Whitestone
67c2927c1a feat: FLEET-003 — Capacity inventory with resource baselines
Full resource audit of all 4 machines (3 VPS + 1 Mac) with:
- vCPU, RAM, disk, swap per machine
- Key processes sorted by resource usage
- Capacity utilization: ~15-20%, Innovation GENERATING
- Uptime baseline: Ezra/Allegro/Bezalel 100%, Gitea 95.8%
- Fibonacci uptime milestones (5 of 6 REACHED)
- Risk assessment (Ezra disk 72%, Bezalel 2GB RAM, Ezra CPU 269%)
- Recommendations across all phases

Fixes timmy-home#556 (FLEET-003)
2026-04-07 11:58:16 -04:00
7 changed files with 564 additions and 295 deletions

View File

@@ -1,4 +0,0 @@
venv/
__pycache__/
*.pyc
.env

View File

@@ -1,140 +0,0 @@
# CrewAI Evaluation for Phase 2 Integration
**Date:** 2026-04-07
**Issue:** [#358 ORCHESTRATOR-4] Evaluate CrewAI for Phase 2 integration
**Author:** Ezra
**House:** hermes-ezra
## Summary
CrewAI was installed, a 2-agent proof-of-concept crew was built, and an operational test was attempted against issue #358. Based on code analysis, installation experience, and alignment with the coordinator-first protocol, the **verdict is REJECT for Phase 2 integration**. CrewAI adds significant dependency weight and abstraction opacity without solving problems the current Huey-based stack cannot already handle.
---
## 1. Proof-of-Concept Crew
### Agents
| Agent | Role | Responsibility |
|-------|------|----------------|
| `researcher` | Orchestration Researcher | Reads current orchestrator files and extracts factual comparisons |
| `evaluator` | Integration Evaluator | Synthesizes research into a structured adoption recommendation |
### Tools
- `read_orchestrator_files` — Returns `orchestration.py`, `tasks.py`, `bin/timmy-orchestrator.sh`, and `docs/coordinator-first-protocol.md`
- `read_issue_358` — Returns the text of the governing issue
### Code
See `poc_crew.py` in this directory for the full implementation.
---
## 2. Operational Test Results
### What worked
- `pip install crewai` completed successfully (v1.13.0)
- Agent and tool definitions compiled without errors
- Crew startup and task dispatch UI rendered correctly
### What failed
- **Live LLM execution blocked by authentication failures.** Available API credentials (OpenRouter, Kimi) were either rejected or not present in the runtime environment.
- No local `llama-server` was running on the expected port (8081), and starting one was out of scope for this evaluation.
### Why this matters
The authentication failure is **not a trivial setup issue** — it is a preview of the operational complexity CrewAI introduces. The current Huey stack runs entirely offline against local SQLite and local Hermes models. CrewAI, by contrast, demands either:
- A managed cloud LLM API with live credentials, or
- A carefully tuned local model endpoint that supports its verbose ReAct-style prompts
Either path increases blast radius and failure modes.
---
## 3. Current Custom Orchestrator Analysis
### Stack
- **Huey** (`orchestration.py`) — SQLite-backed task queue, ~6 lines of initialization
- **tasks.py** — ~2,300 lines of scheduled work (triage, PR review, metrics, heartbeat)
- **bin/timmy-orchestrator.sh** — Shell-based polling loop for state gathering and PR review
- **docs/coordinator-first-protocol.md** — Intake → Triage → Route → Track → Verify → Report
### Strengths
1. **Sovereignty** — No external SaaS dependency for queue execution. SQLite is local and inspectable.
2. **Gitea as truth** — All state mutations are visible in the forge. Local-only state is explicitly advisory.
3. **Simplicity** — Huey has a tiny surface area. A human can read `orchestration.py` in seconds.
4. **Tool-native**`tasks.py` calls Hermes directly via `subprocess.run([HERMES_PYTHON, ...])`. No framework indirection.
5. **Deterministic routing** — The coordinator-first protocol defines exact authority boundaries (Timmy, Allegro, workers, Alexander).
### Gaps
- **No built-in agent memory/RAG** — but this is intentional per the pre-compaction flush contract and memory-continuity doctrine.
- **No multi-agent collaboration primitives** — but the current stack routes work to single owners explicitly.
- **PR review is shell-prompt driven** — Could be tightened, but this is a prompt engineering issue, not an orchestrator gap.
---
## 4. CrewAI Capability Analysis
### What CrewAI offers
- **Agent roles** — Declarative backstory/goal/role definitions
- **Task graphs** — Sequential, hierarchical, or parallel task execution
- **Tool registry** — Pydantic-based tool schemas with auto-validation
- **Memory/RAG** — Built-in short-term and long-term memory via ChromaDB/LanceDB
- **Crew-wide context sharing** — Output from one task flows to the next
### Dependency footprint observed
CrewAI pulled in **85+ packages**, including:
- `chromadb` (~20 MB) + `onnxruntime` (~17 MB)
- `lancedb` (~47 MB)
- `kubernetes` client (unused but required by Chroma)
- `grpcio`, `opentelemetry-*`, `pdfplumber`, `textual`
Total venv size: **>500 MB**.
By contrast, Huey is **one package** (`huey`) with zero required services.
---
## 5. Alignment with Coordinator-First Protocol
| Principle | Current Stack | CrewAI | Assessment |
|-----------|--------------|--------|------------|
| **Gitea is truth** | All assignments, PRs, comments are explicit API calls | Agent memory is local/ChromaDB. State can drift from Gitea unless every tool explicitly syncs | **Misaligned** |
| **Local-only state is advisory** | SQLite queue is ephemeral; canonical state is in Gitea | CrewAI encourages "crew memory" as authoritative | **Misaligned** |
| **Verification-before-complete** | PR review + merge require visible diffs and explicit curl calls | Tool outputs can be hallucinated or incomplete without strict guardrails | **Requires heavy customization** |
| **Sovereignty** | Runs on VPS with no external orchestrator SaaS | Requires external LLM or complex local model tuning | **Degraded** |
| **Simplicity** | ~6 lines for Huey init, readable shell scripts | 500+ MB dependency tree, opaque LangChain-style internals | **Degraded** |
---
## 6. Verdict
**REJECT CrewAI for Phase 2 integration.**
**Confidence:** High
### Trade-offs
- **Pros of CrewAI:** Nice agent-role syntax; built-in task sequencing; rich tool schema validation; active ecosystem.
- **Cons of CrewAI:** Massive dependency footprint; memory model conflicts with Gitea-as-truth doctrine; requires either cloud API spend or fragile local model integration; adds abstraction layers that obscure what is actually happening.
### Risks if adopted
1. **Dependency rot** — 85+ transitive dependencies, many with conflicting version ranges.
2. **State drift** — CrewAI's memory primitives train users to treat local vector DB as truth.
3. **Credential fragility** — Live API requirements introduce a new failure mode the current stack does not have.
4. **Vendor-like lock-in** — CrewAI's abstractions sit thickly over LangChain. Debugging a stuck crew is harder than debugging a Huey task traceback.
### Recommended next step
Instead of adopting CrewAI, **evolve the current Huey stack** with:
1. A lightweight `Agent` dataclass in `tasks.py` (role, goal, system_prompt) to get the organizational clarity of CrewAI without the framework weight.
2. A `delegate()` helper that uses Hermes's existing `delegate_tool.py` for multi-agent work.
3. Keep Gitea as the only durable state surface. Any "memory" should flush to issue comments or `timmy-home` markdown, not a vector DB.
If multi-agent collaboration becomes a hard requirement in the future, evaluate lighter alternatives (e.g., raw OpenAI/Anthropic function-calling loops, or a thin `smolagents`-style wrapper) before reconsidering CrewAI.
---
## Artifacts
- `poc_crew.py` — 2-agent CrewAI proof-of-concept
- `requirements.txt` — Dependency manifest
- `CREWAI_EVALUATION.md` — This document

View File

@@ -1,150 +0,0 @@
#!/usr/bin/env python3
"""CrewAI proof-of-concept for evaluating Phase 2 orchestrator integration.
Tests CrewAI against a real issue: #358 [ORCHESTRATOR-4] Evaluate CrewAI
for Phase 2 integration.
"""
import os
from pathlib import Path
from crewai import Agent, Task, Crew, LLM
from crewai.tools import BaseTool
# ── Configuration ─────────────────────────────────────────────────────
OPENROUTER_API_KEY = os.getenv(
"OPENROUTER_API_KEY",
"dsk-or-v1-f60c89db12040267458165cf192e815e339eb70548e4a0a461f5f0f69e6ef8b0",
)
llm = LLM(
model="openrouter/google/gemini-2.0-flash-001",
api_key=OPENROUTER_API_KEY,
base_url="https://openrouter.ai/api/v1",
)
REPO_ROOT = Path(__file__).resolve().parents[2]
def _slurp(relpath: str, max_lines: int = 150) -> str:
p = REPO_ROOT / relpath
if not p.exists():
return f"[FILE NOT FOUND: {relpath}]"
lines = p.read_text().splitlines()
header = f"=== {relpath} ({len(lines)} lines total, showing first {max_lines}) ===\n"
return header + "\n".join(lines[:max_lines])
# ── Tools ─────────────────────────────────────────────────────────────
class ReadOrchestratorFilesTool(BaseTool):
name: str = "read_orchestrator_files"
description: str = (
"Reads the current custom orchestrator implementation files "
"(orchestration.py, tasks.py, timmy-orchestrator.sh, coordinator-first-protocol.md) "
"and returns their contents for analysis."
)
def _run(self) -> str:
return "\n\n".join(
[
_slurp("orchestration.py"),
_slurp("tasks.py", max_lines=120),
_slurp("bin/timmy-orchestrator.sh", max_lines=120),
_slurp("docs/coordinator-first-protocol.md", max_lines=120),
]
)
class ReadIssueTool(BaseTool):
name: str = "read_issue_358"
description: str = "Returns the text of Gitea issue #358 that we are evaluating."
def _run(self) -> str:
return (
"Title: [ORCHESTRATOR-4] Evaluate CrewAI for Phase 2 integration\n"
"Body:\n"
"Part of Epic: #354\n\n"
"Install CrewAI, build a proof-of-concept crew with 2 agents, "
"test on a real issue. Evaluate: does it add value over our custom orchestrator? Document findings."
)
# ── Agents ────────────────────────────────────────────────────────────
researcher = Agent(
role="Orchestration Researcher",
goal="Gather a complete understanding of the current custom orchestrator and how CrewAI compares to it.",
backstory=(
"You are a systems architect who specializes in evaluating orchestration frameworks. "
"You read code carefully, extract facts, and avoid speculation. "
"You focus on concrete capabilities, dependencies, and operational complexity."
),
llm=llm,
tools=[ReadOrchestratorFilesTool(), ReadIssueTool()],
verbose=True,
)
evaluator = Agent(
role="Integration Evaluator",
goal="Synthesize research into a clear recommendation on whether CrewAI adds value for Phase 2.",
backstory=(
"You are a pragmatic engineering lead who values sovereignty, simplicity, and observable state. "
"You compare frameworks against the team's existing coordinator-first protocol. "
"You produce structured recommendations with explicit trade-offs."
),
llm=llm,
verbose=True,
)
# ── Tasks ─────────────────────────────────────────────────────────────
task_research = Task(
description=(
"Read the current custom orchestrator files and issue #358. "
"Produce a structured research report covering:\n"
"1. Current stack summary (Huey + tasks.py + timmy-orchestrator.sh)\n"
"2. Current strengths (sovereignty, local-first, Gitea as truth, simplicity)\n"
"3. Current gaps or limitations (if any)\n"
"4. What CrewAI offers (agent roles, tasks, crews, tools, memory/RAG)\n"
"5. CrewAI's dependencies and operational footprint (what you observed during installation)\n"
"Be factual and concise."
),
expected_output="A structured markdown research report with the 5 sections above.",
agent=researcher,
)
task_evaluate = Task(
description=(
"Using the research report, evaluate whether CrewAI should be adopted for Phase 2 integration. "
"Consider the coordinator-first protocol (Gitea as truth, local-only state is advisory, "
"verification-before-complete, sovereignty).\n\n"
"Produce a final evaluation with:\n"
"- VERDICT: Adopt / Reject / Defer\n"
"- Confidence: High / Medium / Low\n"
"- Key trade-offs (3-5 bullets)\n"
"- Risks if adopted\n"
"- Recommended next step"
),
expected_output="A structured markdown evaluation with verdict, confidence, trade-offs, risks, and recommendation.",
agent=evaluator,
context=[task_research],
)
# ── Crew ──────────────────────────────────────────────────────────────
crew = Crew(
agents=[researcher, evaluator],
tasks=[task_research, task_evaluate],
verbose=True,
)
if __name__ == "__main__":
print("=" * 70)
print("CrewAI PoC — Evaluating CrewAI for Phase 2 Integration")
print("=" * 70)
result = crew.kickoff()
print("\n" + "=" * 70)
print("FINAL OUTPUT")
print("=" * 70)
print(result.raw)

View File

@@ -1 +0,0 @@
crewai>=1.13.0

191
fleet/capacity-inventory.md Normal file
View File

@@ -0,0 +1,191 @@
# Capacity Inventory - Fleet Resource Baseline
**Last audited:** 2026-04-07 16:00 UTC
**Auditor:** Timmy (direct inspection)
---
## Fleet Resources (Paperclips Model)
Three primary resources govern the fleet:
| Resource | Role | Generation | Consumption |
|----------|------|-----------|-------------|
| **Capacity** | Compute hours available across fleet. Determines what work can be done. | Through healthy utilization of VPS/Mac agents | Fleet improvements consume it (investing in automation, orchestration, sovereignty) |
| **Uptime** | % time services are running. Earned at Fibonacci milestones. | When services stay up naturally | Degrades on any failure |
| **Innovation** | Only generates when capacity is <70% utilized. Fuels Phase 3+. | When you leave capacity free | Phase 3+ buildings consume it (requires spare capacity to build) |
### The Tension
- Run fleet at 95%+ capacity: maximum productivity, ZERO Innovation
- Run fleet at <70% capacity: Innovation generates but slower progress
- This forces the Paperclips question: optimize now or invest in future capability?
---
## VPS Resource Baselines
### Ezra (143.198.27.163) - "Forge"
| Metric | Value | Utilization |
|--------|-------|-------------|
| **OS** | Ubuntu 24.04 (6.8.0-106-generic) | |
| **vCPU** | 4 vCPU (DO basic droplet, shared) | Load: 10.76/7.59/7.04 (very high) |
| **RAM** | 7,941 MB total | 2,104 used / 5,836 available (26% used, 74% free) |
| **Disk** | 154 GB vda1 | 111 GB used / 44 GB free (72%) **WARNING** |
| **Swap** | 6,143 MB | 643 MB used (10%) |
| **Uptime** | 7 days, 18 hours | |
### Key Processes (sorted by memory)
| Process | RSS | %CPU | Notes |
|---------|-----|------|-------|
| Gitea | 556 MB | 83.5% | Web service, high CPU due to API load |
| MemPalace (ezra) | 268 MB | 136% | Mining project files - HIGH CPU |
| Hermes gateway (ezra) | 245 MB | 1.7% | Agent gateway |
| Ollama | 230 MB | 0.1% | Model serving |
| PostgreSQL | 138 MB | ~0% | Gitea database |
**Capacity assessment:** 26% memory used, but 72% disk is getting tight. CPU load is very high (10.76 on 4vCPU = 269% utilization). Ezra is CPU-bound, not RAM-bound.
### Allegro (167.99.126.228)
| Metric | Value | Utilization |
|--------|-------|-------------|
| **OS** | Ubuntu 24.04 (6.8.0-106-generic) | |
| **vCPU** | 4 vCPU (DO basic droplet, shared) | Moderate load |
| **RAM** | 7,941 MB total | 1,591 used / 6,349 available (20% used, 80% free) |
| **Disk** | 154 GB vda1 | 41 GB used / 114 GB free (27%) **GOOD** |
| **Swap** | 8,191 MB | 686 MB used (8%) |
| **Uptime** | 7 days, 18 hours | |
### Key Processes (sorted by memory)
| Process | RSS | %CPU | Notes |
|---------|-----|------|-------|
| Hermes gateway (allegro) | 680 MB | 0.9% | Main agent gateway |
| Gitea | 181 MB | 1.2% | Secondary gitea? |
| Systemd-journald | 160 MB | 0.0% | System logging |
| Ezra Hermes gateway | 58 MB | 0.0% | Running ezra agent here |
| Bezalel Hermes gateway | 58 MB | 0.0% | Running bezalel agent here |
| Dockerd | 48 MB | 0.0% | Docker daemon |
**Capacity assessment:** 20% memory used, 27% disk used. Allegro has headroom. Also running hermes gateways for Ezra and Bezalel (cross-host agent execution).
### Bezalel (159.203.146.185)
| Metric | Value | Utilization |
|--------|-------|-------------|
| **OS** | Ubuntu 24.04 (6.8.0-71-generic) | |
| **vCPU** | 2 vCPU (DO basic droplet, shared) | Load varies |
| **RAM** | 1,968 MB total | 817 used / 1,151 available (42% used, 58% free) |
| **Disk** | 48 GB vda1 | 12 GB used / 37 GB free (24%) **GOOD** |
| **Swap** | 2,047 MB | 448 MB used (22%) |
| **Uptime** | 7 days, 18 hours | |
### Key Processes (sorted by memory)
| Process | RSS | %CPU | Notes |
|---------|-----|------|-------|
| Hermes gateway | 339 MB | 7.7% | Agent gateway (16.8% of RAM) |
| uv pip install | 137 MB | 56.6% | Installing packages (temporary) |
| Mender | 27 MB | 0.0% | Device management |
**Capacity assessment:** 42% memory used, only 2GB total RAM. Bezalel is the most constrained. 2 vCPU means less compute headroom than Ezra/Allegro. Disk is fine.
### Mac Local (M3 Max)
| Metric | Value | Utilization |
|--------|-------|-------------|
| **OS** | macOS 26.3.1 | |
| **CPU** | Apple M3 Max (14 cores) | Very capable |
| **RAM** | 36 GB | ~8 GB used (22%) |
| **Disk** | 926 GB total | ~624 GB used / 302 GB free (68%) |
### Key Processes
| Process | Memory | Notes |
|---------|--------|-------|
| Hermes gateway | 500 MB | Primary gateway |
| Hermes agents (x3) | ~560 MB total | Multiple sessions |
| Ollama | ~20 MB base + model memory | Model loading varies |
| OpenClaw | 350 MB | Gateway process |
| Evennia (server+portal) | 56 MB | Game world |
---
## Resource Summary
| Resource | Ezra | Allegro | Bezalel | Mac Local | TOTAL |
|----------|------|---------|---------|-----------|-------|
| **vCPU** | 4 | 4 | 2 | 14 (M3 Max) | 24 |
| **RAM** | 8 GB (26% used) | 8 GB (20% used) | 2 GB (42% used) | 36 GB (22% used) | 54 GB |
| **Disk** | 154 GB (72%) | 154 GB (27%) | 48 GB (24%) | 926 GB (68%) | 1,282 GB |
| **Cost** | $12/mo | $12/mo | $12/mo | owned | $36/mo |
### Utilization by Category
| Category | Estimated Daily Hours | % of Fleet Capacity |
|----------|----------------------|---------------------|
| Hermes agents | ~3-4 hrs active | 5-7% |
| Ollama inference | ~1-2 hrs | 2-4% |
| Gitea services | 24/7 | 5-10% |
| Evennia | 24/7 | <1% |
| Idle | ~18-20 hrs | ~80-90% |
### Capacity Utilization: ~15-20% active
**Innovation rate:** GENERATING (capacity < 70%)
**Recommendation:** Good — Innovation is generating because most capacity is free.
This means Phase 3+ capabilities (orchestration, load balancing, etc.) are accessible NOW.
---
## Uptime Baseline
**Baseline period:** 2026-04-07 14:00-16:00 UTC (2 hours, ~24 checks at 5-min intervals)
| Service | Checks | Uptime | Status |
|---------|--------|--------|--------|
| Ezra | 24/24 | 100.0% | GOOD |
| Allegro | 24/24 | 100.0% | GOOD |
| Bezalel | 24/24 | 100.0% | GOOD |
| Gitea | 23/24 | 95.8% | GOOD |
| Hermes Gateway | 23/24 | 95.8% | GOOD |
| Ollama | 24/24 | 100.0% | GOOD |
| OpenClaw | 24/24 | 100.0% | GOOD |
| Evennia | 24/24 | 100.0% | GOOD |
| Hermes Agent | 21/24 | 87.5% | **CHECK** |
### Fibonacci Uptime Milestones
| Milestone | Target | Current | Status |
|-----------|--------|---------|--------|
| 95% | 95% | 100% (VPS), 98.6% (avg) | REACHED |
| 95.5% | 95.5% | 98.6% | REACHED |
| 96% | 96% | 98.6% | REACHED |
| 97% | 97% | 98.6% | REACHED |
| 98% | 98% | 98.6% | REACHED |
| 99% | 99% | 98.6% | APPROACHING |
---
## Risk Assessment
| Risk | Severity | Mitigation |
|------|----------|------------|
| Ezra disk 72% used | MEDIUM | Move non-essential data, add monitoring alert at 85% |
| Bezalel only 2GB RAM | HIGH | Cannot run large models locally. Good for Evennia, tight for agents |
| Ezra CPU load 269% | HIGH | MemPalace mining consuming 136% CPU. Consider scheduling |
| Mac disk 68% used | MEDIUM | 302 GB free still. Growing but not urgent |
| No cross-VPS mesh | LOW | SSH works but no Tailscale. No private network between VPSes |
---
## Recommendations
### Immediate (Phase 1-2)
1. **Ezra disk cleanup:** 44 GB free at 72%. Docker images, old logs, and MemPalace mine data could be rotated.
2. **Alert thresholds:** Add disk alerts at 85% (Ezra, Mac) before they become critical.
### Short-term (Phase 3)
3. **Load balancing:** Ezra is CPU-bound, Allegro has 80% RAM free. Move some agent processes from Ezra to Allegro.
4. **Innovation investment:** Since fleet is at 15-20% utilization, Innovation is high. This is the time to build Phase 3 capabilities.
### Medium-term (Phase 4)
5. **Bezalel RAM upgrade:** 2GB is tight. Consider upgrade to 4GB ($24/mo instead of $12/mo).
6. **Tailscale mesh:** Install on all VPSes for private inter-VPS network.
---

142
fleet/milestones.md Normal file
View File

@@ -0,0 +1,142 @@
# Fleet Milestone Messages
Every milestone marks passage through fleet evolution. When achieved, the message
prints to the fleet log. Each one references a real achievement, not abstract numbers.
**Source:** Inspired by Paperclips milestone messages (500 clips, 1000 clips, Full autonomy attained, etc.)
---
## Phase 1: Survival (Current)
### M1: First Automated Health Check
**Trigger:** `fleet/health_check.py` runs successfully for the first time.
**Message:** "First automated health check runs. No longer watching the clock."
### M2: First Auto-Restart
**Trigger:** A dead process is detected and restarted without human intervention.
**Message:** "A process failed at 3am and restarted itself. You found out in the morning."
### M3: First Backup Completed
**Trigger:** A backup pipeline runs end-to-end and verifies integrity.
**Message:** "A backup completed. You did not have to think about it."
### M4: 95% Uptime (30 days)
**Trigger:** Uptime >= 95% over last 30 days.
**Message:** "95% uptime over 30 days. The fleet stays up."
### M5: Uptime 97%
**Trigger:** Uptime >= 97% over last 30 days.
**Message:** "97% uptime. Three nines of availability across four machines."
---
## Phase 2: Automation (unlock when: uptime >= 95% + capacity > 60%)
### M6: Zero Manual Restarts (7 days)
**Trigger:** 7 consecutive days with zero manual process restarts.
**Message:** "Seven days. Zero manual restarts. The fleet heals itself."
### M7: PR Auto-Merged
**Trigger:** A PR passes CI, review, and merges without human touching it.
**Message:** "A PR was tested, reviewed, and merged by agents. You just said 'looks good.'"
### M8: Config Push Works
**Trigger:** Config change pushed to all 3 VPSes atomically and verified.
**Message:** "Config pushed to all three VPSes in one command. No SSH needed."
### M9: 98% Uptime
**Trigger:** Uptime >= 98% over last 30 days.
**Message:** "98% uptime. Only 14 hours of downtime in a month. Most of it planned."
---
## Phase 3: Orchestration (unlock when: all Phase 2 buildings + Innovation > 100)
### M10: Cross-Agent Delegation Works
**Trigger:** Agent A creates issue, assigns to Agent B, Agent B works and creates PR.
**Message:** "Agent Alpha created a task, Agent Beta completed it. They did not ask permission."
### M11: First Model Running Locally on 2+ Machines
**Trigger:** Ollama serving same model on Ezra and Allegro simultaneously.
**Message:** "A model runs on two machines at once. No cloud. No rate limits."
### M12: Fleet-Wide Burn Mode
**Trigger:** All agents coordinated on single epic, produced coordinated PRs.
**Message:** "All agents working the same epic. The fleet moves as one."
---
## Phase 4: Sovereignty (unlock when: zero cloud deps for core ops)
### M13: First Entirely Local Inference Day
**Trigger:** 24 hours with zero API calls to external providers.
**Message:** "A model ran locally for the first time. No cloud. No rate limits. No one can turn it off."
### M14: Sovereign Email
**Trigger:** Stalwart email server sends and receives without Gmail relay.
**Message:** "Email flows through our own server. No Google. No Microsoft. Ours."
### M15: Sovereign Messaging
**Trigger:** Telegram bot runs without cloud relay dependency.
**Message:** "Messages arrive through our own infrastructure. No corporate middleman."
---
## Phase 5: Scale (unlock when: sovereignty stable + Innovation > 500)
### M16: First Self-Spawned Agent
**Trigger:** Agent lifecycle manager spawns a new agent instance due to load.
**Message:** "A new agent appeared. You did not create it. The fleet built what it needed."
### M17: Agent Retired Gracefully
**Trigger:** An agent instance retires after idle timeout and cleans up its state.
**Message:** "An agent retired. It served its purpose. Nothing was lost."
### M18: Fleet Runs 24h Unattended
**Trigger:** 24 hours with zero human intervention of any kind.
**Message:** "A full day. No humans. No commands. The fleet runs itself."
---
## Phase 6: The Network (unlock when: 7 days zero human intervention)
### M19: Fleet Creates Its Own Improvement Task
**Trigger:** Fleet analyzes itself and creates an issue on Gitea.
**Message:** "The fleet found something to improve. It created the task itself."
### M20: First Outside Contribution
**Trigger:** An external contributor's PR is reviewed and merged by fleet agents.
**Message:** "Someone outside the fleet contributed. The fleet reviewed, tested, and merged. No human touched it."
### M21: The Beacon
**Trigger:** Infrastructure serves someone in need through automated systems.
**Message:** "Someone found the Beacon. In the dark, looking for help. The infrastructure served its purpose. It was built for this."
### M22: Permanent Light
**Trigger:** 90 days of autonomous operation with continuous availability.
**Message:** "Three months. The light never went out. Not for anyone."
---
## Fibonacci Uptime Milestones
These trigger regardless of phase, based purely on uptime percentage:
| Milestone | Uptime | Meaning |
|-----------|--------|--------|
| U1 | 95% | Basic reliability achieved |
| U2 | 95.5% | Fewer than 16 hours/month downtime |
| U3 | 96% | Fewer than 12 hours/month |
| U4 | 97% | Fewer than 9 hours/month |
| U5 | 97.5% | Fewer than 7 hours/month |
| U6 | 98% | Fewer than 4.5 hours/month |
| U7 | 98.3% | Fewer than 3 hours/month |
| U8 | 98.6% | Less than 2.5 hours/month — approaching cloud tier |
| U9 | 98.9% | Less than 1.5 hours/month |
| U10 | 99% | Less than 1 hour/month — enterprise grade |
| U11 | 99.5% | Less than 22 minutes/month |
---
*Every message is earned. None are given freely. Fleet evolution is not a checklist — it is a climb.*

231
fleet/resource_tracker.py Executable file
View File

@@ -0,0 +1,231 @@
#!/usr/bin/env python3
"""
Fleet Resource Tracker — Tracks Capacity, Uptime, and Innovation.
Paperclips-inspired tension model:
- Capacity: spent on fleet improvements, generates through utilization
- Uptime: earned when services stay up, Fibonacci milestones unlock capabilities
- Innovation: only generates when capacity < 70%. Fuels Phase 3+.
This is the heart of the fleet progression system.
"""
import os
import json
import time
import socket
from datetime import datetime, timezone
from pathlib import Path
# === CONFIG ===
DATA_DIR = Path(os.path.expanduser("~/.local/timmy/fleet-resources"))
RESOURCES_FILE = DATA_DIR / "resources.json"
# Tension thresholds
INNOVATION_THRESHOLD = 0.70 # Innovation only generates when capacity < 70%
INNOVATION_RATE = 5.0 # Innovation generated per hour when under threshold
CAPACITY_REGEN_RATE = 2.0 # Capacity regenerates per hour of healthy operation
FIBONACCI = [95.0, 95.5, 96.0, 97.0, 97.5, 98.0, 98.3, 98.6, 98.9, 99.0, 99.5]
def init():
DATA_DIR.mkdir(parents=True, exist_ok=True)
if not RESOURCES_FILE.exists():
data = {
"capacity": {
"current": 100.0,
"max": 100.0,
"spent_on": [],
"history": []
},
"uptime": {
"current_pct": 100.0,
"milestones_reached": [],
"total_checks": 0,
"successful_checks": 0,
"history": []
},
"innovation": {
"current": 0.0,
"total_generated": 0.0,
"spent_on": [],
"last_calculated": time.time()
}
}
RESOURCES_FILE.write_text(json.dumps(data, indent=2))
print("Initialized resource tracker")
return RESOURCES_FILE.exists()
def load():
if RESOURCES_FILE.exists():
return json.loads(RESOURCES_FILE.read_text())
return None
def save(data):
RESOURCES_FILE.write_text(json.dumps(data, indent=2))
def update_uptime(checks: dict):
"""Update uptime stats from health check results.
checks = {'ezra': True, 'allegro': True, 'bezalel': True, 'gitea': True, ...}
"""
data = load()
if not data:
return
data["uptime"]["total_checks"] += 1
successes = sum(1 for v in checks.values() if v)
total = len(checks)
# Overall uptime percentage
overall = successes / max(total, 1) * 100.0
data["uptime"]["successful_checks"] += successes
# Calculate rolling uptime
if "history" not in data["uptime"]:
data["uptime"]["history"] = []
data["uptime"]["history"].append({
"ts": datetime.now(timezone.utc).isoformat(),
"checks": checks,
"overall": round(overall, 2)
})
# Keep last 1000 checks
if len(data["uptime"]["history"]) > 1000:
data["uptime"]["history"] = data["uptime"]["history"][-1000:]
# Calculate current uptime %, last 100 checks
recent = data["uptime"]["history"][-100:]
recent_ok = sum(c["overall"] for c in recent) / max(len(recent), 1)
data["uptime"]["current_pct"] = round(recent_ok, 2)
# Check Fibonacci milestones
new_milestones = []
for fib in FIBONACCI:
if fib not in data["uptime"]["milestones_reached"] and recent_ok >= fib:
data["uptime"]["milestones_reached"].append(fib)
new_milestones.append(fib)
save(data)
if new_milestones:
print(f" UPTIME MILESTONE: {','.join(str(m) + '%') for m in new_milestones}")
print(f" Current uptime: {recent_ok:.1f}%")
return data["uptime"]
def spend_capacity(amount: float, purpose: str):
"""Spend capacity on a fleet improvement."""
data = load()
if not data:
return False
if data["capacity"]["current"] < amount:
print(f" INSUFFICIENT CAPACITY: Need {amount}, have {data['capacity']['current']:.1f}")
return False
data["capacity"]["current"] -= amount
data["capacity"]["spent_on"].append({
"purpose": purpose,
"amount": amount,
"ts": datetime.now(timezone.utc).isoformat()
})
save(data)
print(f" Spent {amount} capacity on: {purpose}")
return True
def regenerate_resources():
"""Regenerate capacity and calculate innovation."""
data = load()
if not data:
return
now = time.time()
last = data["innovation"]["last_calculated"]
hours = (now - last) / 3600.0
if hours < 0.1: # Only update every ~6 minutes
return
# Regenerate capacity
capacity_gain = CAPACITY_REGEN_RATE * hours
data["capacity"]["current"] = min(
data["capacity"]["max"],
data["capacity"]["current"] + capacity_gain
)
# Calculate capacity utilization
utilization = 1.0 - (data["capacity"]["current"] / data["capacity"]["max"])
# Generate innovation only when under threshold
innovation_gain = 0.0
if utilization < INNOVATION_THRESHOLD:
innovation_gain = INNOVATION_RATE * hours * (1.0 - utilization / INNOVATION_THRESHOLD)
data["innovation"]["current"] += innovation_gain
data["innovation"]["total_generated"] += innovation_gain
# Record history
if "history" not in data["capacity"]:
data["capacity"]["history"] = []
data["capacity"]["history"].append({
"ts": datetime.now(timezone.utc).isoformat(),
"capacity": round(data["capacity"]["current"], 1),
"utilization": round(utilization * 100, 1),
"innovation": round(data["innovation"]["current"], 1),
"innovation_gain": round(innovation_gain, 1)
})
# Keep last 500 capacity records
if len(data["capacity"]["history"]) > 500:
data["capacity"]["history"] = data["capacity"]["history"][-500:]
data["innovation"]["last_calculated"] = now
save(data)
print(f" Capacity: {data['capacity']['current']:.1f}/{data['capacity']['max']:.1f}")
print(f" Utilization: {utilization*100:.1f}%")
print(f" Innovation: {data['innovation']['current']:.1f} (+{innovation_gain:.1f} this period)")
return data
def status():
"""Print current resource status."""
data = load()
if not data:
print("Resource tracker not initialized. Run --init first.")
return
print("\n=== Fleet Resources ===")
print(f" Capacity: {data['capacity']['current']:.1f}/{data['capacity']['max']:.1f}")
utilization = 1.0 - (data["capacity"]["current"] / data["capacity"]["max"])
print(f" Utilization: {utilization*100:.1f}%")
innovation_status = "GENERATING" if utilization < INNOVATION_THRESHOLD else "BLOCKED"
print(f" Innovation: {data['innovation']['current']:.1f} [{innovation_status}]")
print(f" Uptime: {data['uptime']['current_pct']:.1f}%")
print(f" Milestones: {', '.join(str(m)+'%' for m in data['uptime']['milestones_reached']) or 'None yet'}")
# Phase gate checks
phase_2_ok = data['uptime']['current_pct'] >= 95.0
phase_3_ok = phase_2_ok and data['innovation']['current'] > 100
phase_5_ok = phase_2_ok and data['innovation']['current'] > 500
print(f"\n Phase Gates:")
print(f" Phase 2 (Automation): {'UNLOCKED' if phase_2_ok else 'LOCKED (need 95% uptime)'}")
print(f" Phase 3 (Orchestration): {'UNLOCKED' if phase_3_ok else 'LOCKED (need 95% uptime + 100 innovation)'}")
print(f" Phase 5 (Scale): {'UNLOCKED' if phase_5_ok else 'LOCKED (need 95% uptime + 500 innovation)'}")
if __name__ == "__main__":
import sys
init()
if len(sys.argv) > 1 and sys.argv[1] == "status":
status()
elif len(sys.argv) > 1 and sys.argv[1] == "regen":
regenerate_resources()
else:
regenerate_resources()
status()