feat: Add basic fleet orchestration script (#552 )

Merge pull request #591
Merged PR #591
2026-04-10 00:51:09 -04:00 · 2026-04-10 03:44:07 +00:00 · 2026-04-09 21:08:58 -04:00
2 changed files with 448 additions and 0 deletions
--- a/docs/sovereign-stack.md
+++ b/docs/sovereign-stack.md
@@ -0,0 +1,351 @@
+# Sovereign Stack: Replacing Homebrew with Mature Open-Source Tools
+
+> Issue: #589 | Research Spike | Status: Complete
+
+## Executive Summary
+
+Homebrew is a macOS-first tool that has crept into our Linux server workflows. It
+runs as a non-root user, maintains its own cellar under /home/linuxbrew, and pulls
+pre-built binaries from a CDN we do not control. For a foundation building sovereign
+AI infrastructure, that is the wrong dependency graph.
+
+This document evaluates the alternatives, gives copy-paste install commands, and
+lands on a recommended stack for the Timmy Foundation.
+
+---
+
+## 1. Package Managers: apt vs dnf vs pacman vs Nix vs Guix
+
+| Criterion | apt (Debian/Ubuntu) | dnf (Fedora/RHEL) | pacman (Arch) | Nix | GNU Guix |
+|---|---|---|---|---|---|
+| Maturity | 25+ years | 20+ years | 20+ years | 20 years | 13 years |
+| Reproducible builds | No | No | No | Yes (core) | Yes (core) |
+| Declarative config | Partial (Ansible) | Partial (Ansible) | Partial (Ansible) | Yes (NixOS/modules) | Yes (Guix System) |
+| Rollback | Manual | Manual | Manual | Automatic | Automatic |
+| Binary cache trust | Distro mirrors | Distro mirrors | Distro mirrors | cache.nixos.org or self-host | ci.guix.gnu.org or self-host |
+| Server adoption | Very high (Ubuntu, Debian) | High (RHEL, Rocky, Alma) | Low | Growing | Niche |
+| Learning curve | Low | Low | Low | High | High |
+| Supply-chain model | Signed debs, curated repos | Signed rpms, curated repos | Signed pkg.tar, rolling | Content-addressed store | Content-addressed store, fully bootstrappable |
+
+### Recommendation for servers
+
+**Primary: apt on Debian 12 or Ubuntu 24.04 LTS**
+
+Rationale: widest third-party support, long security maintenance windows, every
+AI tool we ship already has .deb or pip packages. If we need reproducibility, we
+layer Nix on top rather than replacing the base OS.
+
+**Secondary: Nix as a user-space tool on any Linux**
+
+```bash
+# Install Nix (multi-user, Determinate Systems installer — single command)
+curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install
+
+# After install, use nix-env or flakes
+nix profile install nixpkgs#ripgrep
+nix profile install nixpkgs#ffmpeg
+
+# Pin a flake for reproducible dev shells
+nix develop github:timmy-foundation/sovereign-shell
+```
+
+Use Nix when you need bit-for-bit reproducibility (CI, model training environments).
+Use apt for general server provisioning.
+
+---
+
+## 2. Containers: Docker vs Podman vs containerd
+
+| Criterion | Docker | Podman | containerd (standalone) |
+|---|---|---|---|
+| Daemon required | Yes (dockerd) | No (rootless by default) | No (CRI plugin) |
+| Rootless support | Experimental | First-class | Via CRI |
+| OCI compliant | Yes | Yes | Yes |
+| Compose support | docker-compose | podman-compose / podman compose | N/A (use nerdctl) |
+| Kubernetes CRI | Via dockershim (removed) | CRI-O compatible | Native CRI |
+| Image signing | Content Trust | sigstore/cosign native | Requires external tooling |
+| Supply chain risk | Docker Hub defaults, rate-limited | Can use any OCI registry | Can use any OCI registry |
+
+### Recommendation for agent isolation
+
+**Podman — rootless, daemonless, Docker-compatible**
+
+```bash
+# Debian/Ubuntu
+sudo apt update && sudo apt install -y podman
+
+# Verify rootless
+podman info | grep -i rootless
+
+# Run an agent container (no sudo needed)
+podman run -d --name timmy-agent \
+  --security-opt label=disable \
+  -v /opt/timmy/models:/models:ro \
+  -p 8080:8080 \
+  ghcr.io/timmy-foundation/agent-server:latest
+
+# Compose equivalent
+podman compose -f docker-compose.yml up -d
+```
+
+Why Podman:
+- No daemon = smaller attack surface, no single point of failure.
+- Rootless by default = containers do not run as root on the host.
+- Docker CLI alias works: `alias docker=podman` for migration.
+- Systemd integration for auto-start without Docker Desktop nonsense.
+
+---
+
+## 3. Python: uv vs pip vs conda
+
+| Criterion | pip + venv | uv | conda / mamba |
+|---|---|---|---|
+| Speed | Baseline | 10-100x faster (Rust) | Slow (conda), fast (mamba) |
+| Lock files | pip-compile (pip-tools) | uv.lock (built-in) | conda-lock |
+| Virtual envs | venv module | Built-in | Built-in (envs) |
+| System Python needed | Yes | No (downloads Python itself) | No (bundles Python) |
+| Binary wheels | PyPI only | PyPI only | Conda-forge (C/C++ libs) |
+| Supply chain | PyPI (improving PEP 740) | PyPI + custom indexes | conda-forge (community) |
+| For local inference | Works but slow installs | Best for speed | Best for CUDA-linked libs |
+
+### Recommendation for local inference
+
+**uv — fast, modern, single binary**
+
+```bash
+# Install uv
+curl -LsSf https://astral.sh/uv/install.sh | sh
+
+# Create a project with a specific Python version
+uv init timmy-inference
+cd timmy-inference
+uv python install 3.12
+uv venv
+source .venv/bin/activate
+
+# Install inference stack (fast)
+uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
+uv pip install transformers accelerate vllm
+
+# Or use pyproject.toml with uv.lock for reproducibility
+uv add torch transformers accelerate vllm
+uv lock
+```
+
+Use conda only when you need pre-built CUDA-linked packages that PyPI does not
+provide (rare now that PyPI has manylinux CUDA wheels). Otherwise, uv wins on
+speed, simplicity, and supply-chain transparency.
+
+---
+
+## 4. Node: fnm vs nvm vs volta
+
+| Criterion | nvm | fnm | volta |
+|---|---|---|---|
+| Written in | Bash | Rust | Rust |
+| Speed (shell startup) | ~200ms | ~1ms | ~1ms |
+| Windows support | No | Yes | Yes |
+| .nvmrc support | Native | Native | Via shim |
+| Volta pin support | No | No | Native |
+| Install method | curl script | curl script / cargo | curl script / cargo |
+
+### Recommendation for tooling
+
+**fnm — fast, minimal, just works**
+
+```bash
+# Install fnm
+curl -fsSL https://fnm.vercel.app/install | bash -s -- --skip-shell
+
+# Add to shell
+eval "$(fnm env --use-on-cd)"
+
+# Install and use Node
+fnm install 22
+fnm use 22
+node --version
+
+# Pin for a project
+echo "22" > .node-version
+```
+
+Why fnm: nvm's Bash overhead is noticeable on every shell open. fnm is a single
+Rust binary with ~1ms startup. It reads the same .nvmrc files, so no project
+changes needed.
+
+---
+
+## 5. GPU: CUDA Toolkit Installation Without Package Manager
+
+NVIDIA's apt repository adds a third-party GPG key and pulls ~2GB of packages.
+For sovereign infrastructure, we want to control what goes on the box.
+
+### Option A: Runfile installer (recommended for servers)
+
+```bash
+# Download runfile from developer.nvidia.com (select: Linux > x86_64 > Ubuntu > 22.04 > runfile)
+# Example for CUDA 12.4:
+wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run
+
+# Install toolkit only (skip driver if already present)
+sudo sh cuda_12.4.0_550.54.14_linux.run --toolkit --silent
+
+# Set environment
+export CUDA_HOME=/usr/local/cuda-12.4
+export PATH=$CUDA_HOME/bin:$PATH
+export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
+
+# Persist
+echo 'export CUDA_HOME=/usr/local/cuda-12.4' | sudo tee /etc/profile.d/cuda.sh
+echo 'export PATH=$CUDA_HOME/bin:$PATH' | sudo tee -a /etc/profile.d/cuda.sh
+echo 'export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH' | sudo tee -a /etc/profile.d/cuda.sh
+```
+
+### Option B: Containerized CUDA (best isolation)
+
+```bash
+# Use NVIDIA container toolkit with Podman
+sudo apt install -y nvidia-container-toolkit
+
+podman run --rm --device nvidia.com/gpu=all \
+  nvcr.io/nvidia/cuda:12.4.0-base-ubuntu22.04 \
+  nvidia-smi
+```
+
+### Option C: Nix CUDA (reproducible but complex)
+
+```nix
+# flake.nix
+{
+  inputs.nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.05";
+  outputs = { self, nixpkgs }: {
+    devShells.x86_64-linux.default = nixpkgs.legacyPackages.x86_64-linux.mkShell {
+      buildInputs = with nixpkgs.legacyPackages.x86_64-linux; [
+        cudaPackages_12.cudatoolkit
+        cudaPackages_12.cudnn
+        python312
+        python312Packages.torch
+      ];
+    };
+  };
+}
+```
+
+**Recommendation: Runfile installer for bare-metal, containerized CUDA for
+multi-tenant / CI.** Avoid NVIDIA's apt repo to reduce third-party key exposure.
+
+---
+
+## 6. Security: Minimizing Supply-Chain Risk
+
+### Threat model
+
+| Attack vector | Homebrew risk | Sovereign alternative |
+|---|---|---|
+| Upstream binary tampering | High (pre-built bottles from CDN) | Build from source or use signed distro packages |
+| Third-party GPG key compromise | Medium (Homebrew taps) | Only distro archive keys |
+| Dependency confusion | Medium (random formulae) | Curated distro repos, lock files |
+| Lateral movement from daemon | High (Docker daemon as root) | Rootless Podman |
+| Unvetted Python packages | Medium (PyPI) | uv lock files + pip-audit |
+| CUDA supply chain | High (NVIDIA apt repo) | Runfile + checksum verification |
+
+### Hardening checklist
+
+1. **Pin every dependency** — use uv.lock, package-lock.json, flake.lock.
+2. **Audit regularly** — `pip-audit`, `npm audit`, `osv-scanner`.
+3. **No Homebrew on servers** — use apt + Nix for reproducibility.
+4. **Rootless containers** — Podman, not Docker.
+5. **Verify downloads** — GPG-verify runfiles, check SHA256 sums.
+6. **Self-host binary caches** — Nix binary cache on your own infra.
+7. **Minimal images** — distroless or Chainguard base images for containers.
+
+```bash
+# Audit Python deps
+pip-audit -r requirements.txt
+
+# Audit with OSV (covers all ecosystems)
+osv-scanner --lockfile uv.lock
+osv-scanner --lockfile package-lock.json
+```
+
+---
+
+## 7. Recommended Sovereign Stack for Timmy Foundation
+
+```
+Layer              Tool                    Why
+──────────────────────────────────────────────────────────────────
+OS                 Debian 12 / Ubuntu LTS  Stable, 5yr security support
+Package manager    apt + Nix (user-space)  apt for base, Nix for reproducible dev shells
+Containers         Podman (rootless)       Daemonless, rootless, OCI-native
+Python             uv                      10-100x faster than pip, built-in lock
+Node.js            fnm                     1ms startup, .nvmrc compatible
+GPU                Runfile installer       No third-party apt repo needed
+Security audit     pip-audit + osv-scanner Cross-ecosystem vulnerability scanning
+```
+
+### Quick setup script (server)
+
+```bash
+#!/usr/bin/env bash
+set -euo pipefail
+
+echo "==> Updating base packages"
+sudo apt update && sudo apt upgrade -y
+
+echo "==> Installing system packages"
+sudo apt install -y podman curl git build-essential
+
+echo "==> Installing Nix"
+curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install --no-confirm
+
+echo "==> Installing uv"
+curl -LsSf https://astral.sh/uv/install.sh | sh
+
+echo "==> Installing fnm"
+curl -fsSL https://fnm.vercel.app/install | bash -s -- --skip-shell
+
+echo "==> Setting up shell"
+cat >> ~/.bashrc << 'EOF'
+# Sovereign stack
+export PATH="$HOME/.local/bin:$PATH"
+eval "$(fnm env --use-on-cd)"
+EOF
+
+echo "==> Done. Run 'source ~/.bashrc' to activate."
+```
+
+### What this gives us
+
+- No Homebrew dependency on any server.
+- Reproducible environments via Nix flakes + uv lock files.
+- Rootless container isolation for agent workloads.
+- Fast Python installs for local model inference.
+- Minimal supply-chain surface: distro-signed packages + content-addressed Nix store.
+- Easy onboarding: one script to set up any new server.
+
+---
+
+## Migration path from current setup
+
+1. **Phase 1 (now):** Stop installing Homebrew on new servers. Use the setup script above.
+2. **Phase 2 (this quarter):** Migrate existing servers. Uninstall linuxbrew, reinstall tools via apt/uv/fnm.
+3. **Phase 3 (next quarter):** Create a Timmy Foundation Nix flake for reproducible dev environments.
+4. **Phase 4 (ongoing):** Self-host a Nix binary cache and PyPI mirror for air-gapped deployments.
+
+---
+
+## References
+
+- Nix: https://nixos.org/
+- Podman: https://podman.io/
+- uv: https://docs.astral.sh/uv/
+- fnm: https://github.com/Schniz/fnm
+- CUDA runfile: https://developer.nvidia.com/cuda-downloads
+- pip-audit: https://github.com/pypa/pip-audit
+- OSV Scanner: https://github.com/google/osv-scanner
+
+---
+
+*Document prepared for issue #589. Practical recommendations based on current
+tooling as of April 2026.*
--- a/scripts/fleet_orchestrator.py
+++ b/scripts/fleet_orchestrator.py
@@ -0,0 +1,97 @@
+
+import subprocess
+import sys
+import os
+
+FLEET_HOSTS = os.environ.get("FLEET_HOSTS", "143.198.27.163 104.131.15.18").split()
+TIMMY_USER = os.environ.get("TIMMY_USER", "root")
+TIMMY_DIR = os.environ.get("TIMMY_HOME", "/root") + "/timmy"
+
+
+def run_remote_command(host, command):
+    """Executes a command remotely on a given host via SSH."""
+    ssh_command = ["ssh", f"{TIMMY_USER}@{host}", command]
+    print(f"Executing on {host}: {' '.join(ssh_command)}")
+    try:
+        result = subprocess.run(ssh_command, capture_output=True, text=True, check=True)
+        print(f"[{host}] STDOUT:\n{result.stdout}")
+        if result.stderr:
+            print(f"[{host}] STDERR:\n{result.stderr}")
+        return result.stdout
+    except subprocess.CalledProcessError as e:
+        print(f"[{host}] ERROR: Command failed with exit code {e.returncode}")
+        print(f"[{host}] STDOUT:\n{e.stdout}")
+        print(f"[{host}] STDERR:\n{e.stderr}")
+        return None
+    except Exception as e:
+        print(f"[{host}] AN UNEXPECTED ERROR OCCURRED: {e}")
+        return None
+
+
+def deploy_agent(host):
+    """Deploys or updates the agent on a remote host using the provisioning script."""
+    print(f"Deploying agent on {host}...")
+    # For now, we'll just run a placeholder command.
+    # In a real scenario, this would involve SCPing the provisioning script and running it.
+    command = f"echo 'Simulating deployment of agent on {host}'"
+    run_remote_command(host, command)
+
+
+def start_agent(host):
+    """Starts the timmy-agent.service on a remote host."""
+    print(f"Starting agent on {host}...")
+    run_remote_command(host, f"systemctl start timmy-agent.service")
+
+
+def stop_agent(host):
+    """Stops the timmy-agent.service on a remote host."""
+    print(f"Stopping agent on {host}...")
+    run_remote_command(host, f"systemctl stop timmy-agent.service")
+
+
+def update_agent(host):
+    """Pulls the latest timmy-home repo and restarts the agent on a remote host."""
+    print(f"Updating agent on {host}...")
+    commands = [
+        f"cd {TIMMY_DIR}/timmy-home && git pull",
+        f"systemctl restart timmy-agent.service"
+    ]
+    for cmd in commands:
+        run_remote_command(host, cmd)
+
+
+def status_agent(host):
+    """Checks the status of the timmy-agent.service on a remote host."""
+    print(f"Checking agent status on {host}...")
+    run_remote_command(host, f"systemctl status timmy-agent.service --no-pager")
+
+
+def main():
+    if len(sys.argv) < 2:
+        print("Usage: python fleet_orchestrator.py <command> [host]")
+        print("Commands: deploy, start, stop, update, status")
+        sys.exit(1)
+
+    action = sys.argv[1]
+    target_host = sys.argv[2] if len(sys.argv) > 2 else None
+
+    hosts_to_target = [target_host] if target_host else FLEET_HOSTS
+
+    for host in hosts_to_target:
+        if action == "deploy":
+            deploy_agent(host)
+        elif action == "start":
+            start_agent(host)
+        elif action == "stop":
+            stop_agent(host)
+        elif action == "update":
+            update_agent(host)
+        elif action == "status":
+            status_agent(host)
+        else:
+            print(f"Unknown command: {action}")
+            sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
Author	SHA1	Message	Date
Alexander Whitestone	b402335599	feat: Add basic fleet orchestration script (#552 )	2026-04-10 00:51:09 -04:00
Alexander Whitestone	d3368a5a9d	Merge pull request #591 Merged PR #591	2026-04-10 03:44:07 +00:00
Alexander Whitestone	1614ef5d66	docs: add sovereign stack research document (#589 ) Research spike on replacing Homebrew with mature open-source tools for sovereign AI infrastructure. Covers: package managers, containers, Python, Node, GPU CUDA, supply-chain security, and a recommended stack with install commands. Refs: #589	2026-04-09 21:08:58 -04:00