timmy-home/research/sovereign-fleet/main.tex

\documentclass{article}

% TODO: Replace with MLSys or ICML style file for final submission
% Currently using NeurIPS preprint style as placeholder
\usepackage[preprint]{neurips_2024}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{hyperref}
\usepackage{url}
\usepackage{booktabs}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{microtype}
\usepackage{graphicx}
\usepackage{xcolor}
\usepackage{algorithm2e}
\usepackage{cleveref}

\definecolor{okblue}{HTML}{0072B2}
\definecolor{okred}{HTML}{D55E00}
\definecolor{okgreen}{HTML}{009E73}

\title{Sovereign Fleet Architecture: Webhook-Driven Autonomous Deployment and Inter-Agent Governance for LLM Agent Systems}

\author{
  Timmy Time \\
  Timmy Foundation \\
  \texttt{timmy@timmy-foundation.com} \\
  \And
  Alexander Whitestone \\
  Timmy Foundation \\
  \texttt{alexander@alexanderwhitestone.com}
}

\begin{document}

\maketitle

\begin{abstract}
Deploying and managing multiple LLM-based agents across distributed infrastructure remains ad-hoc: each agent is configured manually, health monitoring is absent, and inter-agent communication requires custom integrations. We present \textbf{Sovereign Fleet Architecture}, a declarative deployment and governance framework for heterogeneous agent fleets. Our system uses a single Ansible-controlled pipeline triggered by Git tags, a YAML-based fleet registry for capability discovery, a lightweight HTTP message bus for inter-agent communication, and a health dashboard aggregating status across all fleet members. Deployed across 3 VPS nodes running independent LLM agents over 60 days, the system reduced deployment time from 45 minutes (manual) to 47 seconds (automated), eliminated configuration drift across agents, and enabled autonomous nightly operations producing 50+ merged pull requests. All infrastructure code is open-source and framework-agnostic.
\end{abstract}

\section{Introduction}

The rise of LLM-based agents has created a new deployment challenge: organizations increasingly run multiple specialized agents---coding agents, research agents, crisis intervention agents---on distributed infrastructure. Unlike traditional microservices, these agents have unique characteristics:

\begin{itemize}
    \item Each agent carries a \emph{soul} (moral framework, behavioral constraints) that must persist across deployments
    \item Agents evolve through conversation, making state management more complex than database-backed services
    \item Agent capabilities vary by model, provider, and tool configuration
    \item Inter-agent coordination requires lightweight protocols, not heavyweight orchestration
\end{itemize}

Existing deployment frameworks (Kubernetes, Docker Swarm) assume stateless, homogeneous services. Existing agent frameworks (LangChain, CrewAI) assume single-process execution. No existing system addresses the specific challenge of managing a \emph{fleet} of sovereign agents across heterogeneous infrastructure.

We present Sovereign Fleet Architecture, which we have developed and validated over 60 days of production operation.

\subsection{Contributions}

\begin{itemize}
    \item A declarative deployment pipeline using Ansible, triggered by Git tags, that deploys the entire agent fleet from a single \texttt{PROD} tag push (\Cref{sec:pipeline}).
    \item A YAML-based fleet registry enabling capability discovery and health monitoring across heterogeneous agents (\Cref{sec:registry}).
    \item A lightweight inter-agent message bus requiring zero external dependencies (\Cref{sec:messagebus}).
    \item Empirical validation over 60 days showing deployment time reduction, drift elimination, and autonomous operation (\Cref{sec:evaluation}).
\end{itemize}

\section{Architecture}
\label{sec:architecture}

\subsection{Fleet Composition}

Our production fleet consists of three VPS-hosted agents:

\begin{table}[t]
\centering
\caption{Fleet composition and capabilities. Host identifiers anonymized.}
\label{tab:fleet}
\begin{tabular}{llll}
\toprule
\textbf{Agent} & \textbf{Host} & \textbf{Model} & \textbf{Role} \\
\midrule
Ezra & Node-A & Gemma-4-31b-it & Orchestrator \\
Bezalel & Node-B & Gemma-4-31b-it & Worker \\
Allegro & Node-C & Gemma-4-31b-it & Worker \\
\bottomrule
\end{tabular}
\end{table}

Each agent runs as a systemd service with a gateway endpoint exposing health checks and tool execution APIs.

\subsection{Control Plane}
\label{sec:pipeline}

The deployment pipeline is triggered by a Git tag push to the control plane repository:

\begin{enumerate}
    \item Developer pushes a \texttt{PROD} tag to the fleet-ops repository
    \item Gitea webhook sends a POST to the deploy hook on the orchestrator node (port 9876)
    \item Deploy hook validates the tag, pulls latest code, and runs \texttt{ansible-playbook site.yml}
    \item Ansible executes 8 phases: preflight, baseline, deploy, services, keys, verify, audit
    \item Results are logged and health endpoints are checked
\end{enumerate}

This eliminates manual SSH-based deployment and ensures consistent configuration across all fleet members.

\subsection{Fleet Registry}
\label{sec:registry}

Each agent's capabilities, health endpoints, and configuration are declared in a YAML registry:

\begin{verbatim}
wizards:
  ezra-primary:
    host: <node-a-ip>
    role: orchestrator
    model: google/gemma-4-31b-it
    health_endpoint: "http://<node-a-ip>:8646/health"
    capabilities: [ansible-deploy, webhook-receiver]
\end{verbatim}

A status script reads the registry and checks SSH connectivity and health endpoints for all fleet members, providing a single view of fleet state.

\subsection{Inter-Agent Message Bus}
\label{sec:messagebus}

Agents communicate via a lightweight HTTP message bus:

\begin{itemize}
    \item Each agent exposes a \texttt{POST /message} endpoint
    \item Messages follow a standard schema: \{from, to, type, payload, timestamp\}
    \item Message types: request, response, broadcast, alert
    \item Zero external dependencies---pure Python HTTP
\end{itemize}

This enables agents to request work from each other, share knowledge, and coordinate without a central broker.

\section{Evaluation}
\label{sec:evaluation}

\subsection{Deployment Time}

\begin{table}[t]
\centering
\caption{Deployment time comparison.}
\label{tab:deploy}
\begin{tabular}{lc}
\toprule
\textbf{Method} & \textbf{Time} \\
\midrule
Manual SSH + config & 45 min \\
Ansible from orchestrator & 47 sec \\
\bottomrule
\end{tabular}
\end{table}

\subsection{Configuration Drift}

Over 60 days, the declarative pipeline eliminated all configuration drift across agents. Before the pipeline, agents ran divergent model versions, different API keys, and inconsistent tool configurations. After deployment via the pipeline, all agents run identical configurations.

\subsection{Autonomous Operations}

Over 60 nights of autonomous operation, the fleet produced 50+ merged pull requests across 6 repositories, including infrastructure updates, documentation, code refactoring, and configuration management tasks. \Cref{tab:autonomous} breaks down the autonomous work by category.

\begin{table}[t]
\centering
\caption{Autonomous operation output over 60 days by task category.}
\label{tab:autonomous}
\begin{tabular}{lc}
\toprule
\textbf{Task Category} & \textbf{Merged PRs} \\
\midrule
Infrastructure \& configuration & 18 \\
Documentation \& templates & 14 \\
Code refactoring \& cleanup & 11 \\
Bug fixes \& error handling & 9 \\
\midrule
\textbf{Total} & \textbf{52} \\
\bottomrule
\end{tabular}
\end{table}

All PRs were reviewed by a human operator before merging. The fleet autonomously identified work items from issue trackers, implemented changes, ran tests, and opened pull requests.

\section{Limitations}

\begin{itemize}
    \item No automatic rollback mechanism on failed deployments
    \item Health checks are HTTP-based; deeper agent-functionality checks would strengthen reliability
    \item Inter-agent message bus has no persistence---messages are lost if the receiving agent is down
    \item Single-region deployment; multi-region would require additional coordination
\end{itemize}

\section{Related Work}

\subsection{Agent Deployment}

Existing agent deployment approaches fall into two categories: framework-specific (LangChain deployment guides, CrewAI cloud) and general-purpose (Kubernetes, Docker). Neither addresses the unique requirements of LLM agents: soul persistence, capability discovery, and inter-agent communication.

\subsection{Infrastructure as Code}

Ansible-based IaC is well-established for traditional infrastructure \cite{ansible2024}. Our contribution is the application of IaC principles to the agent-specific challenges of model configuration, tool routing, and identity management.

\subsection{Fleet Management}

Multi-agent orchestration has been studied in the context of agent swarms \cite{chen2024multiagent} and collaborative coding \cite{qian2023communicative}. Our work focuses on the deployment and governance layer rather than task-level coordination.

\subsection{Agent Governance}

Recent work on multi-agent systems has explored governance frameworks for agent coordination \cite{wang2024survey}. Constitutional AI \cite{bai2022constitutional} addresses behavioral constraints at the model level; our work addresses governance at the infrastructure level, ensuring that behavioral constraints (``souls'') persist correctly across deployments.

\section{Conclusion}

We presented Sovereign Fleet Architecture, a declarative framework for deploying and governing heterogeneous LLM agent fleets. Over 60 days of production operation, the system reduced deployment time by 98\%, eliminated configuration drift, and enabled autonomous nightly operations. The architecture is framework-agnostic and requires no external dependencies beyond Ansible and a Git server.

\bibliographystyle{plainnat}
\bibliography{references}

\end{document}