Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 156418a6fd |
185
genomes/fleet-ops-GENOME.md
Normal file
185
genomes/fleet-ops-GENOME.md
Normal file
@@ -0,0 +1,185 @@
|
||||
# GENOME.md: fleet-ops
|
||||
|
||||
**Generated:** 2026-04-14
|
||||
**Repo:** Timmy_Foundation/fleet-ops
|
||||
**Purpose:** Sovereign fleet operations -- Ansible playbooks, monitoring, dispatch, infrastructure-as-code
|
||||
**Size:** 284 files | 14 Ansible roles | 15 Python scripts | 11 test files
|
||||
|
||||
---
|
||||
|
||||
## Project Overview
|
||||
|
||||
fleet-ops is the infrastructure-as-code repository for the Timmy Foundation's sovereign wizard fleet. It manages three VPS-based AI agent workers (Bezalel, Ezra, Allegro) through Ansible playbooks, Docker Compose, tmux dispatch, and automated monitoring.
|
||||
|
||||
The fleet runs 50+ AI agent sessions simultaneously, dispatches work through Gitea issues, and monitors health through cron-based watchdogs. fleet-ops is the control plane.
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A[Local Machine] -->|tmux dispatch| B[BURN Session: 50+ panes]
|
||||
A -->|Ansible| C[VPS: Bezalel]
|
||||
A -->|Ansible| D[VPS: Ezra]
|
||||
A -->|Ansible| E[VPS: Allegro]
|
||||
A -->|Gitea API| F[Forge: Gitea]
|
||||
|
||||
C -->|hermes-agent| G[Agent Worker]
|
||||
D -->|hermes-agent| G
|
||||
E -->|hermes-agent| G
|
||||
|
||||
H[Cron Jobs] --> I[Monitoring]
|
||||
H --> J[Burndown]
|
||||
H --> K[Dispatch Consumer]
|
||||
H --> L[Nightly Efficiency]
|
||||
|
||||
I -->|deadman switch| M[Alert Manager]
|
||||
I -->|health check| N[Telegram Alerts]
|
||||
|
||||
J -->|scan issues| F
|
||||
K -->|consume issues| B
|
||||
L -->|token stats| O[Reports]
|
||||
```
|
||||
|
||||
## Entry Points
|
||||
|
||||
| Entry Point | Type | Purpose |
|
||||
|-------------|------|---------|
|
||||
| `playbooks/site.yml` | Ansible | Master playbook -- deploys entire fleet |
|
||||
| `playbooks/provision_and_deploy.yml` | Ansible | Full VPS provisioning + service deploy |
|
||||
| `playbooks/deploy_hermes.yml` | Ansible | Deploy hermes-agent to wizard VPSes |
|
||||
| `playbooks/deploy_ollama.yml` | Ansible | Deploy Ollama inference server |
|
||||
| `playbooks/deploy_gitea.yml` | Ansible | Deploy Gitea forge |
|
||||
| `docker-compose.yml` | Docker | Local multi-service stack (ollama, gitea, agent, monitor) |
|
||||
| `scripts/dispatch_consumer.py` | Python | Consume Gitea issues and dispatch to tmux panes |
|
||||
| `scripts/burndown_watcher.py` | Python | Monitor backlog velocity across repos |
|
||||
| `scripts/fleet-status.py` | Python | One-command fleet health report |
|
||||
| `scripts/tmux-dispatch.sh` | Shell | Route work to tmux pane windows |
|
||||
|
||||
## Data Flow
|
||||
|
||||
```
|
||||
Gitea Issue Created
|
||||
|
|
||||
v
|
||||
dispatch_consumer.py (cron: 5m)
|
||||
|
|
||||
v
|
||||
tmux-dispatch.sh -> Assign to pane window (CRUCIBLE/GNOMES/FOUNDRY)
|
||||
|
|
||||
v
|
||||
hermes-agent in tmux pane (agent worker)
|
||||
|
|
||||
v
|
||||
Agent creates branch -> commits -> pushes -> opens PR
|
||||
|
|
||||
v
|
||||
auto_merge.sh (cron: 10m) -> Safe PRs merge automatically
|
||||
|
|
||||
v
|
||||
nightly_efficiency_report.py -> Token usage, cost, throughput stats
|
||||
```
|
||||
|
||||
## Key Abstractions
|
||||
|
||||
| Abstraction | Description |
|
||||
|-------------|-------------|
|
||||
| **Wizard** | A VPS running hermes-agent. Three active: Bezalel, Ezra, Allegro. |
|
||||
| **Fleet** | The collective of all wizards + local orchestration. |
|
||||
| **Burn** | High-throughput execution mode. 50+ agents working in parallel. |
|
||||
| **Dispatch** | Routing Gitea issues to tmux panes for agent processing. |
|
||||
| **Deadman** | Watchdog that alerts when heartbeats stop. |
|
||||
| **Burndown** | Tracking backlog velocity. Issues closed vs created per day. |
|
||||
| **Sovereignty** | No cloud dependency for core operations. Local inference preferred. |
|
||||
|
||||
## Ansible Roles
|
||||
|
||||
| Role | Purpose | Key Files |
|
||||
|------|---------|-----------|
|
||||
| `common` | Base OS config, packages, users | tasks/main.yml |
|
||||
| `hermes-agent` | Deploy agent service, config, env | templates/config.yaml.j2, hermes.service.j2 |
|
||||
| `ollama` | Deploy Ollama inference server | templates/ollama.service.j2 |
|
||||
| `gitea` | Deploy Gitea forge (Docker) | templates/docker-compose.yml.j2 |
|
||||
| `nginx` | Reverse proxy for all services | templates/site.conf.j2 |
|
||||
| `backups` | Automated backups for gitea, evennia | templates/backup.cron.j2 |
|
||||
| `monitoring` | Health checks, deadman switch | templates/deadman-switch.sh.j2 |
|
||||
| `auto-merge` | PR auto-merge for safe changes | files/scripts/auto_merge.sh |
|
||||
| `conduit` | Matrix homeserver (Conduit) | templates/conduit.toml.j2 |
|
||||
| `nostr-relay` | Nostr relay for sovereign comms | templates/strfry.conf.j2 |
|
||||
| `docker` | Docker installation and config | tasks/main.yml |
|
||||
| `evennia` | MUD world server (The Tower) | templates/settings.py.j2 |
|
||||
| `message-bus` | Inter-agent message bus | templates/busd.service.j2 |
|
||||
| `knowledge-store` | Persistent knowledge store | templates/knowledged.service.j2 |
|
||||
|
||||
## Inventory
|
||||
|
||||
| Host | Wizard | Role | Model |
|
||||
|------|--------|------|-------|
|
||||
| bezal | Bezalel | Agent worker | gemma-4-31b-it |
|
||||
| hermes-vps | Ezra | Agent worker | gemma-4-31b-it |
|
||||
| allegro-vps | Allegro | Agent worker | gemma-4-31b-it |
|
||||
| gitea-forge | -- | Gitea, registry | -- |
|
||||
|
||||
## Cron Jobs
|
||||
|
||||
| Job | Schedule | Script | Purpose |
|
||||
|-----|----------|--------|---------|
|
||||
| dispatch-consumer | 5m | scripts/dispatch_consumer.py | Route issues to agents |
|
||||
| burndown-watcher | 15m | scripts/burndown_watcher.py | Track backlog velocity |
|
||||
| nightly-efficiency | Daily 03:00 | scripts/nightly_efficiency_report.py | Token/cost report |
|
||||
| auto-merge | 10m | scripts/auto_merge.sh | Merge safe PRs |
|
||||
| morning-report | Daily 07:00 | scripts/morning_report_compile.py | Fleet status digest |
|
||||
| sovereign-guard | 5m | sovereign_sentinel.py | Security monitoring |
|
||||
| sovereign-pulse | 5m | sovereign_pulse.py | Health heartbeat |
|
||||
|
||||
## Test Coverage
|
||||
|
||||
| Test File | Tests | Area |
|
||||
|-----------|-------|------|
|
||||
| test_dispatch_consumer.py | Y | Issue dispatch routing |
|
||||
| test_health_dashboard.py | Y | Health check aggregation |
|
||||
| test_knowledge_store.py | Y | Knowledge persistence |
|
||||
| test_message_bus.py | Y | Inter-agent messaging |
|
||||
| test_nightly_efficiency_report.py | Y | Token/cost calculation |
|
||||
| test_profile_isolation.py | Y | Agent profile separation |
|
||||
| test_skill_scorer.py | Y | Skill quality scoring |
|
||||
| test_synthesis.py | Y | Synthesis engine |
|
||||
| test_video_engine_client.py | Y | Video generation client |
|
||||
| test_federation_sync.py | Y | Cross-wizard state sync |
|
||||
| test_heart.py | Y | Heart/compassion layer |
|
||||
|
||||
### Gaps
|
||||
|
||||
- No integration tests for full dispatch-to-merge pipeline
|
||||
- Ansible roles lack molecule tests (only lint)
|
||||
- No test for deadman switch (shell script, not Python)
|
||||
- tmux-dispatch.sh is pure shell, no test coverage
|
||||
- Docker Compose tested manually only
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- **Vault-encrypted secrets.** API keys, tokens in `playbooks/group_vars/vault.yml` (ansible-vault)
|
||||
- **SSH key auth only.** No password auth on VPSes.
|
||||
- **Registry auth.** Private container registry at forge.alexanderwhitestone.com
|
||||
- **Nostr relay.** Sovereign comms channel, no third-party dependency.
|
||||
- **Deadman switch.** Alerts on heartbeat loss. Prevents silent fleet death.
|
||||
- **Sovereign sentinel.** Monitors for unauthorized access patterns.
|
||||
|
||||
## Docker Services
|
||||
|
||||
| Service | Image | Port | Purpose |
|
||||
|---------|-------|------|---------|
|
||||
| ollama | ollama/ollama:latest | 11434 | Local LLM inference |
|
||||
| gitea | gitea/gitea:latest | 3000, 2222 | Git hosting, issues |
|
||||
| agent | hermes-agent:prod | 8080 | Agent worker loop |
|
||||
| monitor | custom | internal | Health reporter |
|
||||
|
||||
## Key Dependencies
|
||||
|
||||
| Dependency | Type | Purpose |
|
||||
|------------|------|---------|
|
||||
| Ansible | IaC | Fleet provisioning and deployment |
|
||||
| Docker | Container | Service isolation |
|
||||
| tmux | Process | Agent session management |
|
||||
| Gitea | Forge | Issue tracking, PR workflow |
|
||||
| Ollama | Inference | Local model serving |
|
||||
| Telegram | Alerts | Human notification channel |
|
||||
Reference in New Issue
Block a user