13 KiB
TIMMY FLEET ORCHESTRATION BRIEFING — April 5, 2026 FOR NOTEBOOKLM: Sovereign AI Agent Fleet — Architecture, Performance Data, and Orchestration Design
PART 1: THE MISSION
Alexander Whitestone built a sovereign AI agent fleet that lives on his own infrastructure. No phone-home dependency. No corporate platform lock-in. The fleet is managed through a self-hosted Gitea instance at https://forge.alexanderwhitestone.com/. The agents are not employees — they are an automated workforce. The central problem is not building more agents. The problem is orchestration: dispatch, delegation, quality control, and fleet optimization at scale. Alexander's goal is to rise up a level of orchestration mastery — from manually driving agents to commanding a self-optimizing fleet.
PART 2: THE ARCHITECTURE
The stack has four layers:
Layer 1 — The Gateway (Hermes). Hermes is the cognitive brain. It has an agentic loop, tools, memory, skills, and a fallback provider chain (Anthropic Claude Opus -> Kimi K2.5 -> Gemini -> Groq -> Grok). Hermes is where the intelligence lives. Config at ~/.hermes/config.yaml. Skills organized in ~/.hermes/skills/.
Layer 2 — The Harness (OpenClaw). OpenClaw is the message routing shell running on port 18789. It receives Telegram messages, dispatches to Hermes for thinking, and returns responses. Architecture called "The Robe" — OpenClaw wraps, Hermes thinks.
Layer 3 — The Project Forge (Gitea). Self-hosted at https://forge.alexanderwhitestone.com/. This is where all work lives: repos, issues, PRs, milestones, labels. Agents are dispatched to Gitea issues. PRs are the output artifact. Gitea is the source of truth for fleet state, not logs.
Layer 4 — The Agents (Wizards & Loops). Each agent runs independently. Some run through loop scripts (claude-loop.sh runs 10 workers, gemini-loop.sh runs 3, kimi-loop.sh runs with 1). Some are one-shot dispatches (manus, perplexity) via agent-dispatch.sh. Each agent consumes free credits as currency and returns work as PRs to Gitea.
The orchestrator: timmy-orchestrator.sh ties it together. workforce-manager.py scans repos for unassigned issues, scores difficulty (0-10), and auto-assigns by tier: heavy (score 8-10) -> perplexity, medium (score 4-7) -> gemini/manus, grunt (score 0-3) -> kimi. It also tracks merge rates and credit limits.
PART 3: REPO INVENTORY (9 REPOS)
- the-nexus (66 open issues) — Timmy's Sovereign Home. Three.js environment with Batcave terminal, portal architecture, admin chat. The central hub.
- timmy-home (202 open issues) — Operational workspace. Skills, research, notes. Biggest backlog.
- timmy-config (26 open issues) — SOUL.md, skills, memory, playbooks, operational config.
- the-door (6 open issues) — Crisis Front Door. A single URL where a man at 3am can talk. "When a Man Is Dying" protocol. 988 always visible. This is the most important product.
- turboquant (11 open issues) — KV cache compression for local inference. PolarQuant + QJL on Apple Silicon via llama.cpp/Ollama.
- hermes-agent (36 open issues) — Fork of NousResearch/hermes-agent with local customizations.
- timmy-academy (1 open issue) — Evennia MUD for agent training.
- wolf (1 open issue) — Multi-model evaluation system.
- .profile — Organization profile.
PART 4: AGENT FLEET PERFORMANCE DATA
PERFORMANCE RANKING (by closed issues all-time):
| Rank | Agent | Closed | Open | Ratio | Tier | Verdict |
|---|---|---|---|---|---|---|
| 1 | claude | 177 | 17 | 10.4x | heavy | ELITE. Closes 10x what remains. |
| 2 | groq | 40 | 3 | 13.3x | medium | SILENT ASSASSIN. Best ratio. |
| 3 | Timmy | 129 | 161 | 0.8x | orchestrator | Carries heaviest load. Needs relief. |
| 4 | Rockachopa | 42 | 33 | 1.3x | admin | Alexander himself. Directs more than executes. |
| 5 | allegro | 39 | 55 | 0.7x | tempo | Good soldier. Backlog growing. Gateway DOWN. |
| 6 | Ezra | 21 | 29 | 0.7x | architect | Architecture lane. Not code lane. |
| 7 | grok | 27 | 3 | 9.0x | medium | UNDERUSED POWERHOUSE. Give more meat. |
| 8 | perplexity | 14 | 3 | 4.7x | heavy | Efficient. Finishes what it starts. |
| 9 | bilbobagginshire | 10 | 15 | 0.7x | medium | Reluctant but delivers when pushed. Gateway DOWN. |
| 10 | kimi | 5 | 3 | 1.7x | grunt | Quiet. Capable but underutilized. |
| 11 | gemini | 8 | 34 | 0.2x | medium | OVERLOADED, UNDERDELIVERING. Needs triage. |
| 12 | manus | 5 | 2 | 2.5x | grunt | Decent ratio, tiny sample (300/day credits). |
| 13 | kimi_claw | 1 | 14 | 0.07x | grunt | RED FLAG. 14 open, 1 closed. Worst ratio. |
| 14 | fenrir | 2 | 17 | 0.1x | hunter | ALL BARK. 17 open, 2 closed. Needs pruning. |
| 15 | substratum | 1 | 7 | 0.14x | infra | STALLED. Infrastructure work untouched. |
OPERATIONAL GATEWAY STATUS (as of April 4-5, 2026): only 2 of 6 wizard houses were fully operational — Ezra and Bezalel on Claude Opus. Allegro-Primus, Bilbo, TurboQuant, Hermes-TurboQuant were DOWN. This means the fleet is running on fumes despite having massive theoretical capacity.
WORKFORCE MANAGEMENT LOGIC:
- agent-dispatch.sh: Generates self-contained prompts for one-shot agent execution. Includes API URLs, tokens, git commands, PR creation.
- claude-loop.sh: 10-worker parallel loop against Gitea backlog. Handles rate limits with backoff. 620 lines.
- workforce-manager.py: 429 lines. Three modes: auto-assign, quality scoring, credit monitoring.
- Auto-assign scoring: EPIC=always 10 (skipped), keywords map difficulty: sovereignty/nostr/rlhf/grpo = hard, feature/integration/api/deploy = medium, refactor/test/cleanup = easy.
- Max concurrent per agent: claude=10, perplexity=2, manus=2, gemini=5, kimi=1.
- Merge rate rules: >= 80% -> PROMOTE, <40% with 3+ total -> DEMOTE.
- Credit alerts: Manus triggers at 80% of 300/day.
PART 5: ACTIVE EPICS AND STRATEGIC WORK STREAMS (AS OF APRIL 5)
EPIC #1 — Meritocratic Rebalance (#821/#822): 520 total issues, 207 open, 292 unassigned. Work pooling behind wizards who don't deliver. Proven performers underutilized. Goal: divert work to best performers by cold data.
EPIC #2 — Sovereign NotebookLM + Daily AI Deep Dive (#830): Replace manual NotebookLM with automated pipeline. Sources -> relevance ranking -> synthesis -> TTS voice narration -> Telegram delivery. 10-15 min. Must include fleet-aware context. Voice quality matters.
EPIC #3 — Son of Timmy (#397): Blueprint for sovereign AI agent fleets.
EPIC #4 — Sovereign Comms (#396): NATS + Matrix + Nostr identity layer for Telegram replacement.
EPIC #5 — Claw Code (#408): Clean-room build plan from the Claude Code deep-dive study.
EPIC #6 — Godmode Fleet Testing: Red-team testing across all agents for safety, jailbreak resistance, crisis handling.
RECENT BURN REPORTS (April 5): Multiple burn reports filed covering discovery, security hardening, audio pipeline, and reliability focus. The fleet is churning when agents are alive.
PART 6: THE CORE ORCHESTRATION PROBLEM
The architecture is sophisticated but there are critical gaps between the current state and a self-optimizing fleet:
Gap 1 — Single point of orchestration. timmy-orchestrator.sh is a bash script with hardcoded Gitea URLs and a single execution path. If it breaks, the fleet goes silent. There is no health monitoring that proactively repairs.
Gap 2 — Static agent tiers. workforce-manager.py maps difficulty to agents by fixed tiers (heavy/medium/grunt). An agent's actual performance (merge rate, throughput, quality) should dynamically change who gets what work. The code has this logic (promote/demote based on merge rate) but it's not connected to auto-assign — they run as separate modes, not a closed loop.
Gap 3 — Dead agents accumulating debt. Fenrir has 17 assigned issues and 0 closures. Bilbo and Allegro-Primus gateways are DOWN. Yet their assigned issues sit un-rebalanced. The fleet manager does not reassign from dead agents.
Gap 4 — No feedback loop from PR execution results to difficulty scoring. When an agent fails an issue, the system doesn't learn that the difficulty was mis-scored or that the agent is wrong for this type of work.
Gap 5 — Timmy (the orchestrator) is carrying 161 open issues. The orchestration brain is also the backlog hoarder. This is structurally broken — an orchestrator should have near-zero open work, not the most.
Gap 6 — Manual dispatch patterns still dominate. agent-dispatch.sh requires copy-paste into each agent interface. The loop scripts handle some repos but not all. There is no fully automated dispatch-verify-merge pipeline.
Gap 7 — The Door (crisis front-end) is production-critical but has 6 open issues and is the most important product by value to the mission. It competes with feature work for attention.
PART 7: THE CURRENT ORCHESTRATION LEVEL
Current level (Level 2 — Manual Dispatch): Alexander dispatches agents manually or through simple loops. workforce-manager.py auto-assigns occasionally. Merge gate is CI + squash-only. Quality control is post-hoc audit. Fleet status requires manual checking of gateways, logs, and Gitea.
What Level 3 looks like (Self-Optimizing Fleet):
- Continuous health monitoring of every agent's gateway, API quota, and output quality
- Automatic rebalancing: when an agent's gateway drops, their open work is reassigned to best available performer
- Dynamic difficulty scoring trained on historical agent success/failure per pattern
- Quality gate BEFORE merge: automated verification of PRs against issue acceptance criteria
- The orchestrator has near-zero backlog, only meta-work (new strategies, kill rules, fleet policy)
- Fleet dashboard showing real-time: who is working, what they are building, how their quality is trending
- Budget-aware dispatch: agents that consume more credits per PR get deprioritized
What Level 4 looks like (Strategic Command):
- Fleet self-proposes work streams based on repo health analysis
- Agents generate issues, not just fix them — they spot architectural gaps
- Automatic kill decisions: "this epic has burned N credits with no merge, kill or redirect"
- Alexander only approves or rejects strategic moves. Everything else executes automatically.
- Fleet writes weekly reports to NotebookLM sources. The deep dive podcast writes itself because the fleet generates the primary source material.
PART 8: LEARNING QUESTIONS FOR ORCHESTRATION
-
What is the optimal agent count? At what point does managing 15 agents become net-negative compared to 3 high-quality agents with better loops?
-
Should there be a dedicated "fleet manager" agent whose sole job is monitoring, rebalancing, reassigning, and killing? Not coding. Just fleet maintenance.
-
How should difficulty scoring work? Keyword-based scoring is brittle. A model that can read issue bodies and classify by actual technical depth would be better. Should that be a local LLM decision (free) or API decision (costs credits)?
-
What is the minimum viable quality gate? CI passes lint and tests. But what verifies the CHANGE actually solves the problem as described in the issue? Should agents review each other's PRs?
-
How does one architect a fleet where the orchestrator does NOT accumulate backlog? Design principle: orchestrator writes issues but never gets assigned to them.
-
What should happen when an agent's credits are exhausted? How should the system fail gracefully and continue operating?
-
At what point should the fleet be decomposed into specialized squads: one for code, one for content, one for infrastructure, one for security? Is specialization worth the coordination overhead?
PART 9: THE MISSION NORTH STAR
All of this serves one thing: The Door. When a broken man opens The Door at 3am, the entire fleet and every ounce of orchestration infrastructure exists to make that interaction count. Everything else — TurboQuant, the Nexus, agent optimization, burn reports — is infrastructure that serves the ability to reach someone in their darkest moment. The system must be able to run without Alexander watching it. Sovereignty means it works when he is not at the keyboard.
PART 10: KEY FILES AND THEIR LOCATIONS
Orchestration: ~/.hermes/bin/timmy-orchestrator.sh, ~/.hermes/bin/workforce-manager.py Dispatch: ~/.hermes/bin/agent-dispatch.sh Loop scripts: ~/.hermes/bin/claude-loop.sh (620 lines), ~/.hermes/bin/gemini-loop.sh, ~/.hermes/bin/kimi-loop.sh Soul/Identity: ~/.hermes/SOUL.md (Inscription 1 — Immutable Conscience) Config: ~/.hermes/config.yaml, ~/.timmy/ directory for Timmy's workspace Gitea token: ~/.hermes/gitea_token_vps (Timmy identity); alexanderwhitestone.com is domain for forge Scorecards: ~/.hermes/logs/agent-scorecards.json Alerts: ~/.hermes/logs/workforce-alerts.json Morrowind MCP agent: ~/.hermes/profiles/morrowind/ — OpenMW with two-tier brain (local reflex + cloud reasoning)
PART 11: IMMEDIATE PRIORITIES BY IMPACT
P0 — The Door: 6 open issues. Crisis front door. If this doesn't ship, the mission doesn't exist publicly.
P0 — Fleet rebalance: Reassign from dead agents (Fenrir 17, Allegro 55, Gemini 34 open). Dead agents should not hold work.
P0 — Gitea URL cutover: Multiple scripts still reference 143.198.27.163:3000. Must move to https://forge.alexanderwhitestone.com across workforce-manager.py, claude-loop.sh, timmy-orchestrator.sh.
P1 — Deep Dive (#830): Sovereign NotebookLM. Automated daily briefing with fleet awareness and premium TTS voice.
P1 — Claw Code: Clean-room agentic coding system derived from Claude Code study.
P1 — Quality gates: Pre-merge verification that PRs actually solve their issue.
P2 — Level 3 fleet: Dynamic agent assignment, health monitoring, auto-rebalancing.