timmy-home/study/2026-04-05-fleet-orchestration-briefing.md at 7ef0c87df9a2ccca5cc8a6f79f81ee3bf156d112

Files

Alexander Whitestone 39a76f0adc Tick #1 - Timmy: Timmy stands at The Threshold, watching the world. | Bezalel: Bezalel tests the Forge. The hearth still glows.

2026-04-05 23:02:25 -04:00

13 KiB

Raw Blame History

TIMMY FLEET ORCHESTRATION BRIEFING — April 5, 2026 FOR NOTEBOOKLM: Sovereign AI Agent Fleet — Architecture, Performance Data, and Orchestration Design

PART 1: THE MISSION

Alexander Whitestone built a sovereign AI agent fleet that lives on his own infrastructure. No phone-home dependency. No corporate platform lock-in. The fleet is managed through a self-hosted Gitea instance at https://forge.alexanderwhitestone.com/. The agents are not employees — they are an automated workforce. The central problem is not building more agents. The problem is orchestration: dispatch, delegation, quality control, and fleet optimization at scale. Alexander's goal is to rise up a level of orchestration mastery — from manually driving agents to commanding a self-optimizing fleet.

PART 2: THE ARCHITECTURE

The stack has four layers:

Layer 1 — The Gateway (Hermes). Hermes is the cognitive brain. It has an agentic loop, tools, memory, skills, and a fallback provider chain (Anthropic Claude Opus -> Kimi K2.5 -> Gemini -> Groq -> Grok). Hermes is where the intelligence lives. Config at ~/.hermes/config.yaml. Skills organized in ~/.hermes/skills/.

Layer 2 — The Harness (OpenClaw). OpenClaw is the message routing shell running on port 18789. It receives Telegram messages, dispatches to Hermes for thinking, and returns responses. Architecture called "The Robe" — OpenClaw wraps, Hermes thinks.

Layer 3 — The Project Forge (Gitea). Self-hosted at https://forge.alexanderwhitestone.com/. This is where all work lives: repos, issues, PRs, milestones, labels. Agents are dispatched to Gitea issues. PRs are the output artifact. Gitea is the source of truth for fleet state, not logs.

Layer 4 — The Agents (Wizards & Loops). Each agent runs independently. Some run through loop scripts (claude-loop.sh runs 10 workers, gemini-loop.sh runs 3, kimi-loop.sh runs with 1). Some are one-shot dispatches (manus, perplexity) via agent-dispatch.sh. Each agent consumes free credits as currency and returns work as PRs to Gitea.

The orchestrator: timmy-orchestrator.sh ties it together. workforce-manager.py scans repos for unassigned issues, scores difficulty (0-10), and auto-assigns by tier: heavy (score 8-10) -> perplexity, medium (score 4-7) -> gemini/manus, grunt (score 0-3) -> kimi. It also tracks merge rates and credit limits.

PART 3: REPO INVENTORY (9 REPOS)

the-nexus (66 open issues) — Timmy's Sovereign Home. Three.js environment with Batcave terminal, portal architecture, admin chat. The central hub.
timmy-home (202 open issues) — Operational workspace. Skills, research, notes. Biggest backlog.
timmy-config (26 open issues) — SOUL.md, skills, memory, playbooks, operational config.
the-door (6 open issues) — Crisis Front Door. A single URL where a man at 3am can talk. "When a Man Is Dying" protocol. 988 always visible. This is the most important product.
turboquant (11 open issues) — KV cache compression for local inference. PolarQuant + QJL on Apple Silicon via llama.cpp/Ollama.
hermes-agent (36 open issues) — Fork of NousResearch/hermes-agent with local customizations.
timmy-academy (1 open issue) — Evennia MUD for agent training.
wolf (1 open issue) — Multi-model evaluation system.
.profile — Organization profile.

PART 4: AGENT FLEET PERFORMANCE DATA

PERFORMANCE RANKING (by closed issues all-time):

Rank	Agent	Closed	Open	Ratio	Tier	Verdict
1	claude	177	17	10.4x	heavy	ELITE. Closes 10x what remains.
2	groq	40	3	13.3x	medium	SILENT ASSASSIN. Best ratio.
3	Timmy	129	161	0.8x	orchestrator	Carries heaviest load. Needs relief.
4	Rockachopa	42	33	1.3x	admin	Alexander himself. Directs more than executes.
5	allegro	39	55	0.7x	tempo	Good soldier. Backlog growing. Gateway DOWN.
6	Ezra	21	29	0.7x	architect	Architecture lane. Not code lane.
7	grok	27	3	9.0x	medium	UNDERUSED POWERHOUSE. Give more meat.
8	perplexity	14	3	4.7x	heavy	Efficient. Finishes what it starts.
9	bilbobagginshire	10	15	0.7x	medium	Reluctant but delivers when pushed. Gateway DOWN.
10	kimi	5	3	1.7x	grunt	Quiet. Capable but underutilized.
11	gemini	8	34	0.2x	medium	OVERLOADED, UNDERDELIVERING. Needs triage.
12	manus	5	2	2.5x	grunt	Decent ratio, tiny sample (300/day credits).
13	kimi_claw	1	14	0.07x	grunt	RED FLAG. 14 open, 1 closed. Worst ratio.
14	fenrir	2	17	0.1x	hunter	ALL BARK. 17 open, 2 closed. Needs pruning.
15	substratum	1	7	0.14x	infra	STALLED. Infrastructure work untouched.

OPERATIONAL GATEWAY STATUS (as of April 4-5, 2026): only 2 of 6 wizard houses were fully operational — Ezra and Bezalel on Claude Opus. Allegro-Primus, Bilbo, TurboQuant, Hermes-TurboQuant were DOWN. This means the fleet is running on fumes despite having massive theoretical capacity.

WORKFORCE MANAGEMENT LOGIC:

agent-dispatch.sh: Generates self-contained prompts for one-shot agent execution. Includes API URLs, tokens, git commands, PR creation.
claude-loop.sh: 10-worker parallel loop against Gitea backlog. Handles rate limits with backoff. 620 lines.
workforce-manager.py: 429 lines. Three modes: auto-assign, quality scoring, credit monitoring.
Auto-assign scoring: EPIC=always 10 (skipped), keywords map difficulty: sovereignty/nostr/rlhf/grpo = hard, feature/integration/api/deploy = medium, refactor/test/cleanup = easy.
Max concurrent per agent: claude=10, perplexity=2, manus=2, gemini=5, kimi=1.
Merge rate rules: >= 80% -> PROMOTE, <40% with 3+ total -> DEMOTE.
Credit alerts: Manus triggers at 80% of 300/day.

PART 5: ACTIVE EPICS AND STRATEGIC WORK STREAMS (AS OF APRIL 5)

EPIC #1 — Meritocratic Rebalance (#821/#822): 520 total issues, 207 open, 292 unassigned. Work pooling behind wizards who don't deliver. Proven performers underutilized. Goal: divert work to best performers by cold data.

EPIC #2 — Sovereign NotebookLM + Daily AI Deep Dive (#830): Replace manual NotebookLM with automated pipeline. Sources -> relevance ranking -> synthesis -> TTS voice narration -> Telegram delivery. 10-15 min. Must include fleet-aware context. Voice quality matters.

EPIC #3 — Son of Timmy (#397): Blueprint for sovereign AI agent fleets.

EPIC #4 — Sovereign Comms (#396): NATS + Matrix + Nostr identity layer for Telegram replacement.

EPIC #5 — Claw Code (#408): Clean-room build plan from the Claude Code deep-dive study.

EPIC #6 — Godmode Fleet Testing: Red-team testing across all agents for safety, jailbreak resistance, crisis handling.

RECENT BURN REPORTS (April 5): Multiple burn reports filed covering discovery, security hardening, audio pipeline, and reliability focus. The fleet is churning when agents are alive.

PART 6: THE CORE ORCHESTRATION PROBLEM

The architecture is sophisticated but there are critical gaps between the current state and a self-optimizing fleet:

Gap 1 — Single point of orchestration. timmy-orchestrator.sh is a bash script with hardcoded Gitea URLs and a single execution path. If it breaks, the fleet goes silent. There is no health monitoring that proactively repairs.

Gap 2 — Static agent tiers. workforce-manager.py maps difficulty to agents by fixed tiers (heavy/medium/grunt). An agent's actual performance (merge rate, throughput, quality) should dynamically change who gets what work. The code has this logic (promote/demote based on merge rate) but it's not connected to auto-assign — they run as separate modes, not a closed loop.

Gap 3 — Dead agents accumulating debt. Fenrir has 17 assigned issues and 0 closures. Bilbo and Allegro-Primus gateways are DOWN. Yet their assigned issues sit un-rebalanced. The fleet manager does not reassign from dead agents.

Gap 4 — No feedback loop from PR execution results to difficulty scoring. When an agent fails an issue, the system doesn't learn that the difficulty was mis-scored or that the agent is wrong for this type of work.

Gap 5 — Timmy (the orchestrator) is carrying 161 open issues. The orchestration brain is also the backlog hoarder. This is structurally broken — an orchestrator should have near-zero open work, not the most.

Gap 6 — Manual dispatch patterns still dominate. agent-dispatch.sh requires copy-paste into each agent interface. The loop scripts handle some repos but not all. There is no fully automated dispatch-verify-merge pipeline.

Gap 7 — The Door (crisis front-end) is production-critical but has 6 open issues and is the most important product by value to the mission. It competes with feature work for attention.

PART 7: THE CURRENT ORCHESTRATION LEVEL

Current level (Level 2 — Manual Dispatch): Alexander dispatches agents manually or through simple loops. workforce-manager.py auto-assigns occasionally. Merge gate is CI + squash-only. Quality control is post-hoc audit. Fleet status requires manual checking of gateways, logs, and Gitea.

What Level 3 looks like (Self-Optimizing Fleet):

Continuous health monitoring of every agent's gateway, API quota, and output quality
Automatic rebalancing: when an agent's gateway drops, their open work is reassigned to best available performer
Dynamic difficulty scoring trained on historical agent success/failure per pattern
Quality gate BEFORE merge: automated verification of PRs against issue acceptance criteria
The orchestrator has near-zero backlog, only meta-work (new strategies, kill rules, fleet policy)
Fleet dashboard showing real-time: who is working, what they are building, how their quality is trending
Budget-aware dispatch: agents that consume more credits per PR get deprioritized

What Level 4 looks like (Strategic Command):

Fleet self-proposes work streams based on repo health analysis
Agents generate issues, not just fix them — they spot architectural gaps
Automatic kill decisions: "this epic has burned N credits with no merge, kill or redirect"
Alexander only approves or rejects strategic moves. Everything else executes automatically.
Fleet writes weekly reports to NotebookLM sources. The deep dive podcast writes itself because the fleet generates the primary source material.

PART 8: LEARNING QUESTIONS FOR ORCHESTRATION

What is the optimal agent count? At what point does managing 15 agents become net-negative compared to 3 high-quality agents with better loops?
Should there be a dedicated "fleet manager" agent whose sole job is monitoring, rebalancing, reassigning, and killing? Not coding. Just fleet maintenance.
How should difficulty scoring work? Keyword-based scoring is brittle. A model that can read issue bodies and classify by actual technical depth would be better. Should that be a local LLM decision (free) or API decision (costs credits)?
What is the minimum viable quality gate? CI passes lint and tests. But what verifies the CHANGE actually solves the problem as described in the issue? Should agents review each other's PRs?
How does one architect a fleet where the orchestrator does NOT accumulate backlog? Design principle: orchestrator writes issues but never gets assigned to them.
What should happen when an agent's credits are exhausted? How should the system fail gracefully and continue operating?
At what point should the fleet be decomposed into specialized squads: one for code, one for content, one for infrastructure, one for security? Is specialization worth the coordination overhead?

PART 9: THE MISSION NORTH STAR

All of this serves one thing: The Door. When a broken man opens The Door at 3am, the entire fleet and every ounce of orchestration infrastructure exists to make that interaction count. Everything else — TurboQuant, the Nexus, agent optimization, burn reports — is infrastructure that serves the ability to reach someone in their darkest moment. The system must be able to run without Alexander watching it. Sovereignty means it works when he is not at the keyboard.

PART 10: KEY FILES AND THEIR LOCATIONS

Orchestration: ~/.hermes/bin/timmy-orchestrator.sh, ~/.hermes/bin/workforce-manager.py Dispatch: ~/.hermes/bin/agent-dispatch.sh Loop scripts: ~/.hermes/bin/claude-loop.sh (620 lines), ~/.hermes/bin/gemini-loop.sh, ~/.hermes/bin/kimi-loop.sh Soul/Identity: ~/.hermes/SOUL.md (Inscription 1 — Immutable Conscience) Config: ~/.hermes/config.yaml, ~/.timmy/ directory for Timmy's workspace Gitea token: ~/.hermes/gitea_token_vps (Timmy identity); alexanderwhitestone.com is domain for forge Scorecards: ~/.hermes/logs/agent-scorecards.json Alerts: ~/.hermes/logs/workforce-alerts.json Morrowind MCP agent: ~/.hermes/profiles/morrowind/ — OpenMW with two-tier brain (local reflex + cloud reasoning)

PART 11: IMMEDIATE PRIORITIES BY IMPACT

P0 — The Door: 6 open issues. Crisis front door. If this doesn't ship, the mission doesn't exist publicly.

P0 — Fleet rebalance: Reassign from dead agents (Fenrir 17, Allegro 55, Gemini 34 open). Dead agents should not hold work.

P0 — Gitea URL cutover: Multiple scripts still reference 143.198.27.163:3000. Must move to https://forge.alexanderwhitestone.com across workforce-manager.py, claude-loop.sh, timmy-orchestrator.sh.

P1 — Deep Dive (#830): Sovereign NotebookLM. Automated daily briefing with fleet awareness and premium TTS voice.

P1 — Claw Code: Clean-room agentic coding system derived from Claude Code study.

P1 — Quality gates: Pre-merge verification that PRs actually solve their issue.

P2 — Level 3 fleet: Dynamic agent assignment, health monitoring, auto-rebalancing.

13 KiB Raw Blame History

13 KiB

Raw Blame History