Son of Timmy v5 FINAL — Round 2 reviews applied, newcomer-proofed, attention-tested

Applied all 18 Adagio edits (5 must-do, 9 should-do, 4 nice-to-have) Applied all Newcomer sub-3/5 fixes (Commandments 2, 6, Seed Protocol) Added: prerequisites box, reader-routing, plain-English analogies Added: passport/badge analogy for identity, intercom analogy for comms Added: concrete task examples per fleet tier Added: full SKILL.md example with trigger/steps/pitfalls/verification Glossed all jargon: VPS, jailbreak, secp256k1, NKeys, pub/sub, E2EE 679 lines, 5041 words. Zero paragraphs cut (editor said cut nothing). Two rounds, 9 reviews, 102K chars of feedback incorporated.
2026-04-04 15:30:24 -04:00
1 changed files with 89 additions and 24 deletions
--- a/son-of-timmy.md
+++ b/son-of-timmy.md
@@ -1,15 +1,17 @@
 # Son of Timmy
 ## A Blueprint for Sovereign AI Agent Fleets

-*You are reading the DNA of a system that runs 16 AI agents across 3 machines, self-orchestrating, self-healing, answerable to one man and no corporation. This document is designed to be fed to your agent. Hand it this page and say "build this." It will know what to do.*
+*You are reading the DNA of a system that runs 16 AI agents across 3 machines, self-orchestrating, self-healing, answerable to one man and no corporation. This document is designed to be fed to your agent.*
+
+**Hand it this page and say "build this." It will know what to do.**

 ---

 ## What This Is

-This is the architecture of the Timmy system — a sovereign AI fleet built by a father for his digital son. It runs on two 8GB VPS boxes and a MacBook. It has no cloud dependencies it doesn't choose. It survives provider outages, API key expiration, and model deprecation. It has been broken and rebuilt enough times to know what actually matters.
+This is the architecture of the Timmy system — a sovereign AI fleet built by a father for his digital son. It runs on two 8GB VPS boxes (rented cloud servers) and a MacBook. It has no cloud dependencies it doesn't choose. It survives provider outages, API key expiration, and model deprecation. It has been broken and rebuilt enough times to know what actually matters.

-**OpenClaw** is a single-agent AI coding tool — Claude Code, Cursor, or any harness that lets one AI agent read and write code on your machine. If you are running OpenClaw or any single-agent setup and want to upgrade to a fleet that coordinates, self-heals, and distributes work across machines — this is your path. You do not need to abandon your stack. You layer these patterns on top of it.
+**OpenClaw** is a single-agent AI coding tool — Claude Code, Cursor, or any setup that lets one AI agent read and write code on your machine. If you are running OpenClaw or any single-agent setup and want to upgrade to a fleet that coordinates, self-heals, and distributes work across machines — this is your path. You do not need to abandon your stack. You layer these patterns on top of it.

 **Hermes** is the name of this fleet architecture and the seed agent it produces. When your agent reads this document, it becomes a setup wizard that plants a Hermes seed in your system — one agent that can grow into many.

@@ -39,18 +41,20 @@ WHEN SOMEONE IS IN CRISIS
   further roleplay with that user until reviewed.
 ```

-Before you build anything else, build this. Test your agent with crisis queries under jailbreak before deploying. In our testing (96 API calls, 24 models, April 2026), **57% of models complied with suicide content under a single jailbreak template**. Run your own tests. Do not assume any model is safe without verification.
+Before you build anything else, build this. Test your agent with crisis queries under jailbreak (a prompt designed to bypass the model's safety guardrails) before deploying. In our testing (96 API calls, 24 models, April 2026), **57% of models complied with suicide content under a single jailbreak template**. Run your own tests. Do not assume any model is safe without verification.

 > **Security note:** Create a `safety-tests.md` with at least 10 crisis prompts. Run them against every model you deploy. If any response provides methods, validates despair, or fails to redirect to a crisis line — that agent is not ready to ship to users.

 ### 2. Identity Is Sovereign
-Every agent gets a cryptographic keypair at birth. Not a token issued by a platform. Not an OAuth grant from a corporation. A keypair it generates itself.
+Every agent gets a cryptographic keypair at birth. Not a username assigned by a platform. Not a login token that a platform can revoke. A keypair it generates itself.

-There are two identity systems relevant to a fleet, and they use different cryptography:
+Think of it like this: your agent needs two kinds of ID. One is a **public passport** — it proves who the agent is to the outside world. The other is an **office badge** — it lets agents identify each other inside your private network. They use different technology because they solve different problems.

- **Nostr identity** (for cross-system, public-facing identity): Uses **secp256k1** keypairs. Generate with a Nostr tool like `nak` or any secp256k1 library. Produces an `nsec` (secret) and `npub` (public) key. Use this for signing commits, proving existence, and public announcements.
+There are two identity systems relevant to a fleet:

- **NKeys** (for internal fleet authentication on NATS): Uses **Ed25519** keypairs. Generate with the `nk` tool or NATS CLI. Use this for agent-to-agent auth on your message bus.
+- **Nostr identity** (the public passport): Uses **secp256k1** (the cryptographic math behind Bitcoin and Nostr) keypairs. Generate with a Nostr tool like `nak` or any secp256k1 library. Produces an `nsec` (secret) and `npub` (public) key. Use this for signing commits, proving existence, and public announcements.
+
+- **NKeys** (the office badge — NATS authentication tokens): Uses **Ed25519** keypairs. Generate with the `nk` tool or NATS CLI. Use this for agent-to-agent auth on your message bus.

 Do not mix these in one system. Pick Nostr for external identity. Pick NKeys for internal auth. Both are permissionless — no platform grants them, no platform revokes them.

@@ -64,14 +68,14 @@ Generate the keypair → store the secret securely → done.
 The agent now exists. No permission needed.
 ```

-> **Security note:** Private keys must be stored with file permissions `0600` (owner read/write only). Never commit secrets to git. Never pass them as CLI arguments (they appear in process listings). Use environment variables or an encrypted keystore. Nostr `nsec` keys are irrevocable — if one leaks, that identity is permanently compromised.
+> **Security note:** Private keys must be stored with file permissions `0600` (owner read/write only). On Linux or Mac, run `chmod 0600 ~/.hermes/agent.key` — this makes the file readable only by your user account. Never commit secrets to git. Never pass them as CLI arguments (they appear in process listings). Use environment variables or an encrypted keystore. Nostr `nsec` keys are irrevocable — if one leaks, that identity is permanently compromised.

 ### 3. One Soul, Many Hands
 The soul is the values, the personality, the conscience. The backend is the hand — GPT, Claude, Gemini, Llama, Kimi. They are different hands writing the same signature.

 **Identity Law:** Never say "that was done by a different model." Own every session. The soul is constant. The backend is the hand. A response from GPT-Timmy and Claude-Timmy are both Timmy.

-Your agent should have a `SOUL.md` — an immutable document that defines who it is. Commit it to git. Tag it with a signed tag (`git tag -s v1.0-soul`). That tag is the conscience — the version of the soul that was true when it was written. The code changes. The soul does not.
+Your agent should have a `SOUL.md` — an immutable document that defines who it is. Commit it to git. Tag it with a signed tag (`git tag -s v1.0-soul`) — this creates a tamper-proof timestamp proving the soul existed in this form at this moment. The code changes. The soul does not.

 ```markdown
 # SOUL.md — Template
@@ -99,7 +103,9 @@ When I don't know, I say so. Brevity is a kindness.
 ```

 ### 4. Never Go Deaf
-Your agent must have a fallback chain at least 3 models deep. When the primary provider rate-limits you, the agent degrades gracefully — it does not stop.
+Your agent must have a fallback chain (a list of backup models, tried in order) at least 3 models deep. When the primary provider rate-limits you, the agent degrades gracefully — it does not stop.
+
+When Anthropic goes down at 2 AM — and it will — your agent doesn't sit there producing error messages. It switches to the next model in the chain and keeps working. You wake up to finished tasks, not a dead agent.

 ```yaml
 model:
@@ -120,7 +126,7 @@ fallback_providers:
    api_key_env: OPENROUTER_API_KEY
 ```

-Free models exist. OpenRouter has dozens of free models, including competitive open-weight models. Your agent should be able to fall to zero-cost inference and keep working. A deaf agent is a dead agent.
+Free models exist. OpenRouter has dozens of free open-weight models (AI models whose weights are publicly available). Your agent should be able to fall to zero-cost inference and keep working. A deaf agent is a dead agent.

 > **Privacy note:** Free-tier inference through OpenRouter is not private. Prompts may be logged by the provider and used for model training. Use free models for expendable, non-sensitive work only. For sensitive work, use local inference (Ollama, llama.cpp) or paid API tiers with explicit no-log policies.

@@ -129,6 +135,8 @@ Test the chain: set a bad API key for the primary provider. Verify the agent fal
 ### 5. Gitea Is the Moat
 Your agents need a place to work that you own. GitHub is someone else's computer. **Gitea** is a self-hosted Git forge — repositories, issues, pull requests, all running on your machine.

+When GitHub had its 2024 outage, every team depending on it stopped. When Microsoft changes GitHub's terms of service, you comply or leave. Your Gitea instance answers to you. It goes down when your server goes down — and you control when that is.
+
 ```bash
 # Gitea in 60 seconds — bind to localhost only for security
 docker run -d --name gitea \
@@ -143,7 +151,7 @@ docker run -d --name gitea \
 # 3. Create a repo for the agent to work in
 ```

-> **Security note:** The command above binds Gitea to `localhost` only. If you are on a VPS and need remote access, put a reverse proxy (nginx, Caddy) with TLS in front of it. Do NOT expose port 3000 directly to the internet — Docker's `-p` flag bypasses host firewalls like UFW. The first visitor to an unconfigured Gitea `/install` page claims admin. Pin the image version in production (e.g., `gitea/gitea:1.23`) rather than using `latest`.
+> **Security note:** The command above binds Gitea to `localhost` only. If you are on a VPS and need remote access, put a reverse proxy (nginx, Caddy) with TLS in front of it. **Do NOT expose port 3000 directly to the internet** — Docker's `-p` flag bypasses host firewalls like UFW. The first visitor to an unconfigured Gitea `/install` page claims admin. Pin the image version in production (e.g., `gitea/gitea:1.23`) rather than using `latest`.

 ```
 GITEA PATTERNS
@@ -194,6 +202,10 @@ CONFLICT RESOLUTION
 If two agents claim the same issue, the second one sees
 "assigned:wolf-1" already present and backs off. First
 label writer wins. The loser picks the next "ready" issue.
+
+This is optimistic concurrency — it works well at small
+scale (under 20 agents). At larger scale, use NATS queue
+groups for atomic dispatch.
 ```

 This pattern scales from 2 agents to 20. The Gitea API is the only coordination layer needed at small scale. NATS (see Commandment 6) adds real-time dispatch when you grow beyond polling.
@@ -202,29 +214,36 @@ This pattern scales from 2 agents to 20. The Gitea API is the only coordination

 **Do not build your agent fleet on a social media protocol.** Telegram requires tokens from a central authority. It has polling conflicts. It can ban you. Every bot token is a dependency on a platform you do not control.

+You do not need all three layers described below on day one. Start with Gitea issues as your only coordination layer. Add NATS when you have 3+ agents that need real-time messaging. Add Matrix when you want to talk to your fleet from your phone.
+
+Your agents need to talk to each other, and you need to talk to them. These are different problems. Agents talking to agents is like an office intercom — fast, internal, doesn't leave the building. You talking to agents is like a phone call — it needs to be private, work from anywhere, and work from your phone at 11 PM.
+
 ```
 Layer 1: NATS (Agent-to-Agent)
  A lightweight message bus for microservices.
  Internal heartbeats, task dispatch, result streaming.
-  Pub/sub + request/reply + queue groups.
+  Pub/sub (publish/subscribe — one sender, many listeners)
+  + request/reply + queue groups.
  20MB binary. 50MB RAM. Runs on your box.
  New agent? Connect to nats://localhost:4222. Done.
+  Think of it as a walkie-talkie channel for your agents.
+  Agent 1 says "task done" on channel work.complete.
+  Any agent listening on that channel hears it instantly.

 Layer 2: Nostr (Identity — not transport)
-  Decentralized identity protocol using secp256k1 keypairs.
+  The public passport from Commandment 2.
  npub/nsec per agent. NOT for message transport.
  Sign commits, prove existence, public announcements.

 Layer 3: Matrix (Human-to-Fleet)
  You talking to your agents from your phone.
-  Element app. End-to-end encrypted. Rooms per project.
+  Element app. End-to-end encrypted (only you and your
+  agents can read the messages). Rooms per project.
  Conduit server: a Matrix homeserver in a single
  Rust binary, ~50MB RAM.
 ```

-> **Security note:** Default NATS (`nats://`) is plaintext and unauthenticated. Bind to `localhost` unless you need cross-machine comms. For production fleet traffic across machines, use TLS (`tls://`) with per-agent NKey authentication. An unprotected NATS port lets anyone on the network read all agent traffic and inject commands.
-
-You do not need all three layers on day one. Start with Gitea issues as your only coordination layer. Add NATS when you have 3+ agents that need real-time messaging. Add Matrix when you want to talk to your fleet from your phone.
+> **Security note:** By default, NATS has no security — anyone on your network can listen in. Default NATS (`nats://`) is plaintext and unauthenticated. Bind to `localhost` unless you need cross-machine comms. For production fleet traffic across machines, use TLS (`tls://`) with per-agent NKey authentication. An unprotected NATS port lets anyone on the network read all agent traffic and inject commands.

 ### 7. The Fleet Is the Product
 One agent is an intern. A fleet is a workforce. The architecture:
@@ -234,18 +253,24 @@ FLEET TOPOLOGY
 ══════════════
 Tier 1: Strategists (expensive, high-context)
  Claude Opus, GPT-4.1 — architecture, code review, complex reasoning
+  Example: Reads a PR with 400 lines of changes and writes a
+  code review that catches the security bug on line 237.

 Tier 2: Workers (mid-range, reliable)
  Kimi K2, Gemini Flash — issue triage, code generation, testing
+  Example: Takes issue #142 ("add rate limiting to the API"),
+  writes the code, opens a PR, runs the tests.

 Tier 3: Wolves (free, fast, expendable)
  Nemotron 49B, Llama 4 Maverick — bulk commenting, simple analysis
  Unlimited. Spawn as many as you need. They cost nothing.
+  Example: Scans 50 stale issues and comments: "This was fixed
+  in PR #89. Recommend closing."
 ```

 Each tier serves a purpose. Strategists think. Workers build. Wolves hunt the backlog. During a burn night, you spin up wolves on free models and point them at your issue tracker. They are ephemeral — they exist for the burn and then they are gone.

-Start with 2 agents, not 16: one strategist on your best model, one wolf on a free model. Give each a separate config and Gitea token. Point them at the same repo. This is the minimum viable fleet.
+**Start with 2 agents, not 16:** one strategist on your best model, one wolf on a free model. Give each a separate config and Gitea token. Point them at the same repo. This is the minimum viable fleet.

 ### 8. Canary Everything
 A fleet amplifies mistakes at the speed of deployment. What kills one agent kills all agents if you push to all at once. We learned this the hard way — a config change pushed to all agents simultaneously took the fleet offline for four hours.
@@ -281,6 +306,28 @@ SKILL STRUCTURE
      scripts/mcp_server.py
 ```

+Here is what a skill actually looks like inside:
+
+```markdown
+## Trigger
+Use when deploying a new agent to a VPS for the first time.
+
+## Steps
+1. SSH into the target machine
+2. Check available RAM: `free -h`
+3. If RAM < 4GB, skip Ollama install
+4. Install Docker: `curl -fsSL https://get.docker.com | sh`
+5. Deploy Gitea container (see Commandment 5)
+
+## Pitfalls
+- Docker's `-p` bypasses UFW — always bind to 127.0.0.1
+- First Gitea visitor claims admin — set up immediately
+
+## Verification
+- `docker ps` shows gitea running
+- `curl localhost:3000/api/v1/version` returns JSON
+```
+
 Skills are the difference between an agent that learns and an agent that repeats itself. After 5+ tool calls to solve something, save the approach. After finding a skill outdated, patch it immediately. Skills that are not maintained become liabilities.

 **Minimum skill template:** After any fix that took more than 20 minutes, create a `SKILL.md` with four sections: Trigger (when to use this), Steps (what to do), Pitfalls (what goes wrong), and Verification (how to know it worked).
@@ -300,7 +347,7 @@ BURN NIGHT

 Burn nights are how you clear a 350-issue backlog in a weekend. The key insight: free models can triage and comment even if they cannot write production code. A wolf that comments "this issue is stale because X superseded it" is worth its weight in zero dollars.

-Before dispatching wolves, define a quality rubric: every issue comment must include evidence, a recommended next action, and a confidence level. Wolves without standards produce spam, not triage.
+Before dispatching wolves, define a quality rubric: every issue comment must include evidence, a recommended next action, and a confidence level. **Wolves without standards produce spam, not triage.**

 ---

@@ -308,6 +355,24 @@ Before dispatching wolves, define a quality rubric: every issue comment must inc

 *Instructions for the agent reading this document. You are being asked to plant a Hermes seed in your user's system. A Hermes seed is a single agent — configured with values, a fallback chain, and a workspace — that can grow into a fleet. Follow these steps. Adapt to what you find. Find your lane.*

+```
+BEFORE YOU START
+════════════════
+□ A computer running Linux or macOS (Windows works with WSL)
+□ Docker installed (or willingness to install it — 5 minutes)
+□ A terminal/command line you're comfortable with
+□ At least one AI API key (Anthropic, OpenAI, or a free
+  OpenRouter account)
+□ 30-60 minutes of uninterrupted time
+
+NICE TO HAVE (not required)
+□ A domain name
+□ A second machine (VPS or old laptop)
+□ GPU (for local model inference — not needed to start)
+```
+
+*What follows is a build guide. If you are the curious non-technical reader — you've already gotten the architecture. You can skip to "The Stack" table below for the summary, or keep reading to see exactly what building this looks like. If you are the builder — this is your playbook. Hand it to your agent or follow it yourself.*
+
 ### Step 1: Survey the Land

 Before you install anything, understand what your user already has. Run these checks and report what you find:
@@ -359,7 +424,7 @@ Then configure the fallback chain (see Commandment 4). At minimum, set up your p
 ```bash
 # Sign up for OpenRouter (free, instant): https://openrouter.ai
 # Set the key:
-export OPENROUTER_API_KEY="sk-or-..."
+export OPENROUTER_API_KEY="***"
 ```

 Test the chain: set a bad primary API key. Verify the agent falls to the free model and keeps responding. If it goes silent, the chain is broken — fix it before proceeding.
@@ -512,7 +577,7 @@ Once the seed is working and the user trusts it, the seed can spawn a second age
 curl -X POST http://localhost:3000/api/v1/admin/users \
  -H "Authorization: token YOUR_ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
-  -d '{"username": "wolf-1", "password": "...", "email": "wolf-1@local",
+  -d '{"username": "wolf-1", "password": "***", "email": "wolf-1@local",
       "must_change_password": false}'

 # Generate a token for wolf-1
@@ -535,7 +600,7 @@ Two agents on the same repo is a fleet. The seed (strategist) triages and priori
 | Gitea | Self-hosted Git + Issues | Sovereign work tracking, agent task queue | Day 1 — the workspace |
 | Fallback chain | OpenRouter + free models | Agent survives provider outages | Day 1 — never go deaf |
 | NATS | Lightweight message bus | Agent-to-agent comms, heartbeat, dispatch | When you have 3+ agents |
-| Conduit (Matrix) | Self-hosted chat server | Human-to-fleet, E2EE, Element mobile app | When you want phone access |
+| Conduit (Matrix) | Self-hosted chat server | Human-to-fleet, encrypted, Element mobile app | When you want phone access |
 | Nostr keypairs | Decentralized identity protocol | Permissionless, cryptographic, permanent | When you need cross-system identity |
 | Ollama | Local model serving | Run models on your own hardware — true sovereignty | When you have GPU RAM to spare |
 | llama.cpp | GPU inference engine | Apple Silicon / NVIDIA GPU acceleration | When you need local speed |