timmy-config/docs/FLEET_BEHAVIOUR_HARDENING.md

# Fleet Behaviour Hardening — Review & Action Plan

**Author:** @perplexity
**Date:** 2026-04-08
**Context:** Alexander asked: "Is it the memory system or the behaviour guardrails?"
**Answer:** It's the guardrails. The memory system is adequate. The enforcement machinery is aspirational.

---

## Diagnosis: Why the Fleet Isn't Smart Enough

After auditing SOUL.md, config.yaml, all 8 playbooks, the orchestrator, the guard scripts, and the v7.0.0 checkin, the pattern is clear:

**The fleet has excellent design documents and broken enforcement.**

| Layer | Design Quality | Enforcement Quality | Gap |
|---|---|---|---|
| SOUL.md | Excellent | None — no code reads it at runtime | Philosophy without machinery |
| Playbooks (7 yaml) | Good lane map | Not invoked by orchestrator | Playbooks exist but nobody calls them |
| Guard scripts (9) | Solid code | 1 of 9 wired (#395 audit) | 89% of guards are dead code |
| Orchestrator | Sound design | Gateway dispatch is a no-op (#391) | Assigns issues but doesn't trigger work |
| Cycle Guard | Good 10-min rule | No cron/loop calls it | Discipline without enforcement |
| PR Reviewer | Clear rules | Runs every 30m (if scheduled) | Only guard that might actually fire |
| Memory (MemPalace) | Working code | Retrieval enforcer wired | Actually operational |

### The Core Problem

Agents pick up issues and produce output, but there is **no pre-task checklist** and **no post-task quality gate**. An agent can:

1. Start work without checking if someone else already did it
2. Produce output without running tests
3. Submit a PR without verifying it addresses the issue
4. Work for hours on something out of scope
5. Create duplicate branches/PRs without detection

The SOUL.md says "grounding before generation" but no code enforces it.
The playbooks define lanes but the orchestrator doesn't load them.
The guards exist but nothing calls them.

---

## What the Fleet Needs (Priority Order)

### 1. Pre-Task Gate (MISSING — this PR adds it)

Before an agent starts any issue:
- [ ] Check if issue is already assigned to another agent
- [ ] Check if a branch already exists for this issue
- [ ] Check if a PR already exists for this issue
- [ ] Load relevant MemPalace context (retrieval enforcer)
- [ ] Verify the agent has the right lane for this work (playbook check)

### 2. Post-Task Gate (MISSING — this PR adds it)

Before an agent submits a PR:
- [ ] Verify the diff addresses the issue title/body
- [ ] Run syntax_guard.py on changed files
- [ ] Check for duplicate PRs targeting the same issue
- [ ] Verify branch name follows convention
- [ ] Run tests if they exist for changed files

### 3. Wire the Existing Guards (8 of 9 are dead code)

Per #395 audit:
- Pre-commit hooks: need symlink on every machine
- Cycle guard: need cron/loop integration
- Forge health check: need cron entry
- Smoke test + deploy validate: need deploy script integration

### 4. Orchestrator Dispatch Actually Works

Per #391 audit: the orchestrator scores and assigns but the gateway dispatch just writes to `/tmp/hermes-dispatch.log`. Nobody reads that file. The dispatch needs to either:
- Trigger `hermes` CLI on the target machine, or
- Post a webhook that the agent loop picks up

### 5. Agent Self-Assessment Loop

After completing work, agents should answer:
- Did I address the issue as stated?
- Did I stay in scope?
- Did I check the palace for prior work?
- Did I run verification?

This is what SOUL.md calls "the apparatus that gives these words teeth."

---

## What's Working (Don't Touch)

- **MemPalace sovereign_store.py** — SQLite + FTS5 + HRR, operational
- **Retrieval enforcer** — wired to SovereignStore as of 14 hours ago
- **Wake-up protocol** — palace-first boot sequence
- **PR reviewer playbook** — clear rules, well-scoped
- **Issue triager playbook** — comprehensive lane map with 11 agents
- **Cycle guard code** — solid 10-min slice discipline (just needs wiring)
- **Config drift guard** — active cron, working
- **Dead man switch** — active, working

---

## Recommendation

The memory system is not the bottleneck. The behaviour guardrails are. Specifically:

1. **Add `task_gate.py`** — pre-task and post-task quality gates that every agent loop calls
2. **Wire cycle_guard.py** — add start/complete calls to agent loop
3. **Wire pre-commit hooks** — deploy script should symlink on provision
4. **Fix orchestrator dispatch** — make it actually trigger work, not just log

This PR adds item 1. Items 2-4 need SSH access and are flagged for Timmy/Allegro.