Fleet Behaviour Hardening — Review & Action Plan

Author: @perplexity
Date: 2026-04-08
Context: Alexander asked: "Is it the memory system or the behaviour guardrails?"
Answer: It's the guardrails. The memory system is adequate. The enforcement machinery is aspirational.

Diagnosis: Why the Fleet Isn't Smart Enough

After auditing SOUL.md, config.yaml, all 8 playbooks, the orchestrator, the guard scripts, and the v7.0.0 checkin, the pattern is clear:

The fleet has excellent design documents and broken enforcement.

Layer	Design Quality	Enforcement Quality	Gap
SOUL.md	Excellent	None — no code reads it at runtime	Philosophy without machinery
Playbooks (7 yaml)	Good lane map	Not invoked by orchestrator	Playbooks exist but nobody calls them
Guard scripts (9)	Solid code	1 of 9 wired (#395 audit)	89% of guards are dead code
Orchestrator	Sound design	Gateway dispatch is a no-op (#391)	Assigns issues but doesn't trigger work
Cycle Guard	Good 10-min rule	No cron/loop calls it	Discipline without enforcement
PR Reviewer	Clear rules	Runs every 30m (if scheduled)	Only guard that might actually fire
Memory (MemPalace)	Working code	Retrieval enforcer wired	Actually operational

The Core Problem

Agents pick up issues and produce output, but there is no pre-task checklist and no post-task quality gate. An agent can:

Start work without checking if someone else already did it
Produce output without running tests
Submit a PR without verifying it addresses the issue
Work for hours on something out of scope
Create duplicate branches/PRs without detection

The SOUL.md says "grounding before generation" but no code enforces it.
The playbooks define lanes but the orchestrator doesn't load them.
The guards exist but nothing calls them.

What the Fleet Needs (Priority Order)

1. Pre-Task Gate (MISSING — this PR adds it)

Before an agent starts any issue:

Check if issue is already assigned to another agent
Check if a branch already exists for this issue
Check if a PR already exists for this issue
Load relevant MemPalace context (retrieval enforcer)
Verify the agent has the right lane for this work (playbook check)

2. Post-Task Gate (MISSING — this PR adds it)

Before an agent submits a PR:

Verify the diff addresses the issue title/body
Run syntax_guard.py on changed files
Check for duplicate PRs targeting the same issue
Verify branch name follows convention
Run tests if they exist for changed files

3. Wire the Existing Guards (8 of 9 are dead code)

Per #395 audit:

Pre-commit hooks: need symlink on every machine
Cycle guard: need cron/loop integration
Forge health check: need cron entry
Smoke test + deploy validate: need deploy script integration

4. Orchestrator Dispatch Actually Works

Per #391 audit: the orchestrator scores and assigns but the gateway dispatch just writes to /tmp/hermes-dispatch.log. Nobody reads that file. The dispatch needs to either:

Trigger hermes CLI on the target machine, or
Post a webhook that the agent loop picks up

5. Agent Self-Assessment Loop

After completing work, agents should answer:

Did I address the issue as stated?
Did I stay in scope?
Did I check the palace for prior work?
Did I run verification?

This is what SOUL.md calls "the apparatus that gives these words teeth."

What's Working (Don't Touch)

MemPalace sovereign_store.py — SQLite + FTS5 + HRR, operational
Retrieval enforcer — wired to SovereignStore as of 14 hours ago
Wake-up protocol — palace-first boot sequence
PR reviewer playbook — clear rules, well-scoped
Issue triager playbook — comprehensive lane map with 11 agents
Cycle guard code — solid 10-min slice discipline (just needs wiring)
Config drift guard — active cron, working
Dead man switch — active, working

Recommendation

The memory system is not the bottleneck. The behaviour guardrails are. Specifically:

Add task_gate.py — pre-task and post-task quality gates that every agent loop calls
Wire cycle_guard.py — add start/complete calls to agent loop
Wire pre-commit hooks — deploy script should symlink on provision
Fix orchestrator dispatch — make it actually trigger work, not just log

This PR adds item 1. Items 2-4 need SSH access and are flagged for Timmy/Allegro.

4.5 KiB Raw Blame History