111 lines
4.5 KiB
Markdown
111 lines
4.5 KiB
Markdown
# Fleet Behaviour Hardening — Review & Action Plan
|
|
|
|
**Author:** @perplexity
|
|
**Date:** 2026-04-08
|
|
**Context:** Alexander asked: "Is it the memory system or the behaviour guardrails?"
|
|
**Answer:** It's the guardrails. The memory system is adequate. The enforcement machinery is aspirational.
|
|
|
|
---
|
|
|
|
## Diagnosis: Why the Fleet Isn't Smart Enough
|
|
|
|
After auditing SOUL.md, config.yaml, all 8 playbooks, the orchestrator, the guard scripts, and the v7.0.0 checkin, the pattern is clear:
|
|
|
|
**The fleet has excellent design documents and broken enforcement.**
|
|
|
|
| Layer | Design Quality | Enforcement Quality | Gap |
|
|
|---|---|---|---|
|
|
| SOUL.md | Excellent | None — no code reads it at runtime | Philosophy without machinery |
|
|
| Playbooks (7 yaml) | Good lane map | Not invoked by orchestrator | Playbooks exist but nobody calls them |
|
|
| Guard scripts (9) | Solid code | 1 of 9 wired (#395 audit) | 89% of guards are dead code |
|
|
| Orchestrator | Sound design | Gateway dispatch is a no-op (#391) | Assigns issues but doesn't trigger work |
|
|
| Cycle Guard | Good 10-min rule | No cron/loop calls it | Discipline without enforcement |
|
|
| PR Reviewer | Clear rules | Runs every 30m (if scheduled) | Only guard that might actually fire |
|
|
| Memory (MemPalace) | Working code | Retrieval enforcer wired | Actually operational |
|
|
|
|
### The Core Problem
|
|
|
|
Agents pick up issues and produce output, but there is **no pre-task checklist** and **no post-task quality gate**. An agent can:
|
|
|
|
1. Start work without checking if someone else already did it
|
|
2. Produce output without running tests
|
|
3. Submit a PR without verifying it addresses the issue
|
|
4. Work for hours on something out of scope
|
|
5. Create duplicate branches/PRs without detection
|
|
|
|
The SOUL.md says "grounding before generation" but no code enforces it.
|
|
The playbooks define lanes but the orchestrator doesn't load them.
|
|
The guards exist but nothing calls them.
|
|
|
|
---
|
|
|
|
## What the Fleet Needs (Priority Order)
|
|
|
|
### 1. Pre-Task Gate (MISSING — this PR adds it)
|
|
|
|
Before an agent starts any issue:
|
|
- [ ] Check if issue is already assigned to another agent
|
|
- [ ] Check if a branch already exists for this issue
|
|
- [ ] Check if a PR already exists for this issue
|
|
- [ ] Load relevant MemPalace context (retrieval enforcer)
|
|
- [ ] Verify the agent has the right lane for this work (playbook check)
|
|
|
|
### 2. Post-Task Gate (MISSING — this PR adds it)
|
|
|
|
Before an agent submits a PR:
|
|
- [ ] Verify the diff addresses the issue title/body
|
|
- [ ] Run syntax_guard.py on changed files
|
|
- [ ] Check for duplicate PRs targeting the same issue
|
|
- [ ] Verify branch name follows convention
|
|
- [ ] Run tests if they exist for changed files
|
|
|
|
### 3. Wire the Existing Guards (8 of 9 are dead code)
|
|
|
|
Per #395 audit:
|
|
- Pre-commit hooks: need symlink on every machine
|
|
- Cycle guard: need cron/loop integration
|
|
- Forge health check: need cron entry
|
|
- Smoke test + deploy validate: need deploy script integration
|
|
|
|
### 4. Orchestrator Dispatch Actually Works
|
|
|
|
Per #391 audit: the orchestrator scores and assigns but the gateway dispatch just writes to `/tmp/hermes-dispatch.log`. Nobody reads that file. The dispatch needs to either:
|
|
- Trigger `hermes` CLI on the target machine, or
|
|
- Post a webhook that the agent loop picks up
|
|
|
|
### 5. Agent Self-Assessment Loop
|
|
|
|
After completing work, agents should answer:
|
|
- Did I address the issue as stated?
|
|
- Did I stay in scope?
|
|
- Did I check the palace for prior work?
|
|
- Did I run verification?
|
|
|
|
This is what SOUL.md calls "the apparatus that gives these words teeth."
|
|
|
|
---
|
|
|
|
## What's Working (Don't Touch)
|
|
|
|
- **MemPalace sovereign_store.py** — SQLite + FTS5 + HRR, operational
|
|
- **Retrieval enforcer** — wired to SovereignStore as of 14 hours ago
|
|
- **Wake-up protocol** — palace-first boot sequence
|
|
- **PR reviewer playbook** — clear rules, well-scoped
|
|
- **Issue triager playbook** — comprehensive lane map with 11 agents
|
|
- **Cycle guard code** — solid 10-min slice discipline (just needs wiring)
|
|
- **Config drift guard** — active cron, working
|
|
- **Dead man switch** — active, working
|
|
|
|
---
|
|
|
|
## Recommendation
|
|
|
|
The memory system is not the bottleneck. The behaviour guardrails are. Specifically:
|
|
|
|
1. **Add `task_gate.py`** — pre-task and post-task quality gates that every agent loop calls
|
|
2. **Wire cycle_guard.py** — add start/complete calls to agent loop
|
|
3. **Wire pre-commit hooks** — deploy script should symlink on provision
|
|
4. **Fix orchestrator dispatch** — make it actually trigger work, not just log
|
|
|
|
This PR adds item 1. Items 2-4 need SSH access and are flagged for Timmy/Allegro.
|