# SOTA Research: Multi-Agent Coordination & Fleet Knowledge Graphs

**Date:** 2026-04-14  
**Scope:** Agent-to-agent communication, shared memory, task delegation, consensus protocols  
**Frameworks Analyzed:** CrewAI, AutoGen, MetaGPT, ChatDev, CAMEL

---

## 1. Architecture Pattern Summary

### 1.1 CrewAI — Role-Based Crew Orchestration

**Core Pattern:** Agents organized into "Crews" with explicit roles, goals, and backstories. Tasks are assigned to agents, executed via sequential or hierarchical process flows.

**Agent-to-Agent Communication:**
- **Sequential:** Agent A completes Task A → output injected into Task B's context for Agent B
- **Hierarchical:** Manager agent delegates to worker agents, collects results, synthesizes
- **Context passing:** Tasks can declare `context: [other_tasks]` — outputs from dependent tasks are automatically injected into the current task's prompt
- **No direct agent-to-agent messaging** — communication is mediated through task outputs

**Shared Memory (v2 — Unified Memory):**
- `Memory` class with `remember()` / `recall()` using vector embeddings (LanceDB/ChromaDB)
- **Scope-based isolation:** `MemoryScope` provides path-based namespacing (`/crew/research/agent-foo`)
- **Composite scoring:** semantic similarity (0.5) + recency (0.3) + importance (0.2)
- **RecallFlow:** LLM-driven deep recall with adaptive query expansion
- **Privacy flags:** Private memories only visible to the source that created them
- **Background saves:** ThreadPoolExecutor with write barrier (drain_writes before recall)

**Task Delegation:**
- Agent tools include `Delegate Work to Co-worker` and `Ask Question to Co-worker`
- Delegation creates a new task for another agent, results come back to delegator
- Depth-limited (no infinite delegation chains)

**State & Checkpointing:**
- `SqliteProvider` / `JsonProvider` for state checkpoint persistence
- `CheckpointConfig` with event-driven persistence
- Flow state is Pydantic models with serialization

**Cache:**
- Thread-safe in-memory tool result cache with RWLock
- Key: `{tool_name}-{input}` → cached output

### 1.2 AutoGen (Microsoft) — Conversation-Centric Teams

**Core Pattern:** Agents communicate through shared conversation threads. A "Group Chat Manager" controls turn-taking and speaker selection.

**Agent-to-Agent Communication:**
- **Shared message thread** — all agents see all messages (like a group chat)
- **Three team patterns:**
  - `RoundRobinGroupChat`: Fixed order cycling through participants
  - `SelectorGroupChat`: LLM-based speaker selection with candidate filtering
  - `SwarmGroupChat`: Handoff-based routing (agent sends HandoffMessage to next agent)
  - `GraphFlow` (DiGraph): DAG-based execution with conditional edges, parallel fan-out, loops
  - `MagenticOneOrchestrator`: Ledger-based orchestration with task planning, progress tracking, stall detection

**Shared State:**
- `ChatCompletionContext` — manages message history per agent (can be unbounded or windowed)
- `ModelContext` shared across agents in a team
- State serialization: `save_state()` / `load_state()` for all managers
- **No built-in vector memory** — context is purely conversational

**Task Delegation:**
- `Swarm`: Agents use `HandoffMessage` to explicitly route control
- `GraphFlow`: Conditional edges route based on message content (keyword or callable)
- `MagenticOne`: Orchestrator maintains a "task ledger" (facts + plan) and dynamically re-plans on stalls

**Consensus / Termination:**
- `TerminationCondition` — composable conditions (text match, max messages, source-based)
- No explicit consensus protocols — termination is manager-decided

**Key Insight:** AutoGen's `ChatCompletionContext` is the closest analog to shared memory, but it's purely sequential message history, not a knowledge base.

### 1.3 MetaGPT — SOP-Driven Software Teams

**Core Pattern:** Agents follow Standard Operating Procedures (SOPs). Each agent has a defined role (Product Manager, Architect, Engineer, QA) and produces structured artifacts.

**Agent-to-Agent Communication:**
- **Publish-Subscribe via Environment:** Agents publish "actions" to a shared Environment, subscribers react
- **Structured outputs:** Each role produces specific artifact types (PRD, design doc, code, test cases)
- **Message routing:** Environment acts as a message bus, filtering by subscriber interest

**Shared Memory:**
- `Environment` class maintains shared state (project workspace)
- File-based shared memory: agents write/read from a shared filesystem
- `SharedMemory` for cross-agent context (structured data, not free-form text)

**Task Delegation:**
- Implicit through SOP stages: PM → Architect → Engineer → QA
- Each agent's output is the next agent's input
- No dynamic re-delegation

**Consensus:**
- Sequential SOP execution (no parallel agents)
- QA agent can trigger re-work loops back to Engineer

### 1.4 ChatDev — Chat-Chain Software Development

**Core Pattern:** Agents follow a "chat chain" — a sequence of chat phases (designing, coding, testing, documenting). Each phase involves a pair of agents (CEO↔CTO, Programmer↔Reviewer, etc.).

**Agent-to-Agent Communication:**
- **Paired chat sessions:** Two agents communicate in each phase (role-play between instructor and assistant)
- **Chain propagation:** Phase N's output (code, design doc) becomes Phase N+1's input
- **No broadcast** — communication is strictly pairwise within phases

**Shared Memory:**
- Software-centric: shared code repository is the "memory"
- Each phase modifies/inherits the codebase
- No explicit vector memory or knowledge graph

**Task Delegation:**
- Hardcoded phase sequence: Design → Code → Test → Document
- Each phase delegates to a specific agent pair
- No dynamic task re-assignment

**Consensus:**
- Phase-level termination: when both agents agree the phase is complete
- "Thought" tokens for chain-of-thought within chat

### 1.5 CAMEL — Role-Playing & Workforce

**Core Pattern:** Two primary modes:
1. **RolePlaying:** Two-agent conversation with task specification and optional critic
2. **Workforce:** Multi-agent with coordinator, task planner, and worker pool

**Agent-to-Agent Communication:**
- **RolePlaying:** Structured turn-taking between assistant and user agents
- **Workforce:** Coordinator assigns tasks via `TaskChannel`, workers return results
- **Worker types:** `SingleAgentWorker` (single ChatAgent), `RolePlayingWorker` (two-agent pair)

**Shared Memory / Task Channel:**
- `TaskChannel` — async queue-based task dispatch with packet tracking
  - States: SENT → PROCESSING → RETURNED → ARCHIVED
  - O(1) lookup by task ID, status-based filtering, assignee/publisher queues
- `WorkflowMemoryManager` — persists workflow patterns as markdown files
  - Role-based organization: workflows stored by `role_identifier`
  - Agent-based intelligent selection: LLM picks relevant past workflows
  - Versioned: metadata tracks creation time and version numbers

**Task Delegation:**
- Coordinator agent decomposes complex tasks using LLM analysis
- Tasks assigned to workers based on capability matching
- Failed tasks trigger: retry, create new worker, or further decomposition
- `FailureHandlingConfig` with configurable `RecoveryStrategy`

**Consensus / Quality:**
- Quality evaluation via structured output (response format enforced)
- Task dependencies tracked (worker receives dependency tasks as context)
- `WorkforceMetrics` for tracking execution statistics

---

## 2. Key Architectural Patterns for Fleet Knowledge Graph

### 2.1 Communication Topology Patterns

| Pattern | Used By | Description |
|---------|---------|-------------|
| **Sequential Chain** | CrewAI, ChatDev, MetaGPT | A→B→C linear flow, output feeds next |
| **Shared Thread** | AutoGen | All agents see all messages |
| **Publish-Subscribe** | MetaGPT | Environment-based message bus |
| **Paired Chat** | ChatDev, CAMEL | Two-agent conversation pairs |
| **Handoff Routing** | AutoGen Swarm | Agent explicitly names next speaker |
| **DAG Graph** | AutoGen GraphFlow | Conditional edges, parallel, loops |
| **Ledger Orchestration** | AutoGen MagenticOne | Maintains task ledger, re-plans |
| **Task Channel** | CAMEL | Async queue with packet states |

### 2.2 Shared State Patterns

| Pattern | Used By | Description |
|---------|---------|-------------|
| **Vector Memory** | CrewAI | Embeddings + scope-based namespacing |
| **Message History** | AutoGen | Sequential conversation context |
| **File System** | MetaGPT, ChatDev | Agents read/write shared files |
| **Task Channel** | CAMEL | Async packet-based task dispatch |
| **Workflow Files** | CAMEL | Markdown-based workflow memory |
| **Tool Cache** | CrewAI | In-memory RWLock tool result cache |
| **State Checkpoint** | CrewAI, AutoGen | Serialized Pydantic/SQLite checkpoints |

### 2.3 Task Delegation Patterns

| Pattern | Used By | Description |
|---------|---------|-------------|
| **Role Assignment** | CrewAI | Fixed agent per task |
| **Manager Delegation** | CrewAI Hierarchical | Manager assigns tasks dynamically |
| **Speaker Selection** | AutoGen Selector | LLM picks next agent |
| **Handoff** | AutoGen Swarm | Agent explicitly transfers control |
| **SOP Routing** | MetaGPT | Stage-based implicit delegation |
| **Coordinator** | CAMEL Workforce | LLM-based task decomposition + assignment |
| **Dynamic Worker Creation** | CAMEL Workforce | Create new workers on failure |

### 2.4 Conflict Resolution Patterns

| Pattern | Used By | Description |
|---------|---------|-------------|
| **Manager Arbitration** | CrewAI Hierarchical | Manager resolves conflicts |
| **Critic-in-the-loop** | CAMEL | Critic agent evaluates and selects |
| **Quality Gate** | CAMEL Workforce | Structured quality evaluation |
| **Termination Conditions** | AutoGen | Composable stop conditions |
| **Stall Detection** | AutoGen MagenticOne | Re-plans when progress stalls |

---

## 3. Recommendations for Hermes Fleet Knowledge Graph

### 3.1 Architecture: Hybrid Graph + Memory

Based on the SOTA analysis, the optimal fleet knowledge graph should combine:

1. **CrewAI's scoped memory** for hierarchical knowledge organization
   - Path-based namespaces: `/fleet/{fleet_id}/agent/{agent_id}/diary`
   - Composite scoring: semantic + recency + importance
   - Background writes with read barriers

2. **CAMEL's TaskChannel** for task dispatch and tracking
   - Packet states (SENT → PROCESSING → RETURNED → ARCHIVED)
   - O(1) lookup by task ID
   - Assignee/publisher tracking

3. **AutoGen's DiGraph** for execution flow definition
   - DAG with conditional edges for complex workflows
   - Parallel fan-out for independent tasks
   - Activation conditions (all vs any) for synchronization points

4. **AutoGen MagenticOne's ledger** for shared task context
   - Maintained facts, plan, and progress ledger
   - Dynamic re-planning on stalls

### 3.2 Fleet Knowledge Graph Schema

```
/fleet/{fleet_id}/
  ├── shared/              # Shared knowledge (all agents read)
  │   ├── facts/           # Known facts, constraints
  │   ├── decisions/       # Record of decisions made
  │   └── context/         # Active task context
  ├── agent/{agent_id}/
  │   ├── diary/           # Agent's personal experience log
  │   ├── capabilities/    # What this agent can do
  │   └── state/           # Current task state
  ├── tasks/
  │   ├── {task_id}/       # Task metadata, dependencies, status
  │   └── graph/           # DAG definition for task dependencies
  └── consensus/
      ├── proposals/       # Pending proposals
      └── decisions/       # Resolved consensus decisions
```

### 3.3 Key Design Decisions

1. **Diary System (Agent Memory):**
   - Each agent writes to its own scoped memory after every significant action
   - LLM-analyzed importance scoring (like CrewAI's unified memory)
   - Cross-agent recall: agents can query other agents' diaries for relevant experiences
   - Decay: old low-importance memories expire

2. **Shared State (Fleet Knowledge):**
   - SQLite-backed (like Hermes' existing `state.db`) with FTS5 search
   - Hierarchical scopes (like CrewAI's MemoryScope)
   - Write-ahead log for concurrent access
   - Read barriers before queries (like CrewAI's `drain_writes`)

3. **Task Delegation:**
   - Coordinator pattern (like CAMEL's Workforce)
   - Task decomposition via LLM
   - Failed task → retry, reassign, or decompose
   - Max depth limit (like Hermes' existing MAX_DEPTH=2)

4. **Consensus Protocol:**
   - Proposal-based: agent proposes, others vote/acknowledge
   - Timeout-based fallback: if no response within N seconds, proceed
   - Manager override: designated manager can break ties
   - Simple majority for non-critical, unanimity for critical decisions

5. **Conflict Resolution:**
   - Last-write-wins for non-critical state
   - Optimistic locking with version numbers
   - Manager arbitration for task assignment conflicts
   - Quality gates (like CAMEL) for output validation

### 3.4 Integration with Existing Hermes Architecture

Hermes already has strong foundations:
- **Delegation system** (`delegate_tool.py`): Isolated child agents, parallel execution, depth limits
- **State DB** (`hermes_state.py`): SQLite + FTS5, WAL mode, session tracking, message history
- **Credential pools**: Shared credentials with rotation

The fleet knowledge graph should extend these patterns:
- **Session DB → Fleet DB:** Add tables for fleet metadata, agent registrations, task graphs
- **Memory tool → Fleet Memory:** Scoped vector memory shared across fleet agents
- **Delegate tool → Fleet Delegation:** Task channel with persistence, quality evaluation
- **New: Consensus module:** Proposal/vote protocol with timeout handling

---

## 4. Reference Implementations

| Component | Best Reference | Key Takeaway |
|-----------|---------------|--------------|
| Scoped Memory | CrewAI `Memory` + `MemoryScope` | Path-based namespaces, composite scoring, background writes |
| Task Dispatch | CAMEL `TaskChannel` | Packet-based with state machine, O(1) lookup |
| Execution DAG | AutoGen `DiGraphBuilder` | Fluent builder, conditional edges, activation groups |
| Orchestration | AutoGen `MagenticOneOrchestrator` | Ledger-based planning, stall detection, re-planning |
| Agent Communication | AutoGen `SelectorGroupChat` | LLM-based speaker selection, shared message thread |
| Quality Evaluation | CAMEL Workforce | Structured output for quality scoring |
| Workflow Memory | CAMEL `WorkflowMemoryManager` | Markdown-based, role-organized, versioned |
| State Checkpoint | CrewAI `SqliteProvider` | JSONB checkpoints, WAL mode |
| Tool Cache | CrewAI `CacheHandler` | RWLock-based concurrent tool result cache |

---

## 5. Open Questions

1. **Graph vs Vector for knowledge:** Should fleet knowledge use a proper graph DB (e.g., Neo4j) or stick with vector + SQLite?
   - Recommendation: Start with SQLite + vectors (existing stack), add graph later if needed

2. **Real-time vs Batch:** Should agents receive updates in real-time or batched?
   - Recommendation: Event-driven for critical updates, batched for diary entries

3. **Security model:** How should cross-agent access be controlled?
   - Recommendation: Role-based ACLs on scope paths, similar to CrewAI's privacy flags

4. **Scalability:** How many agents can a single fleet support?
   - Recommendation: Start with 10-agent fleets, optimize SQLite concurrency first