# SOTA Research: Multi-Agent Coordination & Fleet Knowledge Graphs **Date:** 2026-04-14 **Scope:** Agent-to-agent communication, shared memory, task delegation, consensus protocols **Frameworks Analyzed:** CrewAI, AutoGen, MetaGPT, ChatDev, CAMEL --- ## 1. Architecture Pattern Summary ### 1.1 CrewAI — Role-Based Crew Orchestration **Core Pattern:** Agents organized into "Crews" with explicit roles, goals, and backstories. Tasks are assigned to agents, executed via sequential or hierarchical process flows. **Agent-to-Agent Communication:** - **Sequential:** Agent A completes Task A → output injected into Task B's context for Agent B - **Hierarchical:** Manager agent delegates to worker agents, collects results, synthesizes - **Context passing:** Tasks can declare `context: [other_tasks]` — outputs from dependent tasks are automatically injected into the current task's prompt - **No direct agent-to-agent messaging** — communication is mediated through task outputs **Shared Memory (v2 — Unified Memory):** - `Memory` class with `remember()` / `recall()` using vector embeddings (LanceDB/ChromaDB) - **Scope-based isolation:** `MemoryScope` provides path-based namespacing (`/crew/research/agent-foo`) - **Composite scoring:** semantic similarity (0.5) + recency (0.3) + importance (0.2) - **RecallFlow:** LLM-driven deep recall with adaptive query expansion - **Privacy flags:** Private memories only visible to the source that created them - **Background saves:** ThreadPoolExecutor with write barrier (drain_writes before recall) **Task Delegation:** - Agent tools include `Delegate Work to Co-worker` and `Ask Question to Co-worker` - Delegation creates a new task for another agent, results come back to delegator - Depth-limited (no infinite delegation chains) **State & Checkpointing:** - `SqliteProvider` / `JsonProvider` for state checkpoint persistence - `CheckpointConfig` with event-driven persistence - Flow state is Pydantic models with serialization **Cache:** - Thread-safe in-memory tool result cache with RWLock - Key: `{tool_name}-{input}` → cached output ### 1.2 AutoGen (Microsoft) — Conversation-Centric Teams **Core Pattern:** Agents communicate through shared conversation threads. A "Group Chat Manager" controls turn-taking and speaker selection. **Agent-to-Agent Communication:** - **Shared message thread** — all agents see all messages (like a group chat) - **Three team patterns:** - `RoundRobinGroupChat`: Fixed order cycling through participants - `SelectorGroupChat`: LLM-based speaker selection with candidate filtering - `SwarmGroupChat`: Handoff-based routing (agent sends HandoffMessage to next agent) - `GraphFlow` (DiGraph): DAG-based execution with conditional edges, parallel fan-out, loops - `MagenticOneOrchestrator`: Ledger-based orchestration with task planning, progress tracking, stall detection **Shared State:** - `ChatCompletionContext` — manages message history per agent (can be unbounded or windowed) - `ModelContext` shared across agents in a team - State serialization: `save_state()` / `load_state()` for all managers - **No built-in vector memory** — context is purely conversational **Task Delegation:** - `Swarm`: Agents use `HandoffMessage` to explicitly route control - `GraphFlow`: Conditional edges route based on message content (keyword or callable) - `MagenticOne`: Orchestrator maintains a "task ledger" (facts + plan) and dynamically re-plans on stalls **Consensus / Termination:** - `TerminationCondition` — composable conditions (text match, max messages, source-based) - No explicit consensus protocols — termination is manager-decided **Key Insight:** AutoGen's `ChatCompletionContext` is the closest analog to shared memory, but it's purely sequential message history, not a knowledge base. ### 1.3 MetaGPT — SOP-Driven Software Teams **Core Pattern:** Agents follow Standard Operating Procedures (SOPs). Each agent has a defined role (Product Manager, Architect, Engineer, QA) and produces structured artifacts. **Agent-to-Agent Communication:** - **Publish-Subscribe via Environment:** Agents publish "actions" to a shared Environment, subscribers react - **Structured outputs:** Each role produces specific artifact types (PRD, design doc, code, test cases) - **Message routing:** Environment acts as a message bus, filtering by subscriber interest **Shared Memory:** - `Environment` class maintains shared state (project workspace) - File-based shared memory: agents write/read from a shared filesystem - `SharedMemory` for cross-agent context (structured data, not free-form text) **Task Delegation:** - Implicit through SOP stages: PM → Architect → Engineer → QA - Each agent's output is the next agent's input - No dynamic re-delegation **Consensus:** - Sequential SOP execution (no parallel agents) - QA agent can trigger re-work loops back to Engineer ### 1.4 ChatDev — Chat-Chain Software Development **Core Pattern:** Agents follow a "chat chain" — a sequence of chat phases (designing, coding, testing, documenting). Each phase involves a pair of agents (CEO↔CTO, Programmer↔Reviewer, etc.). **Agent-to-Agent Communication:** - **Paired chat sessions:** Two agents communicate in each phase (role-play between instructor and assistant) - **Chain propagation:** Phase N's output (code, design doc) becomes Phase N+1's input - **No broadcast** — communication is strictly pairwise within phases **Shared Memory:** - Software-centric: shared code repository is the "memory" - Each phase modifies/inherits the codebase - No explicit vector memory or knowledge graph **Task Delegation:** - Hardcoded phase sequence: Design → Code → Test → Document - Each phase delegates to a specific agent pair - No dynamic task re-assignment **Consensus:** - Phase-level termination: when both agents agree the phase is complete - "Thought" tokens for chain-of-thought within chat ### 1.5 CAMEL — Role-Playing & Workforce **Core Pattern:** Two primary modes: 1. **RolePlaying:** Two-agent conversation with task specification and optional critic 2. **Workforce:** Multi-agent with coordinator, task planner, and worker pool **Agent-to-Agent Communication:** - **RolePlaying:** Structured turn-taking between assistant and user agents - **Workforce:** Coordinator assigns tasks via `TaskChannel`, workers return results - **Worker types:** `SingleAgentWorker` (single ChatAgent), `RolePlayingWorker` (two-agent pair) **Shared Memory / Task Channel:** - `TaskChannel` — async queue-based task dispatch with packet tracking - States: SENT → PROCESSING → RETURNED → ARCHIVED - O(1) lookup by task ID, status-based filtering, assignee/publisher queues - `WorkflowMemoryManager` — persists workflow patterns as markdown files - Role-based organization: workflows stored by `role_identifier` - Agent-based intelligent selection: LLM picks relevant past workflows - Versioned: metadata tracks creation time and version numbers **Task Delegation:** - Coordinator agent decomposes complex tasks using LLM analysis - Tasks assigned to workers based on capability matching - Failed tasks trigger: retry, create new worker, or further decomposition - `FailureHandlingConfig` with configurable `RecoveryStrategy` **Consensus / Quality:** - Quality evaluation via structured output (response format enforced) - Task dependencies tracked (worker receives dependency tasks as context) - `WorkforceMetrics` for tracking execution statistics --- ## 2. Key Architectural Patterns for Fleet Knowledge Graph ### 2.1 Communication Topology Patterns | Pattern | Used By | Description | |---------|---------|-------------| | **Sequential Chain** | CrewAI, ChatDev, MetaGPT | A→B→C linear flow, output feeds next | | **Shared Thread** | AutoGen | All agents see all messages | | **Publish-Subscribe** | MetaGPT | Environment-based message bus | | **Paired Chat** | ChatDev, CAMEL | Two-agent conversation pairs | | **Handoff Routing** | AutoGen Swarm | Agent explicitly names next speaker | | **DAG Graph** | AutoGen GraphFlow | Conditional edges, parallel, loops | | **Ledger Orchestration** | AutoGen MagenticOne | Maintains task ledger, re-plans | | **Task Channel** | CAMEL | Async queue with packet states | ### 2.2 Shared State Patterns | Pattern | Used By | Description | |---------|---------|-------------| | **Vector Memory** | CrewAI | Embeddings + scope-based namespacing | | **Message History** | AutoGen | Sequential conversation context | | **File System** | MetaGPT, ChatDev | Agents read/write shared files | | **Task Channel** | CAMEL | Async packet-based task dispatch | | **Workflow Files** | CAMEL | Markdown-based workflow memory | | **Tool Cache** | CrewAI | In-memory RWLock tool result cache | | **State Checkpoint** | CrewAI, AutoGen | Serialized Pydantic/SQLite checkpoints | ### 2.3 Task Delegation Patterns | Pattern | Used By | Description | |---------|---------|-------------| | **Role Assignment** | CrewAI | Fixed agent per task | | **Manager Delegation** | CrewAI Hierarchical | Manager assigns tasks dynamically | | **Speaker Selection** | AutoGen Selector | LLM picks next agent | | **Handoff** | AutoGen Swarm | Agent explicitly transfers control | | **SOP Routing** | MetaGPT | Stage-based implicit delegation | | **Coordinator** | CAMEL Workforce | LLM-based task decomposition + assignment | | **Dynamic Worker Creation** | CAMEL Workforce | Create new workers on failure | ### 2.4 Conflict Resolution Patterns | Pattern | Used By | Description | |---------|---------|-------------| | **Manager Arbitration** | CrewAI Hierarchical | Manager resolves conflicts | | **Critic-in-the-loop** | CAMEL | Critic agent evaluates and selects | | **Quality Gate** | CAMEL Workforce | Structured quality evaluation | | **Termination Conditions** | AutoGen | Composable stop conditions | | **Stall Detection** | AutoGen MagenticOne | Re-plans when progress stalls | --- ## 3. Recommendations for Hermes Fleet Knowledge Graph ### 3.1 Architecture: Hybrid Graph + Memory Based on the SOTA analysis, the optimal fleet knowledge graph should combine: 1. **CrewAI's scoped memory** for hierarchical knowledge organization - Path-based namespaces: `/fleet/{fleet_id}/agent/{agent_id}/diary` - Composite scoring: semantic + recency + importance - Background writes with read barriers 2. **CAMEL's TaskChannel** for task dispatch and tracking - Packet states (SENT → PROCESSING → RETURNED → ARCHIVED) - O(1) lookup by task ID - Assignee/publisher tracking 3. **AutoGen's DiGraph** for execution flow definition - DAG with conditional edges for complex workflows - Parallel fan-out for independent tasks - Activation conditions (all vs any) for synchronization points 4. **AutoGen MagenticOne's ledger** for shared task context - Maintained facts, plan, and progress ledger - Dynamic re-planning on stalls ### 3.2 Fleet Knowledge Graph Schema ``` /fleet/{fleet_id}/ ├── shared/ # Shared knowledge (all agents read) │ ├── facts/ # Known facts, constraints │ ├── decisions/ # Record of decisions made │ └── context/ # Active task context ├── agent/{agent_id}/ │ ├── diary/ # Agent's personal experience log │ ├── capabilities/ # What this agent can do │ └── state/ # Current task state ├── tasks/ │ ├── {task_id}/ # Task metadata, dependencies, status │ └── graph/ # DAG definition for task dependencies └── consensus/ ├── proposals/ # Pending proposals └── decisions/ # Resolved consensus decisions ``` ### 3.3 Key Design Decisions 1. **Diary System (Agent Memory):** - Each agent writes to its own scoped memory after every significant action - LLM-analyzed importance scoring (like CrewAI's unified memory) - Cross-agent recall: agents can query other agents' diaries for relevant experiences - Decay: old low-importance memories expire 2. **Shared State (Fleet Knowledge):** - SQLite-backed (like Hermes' existing `state.db`) with FTS5 search - Hierarchical scopes (like CrewAI's MemoryScope) - Write-ahead log for concurrent access - Read barriers before queries (like CrewAI's `drain_writes`) 3. **Task Delegation:** - Coordinator pattern (like CAMEL's Workforce) - Task decomposition via LLM - Failed task → retry, reassign, or decompose - Max depth limit (like Hermes' existing MAX_DEPTH=2) 4. **Consensus Protocol:** - Proposal-based: agent proposes, others vote/acknowledge - Timeout-based fallback: if no response within N seconds, proceed - Manager override: designated manager can break ties - Simple majority for non-critical, unanimity for critical decisions 5. **Conflict Resolution:** - Last-write-wins for non-critical state - Optimistic locking with version numbers - Manager arbitration for task assignment conflicts - Quality gates (like CAMEL) for output validation ### 3.4 Integration with Existing Hermes Architecture Hermes already has strong foundations: - **Delegation system** (`delegate_tool.py`): Isolated child agents, parallel execution, depth limits - **State DB** (`hermes_state.py`): SQLite + FTS5, WAL mode, session tracking, message history - **Credential pools**: Shared credentials with rotation The fleet knowledge graph should extend these patterns: - **Session DB → Fleet DB:** Add tables for fleet metadata, agent registrations, task graphs - **Memory tool → Fleet Memory:** Scoped vector memory shared across fleet agents - **Delegate tool → Fleet Delegation:** Task channel with persistence, quality evaluation - **New: Consensus module:** Proposal/vote protocol with timeout handling --- ## 4. Reference Implementations | Component | Best Reference | Key Takeaway | |-----------|---------------|--------------| | Scoped Memory | CrewAI `Memory` + `MemoryScope` | Path-based namespaces, composite scoring, background writes | | Task Dispatch | CAMEL `TaskChannel` | Packet-based with state machine, O(1) lookup | | Execution DAG | AutoGen `DiGraphBuilder` | Fluent builder, conditional edges, activation groups | | Orchestration | AutoGen `MagenticOneOrchestrator` | Ledger-based planning, stall detection, re-planning | | Agent Communication | AutoGen `SelectorGroupChat` | LLM-based speaker selection, shared message thread | | Quality Evaluation | CAMEL Workforce | Structured output for quality scoring | | Workflow Memory | CAMEL `WorkflowMemoryManager` | Markdown-based, role-organized, versioned | | State Checkpoint | CrewAI `SqliteProvider` | JSONB checkpoints, WAL mode | | Tool Cache | CrewAI `CacheHandler` | RWLock-based concurrent tool result cache | --- ## 5. Open Questions 1. **Graph vs Vector for knowledge:** Should fleet knowledge use a proper graph DB (e.g., Neo4j) or stick with vector + SQLite? - Recommendation: Start with SQLite + vectors (existing stack), add graph later if needed 2. **Real-time vs Batch:** Should agents receive updates in real-time or batched? - Recommendation: Event-driven for critical updates, batched for diary entries 3. **Security model:** How should cross-agent access be controlled? - Recommendation: Role-based ACLs on scope paths, similar to CrewAI's privacy flags 4. **Scalability:** How many agents can a single fleet support? - Recommendation: Start with 10-agent fleets, optimize SQLite concurrency first