325 lines
15 KiB
Markdown
325 lines
15 KiB
Markdown
|
|
# SOTA Research: Multi-Agent Coordination & Fleet Knowledge Graphs
|
||
|
|
|
||
|
|
**Date:** 2026-04-14
|
||
|
|
**Scope:** Agent-to-agent communication, shared memory, task delegation, consensus protocols
|
||
|
|
**Frameworks Analyzed:** CrewAI, AutoGen, MetaGPT, ChatDev, CAMEL
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 1. Architecture Pattern Summary
|
||
|
|
|
||
|
|
### 1.1 CrewAI — Role-Based Crew Orchestration
|
||
|
|
|
||
|
|
**Core Pattern:** Agents organized into "Crews" with explicit roles, goals, and backstories. Tasks are assigned to agents, executed via sequential or hierarchical process flows.
|
||
|
|
|
||
|
|
**Agent-to-Agent Communication:**
|
||
|
|
- **Sequential:** Agent A completes Task A → output injected into Task B's context for Agent B
|
||
|
|
- **Hierarchical:** Manager agent delegates to worker agents, collects results, synthesizes
|
||
|
|
- **Context passing:** Tasks can declare `context: [other_tasks]` — outputs from dependent tasks are automatically injected into the current task's prompt
|
||
|
|
- **No direct agent-to-agent messaging** — communication is mediated through task outputs
|
||
|
|
|
||
|
|
**Shared Memory (v2 — Unified Memory):**
|
||
|
|
- `Memory` class with `remember()` / `recall()` using vector embeddings (LanceDB/ChromaDB)
|
||
|
|
- **Scope-based isolation:** `MemoryScope` provides path-based namespacing (`/crew/research/agent-foo`)
|
||
|
|
- **Composite scoring:** semantic similarity (0.5) + recency (0.3) + importance (0.2)
|
||
|
|
- **RecallFlow:** LLM-driven deep recall with adaptive query expansion
|
||
|
|
- **Privacy flags:** Private memories only visible to the source that created them
|
||
|
|
- **Background saves:** ThreadPoolExecutor with write barrier (drain_writes before recall)
|
||
|
|
|
||
|
|
**Task Delegation:**
|
||
|
|
- Agent tools include `Delegate Work to Co-worker` and `Ask Question to Co-worker`
|
||
|
|
- Delegation creates a new task for another agent, results come back to delegator
|
||
|
|
- Depth-limited (no infinite delegation chains)
|
||
|
|
|
||
|
|
**State & Checkpointing:**
|
||
|
|
- `SqliteProvider` / `JsonProvider` for state checkpoint persistence
|
||
|
|
- `CheckpointConfig` with event-driven persistence
|
||
|
|
- Flow state is Pydantic models with serialization
|
||
|
|
|
||
|
|
**Cache:**
|
||
|
|
- Thread-safe in-memory tool result cache with RWLock
|
||
|
|
- Key: `{tool_name}-{input}` → cached output
|
||
|
|
|
||
|
|
### 1.2 AutoGen (Microsoft) — Conversation-Centric Teams
|
||
|
|
|
||
|
|
**Core Pattern:** Agents communicate through shared conversation threads. A "Group Chat Manager" controls turn-taking and speaker selection.
|
||
|
|
|
||
|
|
**Agent-to-Agent Communication:**
|
||
|
|
- **Shared message thread** — all agents see all messages (like a group chat)
|
||
|
|
- **Three team patterns:**
|
||
|
|
- `RoundRobinGroupChat`: Fixed order cycling through participants
|
||
|
|
- `SelectorGroupChat`: LLM-based speaker selection with candidate filtering
|
||
|
|
- `SwarmGroupChat`: Handoff-based routing (agent sends HandoffMessage to next agent)
|
||
|
|
- `GraphFlow` (DiGraph): DAG-based execution with conditional edges, parallel fan-out, loops
|
||
|
|
- `MagenticOneOrchestrator`: Ledger-based orchestration with task planning, progress tracking, stall detection
|
||
|
|
|
||
|
|
**Shared State:**
|
||
|
|
- `ChatCompletionContext` — manages message history per agent (can be unbounded or windowed)
|
||
|
|
- `ModelContext` shared across agents in a team
|
||
|
|
- State serialization: `save_state()` / `load_state()` for all managers
|
||
|
|
- **No built-in vector memory** — context is purely conversational
|
||
|
|
|
||
|
|
**Task Delegation:**
|
||
|
|
- `Swarm`: Agents use `HandoffMessage` to explicitly route control
|
||
|
|
- `GraphFlow`: Conditional edges route based on message content (keyword or callable)
|
||
|
|
- `MagenticOne`: Orchestrator maintains a "task ledger" (facts + plan) and dynamically re-plans on stalls
|
||
|
|
|
||
|
|
**Consensus / Termination:**
|
||
|
|
- `TerminationCondition` — composable conditions (text match, max messages, source-based)
|
||
|
|
- No explicit consensus protocols — termination is manager-decided
|
||
|
|
|
||
|
|
**Key Insight:** AutoGen's `ChatCompletionContext` is the closest analog to shared memory, but it's purely sequential message history, not a knowledge base.
|
||
|
|
|
||
|
|
### 1.3 MetaGPT — SOP-Driven Software Teams
|
||
|
|
|
||
|
|
**Core Pattern:** Agents follow Standard Operating Procedures (SOPs). Each agent has a defined role (Product Manager, Architect, Engineer, QA) and produces structured artifacts.
|
||
|
|
|
||
|
|
**Agent-to-Agent Communication:**
|
||
|
|
- **Publish-Subscribe via Environment:** Agents publish "actions" to a shared Environment, subscribers react
|
||
|
|
- **Structured outputs:** Each role produces specific artifact types (PRD, design doc, code, test cases)
|
||
|
|
- **Message routing:** Environment acts as a message bus, filtering by subscriber interest
|
||
|
|
|
||
|
|
**Shared Memory:**
|
||
|
|
- `Environment` class maintains shared state (project workspace)
|
||
|
|
- File-based shared memory: agents write/read from a shared filesystem
|
||
|
|
- `SharedMemory` for cross-agent context (structured data, not free-form text)
|
||
|
|
|
||
|
|
**Task Delegation:**
|
||
|
|
- Implicit through SOP stages: PM → Architect → Engineer → QA
|
||
|
|
- Each agent's output is the next agent's input
|
||
|
|
- No dynamic re-delegation
|
||
|
|
|
||
|
|
**Consensus:**
|
||
|
|
- Sequential SOP execution (no parallel agents)
|
||
|
|
- QA agent can trigger re-work loops back to Engineer
|
||
|
|
|
||
|
|
### 1.4 ChatDev — Chat-Chain Software Development
|
||
|
|
|
||
|
|
**Core Pattern:** Agents follow a "chat chain" — a sequence of chat phases (designing, coding, testing, documenting). Each phase involves a pair of agents (CEO↔CTO, Programmer↔Reviewer, etc.).
|
||
|
|
|
||
|
|
**Agent-to-Agent Communication:**
|
||
|
|
- **Paired chat sessions:** Two agents communicate in each phase (role-play between instructor and assistant)
|
||
|
|
- **Chain propagation:** Phase N's output (code, design doc) becomes Phase N+1's input
|
||
|
|
- **No broadcast** — communication is strictly pairwise within phases
|
||
|
|
|
||
|
|
**Shared Memory:**
|
||
|
|
- Software-centric: shared code repository is the "memory"
|
||
|
|
- Each phase modifies/inherits the codebase
|
||
|
|
- No explicit vector memory or knowledge graph
|
||
|
|
|
||
|
|
**Task Delegation:**
|
||
|
|
- Hardcoded phase sequence: Design → Code → Test → Document
|
||
|
|
- Each phase delegates to a specific agent pair
|
||
|
|
- No dynamic task re-assignment
|
||
|
|
|
||
|
|
**Consensus:**
|
||
|
|
- Phase-level termination: when both agents agree the phase is complete
|
||
|
|
- "Thought" tokens for chain-of-thought within chat
|
||
|
|
|
||
|
|
### 1.5 CAMEL — Role-Playing & Workforce
|
||
|
|
|
||
|
|
**Core Pattern:** Two primary modes:
|
||
|
|
1. **RolePlaying:** Two-agent conversation with task specification and optional critic
|
||
|
|
2. **Workforce:** Multi-agent with coordinator, task planner, and worker pool
|
||
|
|
|
||
|
|
**Agent-to-Agent Communication:**
|
||
|
|
- **RolePlaying:** Structured turn-taking between assistant and user agents
|
||
|
|
- **Workforce:** Coordinator assigns tasks via `TaskChannel`, workers return results
|
||
|
|
- **Worker types:** `SingleAgentWorker` (single ChatAgent), `RolePlayingWorker` (two-agent pair)
|
||
|
|
|
||
|
|
**Shared Memory / Task Channel:**
|
||
|
|
- `TaskChannel` — async queue-based task dispatch with packet tracking
|
||
|
|
- States: SENT → PROCESSING → RETURNED → ARCHIVED
|
||
|
|
- O(1) lookup by task ID, status-based filtering, assignee/publisher queues
|
||
|
|
- `WorkflowMemoryManager` — persists workflow patterns as markdown files
|
||
|
|
- Role-based organization: workflows stored by `role_identifier`
|
||
|
|
- Agent-based intelligent selection: LLM picks relevant past workflows
|
||
|
|
- Versioned: metadata tracks creation time and version numbers
|
||
|
|
|
||
|
|
**Task Delegation:**
|
||
|
|
- Coordinator agent decomposes complex tasks using LLM analysis
|
||
|
|
- Tasks assigned to workers based on capability matching
|
||
|
|
- Failed tasks trigger: retry, create new worker, or further decomposition
|
||
|
|
- `FailureHandlingConfig` with configurable `RecoveryStrategy`
|
||
|
|
|
||
|
|
**Consensus / Quality:**
|
||
|
|
- Quality evaluation via structured output (response format enforced)
|
||
|
|
- Task dependencies tracked (worker receives dependency tasks as context)
|
||
|
|
- `WorkforceMetrics` for tracking execution statistics
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 2. Key Architectural Patterns for Fleet Knowledge Graph
|
||
|
|
|
||
|
|
### 2.1 Communication Topology Patterns
|
||
|
|
|
||
|
|
| Pattern | Used By | Description |
|
||
|
|
|---------|---------|-------------|
|
||
|
|
| **Sequential Chain** | CrewAI, ChatDev, MetaGPT | A→B→C linear flow, output feeds next |
|
||
|
|
| **Shared Thread** | AutoGen | All agents see all messages |
|
||
|
|
| **Publish-Subscribe** | MetaGPT | Environment-based message bus |
|
||
|
|
| **Paired Chat** | ChatDev, CAMEL | Two-agent conversation pairs |
|
||
|
|
| **Handoff Routing** | AutoGen Swarm | Agent explicitly names next speaker |
|
||
|
|
| **DAG Graph** | AutoGen GraphFlow | Conditional edges, parallel, loops |
|
||
|
|
| **Ledger Orchestration** | AutoGen MagenticOne | Maintains task ledger, re-plans |
|
||
|
|
| **Task Channel** | CAMEL | Async queue with packet states |
|
||
|
|
|
||
|
|
### 2.2 Shared State Patterns
|
||
|
|
|
||
|
|
| Pattern | Used By | Description |
|
||
|
|
|---------|---------|-------------|
|
||
|
|
| **Vector Memory** | CrewAI | Embeddings + scope-based namespacing |
|
||
|
|
| **Message History** | AutoGen | Sequential conversation context |
|
||
|
|
| **File System** | MetaGPT, ChatDev | Agents read/write shared files |
|
||
|
|
| **Task Channel** | CAMEL | Async packet-based task dispatch |
|
||
|
|
| **Workflow Files** | CAMEL | Markdown-based workflow memory |
|
||
|
|
| **Tool Cache** | CrewAI | In-memory RWLock tool result cache |
|
||
|
|
| **State Checkpoint** | CrewAI, AutoGen | Serialized Pydantic/SQLite checkpoints |
|
||
|
|
|
||
|
|
### 2.3 Task Delegation Patterns
|
||
|
|
|
||
|
|
| Pattern | Used By | Description |
|
||
|
|
|---------|---------|-------------|
|
||
|
|
| **Role Assignment** | CrewAI | Fixed agent per task |
|
||
|
|
| **Manager Delegation** | CrewAI Hierarchical | Manager assigns tasks dynamically |
|
||
|
|
| **Speaker Selection** | AutoGen Selector | LLM picks next agent |
|
||
|
|
| **Handoff** | AutoGen Swarm | Agent explicitly transfers control |
|
||
|
|
| **SOP Routing** | MetaGPT | Stage-based implicit delegation |
|
||
|
|
| **Coordinator** | CAMEL Workforce | LLM-based task decomposition + assignment |
|
||
|
|
| **Dynamic Worker Creation** | CAMEL Workforce | Create new workers on failure |
|
||
|
|
|
||
|
|
### 2.4 Conflict Resolution Patterns
|
||
|
|
|
||
|
|
| Pattern | Used By | Description |
|
||
|
|
|---------|---------|-------------|
|
||
|
|
| **Manager Arbitration** | CrewAI Hierarchical | Manager resolves conflicts |
|
||
|
|
| **Critic-in-the-loop** | CAMEL | Critic agent evaluates and selects |
|
||
|
|
| **Quality Gate** | CAMEL Workforce | Structured quality evaluation |
|
||
|
|
| **Termination Conditions** | AutoGen | Composable stop conditions |
|
||
|
|
| **Stall Detection** | AutoGen MagenticOne | Re-plans when progress stalls |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 3. Recommendations for Hermes Fleet Knowledge Graph
|
||
|
|
|
||
|
|
### 3.1 Architecture: Hybrid Graph + Memory
|
||
|
|
|
||
|
|
Based on the SOTA analysis, the optimal fleet knowledge graph should combine:
|
||
|
|
|
||
|
|
1. **CrewAI's scoped memory** for hierarchical knowledge organization
|
||
|
|
- Path-based namespaces: `/fleet/{fleet_id}/agent/{agent_id}/diary`
|
||
|
|
- Composite scoring: semantic + recency + importance
|
||
|
|
- Background writes with read barriers
|
||
|
|
|
||
|
|
2. **CAMEL's TaskChannel** for task dispatch and tracking
|
||
|
|
- Packet states (SENT → PROCESSING → RETURNED → ARCHIVED)
|
||
|
|
- O(1) lookup by task ID
|
||
|
|
- Assignee/publisher tracking
|
||
|
|
|
||
|
|
3. **AutoGen's DiGraph** for execution flow definition
|
||
|
|
- DAG with conditional edges for complex workflows
|
||
|
|
- Parallel fan-out for independent tasks
|
||
|
|
- Activation conditions (all vs any) for synchronization points
|
||
|
|
|
||
|
|
4. **AutoGen MagenticOne's ledger** for shared task context
|
||
|
|
- Maintained facts, plan, and progress ledger
|
||
|
|
- Dynamic re-planning on stalls
|
||
|
|
|
||
|
|
### 3.2 Fleet Knowledge Graph Schema
|
||
|
|
|
||
|
|
```
|
||
|
|
/fleet/{fleet_id}/
|
||
|
|
├── shared/ # Shared knowledge (all agents read)
|
||
|
|
│ ├── facts/ # Known facts, constraints
|
||
|
|
│ ├── decisions/ # Record of decisions made
|
||
|
|
│ └── context/ # Active task context
|
||
|
|
├── agent/{agent_id}/
|
||
|
|
│ ├── diary/ # Agent's personal experience log
|
||
|
|
│ ├── capabilities/ # What this agent can do
|
||
|
|
│ └── state/ # Current task state
|
||
|
|
├── tasks/
|
||
|
|
│ ├── {task_id}/ # Task metadata, dependencies, status
|
||
|
|
│ └── graph/ # DAG definition for task dependencies
|
||
|
|
└── consensus/
|
||
|
|
├── proposals/ # Pending proposals
|
||
|
|
└── decisions/ # Resolved consensus decisions
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3.3 Key Design Decisions
|
||
|
|
|
||
|
|
1. **Diary System (Agent Memory):**
|
||
|
|
- Each agent writes to its own scoped memory after every significant action
|
||
|
|
- LLM-analyzed importance scoring (like CrewAI's unified memory)
|
||
|
|
- Cross-agent recall: agents can query other agents' diaries for relevant experiences
|
||
|
|
- Decay: old low-importance memories expire
|
||
|
|
|
||
|
|
2. **Shared State (Fleet Knowledge):**
|
||
|
|
- SQLite-backed (like Hermes' existing `state.db`) with FTS5 search
|
||
|
|
- Hierarchical scopes (like CrewAI's MemoryScope)
|
||
|
|
- Write-ahead log for concurrent access
|
||
|
|
- Read barriers before queries (like CrewAI's `drain_writes`)
|
||
|
|
|
||
|
|
3. **Task Delegation:**
|
||
|
|
- Coordinator pattern (like CAMEL's Workforce)
|
||
|
|
- Task decomposition via LLM
|
||
|
|
- Failed task → retry, reassign, or decompose
|
||
|
|
- Max depth limit (like Hermes' existing MAX_DEPTH=2)
|
||
|
|
|
||
|
|
4. **Consensus Protocol:**
|
||
|
|
- Proposal-based: agent proposes, others vote/acknowledge
|
||
|
|
- Timeout-based fallback: if no response within N seconds, proceed
|
||
|
|
- Manager override: designated manager can break ties
|
||
|
|
- Simple majority for non-critical, unanimity for critical decisions
|
||
|
|
|
||
|
|
5. **Conflict Resolution:**
|
||
|
|
- Last-write-wins for non-critical state
|
||
|
|
- Optimistic locking with version numbers
|
||
|
|
- Manager arbitration for task assignment conflicts
|
||
|
|
- Quality gates (like CAMEL) for output validation
|
||
|
|
|
||
|
|
### 3.4 Integration with Existing Hermes Architecture
|
||
|
|
|
||
|
|
Hermes already has strong foundations:
|
||
|
|
- **Delegation system** (`delegate_tool.py`): Isolated child agents, parallel execution, depth limits
|
||
|
|
- **State DB** (`hermes_state.py`): SQLite + FTS5, WAL mode, session tracking, message history
|
||
|
|
- **Credential pools**: Shared credentials with rotation
|
||
|
|
|
||
|
|
The fleet knowledge graph should extend these patterns:
|
||
|
|
- **Session DB → Fleet DB:** Add tables for fleet metadata, agent registrations, task graphs
|
||
|
|
- **Memory tool → Fleet Memory:** Scoped vector memory shared across fleet agents
|
||
|
|
- **Delegate tool → Fleet Delegation:** Task channel with persistence, quality evaluation
|
||
|
|
- **New: Consensus module:** Proposal/vote protocol with timeout handling
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 4. Reference Implementations
|
||
|
|
|
||
|
|
| Component | Best Reference | Key Takeaway |
|
||
|
|
|-----------|---------------|--------------|
|
||
|
|
| Scoped Memory | CrewAI `Memory` + `MemoryScope` | Path-based namespaces, composite scoring, background writes |
|
||
|
|
| Task Dispatch | CAMEL `TaskChannel` | Packet-based with state machine, O(1) lookup |
|
||
|
|
| Execution DAG | AutoGen `DiGraphBuilder` | Fluent builder, conditional edges, activation groups |
|
||
|
|
| Orchestration | AutoGen `MagenticOneOrchestrator` | Ledger-based planning, stall detection, re-planning |
|
||
|
|
| Agent Communication | AutoGen `SelectorGroupChat` | LLM-based speaker selection, shared message thread |
|
||
|
|
| Quality Evaluation | CAMEL Workforce | Structured output for quality scoring |
|
||
|
|
| Workflow Memory | CAMEL `WorkflowMemoryManager` | Markdown-based, role-organized, versioned |
|
||
|
|
| State Checkpoint | CrewAI `SqliteProvider` | JSONB checkpoints, WAL mode |
|
||
|
|
| Tool Cache | CrewAI `CacheHandler` | RWLock-based concurrent tool result cache |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 5. Open Questions
|
||
|
|
|
||
|
|
1. **Graph vs Vector for knowledge:** Should fleet knowledge use a proper graph DB (e.g., Neo4j) or stick with vector + SQLite?
|
||
|
|
- Recommendation: Start with SQLite + vectors (existing stack), add graph later if needed
|
||
|
|
|
||
|
|
2. **Real-time vs Batch:** Should agents receive updates in real-time or batched?
|
||
|
|
- Recommendation: Event-driven for critical updates, batched for diary entries
|
||
|
|
|
||
|
|
3. **Security model:** How should cross-agent access be controlled?
|
||
|
|
- Recommendation: Role-based ACLs on scope paths, similar to CrewAI's privacy flags
|
||
|
|
|
||
|
|
4. **Scalability:** How many agents can a single fleet support?
|
||
|
|
- Recommendation: Start with 10-agent fleets, optimize SQLite concurrency first
|
||
|
|
|