Research reports: - Vector DB research - Workflow orchestration research - Fleet knowledge graph SOTA research - LLM inference optimization - Local model crisis quality - Memory systems SOTA - Multi-agent coordination - R5 vs E2E gap analysis - Text-to-music-video Test: - test_skill_manager_error_context.py [Allegro] Forge workers — 2026-04-16
15 KiB
SOTA Research: Multi-Agent Coordination & Fleet Knowledge Graphs
Date: 2026-04-14
Scope: Agent-to-agent communication, shared memory, task delegation, consensus protocols
Frameworks Analyzed: CrewAI, AutoGen, MetaGPT, ChatDev, CAMEL
1. Architecture Pattern Summary
1.1 CrewAI — Role-Based Crew Orchestration
Core Pattern: Agents organized into "Crews" with explicit roles, goals, and backstories. Tasks are assigned to agents, executed via sequential or hierarchical process flows.
Agent-to-Agent Communication:
- Sequential: Agent A completes Task A → output injected into Task B's context for Agent B
- Hierarchical: Manager agent delegates to worker agents, collects results, synthesizes
- Context passing: Tasks can declare
context: [other_tasks]— outputs from dependent tasks are automatically injected into the current task's prompt - No direct agent-to-agent messaging — communication is mediated through task outputs
Shared Memory (v2 — Unified Memory):
Memoryclass withremember()/recall()using vector embeddings (LanceDB/ChromaDB)- Scope-based isolation:
MemoryScopeprovides path-based namespacing (/crew/research/agent-foo) - Composite scoring: semantic similarity (0.5) + recency (0.3) + importance (0.2)
- RecallFlow: LLM-driven deep recall with adaptive query expansion
- Privacy flags: Private memories only visible to the source that created them
- Background saves: ThreadPoolExecutor with write barrier (drain_writes before recall)
Task Delegation:
- Agent tools include
Delegate Work to Co-workerandAsk Question to Co-worker - Delegation creates a new task for another agent, results come back to delegator
- Depth-limited (no infinite delegation chains)
State & Checkpointing:
SqliteProvider/JsonProviderfor state checkpoint persistenceCheckpointConfigwith event-driven persistence- Flow state is Pydantic models with serialization
Cache:
- Thread-safe in-memory tool result cache with RWLock
- Key:
{tool_name}-{input}→ cached output
1.2 AutoGen (Microsoft) — Conversation-Centric Teams
Core Pattern: Agents communicate through shared conversation threads. A "Group Chat Manager" controls turn-taking and speaker selection.
Agent-to-Agent Communication:
- Shared message thread — all agents see all messages (like a group chat)
- Three team patterns:
RoundRobinGroupChat: Fixed order cycling through participantsSelectorGroupChat: LLM-based speaker selection with candidate filteringSwarmGroupChat: Handoff-based routing (agent sends HandoffMessage to next agent)GraphFlow(DiGraph): DAG-based execution with conditional edges, parallel fan-out, loopsMagenticOneOrchestrator: Ledger-based orchestration with task planning, progress tracking, stall detection
Shared State:
ChatCompletionContext— manages message history per agent (can be unbounded or windowed)ModelContextshared across agents in a team- State serialization:
save_state()/load_state()for all managers - No built-in vector memory — context is purely conversational
Task Delegation:
Swarm: Agents useHandoffMessageto explicitly route controlGraphFlow: Conditional edges route based on message content (keyword or callable)MagenticOne: Orchestrator maintains a "task ledger" (facts + plan) and dynamically re-plans on stalls
Consensus / Termination:
TerminationCondition— composable conditions (text match, max messages, source-based)- No explicit consensus protocols — termination is manager-decided
Key Insight: AutoGen's ChatCompletionContext is the closest analog to shared memory, but it's purely sequential message history, not a knowledge base.
1.3 MetaGPT — SOP-Driven Software Teams
Core Pattern: Agents follow Standard Operating Procedures (SOPs). Each agent has a defined role (Product Manager, Architect, Engineer, QA) and produces structured artifacts.
Agent-to-Agent Communication:
- Publish-Subscribe via Environment: Agents publish "actions" to a shared Environment, subscribers react
- Structured outputs: Each role produces specific artifact types (PRD, design doc, code, test cases)
- Message routing: Environment acts as a message bus, filtering by subscriber interest
Shared Memory:
Environmentclass maintains shared state (project workspace)- File-based shared memory: agents write/read from a shared filesystem
SharedMemoryfor cross-agent context (structured data, not free-form text)
Task Delegation:
- Implicit through SOP stages: PM → Architect → Engineer → QA
- Each agent's output is the next agent's input
- No dynamic re-delegation
Consensus:
- Sequential SOP execution (no parallel agents)
- QA agent can trigger re-work loops back to Engineer
1.4 ChatDev — Chat-Chain Software Development
Core Pattern: Agents follow a "chat chain" — a sequence of chat phases (designing, coding, testing, documenting). Each phase involves a pair of agents (CEO↔CTO, Programmer↔Reviewer, etc.).
Agent-to-Agent Communication:
- Paired chat sessions: Two agents communicate in each phase (role-play between instructor and assistant)
- Chain propagation: Phase N's output (code, design doc) becomes Phase N+1's input
- No broadcast — communication is strictly pairwise within phases
Shared Memory:
- Software-centric: shared code repository is the "memory"
- Each phase modifies/inherits the codebase
- No explicit vector memory or knowledge graph
Task Delegation:
- Hardcoded phase sequence: Design → Code → Test → Document
- Each phase delegates to a specific agent pair
- No dynamic task re-assignment
Consensus:
- Phase-level termination: when both agents agree the phase is complete
- "Thought" tokens for chain-of-thought within chat
1.5 CAMEL — Role-Playing & Workforce
Core Pattern: Two primary modes:
- RolePlaying: Two-agent conversation with task specification and optional critic
- Workforce: Multi-agent with coordinator, task planner, and worker pool
Agent-to-Agent Communication:
- RolePlaying: Structured turn-taking between assistant and user agents
- Workforce: Coordinator assigns tasks via
TaskChannel, workers return results - Worker types:
SingleAgentWorker(single ChatAgent),RolePlayingWorker(two-agent pair)
Shared Memory / Task Channel:
TaskChannel— async queue-based task dispatch with packet tracking- States: SENT → PROCESSING → RETURNED → ARCHIVED
- O(1) lookup by task ID, status-based filtering, assignee/publisher queues
WorkflowMemoryManager— persists workflow patterns as markdown files- Role-based organization: workflows stored by
role_identifier - Agent-based intelligent selection: LLM picks relevant past workflows
- Versioned: metadata tracks creation time and version numbers
- Role-based organization: workflows stored by
Task Delegation:
- Coordinator agent decomposes complex tasks using LLM analysis
- Tasks assigned to workers based on capability matching
- Failed tasks trigger: retry, create new worker, or further decomposition
FailureHandlingConfigwith configurableRecoveryStrategy
Consensus / Quality:
- Quality evaluation via structured output (response format enforced)
- Task dependencies tracked (worker receives dependency tasks as context)
WorkforceMetricsfor tracking execution statistics
2. Key Architectural Patterns for Fleet Knowledge Graph
2.1 Communication Topology Patterns
| Pattern | Used By | Description |
|---|---|---|
| Sequential Chain | CrewAI, ChatDev, MetaGPT | A→B→C linear flow, output feeds next |
| Shared Thread | AutoGen | All agents see all messages |
| Publish-Subscribe | MetaGPT | Environment-based message bus |
| Paired Chat | ChatDev, CAMEL | Two-agent conversation pairs |
| Handoff Routing | AutoGen Swarm | Agent explicitly names next speaker |
| DAG Graph | AutoGen GraphFlow | Conditional edges, parallel, loops |
| Ledger Orchestration | AutoGen MagenticOne | Maintains task ledger, re-plans |
| Task Channel | CAMEL | Async queue with packet states |
2.2 Shared State Patterns
| Pattern | Used By | Description |
|---|---|---|
| Vector Memory | CrewAI | Embeddings + scope-based namespacing |
| Message History | AutoGen | Sequential conversation context |
| File System | MetaGPT, ChatDev | Agents read/write shared files |
| Task Channel | CAMEL | Async packet-based task dispatch |
| Workflow Files | CAMEL | Markdown-based workflow memory |
| Tool Cache | CrewAI | In-memory RWLock tool result cache |
| State Checkpoint | CrewAI, AutoGen | Serialized Pydantic/SQLite checkpoints |
2.3 Task Delegation Patterns
| Pattern | Used By | Description |
|---|---|---|
| Role Assignment | CrewAI | Fixed agent per task |
| Manager Delegation | CrewAI Hierarchical | Manager assigns tasks dynamically |
| Speaker Selection | AutoGen Selector | LLM picks next agent |
| Handoff | AutoGen Swarm | Agent explicitly transfers control |
| SOP Routing | MetaGPT | Stage-based implicit delegation |
| Coordinator | CAMEL Workforce | LLM-based task decomposition + assignment |
| Dynamic Worker Creation | CAMEL Workforce | Create new workers on failure |
2.4 Conflict Resolution Patterns
| Pattern | Used By | Description |
|---|---|---|
| Manager Arbitration | CrewAI Hierarchical | Manager resolves conflicts |
| Critic-in-the-loop | CAMEL | Critic agent evaluates and selects |
| Quality Gate | CAMEL Workforce | Structured quality evaluation |
| Termination Conditions | AutoGen | Composable stop conditions |
| Stall Detection | AutoGen MagenticOne | Re-plans when progress stalls |
3. Recommendations for Hermes Fleet Knowledge Graph
3.1 Architecture: Hybrid Graph + Memory
Based on the SOTA analysis, the optimal fleet knowledge graph should combine:
-
CrewAI's scoped memory for hierarchical knowledge organization
- Path-based namespaces:
/fleet/{fleet_id}/agent/{agent_id}/diary - Composite scoring: semantic + recency + importance
- Background writes with read barriers
- Path-based namespaces:
-
CAMEL's TaskChannel for task dispatch and tracking
- Packet states (SENT → PROCESSING → RETURNED → ARCHIVED)
- O(1) lookup by task ID
- Assignee/publisher tracking
-
AutoGen's DiGraph for execution flow definition
- DAG with conditional edges for complex workflows
- Parallel fan-out for independent tasks
- Activation conditions (all vs any) for synchronization points
-
AutoGen MagenticOne's ledger for shared task context
- Maintained facts, plan, and progress ledger
- Dynamic re-planning on stalls
3.2 Fleet Knowledge Graph Schema
/fleet/{fleet_id}/
├── shared/ # Shared knowledge (all agents read)
│ ├── facts/ # Known facts, constraints
│ ├── decisions/ # Record of decisions made
│ └── context/ # Active task context
├── agent/{agent_id}/
│ ├── diary/ # Agent's personal experience log
│ ├── capabilities/ # What this agent can do
│ └── state/ # Current task state
├── tasks/
│ ├── {task_id}/ # Task metadata, dependencies, status
│ └── graph/ # DAG definition for task dependencies
└── consensus/
├── proposals/ # Pending proposals
└── decisions/ # Resolved consensus decisions
3.3 Key Design Decisions
-
Diary System (Agent Memory):
- Each agent writes to its own scoped memory after every significant action
- LLM-analyzed importance scoring (like CrewAI's unified memory)
- Cross-agent recall: agents can query other agents' diaries for relevant experiences
- Decay: old low-importance memories expire
-
Shared State (Fleet Knowledge):
- SQLite-backed (like Hermes' existing
state.db) with FTS5 search - Hierarchical scopes (like CrewAI's MemoryScope)
- Write-ahead log for concurrent access
- Read barriers before queries (like CrewAI's
drain_writes)
- SQLite-backed (like Hermes' existing
-
Task Delegation:
- Coordinator pattern (like CAMEL's Workforce)
- Task decomposition via LLM
- Failed task → retry, reassign, or decompose
- Max depth limit (like Hermes' existing MAX_DEPTH=2)
-
Consensus Protocol:
- Proposal-based: agent proposes, others vote/acknowledge
- Timeout-based fallback: if no response within N seconds, proceed
- Manager override: designated manager can break ties
- Simple majority for non-critical, unanimity for critical decisions
-
Conflict Resolution:
- Last-write-wins for non-critical state
- Optimistic locking with version numbers
- Manager arbitration for task assignment conflicts
- Quality gates (like CAMEL) for output validation
3.4 Integration with Existing Hermes Architecture
Hermes already has strong foundations:
- Delegation system (
delegate_tool.py): Isolated child agents, parallel execution, depth limits - State DB (
hermes_state.py): SQLite + FTS5, WAL mode, session tracking, message history - Credential pools: Shared credentials with rotation
The fleet knowledge graph should extend these patterns:
- Session DB → Fleet DB: Add tables for fleet metadata, agent registrations, task graphs
- Memory tool → Fleet Memory: Scoped vector memory shared across fleet agents
- Delegate tool → Fleet Delegation: Task channel with persistence, quality evaluation
- New: Consensus module: Proposal/vote protocol with timeout handling
4. Reference Implementations
| Component | Best Reference | Key Takeaway |
|---|---|---|
| Scoped Memory | CrewAI Memory + MemoryScope |
Path-based namespaces, composite scoring, background writes |
| Task Dispatch | CAMEL TaskChannel |
Packet-based with state machine, O(1) lookup |
| Execution DAG | AutoGen DiGraphBuilder |
Fluent builder, conditional edges, activation groups |
| Orchestration | AutoGen MagenticOneOrchestrator |
Ledger-based planning, stall detection, re-planning |
| Agent Communication | AutoGen SelectorGroupChat |
LLM-based speaker selection, shared message thread |
| Quality Evaluation | CAMEL Workforce | Structured output for quality scoring |
| Workflow Memory | CAMEL WorkflowMemoryManager |
Markdown-based, role-organized, versioned |
| State Checkpoint | CrewAI SqliteProvider |
JSONB checkpoints, WAL mode |
| Tool Cache | CrewAI CacheHandler |
RWLock-based concurrent tool result cache |
5. Open Questions
-
Graph vs Vector for knowledge: Should fleet knowledge use a proper graph DB (e.g., Neo4j) or stick with vector + SQLite?
- Recommendation: Start with SQLite + vectors (existing stack), add graph later if needed
-
Real-time vs Batch: Should agents receive updates in real-time or batched?
- Recommendation: Event-driven for critical updates, batched for diary entries
-
Security model: How should cross-agent access be controlled?
- Recommendation: Role-based ACLs on scope paths, similar to CrewAI's privacy flags
-
Scalability: How many agents can a single fleet support?
- Recommendation: Start with 10-agent fleets, optimize SQLite concurrency first