Research reports: - Vector DB research - Workflow orchestration research - Fleet knowledge graph SOTA research - LLM inference optimization - Local model crisis quality - Memory systems SOTA - Multi-agent coordination - R5 vs E2E gap analysis - Text-to-music-video Test: - test_skill_manager_error_context.py [Allegro] Forge workers — 2026-04-16
26 KiB
Multi-Agent Coordination SOTA Research Report
Fleet Knowledge Graph — Architecture Patterns & Integration Recommendations
Date: 2025-04-14
Scope: Agent-to-agent communication, shared memory, task delegation, consensus protocols, conflict resolution
Frameworks Analyzed: CrewAI, AutoGen, MetaGPT, ChatDev, CAMEL, LangGraph
Target Fleet: Hermes (orchestrator), Timmy, Claude Code, Gemini, Kimi
1. EXECUTIVE SUMMARY
Six major multi-agent frameworks each solve coordination differently. The SOTA converges on four core patterns: role-based delegation with capability matching, shared state via publish-subscribe messaging, directed-graph task flows with conditional routing, and layered memory (short-term context + long-term knowledge graph). For our fleet, the optimal architecture combines AutoGen's GraphFlow (dag-based task routing), CrewAI's hierarchical memory (short-term RAG + long-term SQLite + entity memory), MetaGPT's standardized output contracts (typed task artifacts), and CAMEL's role-playing delegation protocol (inception-prompted agent negotiation).
2. FRAMEWORK-BY-FRAMEWORK ANALYSIS
2.1 CrewAI (v1.14.x) — Role-Based Crews with Hierarchical Orchestration
Core Architecture:
- Process modes:
Process.sequential(tasks execute in order),Process.hierarchical(manager agent delegates to workers) - Agent delegation:
allow_delegation=Trueenables agents to call other agents as tools, selecting the best agent for subtasks - Memory system: Crew-level
memory=Trueenables UnifiedMemory with:- Short-term: RAG-backed (embeddings → vector store) for recent task context
- Long-term: SQLite-backed for persistent task outcomes
- Entity memory: Tracks entities (people, companies, concepts) across tasks
- User memory: Per-user preference tracking
- Embedder: Configurable (OpenAI, Cohere, Jina, local ONNX, etc.)
- Knowledge sources:
knowledge_sources=[StringKnowledgeSource(...)]for RAG-grounded context per agent or crew - Flows:
@start,@listen,@routerdecorators for DAG orchestration across crews.or_()andand_()combinators for conditional triggers - Callbacks:
before_kickoff_callbacks,after_kickoff_callbacks,step_callback,task_callback
Key Patterns for Fleet:
- Delegation-as-tool: Agents can invoke other agents by role → our fleet agents could expose themselves as callable tools to each other
- Sequential handoff: Task output from Agent A feeds directly as input to Agent B → pipeline pattern
- Hierarchical manager: A manager LLM decomposes goals and assigns tasks → matches Hermes-as-orchestrator pattern
- Shared memory with scopes: Crew-level memory visible to all agents, agent-level memory private
Limitations:
- No native inter-process communication — all agents live in the same process
- Manager/hierarchical mode requires an LLM call just for delegation decisions (extra latency/cost)
- No built-in conflict resolution for concurrent writes to shared memory
2.2 AutoGen (v0.7.5) — Flexible Team Topologies with Graph-Based Coordination
Core Architecture:
- Team topologies (5 types):
RoundRobinGroupChat: Sequential turn-taking, each agent speaks in orderSelectorGroupChat: LLM selects next speaker based on conversation context (selector_prompttemplate)MagenticOneGroupChat: Orchestrator-driven (from Microsoft's Magentic-One paper), with stall detection and replanningSwarm: Handoff-based — current speaker explicitly hands off to target viaHandoffMessageGraphFlow: Directed acyclic graph execution — agents execute based on DAG edges with conditional routing, fan-out, join patterns, and loop support
- Agent types:
AssistantAgent: Standard LLM agent with toolsCodeExecutorAgent: Runs code in isolated environmentsUserProxyAgent: Human-in-the-loop proxySocietyOfMindAgent: Meta-agent — wraps an inner team and summarizes their output as a single response (composable nesting)MessageFilterAgent: Filters/transforms messages between agents
- Termination conditions:
TextMentionTermination,MaxMessageTermination,SourceMatchTermination,HandoffTermination,TimeoutTermination,FunctionCallTermination,TokenUsageTermination,ExternalTermination(programmatic control),FunctionalTermination(custom function) - Memory:
Sequence[Memory]on agents — per-agent memory stores (RAG-backed) - GraphFlow specifics:
DiGraphBuilder.add_node(agent, activation='all'|'any')DiGraphBuilder.add_edge(source, target, condition=callable|str)— conditional edgesset_entry_point(agent)— defines graph root- Supports: sequential, parallel fan-out, conditional branching, join patterns, loops with exit conditions
- Node activation:
'all'(wait for all incoming edges) vs'any'(trigger on first)
Key Patterns for Fleet:
- GraphFlow is the SOTA pattern for multi-agent orchestration — DAG-based, conditional, supports parallel branches and joins
- SocietyOfMindAgent enables hierarchical composition — a team of agents wrapped as a single agent that can participate in a larger team
- Selector pattern (LLM picks next speaker) is elegant for heterogeneous fleets where capability matching matters
- Swarm handoff maps directly to our ACP handoff mechanism
- Termination conditions are composable —
termination_a | termination_b(OR),termination_a & termination_b(AND)
2.3 MetaGPT — SOP-Driven Multi-Agent with Standardized Artifacts
Core Architecture (from paper + codebase):
- SOP (Standard Operating Procedure): Tasks decomposed into phases, each with specific roles and required artifacts
- Role-based agents: Each role has
name,profile,goal,constraints,actions(specific output types) - Shared Message Environment: All agents publish to and subscribe from a shared
Environmentobject - Publish-Subscribe: Agents subscribe to message types/topics they care about, ignore others
- Standardized Output: Each action produces a typed artifact (e.g.,
SystemDesign,Task,Code) — structured contracts between agents - Memory:
Memoryclass stores all messages, retrievable by relevance.Role.react()callsobserve()thenact()based on observed messages - Communication: Asynchronous message passing — agents publish results to environment, interested agents react
Key Patterns for Fleet:
- Typed artifact contracts: Each agent publishes structured outputs (not free-form text) → reduces ambiguity in inter-agent communication
- Pub-sub messaging: Decouples sender from receiver — agents don't need to know about each other, just subscribe to relevant topics
- SOP-driven phases: Define workflow phases (e.g., "analysis" → "implementation" → "review") with specific agents per phase
- Environment as blackboard: Shared state all agents can read/write — classic blackboard architecture for AI systems
2.4 ChatDev — Chat-Chain Architecture for Software Development
Core Architecture:
- Chat Chain: Sequential phases (design → code → test → document), each phase is a two-agent conversation
- Role pairing: Each phase pairs complementary roles (e.g., CEO ↔ CTO, Programmer ↔ Reviewer)
- Communicative dehallucination: Agents communicate through structured prompts that constrain outputs to prevent hallucination
- Phase transitions: Phase completion triggers next phase, output from one phase seeds the next
- Memory: Conversation history within each phase; phase outputs stored as artifacts
Key Patterns for Fleet:
- Phase-gated pipeline: Each phase must produce a specific artifact type before proceeding
- Complementary role pairing: Pair agents with opposing perspectives (creator ↔ reviewer) for higher quality
- Communicative protocols: Structured conversation templates reduce free-form ambiguity
2.5 CAMEL — Role-Playing Autonomous Multi-Agent Communication
Core Architecture:
- RolePlaying society: Two agents (assistant + user) collaborate with inception prompting
- Task specification:
with_task_specify=Trueuses a task-specify agent to refine the initial prompt into a concrete task - Task planning:
with_task_planner=Trueadds a planning agent that decomposes the task - Critic-in-the-loop:
with_critic_in_the_loop=Trueadds a critic agent that evaluates and approves/rejects - Inception prompting: Both agents receive system messages that establish their roles, goals, and communication protocol
- Termination: Agents signal completion via specific tokens or phrases
Key Patterns for Fleet:
- Inception prompting: Agents negotiate a shared understanding of the task before executing
- Critic-in-the-loop: A dedicated reviewer agent validates outputs before acceptance
- Role-playing protocol: Structured back-and-forth between complementary agents
- Task refinement chain: Raw goal → specified task → planned subtasks → executed
2.6 LangGraph — Graph-Based Stateful Agent Workflows
Core Architecture (from documentation/paper):
- StateGraph: Typed state schema shared across all nodes (agents/tools)
- Nodes: Functions (agents, tools, transforms) that read/modify shared state
- Edges: Conditional routing based on state or agent decisions
- Checkpointer: Persistent state snapshots (SQLite, Postgres, in-memory) — enables pause/resume
- Human-in-the-loop: Interrupt nodes for approval, edit, review
- Streaming: Real-time node-by-node or token-by-token output
- Subgraphs: Composable graph composition — subgraph as a node in parent graph
- State channels: Multiple state namespaces for different aspects of the workflow
Key Patterns for Fleet:
- Shared typed state: All agents operate on a well-defined state schema — eliminates ambiguity about what data each agent sees
- Checkpoint persistence: Workflow can be paused, resumed, forked — critical for long-running agent tasks
- Conditional edges: Route based on agent output type or state values
- Subgraph composition: Each fleet agent could be a subgraph, composed into larger workflows
- Command-based routing: Nodes return
Command(goto="node_name", update={...})for explicit control flow
3. CROSS-CUTTING PATTERNS ANALYSIS
3.1 Agent-to-Agent Communication
| Pattern | Frameworks | Latency | Decoupling | Structured |
|---|---|---|---|---|
| Direct tool invocation | CrewAI, AutoGen | Low | Low | Medium |
| Pub-sub messaging | MetaGPT | Medium | High | High |
| Handoff messages | AutoGen Swarm | Low | Medium | High |
| Chat-chain conversations | ChatDev, CAMEL | High | Low | Medium |
| Shared state graph | LangGraph, AutoGen GraphFlow | Low | Medium | High |
Recommendation: Use handoff + shared state pattern. Agents communicate via typed handoff messages (what task was completed, what artifacts produced) while sharing a typed state object (knowledge graph entries).
3.2 Shared Memory Patterns
| Pattern | Frameworks | Persistence | Scope | Query Method |
|---|---|---|---|---|
| RAG-backed short-term | CrewAI, AutoGen | Session | Crew/Team | Embedding similarity |
| SQLite long-term | CrewAI | Cross-session | Global | SQL + embeddings |
| Entity memory | CrewAI | Cross-session | Global | Entity lookup |
| Message store | MetaGPT | Session | Environment | Relevance search |
| Typed state channels | LangGraph | Checkpointed | Graph | State field access |
| Frozen snapshot | Hermes (current) | Cross-session | Agent | System prompt injection |
Recommendation: Implement three-tier memory:
- Session state (LangGraph-style typed state graph) — shared within a workflow
- Fleet knowledge graph (new) — structured triples/relations between entities, projects, decisions
- Agent-local memory (existing MEMORY.md pattern) — per-agent persistent notes
3.3 Task Delegation
| Pattern | Frameworks | Decision Maker | Granularity |
|---|---|---|---|
| Manager decomposition | CrewAI hierarchical | Manager LLM | Task-level |
| Delegation-as-tool | CrewAI | Self-selecting | Subtask |
| Selector-based | AutoGen SelectorGroupChat | LLM selector | Turn-level |
| Handoff-based | AutoGen Swarm | Current agent | Message-level |
| Graph-defined | AutoGen GraphFlow, LangGraph | Pre-defined DAG | Node-level |
| SOP-based | MetaGPT | Phase rules | Phase-level |
Recommendation: Use hybrid delegation:
- Graph-based for known workflows (CI/CD, code review pipelines) — pre-defined DAGs
- Selector-based for exploratory tasks (research, debugging) — LLM picks best agent
- Handoff-based for agent-initiated delegation — current agent explicitly hands off
3.4 Consensus Protocols
No framework implements true consensus protocols (Raft, PBFT). Instead:
| Pattern | What It Solves |
|---|---|
| Critic-in-the-loop (CAMEL) | Single reviewer approves/rejects |
| Aggregator synthesis (MoA/Mixture-of-Agents) | Multiple responses synthesized into one |
| Hierarchical manager (CrewAI) | Manager makes final decision |
| MagenticOne orchestrator (AutoGen) | Orchestrator plans and replans |
Recommendation for Fleet: Implement weighted ensemble consensus:
- Multiple agents produce independent solutions
- A synthesis agent aggregates (like MoA pattern already in Hermes)
- For critical decisions, require 2-of-3 agreement from designated expert agents
3.5 Conflict Resolution
| Conflict Type | Resolution Strategy |
|---|---|
| Concurrent memory writes | File locking + atomic rename (Hermes already does this) |
| Conflicting agent outputs | Critic/validator agent evaluates both |
| Task assignment conflicts | Single orchestrator (Hermes) assigns, no self-assignment |
| State graph race conditions | LangGraph checkpoint + merge strategies |
Recommendation:
- Write conflicts: Atomic operations with optimistic locking (existing pattern)
- Output conflicts: Dedicate one agent as "judge" for each workflow
- Assignment conflicts: Centralized orchestrator (Hermes) — no agent self-delegation to other fleet members without approval
4. FLEET ARCHITECTURE RECOMMENDATION
4.1 Proposed Architecture: "Fleet Knowledge Graph" (FKG)
┌─────────────────────────────────────────────────────────────┐
│ FLEET KNOWLEDGE GRAPH │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Entities │ │ Relations│ │ Artifacts│ │ Decisions│ │
│ │ (nodes) │──│ (edges) │──│ (typed) │──│ (history)│ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ Storage: SQLite + FTS5 (existing hermes_state.py pattern) │
│ Schema: RDF-lite triples with typed properties │
└─────────────────────┬───────────────────────────────────────┘
│
┌───────────┼───────────┐
│ │ │
┌────▼────┐ ┌────▼────┐ ┌───▼─────┐
│ Session │ │ Agent │ │ Workflow│
│ State │ │ Memory │ │ History │
│ (shared)│ │ (local) │ │ (audit) │
└─────────┘ └─────────┘ └─────────┘
4.2 Fleet Member Roles
| Agent | Role | Strengths | Delegation Style |
|---|---|---|---|
| Hermes | Orchestrator | Planning, tool use, multi-platform | Delegator (spawns others) |
| Claude Code | Code specialist | Deep code reasoning, ACP integration | Executor (receives tasks) |
| Gemini | Multimodal analyst | Vision, large context, fast | Executor (receives tasks) |
| Kimi | Coding assistant | Code generation, long context | Executor (receives tasks) |
| Timmy | (Details TBD) | TBD | Executor (receives tasks) |
4.3 Communication Protocol
Inter-Agent Message Format (inspired by MetaGPT's typed artifacts):
{
"message_type": "task_request|task_response|handoff|knowledge_update|conflict",
"source_agent": "hermes",
"target_agent": "claude_code",
"task_id": "uuid",
"parent_task_id": "uuid|null",
"payload": {
"goal": "...",
"context": "...",
"artifacts": [{"type": "code", "path": "..."}, {"type": "analysis", "content": "..."}],
"constraints": ["..."],
"priority": "high|medium|low"
},
"knowledge_graph_refs": ["entity:project-x", "relation:depends-on"],
"timestamp": "ISO8601",
"signature": "hmac-or-uuid"
}
4.4 Task Flow Patterns
Pattern 1: Pipeline (ChatDev-style)
Hermes → [Analyze] → Claude Code → [Implement] → Gemini → [Review] → Hermes → [Deliver]
Pattern 2: Fan-out/Fan-in (AutoGen GraphFlow-style)
┌→ Claude Code (code) ──┐
Hermes ──┼→ Gemini (analysis) ───┼→ Hermes (synthesize)
└→ Kimi (docs) ─────────┘
Pattern 3: Debate (CAMEL-style)
Claude Code (proposal) ↔ Gemini (critic) → Hermes (judge)
Pattern 4: Selector (AutoGen SelectorGroupChat)
Hermes (orchestrator) → LLM selects best agent → Agent executes → Result → Repeat
4.5 Knowledge Graph Schema
-- Core entities
CREATE TABLE fkg_entities (
id TEXT PRIMARY KEY,
entity_type TEXT NOT NULL, -- 'project', 'file', 'agent', 'task', 'concept', 'decision'
name TEXT NOT NULL,
properties JSON, -- Flexible typed properties
created_by TEXT, -- Agent that created this
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Relations between entities
CREATE TABLE fkg_relations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
source_entity TEXT REFERENCES fkg_entities(id),
target_entity TEXT REFERENCES fkg_entities(id),
relation_type TEXT NOT NULL, -- 'depends-on', 'created-by', 'reviewed-by', 'part-of', 'conflicts-with'
properties JSON,
confidence REAL DEFAULT 1.0,
created_by TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Task execution history
CREATE TABLE fkg_task_history (
task_id TEXT PRIMARY KEY,
parent_task_id TEXT,
goal TEXT,
assigned_agent TEXT,
status TEXT, -- 'pending', 'running', 'completed', 'failed', 'conflict'
result_summary TEXT,
artifacts JSON, -- List of produced artifacts
knowledge_refs JSON, -- Entities/relations this task touched
started_at TIMESTAMP,
completed_at TIMESTAMP
);
-- Conflict tracking
CREATE TABLE fkg_conflicts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
entity_id TEXT REFERENCES fkg_entities(id),
conflict_type TEXT, -- 'concurrent_write', 'contradictory_output', 'resource_contention'
agent_a TEXT,
agent_b TEXT,
resolution TEXT,
resolved_by TEXT,
resolved_at TIMESTAMP
);
-- Full-text search across everything
CREATE VIRTUAL TABLE fkg_search USING fts5(
entity_name, entity_type, properties_text,
content='fkg_entities', content_rowid='rowid'
);
5. INTEGRATION RECOMMENDATIONS
5.1 Phase 1: Foundation (Immediate — 1-2 weeks)
-
Implement FKG SQLite database at
~/.hermes/fleet_knowledge.db- Extend existing
hermes_state.pypattern (already uses SQLite + FTS5) - Add schema from §4.5
- Create
tools/fleet_knowledge_tool.pywith CRUD operations
- Extend existing
-
Create fleet agent registry in
agent/fleet_registry.py- Map agent names → transport (ACP, API, subprocess)
- Store capabilities, specializations, availability status
- Integrate with existing
acp_adapter/anddelegate_tool.py
-
Define message protocol as typed Python dataclasses
FleetMessage,TaskRequest,TaskResponse,KnowledgeUpdate- Validation via Pydantic (already a CrewAI/dependency)
5.2 Phase 2: Communication Layer (2-4 weeks)
-
Build fleet delegation on top of existing
delegate_tool.py- Extend to support cross-agent delegation (not just child subagents)
- ACP transport for Claude Code (already supported via
acp_command) - OpenRouter/OpenAI-compatible API for Gemini, Kimi
- Reuse existing credential pool and provider resolution
-
Implement selector-based task routing (AutoGen SelectorGroupChat pattern)
- LLM-based agent selection based on task description + agent capabilities
- Hermes acts as the selector/orchestrator
- Simple heuristic fallback (code → Claude Code, vision → Gemini, etc.)
-
Add typed artifact contracts (MetaGPT pattern)
- Each task produces a typed artifact (code, analysis, docs, review)
- Artifacts stored in FKG with entity relations
- Downstream agents consume typed inputs, not free-form text
5.3 Phase 3: Advanced Patterns (4-6 weeks)
-
Implement workflow DAGs (AutoGen GraphFlow pattern)
- Pre-defined workflows as directed graphs (code review pipeline, research pipeline)
- Conditional routing based on artifact types or agent decisions
- Fan-out/fan-in for parallel execution across fleet agents
-
Add conflict resolution
- Detect concurrent writes to same FKG entities
- Critic agent validates contradictory outputs
- Track resolution history for learning
-
Build consensus mechanism for critical decisions
- Weighted voting based on agent expertise
- MoA-style aggregation (already implemented in
mixture_of_agents_tool.py) - Escalation to human for irreconcilable conflicts
5.4 Phase 4: Intelligence (6-8 weeks)
-
Learning from delegation history
- Track which agent performs best for which task types
- Adjust routing weights over time
- RL-style improvement of delegation decisions
-
Fleet-level memory evolution
- Entities and relations in FKG become the "shared brain"
- Agents contribute knowledge as they work
- Cross-agent knowledge synthesis (one agent's discovery benefits all)
6. BENCHMARKS & PERFORMANCE CONSIDERATIONS
6.1 Latency Estimates
| Pattern | Overhead | Notes |
|---|---|---|
| Direct delegation (current) | ~30s per subagent | Spawn + run + collect |
| ACP transport (Claude Code) | ~2-5s connection + task time | Subprocess handshake |
| API-based (Gemini/Kimi) | ~1-2s + task time | Standard HTTP |
| Selector routing | +1 LLM call (~2-5s) | For agent selection |
| GraphFlow routing | +state overhead (~100ms) | Pre-defined, no LLM call |
| FKG query | ~1-5ms | SQLite indexed query |
| MoA consensus | ~15-30s (4 parallel + 1 aggregator) | Already implemented |
6.2 Recommended Configuration
# Fleet coordination config (add to config.yaml)
fleet:
enabled: true
knowledge_db: "~/.hermes/fleet_knowledge.db"
agents:
hermes:
role: orchestrator
transport: local
claude_code:
role: code_specialist
transport: acp
acp_command: "claude"
acp_args: ["--acp", "--stdio"]
capabilities: ["code", "debugging", "architecture"]
gemini:
role: multimodal_analyst
transport: api
provider: openrouter
model: "google/gemini-3-pro-preview"
capabilities: ["vision", "analysis", "large_context"]
kimi:
role: coding_assistant
transport: api
provider: kimi-coding
capabilities: ["code", "long_context"]
delegation:
strategy: selector # selector | pipeline | graph
max_concurrent: 3
timeout_seconds: 300
consensus:
enabled: true
min_agreement: 2 # 2-of-3 for critical decisions
escalation_agent: hermes
knowledge:
auto_extract: true # Extract entities from task results
relation_confidence_threshold: 0.7
search_provider: fts5 # fts5 | vector | hybrid
7. EXISTING HERMES INFRASTRUCTURE TO LEVERAGE
| Component | What It Provides | Reuse For |
|---|---|---|
delegate_tool.py |
Subagent spawning, isolated contexts | Fleet delegation transport |
mixture_of_agents_tool.py |
Multi-model consensus/aggregation | Fleet consensus protocol |
memory_tool.py |
Bounded persistent memory with atomic writes | Pattern for FKG writes |
acp_adapter/ |
ACP server for IDE integration | Claude Code transport |
hermes_state.py |
SQLite + FTS5 session store | FKG database foundation |
tools/registry.py |
Central tool registry | Fleet knowledge tool registration |
agent/credential_pool.py |
Credential rotation | Multi-provider auth |
hermes_cli/runtime_provider.py |
Provider resolution | Fleet agent connection |
8. KEY TAKEAWAYS
-
GraphFlow (AutoGen) is the SOTA orchestration pattern — DAG-based execution with conditional routing beats sequential chains and pure LLM-delegation for structured workflows
-
Three-tier memory is essential — Session state (volatile), knowledge graph (persistent structured), agent memory (persistent per-agent notes)
-
Typed artifacts over free-form text — MetaGPT's approach of standardized output contracts dramatically reduces inter-agent ambiguity
-
Hybrid delegation beats any single pattern — Pre-defined DAGs for known workflows, LLM selection for exploratory tasks, handoff for agent-initiated delegation
-
Critic-in-the-loop is the practical consensus mechanism — Don't implement Byzantine fault tolerance; a dedicated reviewer agent with clear acceptance criteria is sufficient
-
Our existing infrastructure covers ~60% of what's needed — delegate_tool, MoA, memory_tool, ACP adapter, and SQLite patterns are solid foundations to build on
-
The fleet knowledge graph is the differentiator — No existing framework has a proper shared knowledge graph that persists across agent interactions. Building this gives us a unique advantage.
Report generated from analysis of CrewAI v1.14.1, AutoGen v0.7.5, CAMEL v0.2.90 (installed locally), plus MetaGPT, ChatDev, and LangGraph documentation.