Files
hermes-agent/docs/plans/fleet-knowledge-graph-sota-research.md
Hermes Agent ff2ce95ade
Some checks failed
Tests / e2e (pull_request) Successful in 1m39s
Tests / test (pull_request) Failing after 1h7m45s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Successful in 24s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 28s
feat(research): Allegro worker deliverables — fleet research reports + skill manager test
Research reports:
- Vector DB research
- Workflow orchestration research
- Fleet knowledge graph SOTA research
- LLM inference optimization
- Local model crisis quality
- Memory systems SOTA
- Multi-agent coordination
- R5 vs E2E gap analysis
- Text-to-music-video

Test:
- test_skill_manager_error_context.py

[Allegro] Forge workers — 2026-04-16
2026-04-16 15:04:28 +00:00

15 KiB

SOTA Research: Multi-Agent Coordination & Fleet Knowledge Graphs

Date: 2026-04-14
Scope: Agent-to-agent communication, shared memory, task delegation, consensus protocols
Frameworks Analyzed: CrewAI, AutoGen, MetaGPT, ChatDev, CAMEL


1. Architecture Pattern Summary

1.1 CrewAI — Role-Based Crew Orchestration

Core Pattern: Agents organized into "Crews" with explicit roles, goals, and backstories. Tasks are assigned to agents, executed via sequential or hierarchical process flows.

Agent-to-Agent Communication:

  • Sequential: Agent A completes Task A → output injected into Task B's context for Agent B
  • Hierarchical: Manager agent delegates to worker agents, collects results, synthesizes
  • Context passing: Tasks can declare context: [other_tasks] — outputs from dependent tasks are automatically injected into the current task's prompt
  • No direct agent-to-agent messaging — communication is mediated through task outputs

Shared Memory (v2 — Unified Memory):

  • Memory class with remember() / recall() using vector embeddings (LanceDB/ChromaDB)
  • Scope-based isolation: MemoryScope provides path-based namespacing (/crew/research/agent-foo)
  • Composite scoring: semantic similarity (0.5) + recency (0.3) + importance (0.2)
  • RecallFlow: LLM-driven deep recall with adaptive query expansion
  • Privacy flags: Private memories only visible to the source that created them
  • Background saves: ThreadPoolExecutor with write barrier (drain_writes before recall)

Task Delegation:

  • Agent tools include Delegate Work to Co-worker and Ask Question to Co-worker
  • Delegation creates a new task for another agent, results come back to delegator
  • Depth-limited (no infinite delegation chains)

State & Checkpointing:

  • SqliteProvider / JsonProvider for state checkpoint persistence
  • CheckpointConfig with event-driven persistence
  • Flow state is Pydantic models with serialization

Cache:

  • Thread-safe in-memory tool result cache with RWLock
  • Key: {tool_name}-{input} → cached output

1.2 AutoGen (Microsoft) — Conversation-Centric Teams

Core Pattern: Agents communicate through shared conversation threads. A "Group Chat Manager" controls turn-taking and speaker selection.

Agent-to-Agent Communication:

  • Shared message thread — all agents see all messages (like a group chat)
  • Three team patterns:
    • RoundRobinGroupChat: Fixed order cycling through participants
    • SelectorGroupChat: LLM-based speaker selection with candidate filtering
    • SwarmGroupChat: Handoff-based routing (agent sends HandoffMessage to next agent)
    • GraphFlow (DiGraph): DAG-based execution with conditional edges, parallel fan-out, loops
    • MagenticOneOrchestrator: Ledger-based orchestration with task planning, progress tracking, stall detection

Shared State:

  • ChatCompletionContext — manages message history per agent (can be unbounded or windowed)
  • ModelContext shared across agents in a team
  • State serialization: save_state() / load_state() for all managers
  • No built-in vector memory — context is purely conversational

Task Delegation:

  • Swarm: Agents use HandoffMessage to explicitly route control
  • GraphFlow: Conditional edges route based on message content (keyword or callable)
  • MagenticOne: Orchestrator maintains a "task ledger" (facts + plan) and dynamically re-plans on stalls

Consensus / Termination:

  • TerminationCondition — composable conditions (text match, max messages, source-based)
  • No explicit consensus protocols — termination is manager-decided

Key Insight: AutoGen's ChatCompletionContext is the closest analog to shared memory, but it's purely sequential message history, not a knowledge base.

1.3 MetaGPT — SOP-Driven Software Teams

Core Pattern: Agents follow Standard Operating Procedures (SOPs). Each agent has a defined role (Product Manager, Architect, Engineer, QA) and produces structured artifacts.

Agent-to-Agent Communication:

  • Publish-Subscribe via Environment: Agents publish "actions" to a shared Environment, subscribers react
  • Structured outputs: Each role produces specific artifact types (PRD, design doc, code, test cases)
  • Message routing: Environment acts as a message bus, filtering by subscriber interest

Shared Memory:

  • Environment class maintains shared state (project workspace)
  • File-based shared memory: agents write/read from a shared filesystem
  • SharedMemory for cross-agent context (structured data, not free-form text)

Task Delegation:

  • Implicit through SOP stages: PM → Architect → Engineer → QA
  • Each agent's output is the next agent's input
  • No dynamic re-delegation

Consensus:

  • Sequential SOP execution (no parallel agents)
  • QA agent can trigger re-work loops back to Engineer

1.4 ChatDev — Chat-Chain Software Development

Core Pattern: Agents follow a "chat chain" — a sequence of chat phases (designing, coding, testing, documenting). Each phase involves a pair of agents (CEO↔CTO, Programmer↔Reviewer, etc.).

Agent-to-Agent Communication:

  • Paired chat sessions: Two agents communicate in each phase (role-play between instructor and assistant)
  • Chain propagation: Phase N's output (code, design doc) becomes Phase N+1's input
  • No broadcast — communication is strictly pairwise within phases

Shared Memory:

  • Software-centric: shared code repository is the "memory"
  • Each phase modifies/inherits the codebase
  • No explicit vector memory or knowledge graph

Task Delegation:

  • Hardcoded phase sequence: Design → Code → Test → Document
  • Each phase delegates to a specific agent pair
  • No dynamic task re-assignment

Consensus:

  • Phase-level termination: when both agents agree the phase is complete
  • "Thought" tokens for chain-of-thought within chat

1.5 CAMEL — Role-Playing & Workforce

Core Pattern: Two primary modes:

  1. RolePlaying: Two-agent conversation with task specification and optional critic
  2. Workforce: Multi-agent with coordinator, task planner, and worker pool

Agent-to-Agent Communication:

  • RolePlaying: Structured turn-taking between assistant and user agents
  • Workforce: Coordinator assigns tasks via TaskChannel, workers return results
  • Worker types: SingleAgentWorker (single ChatAgent), RolePlayingWorker (two-agent pair)

Shared Memory / Task Channel:

  • TaskChannel — async queue-based task dispatch with packet tracking
    • States: SENT → PROCESSING → RETURNED → ARCHIVED
    • O(1) lookup by task ID, status-based filtering, assignee/publisher queues
  • WorkflowMemoryManager — persists workflow patterns as markdown files
    • Role-based organization: workflows stored by role_identifier
    • Agent-based intelligent selection: LLM picks relevant past workflows
    • Versioned: metadata tracks creation time and version numbers

Task Delegation:

  • Coordinator agent decomposes complex tasks using LLM analysis
  • Tasks assigned to workers based on capability matching
  • Failed tasks trigger: retry, create new worker, or further decomposition
  • FailureHandlingConfig with configurable RecoveryStrategy

Consensus / Quality:

  • Quality evaluation via structured output (response format enforced)
  • Task dependencies tracked (worker receives dependency tasks as context)
  • WorkforceMetrics for tracking execution statistics

2. Key Architectural Patterns for Fleet Knowledge Graph

2.1 Communication Topology Patterns

Pattern Used By Description
Sequential Chain CrewAI, ChatDev, MetaGPT A→B→C linear flow, output feeds next
Shared Thread AutoGen All agents see all messages
Publish-Subscribe MetaGPT Environment-based message bus
Paired Chat ChatDev, CAMEL Two-agent conversation pairs
Handoff Routing AutoGen Swarm Agent explicitly names next speaker
DAG Graph AutoGen GraphFlow Conditional edges, parallel, loops
Ledger Orchestration AutoGen MagenticOne Maintains task ledger, re-plans
Task Channel CAMEL Async queue with packet states

2.2 Shared State Patterns

Pattern Used By Description
Vector Memory CrewAI Embeddings + scope-based namespacing
Message History AutoGen Sequential conversation context
File System MetaGPT, ChatDev Agents read/write shared files
Task Channel CAMEL Async packet-based task dispatch
Workflow Files CAMEL Markdown-based workflow memory
Tool Cache CrewAI In-memory RWLock tool result cache
State Checkpoint CrewAI, AutoGen Serialized Pydantic/SQLite checkpoints

2.3 Task Delegation Patterns

Pattern Used By Description
Role Assignment CrewAI Fixed agent per task
Manager Delegation CrewAI Hierarchical Manager assigns tasks dynamically
Speaker Selection AutoGen Selector LLM picks next agent
Handoff AutoGen Swarm Agent explicitly transfers control
SOP Routing MetaGPT Stage-based implicit delegation
Coordinator CAMEL Workforce LLM-based task decomposition + assignment
Dynamic Worker Creation CAMEL Workforce Create new workers on failure

2.4 Conflict Resolution Patterns

Pattern Used By Description
Manager Arbitration CrewAI Hierarchical Manager resolves conflicts
Critic-in-the-loop CAMEL Critic agent evaluates and selects
Quality Gate CAMEL Workforce Structured quality evaluation
Termination Conditions AutoGen Composable stop conditions
Stall Detection AutoGen MagenticOne Re-plans when progress stalls

3. Recommendations for Hermes Fleet Knowledge Graph

3.1 Architecture: Hybrid Graph + Memory

Based on the SOTA analysis, the optimal fleet knowledge graph should combine:

  1. CrewAI's scoped memory for hierarchical knowledge organization

    • Path-based namespaces: /fleet/{fleet_id}/agent/{agent_id}/diary
    • Composite scoring: semantic + recency + importance
    • Background writes with read barriers
  2. CAMEL's TaskChannel for task dispatch and tracking

    • Packet states (SENT → PROCESSING → RETURNED → ARCHIVED)
    • O(1) lookup by task ID
    • Assignee/publisher tracking
  3. AutoGen's DiGraph for execution flow definition

    • DAG with conditional edges for complex workflows
    • Parallel fan-out for independent tasks
    • Activation conditions (all vs any) for synchronization points
  4. AutoGen MagenticOne's ledger for shared task context

    • Maintained facts, plan, and progress ledger
    • Dynamic re-planning on stalls

3.2 Fleet Knowledge Graph Schema

/fleet/{fleet_id}/
  ├── shared/              # Shared knowledge (all agents read)
  │   ├── facts/           # Known facts, constraints
  │   ├── decisions/       # Record of decisions made
  │   └── context/         # Active task context
  ├── agent/{agent_id}/
  │   ├── diary/           # Agent's personal experience log
  │   ├── capabilities/    # What this agent can do
  │   └── state/           # Current task state
  ├── tasks/
  │   ├── {task_id}/       # Task metadata, dependencies, status
  │   └── graph/           # DAG definition for task dependencies
  └── consensus/
      ├── proposals/       # Pending proposals
      └── decisions/       # Resolved consensus decisions

3.3 Key Design Decisions

  1. Diary System (Agent Memory):

    • Each agent writes to its own scoped memory after every significant action
    • LLM-analyzed importance scoring (like CrewAI's unified memory)
    • Cross-agent recall: agents can query other agents' diaries for relevant experiences
    • Decay: old low-importance memories expire
  2. Shared State (Fleet Knowledge):

    • SQLite-backed (like Hermes' existing state.db) with FTS5 search
    • Hierarchical scopes (like CrewAI's MemoryScope)
    • Write-ahead log for concurrent access
    • Read barriers before queries (like CrewAI's drain_writes)
  3. Task Delegation:

    • Coordinator pattern (like CAMEL's Workforce)
    • Task decomposition via LLM
    • Failed task → retry, reassign, or decompose
    • Max depth limit (like Hermes' existing MAX_DEPTH=2)
  4. Consensus Protocol:

    • Proposal-based: agent proposes, others vote/acknowledge
    • Timeout-based fallback: if no response within N seconds, proceed
    • Manager override: designated manager can break ties
    • Simple majority for non-critical, unanimity for critical decisions
  5. Conflict Resolution:

    • Last-write-wins for non-critical state
    • Optimistic locking with version numbers
    • Manager arbitration for task assignment conflicts
    • Quality gates (like CAMEL) for output validation

3.4 Integration with Existing Hermes Architecture

Hermes already has strong foundations:

  • Delegation system (delegate_tool.py): Isolated child agents, parallel execution, depth limits
  • State DB (hermes_state.py): SQLite + FTS5, WAL mode, session tracking, message history
  • Credential pools: Shared credentials with rotation

The fleet knowledge graph should extend these patterns:

  • Session DB → Fleet DB: Add tables for fleet metadata, agent registrations, task graphs
  • Memory tool → Fleet Memory: Scoped vector memory shared across fleet agents
  • Delegate tool → Fleet Delegation: Task channel with persistence, quality evaluation
  • New: Consensus module: Proposal/vote protocol with timeout handling

4. Reference Implementations

Component Best Reference Key Takeaway
Scoped Memory CrewAI Memory + MemoryScope Path-based namespaces, composite scoring, background writes
Task Dispatch CAMEL TaskChannel Packet-based with state machine, O(1) lookup
Execution DAG AutoGen DiGraphBuilder Fluent builder, conditional edges, activation groups
Orchestration AutoGen MagenticOneOrchestrator Ledger-based planning, stall detection, re-planning
Agent Communication AutoGen SelectorGroupChat LLM-based speaker selection, shared message thread
Quality Evaluation CAMEL Workforce Structured output for quality scoring
Workflow Memory CAMEL WorkflowMemoryManager Markdown-based, role-organized, versioned
State Checkpoint CrewAI SqliteProvider JSONB checkpoints, WAL mode
Tool Cache CrewAI CacheHandler RWLock-based concurrent tool result cache

5. Open Questions

  1. Graph vs Vector for knowledge: Should fleet knowledge use a proper graph DB (e.g., Neo4j) or stick with vector + SQLite?

    • Recommendation: Start with SQLite + vectors (existing stack), add graph later if needed
  2. Real-time vs Batch: Should agents receive updates in real-time or batched?

    • Recommendation: Event-driven for critical updates, batched for diary entries
  3. Security model: How should cross-agent access be controlled?

    • Recommendation: Role-based ACLs on scope paths, similar to CrewAI's privacy flags
  4. Scalability: How many agents can a single fleet support?

    • Recommendation: Start with 10-agent fleets, optimize SQLite concurrency first