530 lines
26 KiB
Markdown
530 lines
26 KiB
Markdown
|
|
# Multi-Agent Coordination SOTA Research Report
|
||
|
|
## Fleet Knowledge Graph — Architecture Patterns & Integration Recommendations
|
||
|
|
|
||
|
|
**Date**: 2025-04-14
|
||
|
|
**Scope**: Agent-to-agent communication, shared memory, task delegation, consensus protocols, conflict resolution
|
||
|
|
**Frameworks Analyzed**: CrewAI, AutoGen, MetaGPT, ChatDev, CAMEL, LangGraph
|
||
|
|
**Target Fleet**: Hermes (orchestrator), Timmy, Claude Code, Gemini, Kimi
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 1. EXECUTIVE SUMMARY
|
||
|
|
|
||
|
|
Six major multi-agent frameworks each solve coordination differently. The SOTA converges on **four core patterns**: role-based delegation with capability matching, shared state via publish-subscribe messaging, directed-graph task flows with conditional routing, and layered memory (short-term context + long-term knowledge graph). For our fleet, the optimal architecture combines **AutoGen's GraphFlow** (dag-based task routing), **CrewAI's hierarchical memory** (short-term RAG + long-term SQLite + entity memory), **MetaGPT's standardized output contracts** (typed task artifacts), and **CAMEL's role-playing delegation protocol** (inception-prompted agent negotiation).
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 2. FRAMEWORK-BY-FRAMEWORK ANALYSIS
|
||
|
|
|
||
|
|
### 2.1 CrewAI (v1.14.x) — Role-Based Crews with Hierarchical Orchestration
|
||
|
|
|
||
|
|
**Core Architecture:**
|
||
|
|
- **Process modes**: `Process.sequential` (tasks execute in order), `Process.hierarchical` (manager agent delegates to workers)
|
||
|
|
- **Agent delegation**: `allow_delegation=True` enables agents to call other agents as tools, selecting the best agent for subtasks
|
||
|
|
- **Memory system**: Crew-level `memory=True` enables UnifiedMemory with:
|
||
|
|
- **Short-term**: RAG-backed (embeddings → vector store) for recent task context
|
||
|
|
- **Long-term**: SQLite-backed for persistent task outcomes
|
||
|
|
- **Entity memory**: Tracks entities (people, companies, concepts) across tasks
|
||
|
|
- **User memory**: Per-user preference tracking
|
||
|
|
- **Embedder**: Configurable (OpenAI, Cohere, Jina, local ONNX, etc.)
|
||
|
|
- **Knowledge sources**: `knowledge_sources=[StringKnowledgeSource(...)]` for RAG-grounded context per agent or crew
|
||
|
|
- **Flows**: `@start`, `@listen`, `@router` decorators for DAG orchestration across crews. `or_()` and `and_()` combinators for conditional triggers
|
||
|
|
- **Callbacks**: `before_kickoff_callbacks`, `after_kickoff_callbacks`, `step_callback`, `task_callback`
|
||
|
|
|
||
|
|
**Key Patterns for Fleet:**
|
||
|
|
- **Delegation-as-tool**: Agents can invoke other agents by role → our fleet agents could expose themselves as callable tools to each other
|
||
|
|
- **Sequential handoff**: Task output from Agent A feeds directly as input to Agent B → pipeline pattern
|
||
|
|
- **Hierarchical manager**: A manager LLM decomposes goals and assigns tasks → matches Hermes-as-orchestrator pattern
|
||
|
|
- **Shared memory with scopes**: Crew-level memory visible to all agents, agent-level memory private
|
||
|
|
|
||
|
|
**Limitations:**
|
||
|
|
- No native inter-process communication — all agents live in the same process
|
||
|
|
- Manager/hierarchical mode requires an LLM call just for delegation decisions (extra latency/cost)
|
||
|
|
- No built-in conflict resolution for concurrent writes to shared memory
|
||
|
|
|
||
|
|
### 2.2 AutoGen (v0.7.5) — Flexible Team Topologies with Graph-Based Coordination
|
||
|
|
|
||
|
|
**Core Architecture:**
|
||
|
|
- **Team topologies** (5 types):
|
||
|
|
- `RoundRobinGroupChat`: Sequential turn-taking, each agent speaks in order
|
||
|
|
- `SelectorGroupChat`: LLM selects next speaker based on conversation context (`selector_prompt` template)
|
||
|
|
- `MagenticOneGroupChat`: Orchestrator-driven (from Microsoft's Magentic-One paper), with stall detection and replanning
|
||
|
|
- `Swarm`: Handoff-based — current speaker explicitly hands off to target via `HandoffMessage`
|
||
|
|
- `GraphFlow`: **Directed acyclic graph** execution — agents execute based on DAG edges with conditional routing, fan-out, join patterns, and loop support
|
||
|
|
- **Agent types**:
|
||
|
|
- `AssistantAgent`: Standard LLM agent with tools
|
||
|
|
- `CodeExecutorAgent`: Runs code in isolated environments
|
||
|
|
- `UserProxyAgent`: Human-in-the-loop proxy
|
||
|
|
- `SocietyOfMindAgent`: **Meta-agent** — wraps an inner team and summarizes their output as a single response (composable nesting)
|
||
|
|
- `MessageFilterAgent`: Filters/transforms messages between agents
|
||
|
|
- **Termination conditions**: `TextMentionTermination`, `MaxMessageTermination`, `SourceMatchTermination`, `HandoffTermination`, `TimeoutTermination`, `FunctionCallTermination`, `TokenUsageTermination`, `ExternalTermination` (programmatic control), `FunctionalTermination` (custom function)
|
||
|
|
- **Memory**: `Sequence[Memory]` on agents — per-agent memory stores (RAG-backed)
|
||
|
|
- **GraphFlow specifics**:
|
||
|
|
- `DiGraphBuilder.add_node(agent, activation='all'|'any')`
|
||
|
|
- `DiGraphBuilder.add_edge(source, target, condition=callable|str)` — conditional edges
|
||
|
|
- `set_entry_point(agent)` — defines graph root
|
||
|
|
- Supports: sequential, parallel fan-out, conditional branching, join patterns, loops with exit conditions
|
||
|
|
- Node activation: `'all'` (wait for all incoming edges) vs `'any'` (trigger on first)
|
||
|
|
|
||
|
|
**Key Patterns for Fleet:**
|
||
|
|
- **GraphFlow is the SOTA pattern** for multi-agent orchestration — DAG-based, conditional, supports parallel branches and joins
|
||
|
|
- **SocietyOfMindAgent** enables hierarchical composition — a team of agents wrapped as a single agent that can participate in a larger team
|
||
|
|
- **Selector pattern** (LLM picks next speaker) is elegant for heterogeneous fleets where capability matching matters
|
||
|
|
- **Swarm handoff** maps directly to our ACP handoff mechanism
|
||
|
|
- **Termination conditions** are composable — `termination_a | termination_b` (OR), `termination_a & termination_b` (AND)
|
||
|
|
|
||
|
|
### 2.3 MetaGPT — SOP-Driven Multi-Agent with Standardized Artifacts
|
||
|
|
|
||
|
|
**Core Architecture (from paper + codebase):**
|
||
|
|
- **SOP (Standard Operating Procedure)**: Tasks decomposed into phases, each with specific roles and required artifacts
|
||
|
|
- **Role-based agents**: Each role has `name`, `profile`, `goal`, `constraints`, `actions` (specific output types)
|
||
|
|
- **Shared Message Environment**: All agents publish to and subscribe from a shared `Environment` object
|
||
|
|
- **Publish-Subscribe**: Agents subscribe to message types/topics they care about, ignore others
|
||
|
|
- **Standardized Output**: Each action produces a typed artifact (e.g., `SystemDesign`, `Task`, `Code`) — structured contracts between agents
|
||
|
|
- **Memory**: `Memory` class stores all messages, retrievable by relevance. `Role.react()` calls `observe()` then `act()` based on observed messages
|
||
|
|
- **Communication**: Asynchronous message passing — agents publish results to environment, interested agents react
|
||
|
|
|
||
|
|
**Key Patterns for Fleet:**
|
||
|
|
- **Typed artifact contracts**: Each agent publishes structured outputs (not free-form text) → reduces ambiguity in inter-agent communication
|
||
|
|
- **Pub-sub messaging**: Decouples sender from receiver — agents don't need to know about each other, just subscribe to relevant topics
|
||
|
|
- **SOP-driven phases**: Define workflow phases (e.g., "analysis" → "implementation" → "review") with specific agents per phase
|
||
|
|
- **Environment as blackboard**: Shared state all agents can read/write — classic blackboard architecture for AI systems
|
||
|
|
|
||
|
|
### 2.4 ChatDev — Chat-Chain Architecture for Software Development
|
||
|
|
|
||
|
|
**Core Architecture:**
|
||
|
|
- **Chat Chain**: Sequential phases (design → code → test → document), each phase is a two-agent conversation
|
||
|
|
- **Role pairing**: Each phase pairs complementary roles (e.g., CEO ↔ CTO, Programmer ↔ Reviewer)
|
||
|
|
- **Communicative dehallucination**: Agents communicate through structured prompts that constrain outputs to prevent hallucination
|
||
|
|
- **Phase transitions**: Phase completion triggers next phase, output from one phase seeds the next
|
||
|
|
- **Memory**: Conversation history within each phase; phase outputs stored as artifacts
|
||
|
|
|
||
|
|
**Key Patterns for Fleet:**
|
||
|
|
- **Phase-gated pipeline**: Each phase must produce a specific artifact type before proceeding
|
||
|
|
- **Complementary role pairing**: Pair agents with opposing perspectives (creator ↔ reviewer) for higher quality
|
||
|
|
- **Communicative protocols**: Structured conversation templates reduce free-form ambiguity
|
||
|
|
|
||
|
|
### 2.5 CAMEL — Role-Playing Autonomous Multi-Agent Communication
|
||
|
|
|
||
|
|
**Core Architecture:**
|
||
|
|
- **RolePlaying society**: Two agents (assistant + user) collaborate with inception prompting
|
||
|
|
- **Task specification**: `with_task_specify=True` uses a task-specify agent to refine the initial prompt into a concrete task
|
||
|
|
- **Task planning**: `with_task_planner=True` adds a planning agent that decomposes the task
|
||
|
|
- **Critic-in-the-loop**: `with_critic_in_the_loop=True` adds a critic agent that evaluates and approves/rejects
|
||
|
|
- **Inception prompting**: Both agents receive system messages that establish their roles, goals, and communication protocol
|
||
|
|
- **Termination**: Agents signal completion via specific tokens or phrases
|
||
|
|
|
||
|
|
**Key Patterns for Fleet:**
|
||
|
|
- **Inception prompting**: Agents negotiate a shared understanding of the task before executing
|
||
|
|
- **Critic-in-the-loop**: A dedicated reviewer agent validates outputs before acceptance
|
||
|
|
- **Role-playing protocol**: Structured back-and-forth between complementary agents
|
||
|
|
- **Task refinement chain**: Raw goal → specified task → planned subtasks → executed
|
||
|
|
|
||
|
|
### 2.6 LangGraph — Graph-Based Stateful Agent Workflows
|
||
|
|
|
||
|
|
**Core Architecture (from documentation/paper):**
|
||
|
|
- **StateGraph**: Typed state schema shared across all nodes (agents/tools)
|
||
|
|
- **Nodes**: Functions (agents, tools, transforms) that read/modify shared state
|
||
|
|
- **Edges**: Conditional routing based on state or agent decisions
|
||
|
|
- **Checkpointer**: Persistent state snapshots (SQLite, Postgres, in-memory) — enables pause/resume
|
||
|
|
- **Human-in-the-loop**: Interrupt nodes for approval, edit, review
|
||
|
|
- **Streaming**: Real-time node-by-node or token-by-token output
|
||
|
|
- **Subgraphs**: Composable graph composition — subgraph as a node in parent graph
|
||
|
|
- **State channels**: Multiple state namespaces for different aspects of the workflow
|
||
|
|
|
||
|
|
**Key Patterns for Fleet:**
|
||
|
|
- **Shared typed state**: All agents operate on a well-defined state schema — eliminates ambiguity about what data each agent sees
|
||
|
|
- **Checkpoint persistence**: Workflow can be paused, resumed, forked — critical for long-running agent tasks
|
||
|
|
- **Conditional edges**: Route based on agent output type or state values
|
||
|
|
- **Subgraph composition**: Each fleet agent could be a subgraph, composed into larger workflows
|
||
|
|
- **Command-based routing**: Nodes return `Command(goto="node_name", update={...})` for explicit control flow
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 3. CROSS-CUTTING PATTERNS ANALYSIS
|
||
|
|
|
||
|
|
### 3.1 Agent-to-Agent Communication
|
||
|
|
|
||
|
|
| Pattern | Frameworks | Latency | Decoupling | Structured |
|
||
|
|
|---------|-----------|---------|------------|------------|
|
||
|
|
| Direct tool invocation | CrewAI, AutoGen | Low | Low | Medium |
|
||
|
|
| Pub-sub messaging | MetaGPT | Medium | High | High |
|
||
|
|
| Handoff messages | AutoGen Swarm | Low | Medium | High |
|
||
|
|
| Chat-chain conversations | ChatDev, CAMEL | High | Low | Medium |
|
||
|
|
| Shared state graph | LangGraph, AutoGen GraphFlow | Low | Medium | High |
|
||
|
|
|
||
|
|
**Recommendation**: Use **handoff + shared state** pattern. Agents communicate via typed handoff messages (what task was completed, what artifacts produced) while sharing a typed state object (knowledge graph entries).
|
||
|
|
|
||
|
|
### 3.2 Shared Memory Patterns
|
||
|
|
|
||
|
|
| Pattern | Frameworks | Persistence | Scope | Query Method |
|
||
|
|
|---------|-----------|-------------|-------|-------------|
|
||
|
|
| RAG-backed short-term | CrewAI, AutoGen | Session | Crew/Team | Embedding similarity |
|
||
|
|
| SQLite long-term | CrewAI | Cross-session | Global | SQL + embeddings |
|
||
|
|
| Entity memory | CrewAI | Cross-session | Global | Entity lookup |
|
||
|
|
| Message store | MetaGPT | Session | Environment | Relevance search |
|
||
|
|
| Typed state channels | LangGraph | Checkpointed | Graph | State field access |
|
||
|
|
| Frozen snapshot | Hermes (current) | Cross-session | Agent | System prompt injection |
|
||
|
|
|
||
|
|
**Recommendation**: Implement **three-tier memory**:
|
||
|
|
1. **Session state** (LangGraph-style typed state graph) — shared within a workflow
|
||
|
|
2. **Fleet knowledge graph** (new) — structured triples/relations between entities, projects, decisions
|
||
|
|
3. **Agent-local memory** (existing MEMORY.md pattern) — per-agent persistent notes
|
||
|
|
|
||
|
|
### 3.3 Task Delegation
|
||
|
|
|
||
|
|
| Pattern | Frameworks | Decision Maker | Granularity |
|
||
|
|
|---------|-----------|---------------|-------------|
|
||
|
|
| Manager decomposition | CrewAI hierarchical | Manager LLM | Task-level |
|
||
|
|
| Delegation-as-tool | CrewAI | Self-selecting | Subtask |
|
||
|
|
| Selector-based | AutoGen SelectorGroupChat | LLM selector | Turn-level |
|
||
|
|
| Handoff-based | AutoGen Swarm | Current agent | Message-level |
|
||
|
|
| Graph-defined | AutoGen GraphFlow, LangGraph | Pre-defined DAG | Node-level |
|
||
|
|
| SOP-based | MetaGPT | Phase rules | Phase-level |
|
||
|
|
|
||
|
|
**Recommendation**: Use **hybrid delegation**:
|
||
|
|
- **Graph-based** for known workflows (CI/CD, code review pipelines) — pre-defined DAGs
|
||
|
|
- **Selector-based** for exploratory tasks (research, debugging) — LLM picks best agent
|
||
|
|
- **Handoff-based** for agent-initiated delegation — current agent explicitly hands off
|
||
|
|
|
||
|
|
### 3.4 Consensus Protocols
|
||
|
|
|
||
|
|
No framework implements true consensus protocols (Raft, PBFT). Instead:
|
||
|
|
|
||
|
|
| Pattern | What It Solves |
|
||
|
|
|---------|---------------|
|
||
|
|
| Critic-in-the-loop (CAMEL) | Single reviewer approves/rejects |
|
||
|
|
| Aggregator synthesis (MoA/Mixture-of-Agents) | Multiple responses synthesized into one |
|
||
|
|
| Hierarchical manager (CrewAI) | Manager makes final decision |
|
||
|
|
| MagenticOne orchestrator (AutoGen) | Orchestrator plans and replans |
|
||
|
|
|
||
|
|
**Recommendation for Fleet**: Implement **weighted ensemble consensus**:
|
||
|
|
1. Multiple agents produce independent solutions
|
||
|
|
2. A synthesis agent aggregates (like MoA pattern already in Hermes)
|
||
|
|
3. For critical decisions, require 2-of-3 agreement from designated expert agents
|
||
|
|
|
||
|
|
### 3.5 Conflict Resolution
|
||
|
|
|
||
|
|
| Conflict Type | Resolution Strategy |
|
||
|
|
|--------------|-------------------|
|
||
|
|
| Concurrent memory writes | File locking + atomic rename (Hermes already does this) |
|
||
|
|
| Conflicting agent outputs | Critic/validator agent evaluates both |
|
||
|
|
| Task assignment conflicts | Single orchestrator (Hermes) assigns, no self-assignment |
|
||
|
|
| State graph race conditions | LangGraph checkpoint + merge strategies |
|
||
|
|
|
||
|
|
**Recommendation**:
|
||
|
|
- **Write conflicts**: Atomic operations with optimistic locking (existing pattern)
|
||
|
|
- **Output conflicts**: Dedicate one agent as "judge" for each workflow
|
||
|
|
- **Assignment conflicts**: Centralized orchestrator (Hermes) — no agent self-delegation to other fleet members without approval
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 4. FLEET ARCHITECTURE RECOMMENDATION
|
||
|
|
|
||
|
|
### 4.1 Proposed Architecture: "Fleet Knowledge Graph" (FKG)
|
||
|
|
|
||
|
|
```
|
||
|
|
┌─────────────────────────────────────────────────────────────┐
|
||
|
|
│ FLEET KNOWLEDGE GRAPH │
|
||
|
|
│ │
|
||
|
|
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||
|
|
│ │ Entities │ │ Relations│ │ Artifacts│ │ Decisions│ │
|
||
|
|
│ │ (nodes) │──│ (edges) │──│ (typed) │──│ (history)│ │
|
||
|
|
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
|
||
|
|
│ │
|
||
|
|
│ Storage: SQLite + FTS5 (existing hermes_state.py pattern) │
|
||
|
|
│ Schema: RDF-lite triples with typed properties │
|
||
|
|
└─────────────────────┬───────────────────────────────────────┘
|
||
|
|
│
|
||
|
|
┌───────────┼───────────┐
|
||
|
|
│ │ │
|
||
|
|
┌────▼────┐ ┌────▼────┐ ┌───▼─────┐
|
||
|
|
│ Session │ │ Agent │ │ Workflow│
|
||
|
|
│ State │ │ Memory │ │ History │
|
||
|
|
│ (shared)│ │ (local) │ │ (audit) │
|
||
|
|
└─────────┘ └─────────┘ └─────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4.2 Fleet Member Roles
|
||
|
|
|
||
|
|
| Agent | Role | Strengths | Delegation Style |
|
||
|
|
|-------|------|-----------|-----------------|
|
||
|
|
| **Hermes** | Orchestrator | Planning, tool use, multi-platform | Delegator (spawns others) |
|
||
|
|
| **Claude Code** | Code specialist | Deep code reasoning, ACP integration | Executor (receives tasks) |
|
||
|
|
| **Gemini** | Multimodal analyst | Vision, large context, fast | Executor (receives tasks) |
|
||
|
|
| **Kimi** | Coding assistant | Code generation, long context | Executor (receives tasks) |
|
||
|
|
| **Timmy** | (Details TBD) | TBD | Executor (receives tasks) |
|
||
|
|
|
||
|
|
### 4.3 Communication Protocol
|
||
|
|
|
||
|
|
**Inter-Agent Message Format** (inspired by MetaGPT's typed artifacts):
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"message_type": "task_request|task_response|handoff|knowledge_update|conflict",
|
||
|
|
"source_agent": "hermes",
|
||
|
|
"target_agent": "claude_code",
|
||
|
|
"task_id": "uuid",
|
||
|
|
"parent_task_id": "uuid|null",
|
||
|
|
"payload": {
|
||
|
|
"goal": "...",
|
||
|
|
"context": "...",
|
||
|
|
"artifacts": [{"type": "code", "path": "..."}, {"type": "analysis", "content": "..."}],
|
||
|
|
"constraints": ["..."],
|
||
|
|
"priority": "high|medium|low"
|
||
|
|
},
|
||
|
|
"knowledge_graph_refs": ["entity:project-x", "relation:depends-on"],
|
||
|
|
"timestamp": "ISO8601",
|
||
|
|
"signature": "hmac-or-uuid"
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4.4 Task Flow Patterns
|
||
|
|
|
||
|
|
**Pattern 1: Pipeline (ChatDev-style)**
|
||
|
|
```
|
||
|
|
Hermes → [Analyze] → Claude Code → [Implement] → Gemini → [Review] → Hermes → [Deliver]
|
||
|
|
```
|
||
|
|
|
||
|
|
**Pattern 2: Fan-out/Fan-in (AutoGen GraphFlow-style)**
|
||
|
|
```
|
||
|
|
┌→ Claude Code (code) ──┐
|
||
|
|
Hermes ──┼→ Gemini (analysis) ───┼→ Hermes (synthesize)
|
||
|
|
└→ Kimi (docs) ─────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
**Pattern 3: Debate (CAMEL-style)**
|
||
|
|
```
|
||
|
|
Claude Code (proposal) ↔ Gemini (critic) → Hermes (judge)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Pattern 4: Selector (AutoGen SelectorGroupChat)**
|
||
|
|
```
|
||
|
|
Hermes (orchestrator) → LLM selects best agent → Agent executes → Result → Repeat
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4.5 Knowledge Graph Schema
|
||
|
|
|
||
|
|
```sql
|
||
|
|
-- Core entities
|
||
|
|
CREATE TABLE fkg_entities (
|
||
|
|
id TEXT PRIMARY KEY,
|
||
|
|
entity_type TEXT NOT NULL, -- 'project', 'file', 'agent', 'task', 'concept', 'decision'
|
||
|
|
name TEXT NOT NULL,
|
||
|
|
properties JSON, -- Flexible typed properties
|
||
|
|
created_by TEXT, -- Agent that created this
|
||
|
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||
|
|
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||
|
|
);
|
||
|
|
|
||
|
|
-- Relations between entities
|
||
|
|
CREATE TABLE fkg_relations (
|
||
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
|
|
source_entity TEXT REFERENCES fkg_entities(id),
|
||
|
|
target_entity TEXT REFERENCES fkg_entities(id),
|
||
|
|
relation_type TEXT NOT NULL, -- 'depends-on', 'created-by', 'reviewed-by', 'part-of', 'conflicts-with'
|
||
|
|
properties JSON,
|
||
|
|
confidence REAL DEFAULT 1.0,
|
||
|
|
created_by TEXT,
|
||
|
|
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
|
||
|
|
);
|
||
|
|
|
||
|
|
-- Task execution history
|
||
|
|
CREATE TABLE fkg_task_history (
|
||
|
|
task_id TEXT PRIMARY KEY,
|
||
|
|
parent_task_id TEXT,
|
||
|
|
goal TEXT,
|
||
|
|
assigned_agent TEXT,
|
||
|
|
status TEXT, -- 'pending', 'running', 'completed', 'failed', 'conflict'
|
||
|
|
result_summary TEXT,
|
||
|
|
artifacts JSON, -- List of produced artifacts
|
||
|
|
knowledge_refs JSON, -- Entities/relations this task touched
|
||
|
|
started_at TIMESTAMP,
|
||
|
|
completed_at TIMESTAMP
|
||
|
|
);
|
||
|
|
|
||
|
|
-- Conflict tracking
|
||
|
|
CREATE TABLE fkg_conflicts (
|
||
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
|
|
entity_id TEXT REFERENCES fkg_entities(id),
|
||
|
|
conflict_type TEXT, -- 'concurrent_write', 'contradictory_output', 'resource_contention'
|
||
|
|
agent_a TEXT,
|
||
|
|
agent_b TEXT,
|
||
|
|
resolution TEXT,
|
||
|
|
resolved_by TEXT,
|
||
|
|
resolved_at TIMESTAMP
|
||
|
|
);
|
||
|
|
|
||
|
|
-- Full-text search across everything
|
||
|
|
CREATE VIRTUAL TABLE fkg_search USING fts5(
|
||
|
|
entity_name, entity_type, properties_text,
|
||
|
|
content='fkg_entities', content_rowid='rowid'
|
||
|
|
);
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 5. INTEGRATION RECOMMENDATIONS
|
||
|
|
|
||
|
|
### 5.1 Phase 1: Foundation (Immediate — 1-2 weeks)
|
||
|
|
|
||
|
|
1. **Implement FKG SQLite database** at `~/.hermes/fleet_knowledge.db`
|
||
|
|
- Extend existing `hermes_state.py` pattern (already uses SQLite + FTS5)
|
||
|
|
- Add schema from §4.5
|
||
|
|
- Create `tools/fleet_knowledge_tool.py` with CRUD operations
|
||
|
|
|
||
|
|
2. **Create fleet agent registry** in `agent/fleet_registry.py`
|
||
|
|
- Map agent names → transport (ACP, API, subprocess)
|
||
|
|
- Store capabilities, specializations, availability status
|
||
|
|
- Integrate with existing `acp_adapter/` and `delegate_tool.py`
|
||
|
|
|
||
|
|
3. **Define message protocol** as typed Python dataclasses
|
||
|
|
- `FleetMessage`, `TaskRequest`, `TaskResponse`, `KnowledgeUpdate`
|
||
|
|
- Validation via Pydantic (already a CrewAI/dependency)
|
||
|
|
|
||
|
|
### 5.2 Phase 2: Communication Layer (2-4 weeks)
|
||
|
|
|
||
|
|
4. **Build fleet delegation on top of existing `delegate_tool.py`**
|
||
|
|
- Extend to support cross-agent delegation (not just child subagents)
|
||
|
|
- ACP transport for Claude Code (already supported via `acp_command`)
|
||
|
|
- OpenRouter/OpenAI-compatible API for Gemini, Kimi
|
||
|
|
- Reuse existing credential pool and provider resolution
|
||
|
|
|
||
|
|
5. **Implement selector-based task routing** (AutoGen SelectorGroupChat pattern)
|
||
|
|
- LLM-based agent selection based on task description + agent capabilities
|
||
|
|
- Hermes acts as the selector/orchestrator
|
||
|
|
- Simple heuristic fallback (code → Claude Code, vision → Gemini, etc.)
|
||
|
|
|
||
|
|
6. **Add typed artifact contracts** (MetaGPT pattern)
|
||
|
|
- Each task produces a typed artifact (code, analysis, docs, review)
|
||
|
|
- Artifacts stored in FKG with entity relations
|
||
|
|
- Downstream agents consume typed inputs, not free-form text
|
||
|
|
|
||
|
|
### 5.3 Phase 3: Advanced Patterns (4-6 weeks)
|
||
|
|
|
||
|
|
7. **Implement workflow DAGs** (AutoGen GraphFlow pattern)
|
||
|
|
- Pre-defined workflows as directed graphs (code review pipeline, research pipeline)
|
||
|
|
- Conditional routing based on artifact types or agent decisions
|
||
|
|
- Fan-out/fan-in for parallel execution across fleet agents
|
||
|
|
|
||
|
|
8. **Add conflict resolution**
|
||
|
|
- Detect concurrent writes to same FKG entities
|
||
|
|
- Critic agent validates contradictory outputs
|
||
|
|
- Track resolution history for learning
|
||
|
|
|
||
|
|
9. **Build consensus mechanism** for critical decisions
|
||
|
|
- Weighted voting based on agent expertise
|
||
|
|
- MoA-style aggregation (already implemented in `mixture_of_agents_tool.py`)
|
||
|
|
- Escalation to human for irreconcilable conflicts
|
||
|
|
|
||
|
|
### 5.4 Phase 4: Intelligence (6-8 weeks)
|
||
|
|
|
||
|
|
10. **Learning from delegation history**
|
||
|
|
- Track which agent performs best for which task types
|
||
|
|
- Adjust routing weights over time
|
||
|
|
- RL-style improvement of delegation decisions
|
||
|
|
|
||
|
|
11. **Fleet-level memory evolution**
|
||
|
|
- Entities and relations in FKG become the "shared brain"
|
||
|
|
- Agents contribute knowledge as they work
|
||
|
|
- Cross-agent knowledge synthesis (one agent's discovery benefits all)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 6. BENCHMARKS & PERFORMANCE CONSIDERATIONS
|
||
|
|
|
||
|
|
### 6.1 Latency Estimates
|
||
|
|
|
||
|
|
| Pattern | Overhead | Notes |
|
||
|
|
|---------|----------|-------|
|
||
|
|
| Direct delegation (current) | ~30s per subagent | Spawn + run + collect |
|
||
|
|
| ACP transport (Claude Code) | ~2-5s connection + task time | Subprocess handshake |
|
||
|
|
| API-based (Gemini/Kimi) | ~1-2s + task time | Standard HTTP |
|
||
|
|
| Selector routing | +1 LLM call (~2-5s) | For agent selection |
|
||
|
|
| GraphFlow routing | +state overhead (~100ms) | Pre-defined, no LLM call |
|
||
|
|
| FKG query | ~1-5ms | SQLite indexed query |
|
||
|
|
| MoA consensus | ~15-30s (4 parallel + 1 aggregator) | Already implemented |
|
||
|
|
|
||
|
|
### 6.2 Recommended Configuration
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
# Fleet coordination config (add to config.yaml)
|
||
|
|
fleet:
|
||
|
|
enabled: true
|
||
|
|
knowledge_db: "~/.hermes/fleet_knowledge.db"
|
||
|
|
|
||
|
|
agents:
|
||
|
|
hermes:
|
||
|
|
role: orchestrator
|
||
|
|
transport: local
|
||
|
|
claude_code:
|
||
|
|
role: code_specialist
|
||
|
|
transport: acp
|
||
|
|
acp_command: "claude"
|
||
|
|
acp_args: ["--acp", "--stdio"]
|
||
|
|
capabilities: ["code", "debugging", "architecture"]
|
||
|
|
gemini:
|
||
|
|
role: multimodal_analyst
|
||
|
|
transport: api
|
||
|
|
provider: openrouter
|
||
|
|
model: "google/gemini-3-pro-preview"
|
||
|
|
capabilities: ["vision", "analysis", "large_context"]
|
||
|
|
kimi:
|
||
|
|
role: coding_assistant
|
||
|
|
transport: api
|
||
|
|
provider: kimi-coding
|
||
|
|
capabilities: ["code", "long_context"]
|
||
|
|
|
||
|
|
delegation:
|
||
|
|
strategy: selector # selector | pipeline | graph
|
||
|
|
max_concurrent: 3
|
||
|
|
timeout_seconds: 300
|
||
|
|
|
||
|
|
consensus:
|
||
|
|
enabled: true
|
||
|
|
min_agreement: 2 # 2-of-3 for critical decisions
|
||
|
|
escalation_agent: hermes
|
||
|
|
|
||
|
|
knowledge:
|
||
|
|
auto_extract: true # Extract entities from task results
|
||
|
|
relation_confidence_threshold: 0.7
|
||
|
|
search_provider: fts5 # fts5 | vector | hybrid
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 7. EXISTING HERMES INFRASTRUCTURE TO LEVERAGE
|
||
|
|
|
||
|
|
| Component | What It Provides | Reuse For |
|
||
|
|
|-----------|-----------------|-----------|
|
||
|
|
| `delegate_tool.py` | Subagent spawning, isolated contexts | Fleet delegation transport |
|
||
|
|
| `mixture_of_agents_tool.py` | Multi-model consensus/aggregation | Fleet consensus protocol |
|
||
|
|
| `memory_tool.py` | Bounded persistent memory with atomic writes | Pattern for FKG writes |
|
||
|
|
| `acp_adapter/` | ACP server for IDE integration | Claude Code transport |
|
||
|
|
| `hermes_state.py` | SQLite + FTS5 session store | FKG database foundation |
|
||
|
|
| `tools/registry.py` | Central tool registry | Fleet knowledge tool registration |
|
||
|
|
| `agent/credential_pool.py` | Credential rotation | Multi-provider auth |
|
||
|
|
| `hermes_cli/runtime_provider.py` | Provider resolution | Fleet agent connection |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 8. KEY TAKEAWAYS
|
||
|
|
|
||
|
|
1. **GraphFlow (AutoGen) is the SOTA orchestration pattern** — DAG-based execution with conditional routing beats sequential chains and pure LLM-delegation for structured workflows
|
||
|
|
|
||
|
|
2. **Three-tier memory is essential** — Session state (volatile), knowledge graph (persistent structured), agent memory (persistent per-agent notes)
|
||
|
|
|
||
|
|
3. **Typed artifacts over free-form text** — MetaGPT's approach of standardized output contracts dramatically reduces inter-agent ambiguity
|
||
|
|
|
||
|
|
4. **Hybrid delegation beats any single pattern** — Pre-defined DAGs for known workflows, LLM selection for exploratory tasks, handoff for agent-initiated delegation
|
||
|
|
|
||
|
|
5. **Critic-in-the-loop is the practical consensus mechanism** — Don't implement Byzantine fault tolerance; a dedicated reviewer agent with clear acceptance criteria is sufficient
|
||
|
|
|
||
|
|
6. **Our existing infrastructure covers ~60% of what's needed** — delegate_tool, MoA, memory_tool, ACP adapter, and SQLite patterns are solid foundations to build on
|
||
|
|
|
||
|
|
7. **The fleet knowledge graph is the differentiator** — No existing framework has a proper shared knowledge graph that persists across agent interactions. Building this gives us a unique advantage.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
*Report generated from analysis of CrewAI v1.14.1, AutoGen v0.7.5, CAMEL v0.2.90 (installed locally), plus MetaGPT, ChatDev, and LangGraph documentation.*
|