Timmy_Foundation/hermes-agent

Fork 0

Files

Hermes Agent ff2ce95ade

Tests / e2e (pull_request) Successful in 1m39s

Details

Tests / test (pull_request) Failing after 1h7m45s

Details

Docker Build and Publish / build-and-push (pull_request) Has been skipped

Details

Contributor Attribution Check / check-attribution (pull_request) Successful in 24s

Details

Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 28s

Details

feat(research): Allegro worker deliverables — fleet research reports + skill manager test

Research reports:
- Vector DB research
- Workflow orchestration research
- Fleet knowledge graph SOTA research
- LLM inference optimization
- Local model crisis quality
- Memory systems SOTA
- Multi-agent coordination
- R5 vs E2E gap analysis
- Text-to-music-video

Test:
- test_skill_manager_error_context.py

[Allegro] Forge workers — 2026-04-16

2026-04-16 15:04:28 +00:00

15 KiB

Raw Blame History

SOTA Research: Multi-Agent Coordination & Fleet Knowledge Graphs

Date: 2026-04-14
Scope: Agent-to-agent communication, shared memory, task delegation, consensus protocols
Frameworks Analyzed: CrewAI, AutoGen, MetaGPT, ChatDev, CAMEL

1. Architecture Pattern Summary

1.1 CrewAI — Role-Based Crew Orchestration

Core Pattern: Agents organized into "Crews" with explicit roles, goals, and backstories. Tasks are assigned to agents, executed via sequential or hierarchical process flows.

Agent-to-Agent Communication:

Sequential: Agent A completes Task A → output injected into Task B's context for Agent B
Hierarchical: Manager agent delegates to worker agents, collects results, synthesizes
Context passing: Tasks can declare context: [other_tasks] — outputs from dependent tasks are automatically injected into the current task's prompt
No direct agent-to-agent messaging — communication is mediated through task outputs

Shared Memory (v2 — Unified Memory):

Memory class with remember() / recall() using vector embeddings (LanceDB/ChromaDB)
Scope-based isolation: MemoryScope provides path-based namespacing (/crew/research/agent-foo)
Composite scoring: semantic similarity (0.5) + recency (0.3) + importance (0.2)
RecallFlow: LLM-driven deep recall with adaptive query expansion
Privacy flags: Private memories only visible to the source that created them
Background saves: ThreadPoolExecutor with write barrier (drain_writes before recall)

Task Delegation:

Agent tools include Delegate Work to Co-worker and Ask Question to Co-worker
Delegation creates a new task for another agent, results come back to delegator
Depth-limited (no infinite delegation chains)

State & Checkpointing:

SqliteProvider / JsonProvider for state checkpoint persistence
CheckpointConfig with event-driven persistence
Flow state is Pydantic models with serialization

Cache:

Thread-safe in-memory tool result cache with RWLock
Key: {tool_name}-{input} → cached output

1.2 AutoGen (Microsoft) — Conversation-Centric Teams

Core Pattern: Agents communicate through shared conversation threads. A "Group Chat Manager" controls turn-taking and speaker selection.

Agent-to-Agent Communication:

Shared message thread — all agents see all messages (like a group chat)
Three team patterns:
- RoundRobinGroupChat: Fixed order cycling through participants
- SelectorGroupChat: LLM-based speaker selection with candidate filtering
- SwarmGroupChat: Handoff-based routing (agent sends HandoffMessage to next agent)
- GraphFlow (DiGraph): DAG-based execution with conditional edges, parallel fan-out, loops
- MagenticOneOrchestrator: Ledger-based orchestration with task planning, progress tracking, stall detection

Shared State:

ChatCompletionContext — manages message history per agent (can be unbounded or windowed)
ModelContext shared across agents in a team
State serialization: save_state() / load_state() for all managers
No built-in vector memory — context is purely conversational

Task Delegation:

Swarm: Agents use HandoffMessage to explicitly route control
GraphFlow: Conditional edges route based on message content (keyword or callable)
MagenticOne: Orchestrator maintains a "task ledger" (facts + plan) and dynamically re-plans on stalls

Consensus / Termination:

TerminationCondition — composable conditions (text match, max messages, source-based)
No explicit consensus protocols — termination is manager-decided

Key Insight: AutoGen's ChatCompletionContext is the closest analog to shared memory, but it's purely sequential message history, not a knowledge base.

1.3 MetaGPT — SOP-Driven Software Teams

Core Pattern: Agents follow Standard Operating Procedures (SOPs). Each agent has a defined role (Product Manager, Architect, Engineer, QA) and produces structured artifacts.

Agent-to-Agent Communication:

Publish-Subscribe via Environment: Agents publish "actions" to a shared Environment, subscribers react
Structured outputs: Each role produces specific artifact types (PRD, design doc, code, test cases)
Message routing: Environment acts as a message bus, filtering by subscriber interest

Shared Memory:

Environment class maintains shared state (project workspace)
File-based shared memory: agents write/read from a shared filesystem
SharedMemory for cross-agent context (structured data, not free-form text)

Task Delegation:

Implicit through SOP stages: PM → Architect → Engineer → QA
Each agent's output is the next agent's input
No dynamic re-delegation

Consensus:

Sequential SOP execution (no parallel agents)
QA agent can trigger re-work loops back to Engineer

1.4 ChatDev — Chat-Chain Software Development

Core Pattern: Agents follow a "chat chain" — a sequence of chat phases (designing, coding, testing, documenting). Each phase involves a pair of agents (CEO↔CTO, Programmer↔Reviewer, etc.).

Agent-to-Agent Communication:

Paired chat sessions: Two agents communicate in each phase (role-play between instructor and assistant)
Chain propagation: Phase N's output (code, design doc) becomes Phase N+1's input
No broadcast — communication is strictly pairwise within phases

Shared Memory:

Software-centric: shared code repository is the "memory"
Each phase modifies/inherits the codebase
No explicit vector memory or knowledge graph

Task Delegation:

Hardcoded phase sequence: Design → Code → Test → Document
Each phase delegates to a specific agent pair
No dynamic task re-assignment

Consensus:

Phase-level termination: when both agents agree the phase is complete
"Thought" tokens for chain-of-thought within chat

1.5 CAMEL — Role-Playing & Workforce

Core Pattern: Two primary modes:

RolePlaying: Two-agent conversation with task specification and optional critic
Workforce: Multi-agent with coordinator, task planner, and worker pool

Agent-to-Agent Communication:

RolePlaying: Structured turn-taking between assistant and user agents
Workforce: Coordinator assigns tasks via TaskChannel, workers return results
Worker types: SingleAgentWorker (single ChatAgent), RolePlayingWorker (two-agent pair)

Shared Memory / Task Channel:

TaskChannel — async queue-based task dispatch with packet tracking
- States: SENT → PROCESSING → RETURNED → ARCHIVED
- O(1) lookup by task ID, status-based filtering, assignee/publisher queues
WorkflowMemoryManager — persists workflow patterns as markdown files
- Role-based organization: workflows stored by role_identifier
- Agent-based intelligent selection: LLM picks relevant past workflows
- Versioned: metadata tracks creation time and version numbers

Task Delegation:

Coordinator agent decomposes complex tasks using LLM analysis
Tasks assigned to workers based on capability matching
Failed tasks trigger: retry, create new worker, or further decomposition
FailureHandlingConfig with configurable RecoveryStrategy

Consensus / Quality:

Quality evaluation via structured output (response format enforced)
Task dependencies tracked (worker receives dependency tasks as context)
WorkforceMetrics for tracking execution statistics

2. Key Architectural Patterns for Fleet Knowledge Graph

2.1 Communication Topology Patterns

Pattern	Used By	Description
Sequential Chain	CrewAI, ChatDev, MetaGPT	A→B→C linear flow, output feeds next
Shared Thread	AutoGen	All agents see all messages
Publish-Subscribe	MetaGPT	Environment-based message bus
Paired Chat	ChatDev, CAMEL	Two-agent conversation pairs
Handoff Routing	AutoGen Swarm	Agent explicitly names next speaker
DAG Graph	AutoGen GraphFlow	Conditional edges, parallel, loops
Ledger Orchestration	AutoGen MagenticOne	Maintains task ledger, re-plans
Task Channel	CAMEL	Async queue with packet states

2.2 Shared State Patterns

Pattern	Used By	Description
Vector Memory	CrewAI	Embeddings + scope-based namespacing
Message History	AutoGen	Sequential conversation context
File System	MetaGPT, ChatDev	Agents read/write shared files
Task Channel	CAMEL	Async packet-based task dispatch
Workflow Files	CAMEL	Markdown-based workflow memory
Tool Cache	CrewAI	In-memory RWLock tool result cache
State Checkpoint	CrewAI, AutoGen	Serialized Pydantic/SQLite checkpoints

2.3 Task Delegation Patterns

Pattern	Used By	Description
Role Assignment	CrewAI	Fixed agent per task
Manager Delegation	CrewAI Hierarchical	Manager assigns tasks dynamically
Speaker Selection	AutoGen Selector	LLM picks next agent
Handoff	AutoGen Swarm	Agent explicitly transfers control
SOP Routing	MetaGPT	Stage-based implicit delegation
Coordinator	CAMEL Workforce	LLM-based task decomposition + assignment
Dynamic Worker Creation	CAMEL Workforce	Create new workers on failure

2.4 Conflict Resolution Patterns

Pattern	Used By	Description
Manager Arbitration	CrewAI Hierarchical	Manager resolves conflicts
Critic-in-the-loop	CAMEL	Critic agent evaluates and selects
Quality Gate	CAMEL Workforce	Structured quality evaluation
Termination Conditions	AutoGen	Composable stop conditions
Stall Detection	AutoGen MagenticOne	Re-plans when progress stalls

3. Recommendations for Hermes Fleet Knowledge Graph

3.1 Architecture: Hybrid Graph + Memory

Based on the SOTA analysis, the optimal fleet knowledge graph should combine:

CrewAI's scoped memory for hierarchical knowledge organization
- Path-based namespaces: /fleet/{fleet_id}/agent/{agent_id}/diary
- Composite scoring: semantic + recency + importance
- Background writes with read barriers
CAMEL's TaskChannel for task dispatch and tracking
- Packet states (SENT → PROCESSING → RETURNED → ARCHIVED)
- O(1) lookup by task ID
- Assignee/publisher tracking
AutoGen's DiGraph for execution flow definition
- DAG with conditional edges for complex workflows
- Parallel fan-out for independent tasks
- Activation conditions (all vs any) for synchronization points
AutoGen MagenticOne's ledger for shared task context
- Maintained facts, plan, and progress ledger
- Dynamic re-planning on stalls

3.2 Fleet Knowledge Graph Schema

/fleet/{fleet_id}/
  ├── shared/              # Shared knowledge (all agents read)
  │   ├── facts/           # Known facts, constraints
  │   ├── decisions/       # Record of decisions made
  │   └── context/         # Active task context
  ├── agent/{agent_id}/
  │   ├── diary/           # Agent's personal experience log
  │   ├── capabilities/    # What this agent can do
  │   └── state/           # Current task state
  ├── tasks/
  │   ├── {task_id}/       # Task metadata, dependencies, status
  │   └── graph/           # DAG definition for task dependencies
  └── consensus/
      ├── proposals/       # Pending proposals
      └── decisions/       # Resolved consensus decisions

3.3 Key Design Decisions

Diary System (Agent Memory):
- Each agent writes to its own scoped memory after every significant action
- LLM-analyzed importance scoring (like CrewAI's unified memory)
- Cross-agent recall: agents can query other agents' diaries for relevant experiences
- Decay: old low-importance memories expire
Shared State (Fleet Knowledge):
- SQLite-backed (like Hermes' existing state.db) with FTS5 search
- Hierarchical scopes (like CrewAI's MemoryScope)
- Write-ahead log for concurrent access
- Read barriers before queries (like CrewAI's drain_writes)
Task Delegation:
- Coordinator pattern (like CAMEL's Workforce)
- Task decomposition via LLM
- Failed task → retry, reassign, or decompose
- Max depth limit (like Hermes' existing MAX_DEPTH=2)
Consensus Protocol:
- Proposal-based: agent proposes, others vote/acknowledge
- Timeout-based fallback: if no response within N seconds, proceed
- Manager override: designated manager can break ties
- Simple majority for non-critical, unanimity for critical decisions
Conflict Resolution:
- Last-write-wins for non-critical state
- Optimistic locking with version numbers
- Manager arbitration for task assignment conflicts
- Quality gates (like CAMEL) for output validation

3.4 Integration with Existing Hermes Architecture

Hermes already has strong foundations:

Delegation system (delegate_tool.py): Isolated child agents, parallel execution, depth limits
State DB (hermes_state.py): SQLite + FTS5, WAL mode, session tracking, message history
Credential pools: Shared credentials with rotation

The fleet knowledge graph should extend these patterns:

Session DB → Fleet DB: Add tables for fleet metadata, agent registrations, task graphs
Memory tool → Fleet Memory: Scoped vector memory shared across fleet agents
Delegate tool → Fleet Delegation: Task channel with persistence, quality evaluation
New: Consensus module: Proposal/vote protocol with timeout handling

4. Reference Implementations

Component	Best Reference	Key Takeaway
Scoped Memory	CrewAI `Memory` + `MemoryScope`	Path-based namespaces, composite scoring, background writes
Task Dispatch	CAMEL `TaskChannel`	Packet-based with state machine, O(1) lookup
Execution DAG	AutoGen `DiGraphBuilder`	Fluent builder, conditional edges, activation groups
Orchestration	AutoGen `MagenticOneOrchestrator`	Ledger-based planning, stall detection, re-planning
Agent Communication	AutoGen `SelectorGroupChat`	LLM-based speaker selection, shared message thread
Quality Evaluation	CAMEL Workforce	Structured output for quality scoring
Workflow Memory	CAMEL `WorkflowMemoryManager`	Markdown-based, role-organized, versioned
State Checkpoint	CrewAI `SqliteProvider`	JSONB checkpoints, WAL mode
Tool Cache	CrewAI `CacheHandler`	RWLock-based concurrent tool result cache

5. Open Questions

Graph vs Vector for knowledge: Should fleet knowledge use a proper graph DB (e.g., Neo4j) or stick with vector + SQLite?
- Recommendation: Start with SQLite + vectors (existing stack), add graph later if needed
Real-time vs Batch: Should agents receive updates in real-time or batched?
- Recommendation: Event-driven for critical updates, batched for diary entries
Security model: How should cross-agent access be controlled?
- Recommendation: Role-based ACLs on scope paths, similar to CrewAI's privacy flags
Scalability: How many agents can a single fleet support?
- Recommendation: Start with 10-agent fleets, optimize SQLite concurrency first

15 KiB Raw Blame History

SOTA Research: Multi-Agent Coordination & Fleet Knowledge Graphs

1. Architecture Pattern Summary

1.1 CrewAI — Role-Based Crew Orchestration

1.2 AutoGen (Microsoft) — Conversation-Centric Teams

1.3 MetaGPT — SOP-Driven Software Teams

1.4 ChatDev — Chat-Chain Software Development

1.5 CAMEL — Role-Playing & Workforce

2. Key Architectural Patterns for Fleet Knowledge Graph

2.1 Communication Topology Patterns

2.2 Shared State Patterns

2.3 Task Delegation Patterns

2.4 Conflict Resolution Patterns

3. Recommendations for Hermes Fleet Knowledge Graph

3.1 Architecture: Hybrid Graph + Memory

3.2 Fleet Knowledge Graph Schema

3.3 Key Design Decisions

3.4 Integration with Existing Hermes Architecture

4. Reference Implementations

5. Open Questions

15 KiB

Raw Blame History