Timmy_Foundation/hermes-agent

Fork 0

Files

Hermes Agent ff2ce95ade

Tests / e2e (pull_request) Successful in 1m39s

Details

Tests / test (pull_request) Failing after 1h7m45s

Details

Docker Build and Publish / build-and-push (pull_request) Has been skipped

Details

Contributor Attribution Check / check-attribution (pull_request) Successful in 24s

Details

Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 28s

Details

feat(research): Allegro worker deliverables — fleet research reports + skill manager test

Research reports:
- Vector DB research
- Workflow orchestration research
- Fleet knowledge graph SOTA research
- LLM inference optimization
- Local model crisis quality
- Memory systems SOTA
- Multi-agent coordination
- R5 vs E2E gap analysis
- Text-to-music-video

Test:
- test_skill_manager_error_context.py

[Allegro] Forge workers — 2026-04-16

2026-04-16 15:04:28 +00:00

8.0 KiB

Raw Blame History

SOTA Research: Structured Memory Systems for AI Agents

Date: 2026-04-14 Purpose: Inform MemPalace integration for Hermes Agent

1. Landscape Overview

System	Type	License	Retrieval Method	Storage
MemPalace	Local verbatim store	Open Source	ChromaDB vector + metadata filtering (wings/rooms)	ChromaDB + filesystem
Mem0	Managed memory layer	Apache 2.0	Vector DB + LLM extraction/consolidation	Qdrant/Chroma/Pinecone + graph
MemGPT/Letta	OS-inspired memory tiers	MIT	Hierarchical recall (core/recall/archival)	In-context + DB archival
Zep	Context engineering platform	Commercial	Temporal knowledge graph (Graphiti) + vector	Graph DB + vector
LangMem	Memory toolkit (LangChain)	MIT	LangGraph store (semantic search)	Postgres/in-memory store
Engram	CLI binary (Rust)	MIT	Hybrid Gemini Embed + FTS5 + RRF	SQLite FTS5 + embeddings

2. Benchmark Comparison (LongMemEval)

LongMemEval is the primary academic benchmark for long-term memory retrieval. 500 questions, 96% distractors.

System	LongMemEval R@5	LongMemEval R@1	API Required	Notes
MemPalace (raw)	96.6%	—	None	Zero API calls, pure ChromaDB
MemPalace (hybrid+Haiku rerank)	100% (500/500)	—	Optional	Reranking adds cost
MemPalace (AAAK compression)	84.2%	—	None	Lossy, 12.4pt regression vs raw
Engram (hybrid)	99.0%	91.0%	Gemini API	R@5 beats MemPalace by 0.6pt
Engram (+Cohere rerank)	98.0%	93.0%	Gemini+Cohere	First 100 Qs only
Mem0	~85%	—	Yes	On LOCOMO benchmark
Zep	~85%	—	Yes	Cloud service
Mastra	94.87%	—	Yes (GPT)	—
Supermemory ASMR	~99%	—	Yes	—

LOCOMO Benchmark (Mem0's paper, arXiv:2504.19413)

Method	Accuracy	Median Search Latency	p95 Search Latency	End-to-End p95	Tokens/Convo
Full Context	72.9%	—	—	17.12s	~26,000
Standard RAG	61.0%	0.70s	0.26s	—	—
OpenAI Memory	52.9%	—	—	—	—
Mem0	66.9%	0.20s	0.15s	1.44s	~1,800
Mem0ᵍ (graph)	68.4%	0.66s	0.48s	2.59s	—

Key Mem0 claims: +26% accuracy over OpenAI Memory, 91% lower p95 latency vs full-context, 90% token savings.

3. Retrieval Latency

System	Reported Latency	Notes
Mem0	0.20s median search, 0.71s end-to-end	LOCOMO benchmark
Zep	<200ms claimed	Cloud service, sub-200ms SLA
MemPalace	~seconds for ChromaDB search	Local, depends on corpus size; raw mode is fast
Engram	Fast (Rust binary)	No published latency numbers
LangMem	Depends on underlying store	In-memory fast, Postgres slower
MemGPT/Letta	Variable by tier	Core (in-context) is instant; archival has DB latency

Target for Hermes: <100ms is achievable with local ChromaDB + small embedding model (all-MiniLM-L6-v2, ~50MB).

4. Compression Techniques

System	Technique	Compression Ratio	Fidelity Impact
MemPalace AAAK	Lossy abbreviation dialect (entity codes, truncation)	Claimed ~30x (disputed)	12.4pt R@5 regression (96.6% → 84.2%)
Mem0	LLM extraction → structured facts	~14x token reduction (26K → 1.8K)	6pt accuracy loss vs full-context
MemGPT	Hierarchical summarization + eviction	Variable	Depends on tier management
Zep	Graph compression + temporal invalidation	N/A	Maintains temporal accuracy
Engram	None (stores raw)	1x	No loss
LangMem	Background consolidation via LLM	Variable	Depends on LLM quality

Key insight: MemPalace's raw mode (no compression) achieves the best retrieval scores. Compression trades fidelity for token density. For Hermes, raw storage + semantic search is the safest starting point.

5. Architecture Patterns

MemPalace (recommended for Hermes integration)

Hierarchical: Wings (scope: global/workspace) → Rooms (priority: explicit/implicit)
Dual-store: SQLite for canonical data, ChromaDB for vector search
Verbatim storage: No LLM extraction, raw conversation storage
Explicit-first ranking: User instructions always surface above auto-extracted context
Workspace isolation: Memories scoped per project

Mem0 (graph-enhanced)

Two-phase pipeline: Extraction → Update
LLM-driven: Uses LLM to extract candidate memories, decide ADD/UPDATE/DELETE/NOOP
Graph variant (Mem0ᵍ): Entity extraction → relationship graph → conflict detection → temporal updates
Multi-level: User, Session, Agent state

Letta/MemGPT (OS-inspired)

Memory tiers: Core (in-context), Recall (searchable), Archival (deep storage)
Self-editing: Agent manages its own memory via function calls
Interrupts: Control flow between agent and user

Zep (knowledge graph)

Temporal knowledge graph: Facts have valid_at/invalid_at timestamps
Graph RAG: Relationship-aware retrieval
Powered by Graphiti: Open-source temporal KG framework

6. Integration Patterns for Hermes

Current Hermes Memory (memory_tool.py)

File-backed: MEMORY.md + USER.md
Delimiter-based entries (§)
Frozen snapshot in system prompt
No semantic search

MemPalace Plugin (hermes_memorypalace)

Implements MemoryProvider ABC
ChromaDB + SQLite dual-store
Lifecycle hooks: initialize, system_prompt_block, prefetch, sync_turn
Tools: mempalace_remember_explicit, mempalace_store_implicit, mempalace_recall
Local embedding model (all-MiniLM-L6-v2)

Recommended Integration Approach

Keep MEMORY.md/USER.md as L0 (always-loaded baseline)
Add MemPalace as L1 (semantic search layer)
Prefetch on each turn: Run vector search before response generation
Background sync: Store conversation turns as implicit context
Workspace scoping: Isolate memories per project

7. Critical Caveats

Retrieval ≠ Answer accuracy: Engram team showed R@5 of 98.4% (MemPalace) can yield only 17% correct answers when an LLM actually tries to answer. The retrieval-to-accuracy gap is the real bottleneck.
MemPalace's 96.6% is retrieval only: Not end-to-end QA accuracy. End-to-end numbers are much lower (~17-40% depending on question difficulty).
AAAK compression is lossy: 12.4pt regression. Use raw mode for accuracy-critical work.
Mem0's LOCOMO numbers are on a different benchmark: Not directly comparable to LongMemEval scores.
Latency depends heavily on corpus size and hardware: Local ChromaDB on M2 Ultra runs fast; older hardware may not meet <100ms targets.

8. Recommendations for Hermes MemPalace Integration

Metric	Target	Achievable?	Approach
Retrieval latency	<100ms	Yes	Local ChromaDB + small model, pre-indexed
Retrieval accuracy (R@5)	>95%	Yes	Raw verbatim mode, no compression
Token efficiency	<2000 tokens/convo	Yes	Selective retrieval, not full-context
Workspace isolation	Per-project	Yes	Wing-based scoping
Zero cloud dependency	100% local	Yes	all-MiniLM-L6-v2 runs offline

Priority: Integrate existing hermes_memorypalace plugin with raw mode. Defer AAAK compression. Focus on retrieval latency and explicit-first ranking.

Sources

Mem0 paper: arXiv:2504.19413
MemGPT paper: arXiv:2310.08560
MemPalace repo: github.com/MemPalace/mempalace
Engram benchmarks: github.com/199-biotechnologies/engram-2
Hermes MemPalace plugin: github.com/neilharding/hermes_memorypalace
LOCOMO benchmark results from mem0.ai/research
LongMemEval: huggingface.co/datasets/xiaowu0162/longmemeval-cleaned

8.0 KiB Raw Blame History

SOTA Research: Structured Memory Systems for AI Agents

1. Landscape Overview

2. Benchmark Comparison (LongMemEval)

LOCOMO Benchmark (Mem0's paper, arXiv:2504.19413)

3. Retrieval Latency

4. Compression Techniques

5. Architecture Patterns

MemPalace (recommended for Hermes integration)

Mem0 (graph-enhanced)

Letta/MemGPT (OS-inspired)

Zep (knowledge graph)

6. Integration Patterns for Hermes

Current Hermes Memory (memory_tool.py)

MemPalace Plugin (hermes_memorypalace)

Recommended Integration Approach

7. Critical Caveats

8. Recommendations for Hermes MemPalace Integration

Sources

8.0 KiB

Raw Blame History