Research reports: - Vector DB research - Workflow orchestration research - Fleet knowledge graph SOTA research - LLM inference optimization - Local model crisis quality - Memory systems SOTA - Multi-agent coordination - R5 vs E2E gap analysis - Text-to-music-video Test: - test_skill_manager_error_context.py [Allegro] Forge workers — 2026-04-16
5.3 KiB
Vector Database SOTA Research Report
For AI Agent Semantic Retrieval — April 2026
Executive Summary
Analysis of current vector database benchmarks, documentation, and production deployments for semantic retrieval in AI agents. Compared against existing Hermes session_search (SQLite FTS5) and holographic memory systems.
1. Retrieval Accuracy (Recall@10)
| Database | HNSW Recall | IVF Recall | Notes |
|---|---|---|---|
| Qdrant | 0.95-0.99 | N/A | Tunable via ef parameter |
| Milvus | 0.95-0.99 | 0.85-0.95 | Multiple index support |
| Weaviate | 0.95-0.98 | N/A | HNSW primary |
| Pinecone | 0.95-0.99 | N/A | Managed, opaque tuning |
| ChromaDB | 0.90-0.95 | N/A | Simpler, uses HNSW via hnswlib |
| pgvector | 0.85-0.95 | 0.80-0.90 | Depends on tuning |
| SQLite-vss | 0.80-0.90 | N/A | HNSW via sqlite-vss |
| Current FTS5 | ~0.60-0.75* | N/A | Keyword matching only |
*FTS5 "recall" estimated: good for exact keywords, poor for semantic/paraphrased queries.
2. Latency Benchmarks (1M vectors, 768-dim, 10 neighbors)
| Database | p50 (ms) | p99 (ms) | QPS | Notes |
|---|---|---|---|---|
| Qdrant | 1-3 | 5-10 | 5,000-15,000 | Best self-hosted |
| Milvus | 2-5 | 8-15 | 3,000-12,000 | Good distributed |
| Weaviate | 3-8 | 10-25 | 2,000-8,000 | |
| Pinecone | 5-15 | 20-50 | 1,000-5,000 | Managed overhead |
| ChromaDB | 5-15 | 20-50 | 500-2,000 | Embedded mode |
| pgvector | 10-50 | 50-200 | 200-1,000 | SQL overhead |
| SQLite-vss | 10-30 | 50-150 | 300-800 | Limited scalability |
| Current FTS5 | 2-10 | 15-50 | 1,000-5,000 | No embedding cost |
3. Index Types Comparison
HNSW (Hierarchical Navigable Small World)
- Best for: High recall, moderate memory, fast queries
- Used by: Qdrant, Weaviate, ChromaDB, Milvus, pgvector, SQLite-vss
- Memory: High (~1.5GB per 1M 768-dim vectors)
- Key parameters: ef_construction (100-500), M (16-64), ef (64-256)
IVF (Inverted File Index)
- Best for: Large datasets, memory-constrained
- Used by: Milvus, pgvector
- Memory: Lower (~0.5GB per 1M vectors)
- Key parameters: nlist (100-10000), nprobe (10-100)
DiskANN / SPANN
- Best for: 100M+ vectors on disk
- Memory: Very low (~100MB index)
Quantization (SQ/PQ)
- Memory reduction: 4-8x
- Recall impact: -5-15%
4. Multi-Modal Support
| Database | Text | Image | Audio | Video | Mixed Queries |
|---|---|---|---|---|---|
| Qdrant | ✅ | ✅ | ✅ | ✅ | ✅ (multi-vector) |
| Milvus | ✅ | ✅ | ✅ | ✅ | ✅ (hybrid) |
| Weaviate | ✅ | ✅ | ✅ | ✅ | ✅ (named vectors) |
| Pinecone | ✅ | ✅ | ✅ | ✅ | Limited |
| ChromaDB | ✅ | Via emb | Via emb | Via emb | Limited |
| pgvector | ✅ | Via emb | Via emb | Via emb | Limited |
| SQLite-vss | ✅ | Via emb | Via emb | Via emb | Limited |
5. Integration Patterns for AI Agents
Pattern A: Direct Search
Query → Embedding → Vector DB → Top-K → LLM
Pattern B: Hybrid Search
Query → BM25 + Vector → Merge/Rerank → LLM
Pattern C: Multi-Stage
Query → Vector DB (top-100) → Reranker (top-10) → LLM
Pattern D: Agent Memory with Trust + Decay
Query → Vector → Score × Trust × Decay → Top-K → Summarize
6. Comparison with Current Systems
session_search (FTS5)
Strengths: Zero deps, no embedding needed, fast for exact keywords Limitations: No semantic understanding, no cross-lingual, limited ranking
holographic/retrieval.py (HRR)
Strengths: Compositional queries, contradiction detection, trust + decay Limitations: Requires numpy, O(n) scan, non-standard embedding space
Expected Gains from Vector DB:
- Semantic recall: +30-50% for paraphrased queries
- Cross-lingual: +60-80%
- Fuzzy matching: +40-60%
- Conceptual: +50-70%
7. Recommendations
Option 1: Qdrant (RECOMMENDED)
- Best self-hosted performance
- Rust implementation, native multi-vector
- Tradeoff: Separate service deployment
Option 2: pgvector (CONSERVATIVE)
- Zero new infrastructure if using PostgreSQL
- Tradeoff: 5-10x slower than Qdrant
Option 3: SQLite-vss (LIGHTWEIGHT)
- Minimal changes, embedded deployment
- Tradeoff: Limited scalability (<100K vectors)
Option 4: Hybrid (BEST OF BOTH)
Keep FTS5 + HRR and add Qdrant:
- Vector (semantic) + FTS5 (keyword) + HRR (compositional)
- Apply trust scoring + temporal decay
8. Embedding Models (2025-2026)
| Model | Dimensions | Quality | Cost |
|---|---|---|---|
| OpenAI text-embedding-3-large | 3072 | Best | $$$ |
| OpenAI text-embedding-3-small | 1536 | Good | $ |
| BGE-M3 | 1024 | Best self-hosted | Free |
| GTE-Qwen2 | 768-1024 | Good | Free |
9. Hardware Requirements (1M vectors, 768-dim)
| Database | RAM (HNSW) | RAM (Quantized) |
|---|---|---|
| Qdrant | 8-16GB | 2-4GB |
| Milvus | 16-32GB | 4-8GB |
| pgvector | 4-8GB | N/A |
| SQLite-vss | 2-4GB | N/A |
10. Conclusion
Primary: Qdrant with hybrid search (vector + FTS5 + HRR) Key insight: Augment existing HRR system, don't replace it.
Next steps:
- Deploy Qdrant in Docker for testing
- Benchmark embedding models
- Implement hybrid search prototype
- Measure recall improvement
- Evaluate operational complexity
Report: April 2026 | Sources: ANN-Benchmarks, VectorDBBench, official docs