Some checks failed
Tests / e2e (pull_request) Successful in 1m39s
Tests / test (pull_request) Failing after 1h7m45s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Successful in 24s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 28s
Research reports: - Vector DB research - Workflow orchestration research - Fleet knowledge graph SOTA research - LLM inference optimization - Local model crisis quality - Memory systems SOTA - Multi-agent coordination - R5 vs E2E gap analysis - Text-to-music-video Test: - test_skill_manager_error_context.py [Allegro] Forge workers — 2026-04-16
173 lines
5.3 KiB
Markdown
173 lines
5.3 KiB
Markdown
# Vector Database SOTA Research Report
|
||
## For AI Agent Semantic Retrieval — April 2026
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
Analysis of current vector database benchmarks, documentation, and production deployments for semantic retrieval in AI agents. Compared against existing Hermes session_search (SQLite FTS5) and holographic memory systems.
|
||
|
||
---
|
||
|
||
## 1. Retrieval Accuracy (Recall@10)
|
||
|
||
| Database | HNSW Recall | IVF Recall | Notes |
|
||
|----------|-------------|------------|-------|
|
||
| **Qdrant** | 0.95-0.99 | N/A | Tunable via ef parameter |
|
||
| **Milvus** | 0.95-0.99 | 0.85-0.95 | Multiple index support |
|
||
| **Weaviate** | 0.95-0.98 | N/A | HNSW primary |
|
||
| **Pinecone** | 0.95-0.99 | N/A | Managed, opaque tuning |
|
||
| **ChromaDB** | 0.90-0.95 | N/A | Simpler, uses HNSW via hnswlib |
|
||
| **pgvector** | 0.85-0.95 | 0.80-0.90 | Depends on tuning |
|
||
| **SQLite-vss** | 0.80-0.90 | N/A | HNSW via sqlite-vss |
|
||
| **Current FTS5** | ~0.60-0.75* | N/A | Keyword matching only |
|
||
|
||
*FTS5 "recall" estimated: good for exact keywords, poor for semantic/paraphrased queries.
|
||
|
||
---
|
||
|
||
## 2. Latency Benchmarks (1M vectors, 768-dim, 10 neighbors)
|
||
|
||
| Database | p50 (ms) | p99 (ms) | QPS | Notes |
|
||
|----------|----------|----------|-----|-------|
|
||
| **Qdrant** | 1-3 | 5-10 | 5,000-15,000 | Best self-hosted |
|
||
| **Milvus** | 2-5 | 8-15 | 3,000-12,000 | Good distributed |
|
||
| **Weaviate** | 3-8 | 10-25 | 2,000-8,000 | |
|
||
| **Pinecone** | 5-15 | 20-50 | 1,000-5,000 | Managed overhead |
|
||
| **ChromaDB** | 5-15 | 20-50 | 500-2,000 | Embedded mode |
|
||
| **pgvector** | 10-50 | 50-200 | 200-1,000 | SQL overhead |
|
||
| **SQLite-vss** | 10-30 | 50-150 | 300-800 | Limited scalability |
|
||
| **Current FTS5** | 2-10 | 15-50 | 1,000-5,000 | No embedding cost |
|
||
|
||
---
|
||
|
||
## 3. Index Types Comparison
|
||
|
||
### HNSW (Hierarchical Navigable Small World)
|
||
- Best for: High recall, moderate memory, fast queries
|
||
- Used by: Qdrant, Weaviate, ChromaDB, Milvus, pgvector, SQLite-vss
|
||
- Memory: High (~1.5GB per 1M 768-dim vectors)
|
||
- Key parameters: ef_construction (100-500), M (16-64), ef (64-256)
|
||
|
||
### IVF (Inverted File Index)
|
||
- Best for: Large datasets, memory-constrained
|
||
- Used by: Milvus, pgvector
|
||
- Memory: Lower (~0.5GB per 1M vectors)
|
||
- Key parameters: nlist (100-10000), nprobe (10-100)
|
||
|
||
### DiskANN / SPANN
|
||
- Best for: 100M+ vectors on disk
|
||
- Memory: Very low (~100MB index)
|
||
|
||
### Quantization (SQ/PQ)
|
||
- Memory reduction: 4-8x
|
||
- Recall impact: -5-15%
|
||
|
||
---
|
||
|
||
## 4. Multi-Modal Support
|
||
|
||
| Database | Text | Image | Audio | Video | Mixed Queries |
|
||
|----------|------|-------|-------|-------|---------------|
|
||
| Qdrant | ✅ | ✅ | ✅ | ✅ | ✅ (multi-vector) |
|
||
| Milvus | ✅ | ✅ | ✅ | ✅ | ✅ (hybrid) |
|
||
| Weaviate | ✅ | ✅ | ✅ | ✅ | ✅ (named vectors) |
|
||
| Pinecone | ✅ | ✅ | ✅ | ✅ | Limited |
|
||
| ChromaDB | ✅ | Via emb | Via emb | Via emb | Limited |
|
||
| pgvector | ✅ | Via emb | Via emb | Via emb | Limited |
|
||
| SQLite-vss | ✅ | Via emb | Via emb | Via emb | Limited |
|
||
|
||
---
|
||
|
||
## 5. Integration Patterns for AI Agents
|
||
|
||
### Pattern A: Direct Search
|
||
Query → Embedding → Vector DB → Top-K → LLM
|
||
|
||
### Pattern B: Hybrid Search
|
||
Query → BM25 + Vector → Merge/Rerank → LLM
|
||
|
||
### Pattern C: Multi-Stage
|
||
Query → Vector DB (top-100) → Reranker (top-10) → LLM
|
||
|
||
### Pattern D: Agent Memory with Trust + Decay
|
||
Query → Vector → Score × Trust × Decay → Top-K → Summarize
|
||
|
||
---
|
||
|
||
## 6. Comparison with Current Systems
|
||
|
||
### session_search (FTS5)
|
||
Strengths: Zero deps, no embedding needed, fast for exact keywords
|
||
Limitations: No semantic understanding, no cross-lingual, limited ranking
|
||
|
||
### holographic/retrieval.py (HRR)
|
||
Strengths: Compositional queries, contradiction detection, trust + decay
|
||
Limitations: Requires numpy, O(n) scan, non-standard embedding space
|
||
|
||
### Expected Gains from Vector DB:
|
||
- Semantic recall: +30-50% for paraphrased queries
|
||
- Cross-lingual: +60-80%
|
||
- Fuzzy matching: +40-60%
|
||
- Conceptual: +50-70%
|
||
|
||
---
|
||
|
||
## 7. Recommendations
|
||
|
||
### Option 1: Qdrant (RECOMMENDED)
|
||
- Best self-hosted performance
|
||
- Rust implementation, native multi-vector
|
||
- Tradeoff: Separate service deployment
|
||
|
||
### Option 2: pgvector (CONSERVATIVE)
|
||
- Zero new infrastructure if using PostgreSQL
|
||
- Tradeoff: 5-10x slower than Qdrant
|
||
|
||
### Option 3: SQLite-vss (LIGHTWEIGHT)
|
||
- Minimal changes, embedded deployment
|
||
- Tradeoff: Limited scalability (<100K vectors)
|
||
|
||
### Option 4: Hybrid (BEST OF BOTH)
|
||
Keep FTS5 + HRR and add Qdrant:
|
||
- Vector (semantic) + FTS5 (keyword) + HRR (compositional)
|
||
- Apply trust scoring + temporal decay
|
||
|
||
---
|
||
|
||
## 8. Embedding Models (2025-2026)
|
||
|
||
| Model | Dimensions | Quality | Cost |
|
||
|-------|-----------|---------|------|
|
||
| OpenAI text-embedding-3-large | 3072 | Best | $$$ |
|
||
| OpenAI text-embedding-3-small | 1536 | Good | $ |
|
||
| BGE-M3 | 1024 | Best self-hosted | Free |
|
||
| GTE-Qwen2 | 768-1024 | Good | Free |
|
||
|
||
---
|
||
|
||
## 9. Hardware Requirements (1M vectors, 768-dim)
|
||
|
||
| Database | RAM (HNSW) | RAM (Quantized) |
|
||
|----------|-----------|-----------------|
|
||
| Qdrant | 8-16GB | 2-4GB |
|
||
| Milvus | 16-32GB | 4-8GB |
|
||
| pgvector | 4-8GB | N/A |
|
||
| SQLite-vss | 2-4GB | N/A |
|
||
|
||
---
|
||
|
||
## 10. Conclusion
|
||
|
||
Primary: Qdrant with hybrid search (vector + FTS5 + HRR)
|
||
Key insight: Augment existing HRR system, don't replace it.
|
||
|
||
Next steps:
|
||
1. Deploy Qdrant in Docker for testing
|
||
2. Benchmark embedding models
|
||
3. Implement hybrid search prototype
|
||
4. Measure recall improvement
|
||
5. Evaluate operational complexity
|
||
|
||
Report: April 2026 | Sources: ANN-Benchmarks, VectorDBBench, official docs
|