173 lines
5.3 KiB
Markdown
173 lines
5.3 KiB
Markdown
|
|
# Vector Database SOTA Research Report
|
|||
|
|
## For AI Agent Semantic Retrieval — April 2026
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Executive Summary
|
|||
|
|
|
|||
|
|
Analysis of current vector database benchmarks, documentation, and production deployments for semantic retrieval in AI agents. Compared against existing Hermes session_search (SQLite FTS5) and holographic memory systems.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Retrieval Accuracy (Recall@10)
|
|||
|
|
|
|||
|
|
| Database | HNSW Recall | IVF Recall | Notes |
|
|||
|
|
|----------|-------------|------------|-------|
|
|||
|
|
| **Qdrant** | 0.95-0.99 | N/A | Tunable via ef parameter |
|
|||
|
|
| **Milvus** | 0.95-0.99 | 0.85-0.95 | Multiple index support |
|
|||
|
|
| **Weaviate** | 0.95-0.98 | N/A | HNSW primary |
|
|||
|
|
| **Pinecone** | 0.95-0.99 | N/A | Managed, opaque tuning |
|
|||
|
|
| **ChromaDB** | 0.90-0.95 | N/A | Simpler, uses HNSW via hnswlib |
|
|||
|
|
| **pgvector** | 0.85-0.95 | 0.80-0.90 | Depends on tuning |
|
|||
|
|
| **SQLite-vss** | 0.80-0.90 | N/A | HNSW via sqlite-vss |
|
|||
|
|
| **Current FTS5** | ~0.60-0.75* | N/A | Keyword matching only |
|
|||
|
|
|
|||
|
|
*FTS5 "recall" estimated: good for exact keywords, poor for semantic/paraphrased queries.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. Latency Benchmarks (1M vectors, 768-dim, 10 neighbors)
|
|||
|
|
|
|||
|
|
| Database | p50 (ms) | p99 (ms) | QPS | Notes |
|
|||
|
|
|----------|----------|----------|-----|-------|
|
|||
|
|
| **Qdrant** | 1-3 | 5-10 | 5,000-15,000 | Best self-hosted |
|
|||
|
|
| **Milvus** | 2-5 | 8-15 | 3,000-12,000 | Good distributed |
|
|||
|
|
| **Weaviate** | 3-8 | 10-25 | 2,000-8,000 | |
|
|||
|
|
| **Pinecone** | 5-15 | 20-50 | 1,000-5,000 | Managed overhead |
|
|||
|
|
| **ChromaDB** | 5-15 | 20-50 | 500-2,000 | Embedded mode |
|
|||
|
|
| **pgvector** | 10-50 | 50-200 | 200-1,000 | SQL overhead |
|
|||
|
|
| **SQLite-vss** | 10-30 | 50-150 | 300-800 | Limited scalability |
|
|||
|
|
| **Current FTS5** | 2-10 | 15-50 | 1,000-5,000 | No embedding cost |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. Index Types Comparison
|
|||
|
|
|
|||
|
|
### HNSW (Hierarchical Navigable Small World)
|
|||
|
|
- Best for: High recall, moderate memory, fast queries
|
|||
|
|
- Used by: Qdrant, Weaviate, ChromaDB, Milvus, pgvector, SQLite-vss
|
|||
|
|
- Memory: High (~1.5GB per 1M 768-dim vectors)
|
|||
|
|
- Key parameters: ef_construction (100-500), M (16-64), ef (64-256)
|
|||
|
|
|
|||
|
|
### IVF (Inverted File Index)
|
|||
|
|
- Best for: Large datasets, memory-constrained
|
|||
|
|
- Used by: Milvus, pgvector
|
|||
|
|
- Memory: Lower (~0.5GB per 1M vectors)
|
|||
|
|
- Key parameters: nlist (100-10000), nprobe (10-100)
|
|||
|
|
|
|||
|
|
### DiskANN / SPANN
|
|||
|
|
- Best for: 100M+ vectors on disk
|
|||
|
|
- Memory: Very low (~100MB index)
|
|||
|
|
|
|||
|
|
### Quantization (SQ/PQ)
|
|||
|
|
- Memory reduction: 4-8x
|
|||
|
|
- Recall impact: -5-15%
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Multi-Modal Support
|
|||
|
|
|
|||
|
|
| Database | Text | Image | Audio | Video | Mixed Queries |
|
|||
|
|
|----------|------|-------|-------|-------|---------------|
|
|||
|
|
| Qdrant | ✅ | ✅ | ✅ | ✅ | ✅ (multi-vector) |
|
|||
|
|
| Milvus | ✅ | ✅ | ✅ | ✅ | ✅ (hybrid) |
|
|||
|
|
| Weaviate | ✅ | ✅ | ✅ | ✅ | ✅ (named vectors) |
|
|||
|
|
| Pinecone | ✅ | ✅ | ✅ | ✅ | Limited |
|
|||
|
|
| ChromaDB | ✅ | Via emb | Via emb | Via emb | Limited |
|
|||
|
|
| pgvector | ✅ | Via emb | Via emb | Via emb | Limited |
|
|||
|
|
| SQLite-vss | ✅ | Via emb | Via emb | Via emb | Limited |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. Integration Patterns for AI Agents
|
|||
|
|
|
|||
|
|
### Pattern A: Direct Search
|
|||
|
|
Query → Embedding → Vector DB → Top-K → LLM
|
|||
|
|
|
|||
|
|
### Pattern B: Hybrid Search
|
|||
|
|
Query → BM25 + Vector → Merge/Rerank → LLM
|
|||
|
|
|
|||
|
|
### Pattern C: Multi-Stage
|
|||
|
|
Query → Vector DB (top-100) → Reranker (top-10) → LLM
|
|||
|
|
|
|||
|
|
### Pattern D: Agent Memory with Trust + Decay
|
|||
|
|
Query → Vector → Score × Trust × Decay → Top-K → Summarize
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Comparison with Current Systems
|
|||
|
|
|
|||
|
|
### session_search (FTS5)
|
|||
|
|
Strengths: Zero deps, no embedding needed, fast for exact keywords
|
|||
|
|
Limitations: No semantic understanding, no cross-lingual, limited ranking
|
|||
|
|
|
|||
|
|
### holographic/retrieval.py (HRR)
|
|||
|
|
Strengths: Compositional queries, contradiction detection, trust + decay
|
|||
|
|
Limitations: Requires numpy, O(n) scan, non-standard embedding space
|
|||
|
|
|
|||
|
|
### Expected Gains from Vector DB:
|
|||
|
|
- Semantic recall: +30-50% for paraphrased queries
|
|||
|
|
- Cross-lingual: +60-80%
|
|||
|
|
- Fuzzy matching: +40-60%
|
|||
|
|
- Conceptual: +50-70%
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. Recommendations
|
|||
|
|
|
|||
|
|
### Option 1: Qdrant (RECOMMENDED)
|
|||
|
|
- Best self-hosted performance
|
|||
|
|
- Rust implementation, native multi-vector
|
|||
|
|
- Tradeoff: Separate service deployment
|
|||
|
|
|
|||
|
|
### Option 2: pgvector (CONSERVATIVE)
|
|||
|
|
- Zero new infrastructure if using PostgreSQL
|
|||
|
|
- Tradeoff: 5-10x slower than Qdrant
|
|||
|
|
|
|||
|
|
### Option 3: SQLite-vss (LIGHTWEIGHT)
|
|||
|
|
- Minimal changes, embedded deployment
|
|||
|
|
- Tradeoff: Limited scalability (<100K vectors)
|
|||
|
|
|
|||
|
|
### Option 4: Hybrid (BEST OF BOTH)
|
|||
|
|
Keep FTS5 + HRR and add Qdrant:
|
|||
|
|
- Vector (semantic) + FTS5 (keyword) + HRR (compositional)
|
|||
|
|
- Apply trust scoring + temporal decay
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. Embedding Models (2025-2026)
|
|||
|
|
|
|||
|
|
| Model | Dimensions | Quality | Cost |
|
|||
|
|
|-------|-----------|---------|------|
|
|||
|
|
| OpenAI text-embedding-3-large | 3072 | Best | $$$ |
|
|||
|
|
| OpenAI text-embedding-3-small | 1536 | Good | $ |
|
|||
|
|
| BGE-M3 | 1024 | Best self-hosted | Free |
|
|||
|
|
| GTE-Qwen2 | 768-1024 | Good | Free |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 9. Hardware Requirements (1M vectors, 768-dim)
|
|||
|
|
|
|||
|
|
| Database | RAM (HNSW) | RAM (Quantized) |
|
|||
|
|
|----------|-----------|-----------------|
|
|||
|
|
| Qdrant | 8-16GB | 2-4GB |
|
|||
|
|
| Milvus | 16-32GB | 4-8GB |
|
|||
|
|
| pgvector | 4-8GB | N/A |
|
|||
|
|
| SQLite-vss | 2-4GB | N/A |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 10. Conclusion
|
|||
|
|
|
|||
|
|
Primary: Qdrant with hybrid search (vector + FTS5 + HRR)
|
|||
|
|
Key insight: Augment existing HRR system, don't replace it.
|
|||
|
|
|
|||
|
|
Next steps:
|
|||
|
|
1. Deploy Qdrant in Docker for testing
|
|||
|
|
2. Benchmark embedding models
|
|||
|
|
3. Implement hybrid search prototype
|
|||
|
|
4. Measure recall improvement
|
|||
|
|
5. Evaluate operational complexity
|
|||
|
|
|
|||
|
|
Report: April 2026 | Sources: ANN-Benchmarks, VectorDBBench, official docs
|