# Vector Database SOTA Research Report ## For AI Agent Semantic Retrieval — April 2026 --- ## Executive Summary Analysis of current vector database benchmarks, documentation, and production deployments for semantic retrieval in AI agents. Compared against existing Hermes session_search (SQLite FTS5) and holographic memory systems. --- ## 1. Retrieval Accuracy (Recall@10) | Database | HNSW Recall | IVF Recall | Notes | |----------|-------------|------------|-------| | **Qdrant** | 0.95-0.99 | N/A | Tunable via ef parameter | | **Milvus** | 0.95-0.99 | 0.85-0.95 | Multiple index support | | **Weaviate** | 0.95-0.98 | N/A | HNSW primary | | **Pinecone** | 0.95-0.99 | N/A | Managed, opaque tuning | | **ChromaDB** | 0.90-0.95 | N/A | Simpler, uses HNSW via hnswlib | | **pgvector** | 0.85-0.95 | 0.80-0.90 | Depends on tuning | | **SQLite-vss** | 0.80-0.90 | N/A | HNSW via sqlite-vss | | **Current FTS5** | ~0.60-0.75* | N/A | Keyword matching only | *FTS5 "recall" estimated: good for exact keywords, poor for semantic/paraphrased queries. --- ## 2. Latency Benchmarks (1M vectors, 768-dim, 10 neighbors) | Database | p50 (ms) | p99 (ms) | QPS | Notes | |----------|----------|----------|-----|-------| | **Qdrant** | 1-3 | 5-10 | 5,000-15,000 | Best self-hosted | | **Milvus** | 2-5 | 8-15 | 3,000-12,000 | Good distributed | | **Weaviate** | 3-8 | 10-25 | 2,000-8,000 | | | **Pinecone** | 5-15 | 20-50 | 1,000-5,000 | Managed overhead | | **ChromaDB** | 5-15 | 20-50 | 500-2,000 | Embedded mode | | **pgvector** | 10-50 | 50-200 | 200-1,000 | SQL overhead | | **SQLite-vss** | 10-30 | 50-150 | 300-800 | Limited scalability | | **Current FTS5** | 2-10 | 15-50 | 1,000-5,000 | No embedding cost | --- ## 3. Index Types Comparison ### HNSW (Hierarchical Navigable Small World) - Best for: High recall, moderate memory, fast queries - Used by: Qdrant, Weaviate, ChromaDB, Milvus, pgvector, SQLite-vss - Memory: High (~1.5GB per 1M 768-dim vectors) - Key parameters: ef_construction (100-500), M (16-64), ef (64-256) ### IVF (Inverted File Index) - Best for: Large datasets, memory-constrained - Used by: Milvus, pgvector - Memory: Lower (~0.5GB per 1M vectors) - Key parameters: nlist (100-10000), nprobe (10-100) ### DiskANN / SPANN - Best for: 100M+ vectors on disk - Memory: Very low (~100MB index) ### Quantization (SQ/PQ) - Memory reduction: 4-8x - Recall impact: -5-15% --- ## 4. Multi-Modal Support | Database | Text | Image | Audio | Video | Mixed Queries | |----------|------|-------|-------|-------|---------------| | Qdrant | ✅ | ✅ | ✅ | ✅ | ✅ (multi-vector) | | Milvus | ✅ | ✅ | ✅ | ✅ | ✅ (hybrid) | | Weaviate | ✅ | ✅ | ✅ | ✅ | ✅ (named vectors) | | Pinecone | ✅ | ✅ | ✅ | ✅ | Limited | | ChromaDB | ✅ | Via emb | Via emb | Via emb | Limited | | pgvector | ✅ | Via emb | Via emb | Via emb | Limited | | SQLite-vss | ✅ | Via emb | Via emb | Via emb | Limited | --- ## 5. Integration Patterns for AI Agents ### Pattern A: Direct Search Query → Embedding → Vector DB → Top-K → LLM ### Pattern B: Hybrid Search Query → BM25 + Vector → Merge/Rerank → LLM ### Pattern C: Multi-Stage Query → Vector DB (top-100) → Reranker (top-10) → LLM ### Pattern D: Agent Memory with Trust + Decay Query → Vector → Score × Trust × Decay → Top-K → Summarize --- ## 6. Comparison with Current Systems ### session_search (FTS5) Strengths: Zero deps, no embedding needed, fast for exact keywords Limitations: No semantic understanding, no cross-lingual, limited ranking ### holographic/retrieval.py (HRR) Strengths: Compositional queries, contradiction detection, trust + decay Limitations: Requires numpy, O(n) scan, non-standard embedding space ### Expected Gains from Vector DB: - Semantic recall: +30-50% for paraphrased queries - Cross-lingual: +60-80% - Fuzzy matching: +40-60% - Conceptual: +50-70% --- ## 7. Recommendations ### Option 1: Qdrant (RECOMMENDED) - Best self-hosted performance - Rust implementation, native multi-vector - Tradeoff: Separate service deployment ### Option 2: pgvector (CONSERVATIVE) - Zero new infrastructure if using PostgreSQL - Tradeoff: 5-10x slower than Qdrant ### Option 3: SQLite-vss (LIGHTWEIGHT) - Minimal changes, embedded deployment - Tradeoff: Limited scalability (<100K vectors) ### Option 4: Hybrid (BEST OF BOTH) Keep FTS5 + HRR and add Qdrant: - Vector (semantic) + FTS5 (keyword) + HRR (compositional) - Apply trust scoring + temporal decay --- ## 8. Embedding Models (2025-2026) | Model | Dimensions | Quality | Cost | |-------|-----------|---------|------| | OpenAI text-embedding-3-large | 3072 | Best | $$$ | | OpenAI text-embedding-3-small | 1536 | Good | $ | | BGE-M3 | 1024 | Best self-hosted | Free | | GTE-Qwen2 | 768-1024 | Good | Free | --- ## 9. Hardware Requirements (1M vectors, 768-dim) | Database | RAM (HNSW) | RAM (Quantized) | |----------|-----------|-----------------| | Qdrant | 8-16GB | 2-4GB | | Milvus | 16-32GB | 4-8GB | | pgvector | 4-8GB | N/A | | SQLite-vss | 2-4GB | N/A | --- ## 10. Conclusion Primary: Qdrant with hybrid search (vector + FTS5 + HRR) Key insight: Augment existing HRR system, don't replace it. Next steps: 1. Deploy Qdrant in Docker for testing 2. Benchmark embedding models 3. Implement hybrid search prototype 4. Measure recall improvement 5. Evaluate operational complexity Report: April 2026 | Sources: ANN-Benchmarks, VectorDBBench, official docs