initial: sovereign home — morrowind agent, skills, training-data, research, specs, notes, operational docs
Tracked: morrowind agent (py/cfg), skills/, training-data/, research/, notes/, specs/, test-results/, metrics/, heartbeat/, briefings/, memories/, skins/, hooks/, decisions.md, OPERATIONS.md, SOUL.md Excluded: screenshots, PNGs, binaries, sessions, databases, secrets, audio cache, timmy-config/ and timmy-telemetry/ (separate repos)
This commit is contained in:
3
skills/mlops/vector-databases/DESCRIPTION.md
Normal file
3
skills/mlops/vector-databases/DESCRIPTION.md
Normal file
@@ -0,0 +1,3 @@
|
||||
---
|
||||
description: Vector similarity search and embedding databases for RAG, semantic search, and AI application backends.
|
||||
---
|
||||
409
skills/mlops/vector-databases/chroma/SKILL.md
Normal file
409
skills/mlops/vector-databases/chroma/SKILL.md
Normal file
@@ -0,0 +1,409 @@
|
||||
---
|
||||
name: chroma
|
||||
description: Open-source embedding database for AI applications. Store embeddings and metadata, perform vector and full-text search, filter by metadata. Simple 4-function API. Scales from notebooks to production clusters. Use for semantic search, RAG applications, or document retrieval. Best for local development and open-source projects.
|
||||
version: 1.0.0
|
||||
author: Orchestra Research
|
||||
license: MIT
|
||||
dependencies: [chromadb, sentence-transformers]
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [RAG, Chroma, Vector Database, Embeddings, Semantic Search, Open Source, Self-Hosted, Document Retrieval, Metadata Filtering]
|
||||
|
||||
---
|
||||
|
||||
# Chroma - Open-Source Embedding Database
|
||||
|
||||
The AI-native database for building LLM applications with memory.
|
||||
|
||||
## When to use Chroma
|
||||
|
||||
**Use Chroma when:**
|
||||
- Building RAG (retrieval-augmented generation) applications
|
||||
- Need local/self-hosted vector database
|
||||
- Want open-source solution (Apache 2.0)
|
||||
- Prototyping in notebooks
|
||||
- Semantic search over documents
|
||||
- Storing embeddings with metadata
|
||||
|
||||
**Metrics**:
|
||||
- **24,300+ GitHub stars**
|
||||
- **1,900+ forks**
|
||||
- **v1.3.3** (stable, weekly releases)
|
||||
- **Apache 2.0 license**
|
||||
|
||||
**Use alternatives instead**:
|
||||
- **Pinecone**: Managed cloud, auto-scaling
|
||||
- **FAISS**: Pure similarity search, no metadata
|
||||
- **Weaviate**: Production ML-native database
|
||||
- **Qdrant**: High performance, Rust-based
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Python
|
||||
pip install chromadb
|
||||
|
||||
# JavaScript/TypeScript
|
||||
npm install chromadb @chroma-core/default-embed
|
||||
```
|
||||
|
||||
### Basic usage (Python)
|
||||
|
||||
```python
|
||||
import chromadb
|
||||
|
||||
# Create client
|
||||
client = chromadb.Client()
|
||||
|
||||
# Create collection
|
||||
collection = client.create_collection(name="my_collection")
|
||||
|
||||
# Add documents
|
||||
collection.add(
|
||||
documents=["This is document 1", "This is document 2"],
|
||||
metadatas=[{"source": "doc1"}, {"source": "doc2"}],
|
||||
ids=["id1", "id2"]
|
||||
)
|
||||
|
||||
# Query
|
||||
results = collection.query(
|
||||
query_texts=["document about topic"],
|
||||
n_results=2
|
||||
)
|
||||
|
||||
print(results)
|
||||
```
|
||||
|
||||
## Core operations
|
||||
|
||||
### 1. Create collection
|
||||
|
||||
```python
|
||||
# Simple collection
|
||||
collection = client.create_collection("my_docs")
|
||||
|
||||
# With custom embedding function
|
||||
from chromadb.utils import embedding_functions
|
||||
|
||||
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
|
||||
api_key="your-key",
|
||||
model_name="text-embedding-3-small"
|
||||
)
|
||||
|
||||
collection = client.create_collection(
|
||||
name="my_docs",
|
||||
embedding_function=openai_ef
|
||||
)
|
||||
|
||||
# Get existing collection
|
||||
collection = client.get_collection("my_docs")
|
||||
|
||||
# Delete collection
|
||||
client.delete_collection("my_docs")
|
||||
```
|
||||
|
||||
### 2. Add documents
|
||||
|
||||
```python
|
||||
# Add with auto-generated IDs
|
||||
collection.add(
|
||||
documents=["Doc 1", "Doc 2", "Doc 3"],
|
||||
metadatas=[
|
||||
{"source": "web", "category": "tutorial"},
|
||||
{"source": "pdf", "page": 5},
|
||||
{"source": "api", "timestamp": "2025-01-01"}
|
||||
],
|
||||
ids=["id1", "id2", "id3"]
|
||||
)
|
||||
|
||||
# Add with custom embeddings
|
||||
collection.add(
|
||||
embeddings=[[0.1, 0.2, ...], [0.3, 0.4, ...]],
|
||||
documents=["Doc 1", "Doc 2"],
|
||||
ids=["id1", "id2"]
|
||||
)
|
||||
```
|
||||
|
||||
### 3. Query (similarity search)
|
||||
|
||||
```python
|
||||
# Basic query
|
||||
results = collection.query(
|
||||
query_texts=["machine learning tutorial"],
|
||||
n_results=5
|
||||
)
|
||||
|
||||
# Query with filters
|
||||
results = collection.query(
|
||||
query_texts=["Python programming"],
|
||||
n_results=3,
|
||||
where={"source": "web"}
|
||||
)
|
||||
|
||||
# Query with metadata filters
|
||||
results = collection.query(
|
||||
query_texts=["advanced topics"],
|
||||
where={
|
||||
"$and": [
|
||||
{"category": "tutorial"},
|
||||
{"difficulty": {"$gte": 3}}
|
||||
]
|
||||
}
|
||||
)
|
||||
|
||||
# Access results
|
||||
print(results["documents"]) # List of matching documents
|
||||
print(results["metadatas"]) # Metadata for each doc
|
||||
print(results["distances"]) # Similarity scores
|
||||
print(results["ids"]) # Document IDs
|
||||
```
|
||||
|
||||
### 4. Get documents
|
||||
|
||||
```python
|
||||
# Get by IDs
|
||||
docs = collection.get(
|
||||
ids=["id1", "id2"]
|
||||
)
|
||||
|
||||
# Get with filters
|
||||
docs = collection.get(
|
||||
where={"category": "tutorial"},
|
||||
limit=10
|
||||
)
|
||||
|
||||
# Get all documents
|
||||
docs = collection.get()
|
||||
```
|
||||
|
||||
### 5. Update documents
|
||||
|
||||
```python
|
||||
# Update document content
|
||||
collection.update(
|
||||
ids=["id1"],
|
||||
documents=["Updated content"],
|
||||
metadatas=[{"source": "updated"}]
|
||||
)
|
||||
```
|
||||
|
||||
### 6. Delete documents
|
||||
|
||||
```python
|
||||
# Delete by IDs
|
||||
collection.delete(ids=["id1", "id2"])
|
||||
|
||||
# Delete with filter
|
||||
collection.delete(
|
||||
where={"source": "outdated"}
|
||||
)
|
||||
```
|
||||
|
||||
## Persistent storage
|
||||
|
||||
```python
|
||||
# Persist to disk
|
||||
client = chromadb.PersistentClient(path="./chroma_db")
|
||||
|
||||
collection = client.create_collection("my_docs")
|
||||
collection.add(documents=["Doc 1"], ids=["id1"])
|
||||
|
||||
# Data persisted automatically
|
||||
# Reload later with same path
|
||||
client = chromadb.PersistentClient(path="./chroma_db")
|
||||
collection = client.get_collection("my_docs")
|
||||
```
|
||||
|
||||
## Embedding functions
|
||||
|
||||
### Default (Sentence Transformers)
|
||||
|
||||
```python
|
||||
# Uses sentence-transformers by default
|
||||
collection = client.create_collection("my_docs")
|
||||
# Default model: all-MiniLM-L6-v2
|
||||
```
|
||||
|
||||
### OpenAI
|
||||
|
||||
```python
|
||||
from chromadb.utils import embedding_functions
|
||||
|
||||
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
|
||||
api_key="your-key",
|
||||
model_name="text-embedding-3-small"
|
||||
)
|
||||
|
||||
collection = client.create_collection(
|
||||
name="openai_docs",
|
||||
embedding_function=openai_ef
|
||||
)
|
||||
```
|
||||
|
||||
### HuggingFace
|
||||
|
||||
```python
|
||||
huggingface_ef = embedding_functions.HuggingFaceEmbeddingFunction(
|
||||
api_key="your-key",
|
||||
model_name="sentence-transformers/all-mpnet-base-v2"
|
||||
)
|
||||
|
||||
collection = client.create_collection(
|
||||
name="hf_docs",
|
||||
embedding_function=huggingface_ef
|
||||
)
|
||||
```
|
||||
|
||||
### Custom embedding function
|
||||
|
||||
```python
|
||||
from chromadb import Documents, EmbeddingFunction, Embeddings
|
||||
|
||||
class MyEmbeddingFunction(EmbeddingFunction):
|
||||
def __call__(self, input: Documents) -> Embeddings:
|
||||
# Your embedding logic
|
||||
return embeddings
|
||||
|
||||
my_ef = MyEmbeddingFunction()
|
||||
collection = client.create_collection(
|
||||
name="custom_docs",
|
||||
embedding_function=my_ef
|
||||
)
|
||||
```
|
||||
|
||||
## Metadata filtering
|
||||
|
||||
```python
|
||||
# Exact match
|
||||
results = collection.query(
|
||||
query_texts=["query"],
|
||||
where={"category": "tutorial"}
|
||||
)
|
||||
|
||||
# Comparison operators
|
||||
results = collection.query(
|
||||
query_texts=["query"],
|
||||
where={"page": {"$gt": 10}} # $gt, $gte, $lt, $lte, $ne
|
||||
)
|
||||
|
||||
# Logical operators
|
||||
results = collection.query(
|
||||
query_texts=["query"],
|
||||
where={
|
||||
"$and": [
|
||||
{"category": "tutorial"},
|
||||
{"difficulty": {"$lte": 3}}
|
||||
]
|
||||
} # Also: $or
|
||||
)
|
||||
|
||||
# Contains
|
||||
results = collection.query(
|
||||
query_texts=["query"],
|
||||
where={"tags": {"$in": ["python", "ml"]}}
|
||||
)
|
||||
```
|
||||
|
||||
## LangChain integration
|
||||
|
||||
```python
|
||||
from langchain_chroma import Chroma
|
||||
from langchain_openai import OpenAIEmbeddings
|
||||
from langchain.text_splitter import RecursiveCharacterTextSplitter
|
||||
|
||||
# Split documents
|
||||
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
|
||||
docs = text_splitter.split_documents(documents)
|
||||
|
||||
# Create Chroma vector store
|
||||
vectorstore = Chroma.from_documents(
|
||||
documents=docs,
|
||||
embedding=OpenAIEmbeddings(),
|
||||
persist_directory="./chroma_db"
|
||||
)
|
||||
|
||||
# Query
|
||||
results = vectorstore.similarity_search("machine learning", k=3)
|
||||
|
||||
# As retriever
|
||||
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
|
||||
```
|
||||
|
||||
## LlamaIndex integration
|
||||
|
||||
```python
|
||||
from llama_index.vector_stores.chroma import ChromaVectorStore
|
||||
from llama_index.core import VectorStoreIndex, StorageContext
|
||||
import chromadb
|
||||
|
||||
# Initialize Chroma
|
||||
db = chromadb.PersistentClient(path="./chroma_db")
|
||||
collection = db.get_or_create_collection("my_collection")
|
||||
|
||||
# Create vector store
|
||||
vector_store = ChromaVectorStore(chroma_collection=collection)
|
||||
storage_context = StorageContext.from_defaults(vector_store=vector_store)
|
||||
|
||||
# Create index
|
||||
index = VectorStoreIndex.from_documents(
|
||||
documents,
|
||||
storage_context=storage_context
|
||||
)
|
||||
|
||||
# Query
|
||||
query_engine = index.as_query_engine()
|
||||
response = query_engine.query("What is machine learning?")
|
||||
```
|
||||
|
||||
## Server mode
|
||||
|
||||
```python
|
||||
# Run Chroma server
|
||||
# Terminal: chroma run --path ./chroma_db --port 8000
|
||||
|
||||
# Connect to server
|
||||
import chromadb
|
||||
from chromadb.config import Settings
|
||||
|
||||
client = chromadb.HttpClient(
|
||||
host="localhost",
|
||||
port=8000,
|
||||
settings=Settings(anonymized_telemetry=False)
|
||||
)
|
||||
|
||||
# Use as normal
|
||||
collection = client.get_or_create_collection("my_docs")
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
1. **Use persistent client** - Don't lose data on restart
|
||||
2. **Add metadata** - Enables filtering and tracking
|
||||
3. **Batch operations** - Add multiple docs at once
|
||||
4. **Choose right embedding model** - Balance speed/quality
|
||||
5. **Use filters** - Narrow search space
|
||||
6. **Unique IDs** - Avoid collisions
|
||||
7. **Regular backups** - Copy chroma_db directory
|
||||
8. **Monitor collection size** - Scale up if needed
|
||||
9. **Test embedding functions** - Ensure quality
|
||||
10. **Use server mode for production** - Better for multi-user
|
||||
|
||||
## Performance
|
||||
|
||||
| Operation | Latency | Notes |
|
||||
|-----------|---------|-------|
|
||||
| Add 100 docs | ~1-3s | With embedding |
|
||||
| Query (top 10) | ~50-200ms | Depends on collection size |
|
||||
| Metadata filter | ~10-50ms | Fast with proper indexing |
|
||||
|
||||
## Resources
|
||||
|
||||
- **GitHub**: https://github.com/chroma-core/chroma ⭐ 24,300+
|
||||
- **Docs**: https://docs.trychroma.com
|
||||
- **Discord**: https://discord.gg/MMeYNTmh3x
|
||||
- **Version**: 1.3.3+
|
||||
- **License**: Apache 2.0
|
||||
|
||||
|
||||
@@ -0,0 +1,38 @@
|
||||
# Chroma Integration Guide
|
||||
|
||||
Integration with LangChain, LlamaIndex, and frameworks.
|
||||
|
||||
## LangChain
|
||||
|
||||
```python
|
||||
from langchain_chroma import Chroma
|
||||
from langchain_openai import OpenAIEmbeddings
|
||||
|
||||
vectorstore = Chroma.from_documents(
|
||||
documents=docs,
|
||||
embedding=OpenAIEmbeddings(),
|
||||
persist_directory="./chroma_db"
|
||||
)
|
||||
|
||||
# Query
|
||||
results = vectorstore.similarity_search("query", k=3)
|
||||
|
||||
# As retriever
|
||||
retriever = vectorstore.as_retriever()
|
||||
```
|
||||
|
||||
## LlamaIndex
|
||||
|
||||
```python
|
||||
from llama_index.vector_stores.chroma import ChromaVectorStore
|
||||
import chromadb
|
||||
|
||||
db = chromadb.PersistentClient(path="./chroma_db")
|
||||
collection = db.get_or_create_collection("docs")
|
||||
|
||||
vector_store = ChromaVectorStore(chroma_collection=collection)
|
||||
```
|
||||
|
||||
## Resources
|
||||
|
||||
- **Docs**: https://docs.trychroma.com
|
||||
224
skills/mlops/vector-databases/faiss/SKILL.md
Normal file
224
skills/mlops/vector-databases/faiss/SKILL.md
Normal file
@@ -0,0 +1,224 @@
|
||||
---
|
||||
name: faiss
|
||||
description: Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). Use for fast k-NN search, large-scale vector retrieval, or when you need pure similarity search without metadata. Best for high-performance applications.
|
||||
version: 1.0.0
|
||||
author: Orchestra Research
|
||||
license: MIT
|
||||
dependencies: [faiss-cpu, faiss-gpu, numpy]
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [RAG, FAISS, Similarity Search, Vector Search, Facebook AI, GPU Acceleration, Billion-Scale, K-NN, HNSW, High Performance, Large Scale]
|
||||
|
||||
---
|
||||
|
||||
# FAISS - Efficient Similarity Search
|
||||
|
||||
Facebook AI's library for billion-scale vector similarity search.
|
||||
|
||||
## When to use FAISS
|
||||
|
||||
**Use FAISS when:**
|
||||
- Need fast similarity search on large vector datasets (millions/billions)
|
||||
- GPU acceleration required
|
||||
- Pure vector similarity (no metadata filtering needed)
|
||||
- High throughput, low latency critical
|
||||
- Offline/batch processing of embeddings
|
||||
|
||||
**Metrics**:
|
||||
- **31,700+ GitHub stars**
|
||||
- Meta/Facebook AI Research
|
||||
- **Handles billions of vectors**
|
||||
- **C++** with Python bindings
|
||||
|
||||
**Use alternatives instead**:
|
||||
- **Chroma/Pinecone**: Need metadata filtering
|
||||
- **Weaviate**: Need full database features
|
||||
- **Annoy**: Simpler, fewer features
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# CPU only
|
||||
pip install faiss-cpu
|
||||
|
||||
# GPU support
|
||||
pip install faiss-gpu
|
||||
```
|
||||
|
||||
### Basic usage
|
||||
|
||||
```python
|
||||
import faiss
|
||||
import numpy as np
|
||||
|
||||
# Create sample data (1000 vectors, 128 dimensions)
|
||||
d = 128
|
||||
nb = 1000
|
||||
vectors = np.random.random((nb, d)).astype('float32')
|
||||
|
||||
# Create index
|
||||
index = faiss.IndexFlatL2(d) # L2 distance
|
||||
index.add(vectors) # Add vectors
|
||||
|
||||
# Search
|
||||
k = 5 # Find 5 nearest neighbors
|
||||
query = np.random.random((1, d)).astype('float32')
|
||||
distances, indices = index.search(query, k)
|
||||
|
||||
print(f"Nearest neighbors: {indices}")
|
||||
print(f"Distances: {distances}")
|
||||
```
|
||||
|
||||
## Index types
|
||||
|
||||
### 1. Flat (exact search)
|
||||
|
||||
```python
|
||||
# L2 (Euclidean) distance
|
||||
index = faiss.IndexFlatL2(d)
|
||||
|
||||
# Inner product (cosine similarity if normalized)
|
||||
index = faiss.IndexFlatIP(d)
|
||||
|
||||
# Slowest, most accurate
|
||||
```
|
||||
|
||||
### 2. IVF (inverted file) - Fast approximate
|
||||
|
||||
```python
|
||||
# Create quantizer
|
||||
quantizer = faiss.IndexFlatL2(d)
|
||||
|
||||
# IVF index with 100 clusters
|
||||
nlist = 100
|
||||
index = faiss.IndexIVFFlat(quantizer, d, nlist)
|
||||
|
||||
# Train on data
|
||||
index.train(vectors)
|
||||
|
||||
# Add vectors
|
||||
index.add(vectors)
|
||||
|
||||
# Search (nprobe = clusters to search)
|
||||
index.nprobe = 10
|
||||
distances, indices = index.search(query, k)
|
||||
```
|
||||
|
||||
### 3. HNSW (Hierarchical NSW) - Best quality/speed
|
||||
|
||||
```python
|
||||
# HNSW index
|
||||
M = 32 # Number of connections per layer
|
||||
index = faiss.IndexHNSWFlat(d, M)
|
||||
|
||||
# No training needed
|
||||
index.add(vectors)
|
||||
|
||||
# Search
|
||||
distances, indices = index.search(query, k)
|
||||
```
|
||||
|
||||
### 4. Product Quantization - Memory efficient
|
||||
|
||||
```python
|
||||
# PQ reduces memory by 16-32×
|
||||
m = 8 # Number of subquantizers
|
||||
nbits = 8
|
||||
index = faiss.IndexPQ(d, m, nbits)
|
||||
|
||||
# Train and add
|
||||
index.train(vectors)
|
||||
index.add(vectors)
|
||||
```
|
||||
|
||||
## Save and load
|
||||
|
||||
```python
|
||||
# Save index
|
||||
faiss.write_index(index, "large.index")
|
||||
|
||||
# Load index
|
||||
index = faiss.read_index("large.index")
|
||||
|
||||
# Continue using
|
||||
distances, indices = index.search(query, k)
|
||||
```
|
||||
|
||||
## GPU acceleration
|
||||
|
||||
```python
|
||||
# Single GPU
|
||||
res = faiss.StandardGpuResources()
|
||||
index_cpu = faiss.IndexFlatL2(d)
|
||||
index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu) # GPU 0
|
||||
|
||||
# Multi-GPU
|
||||
index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)
|
||||
|
||||
# 10-100× faster than CPU
|
||||
```
|
||||
|
||||
## LangChain integration
|
||||
|
||||
```python
|
||||
from langchain_community.vectorstores import FAISS
|
||||
from langchain_openai import OpenAIEmbeddings
|
||||
|
||||
# Create FAISS vector store
|
||||
vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())
|
||||
|
||||
# Save
|
||||
vectorstore.save_local("faiss_index")
|
||||
|
||||
# Load
|
||||
vectorstore = FAISS.load_local(
|
||||
"faiss_index",
|
||||
OpenAIEmbeddings(),
|
||||
allow_dangerous_deserialization=True
|
||||
)
|
||||
|
||||
# Search
|
||||
results = vectorstore.similarity_search("query", k=5)
|
||||
```
|
||||
|
||||
## LlamaIndex integration
|
||||
|
||||
```python
|
||||
from llama_index.vector_stores.faiss import FaissVectorStore
|
||||
import faiss
|
||||
|
||||
# Create FAISS index
|
||||
d = 1536
|
||||
faiss_index = faiss.IndexFlatL2(d)
|
||||
|
||||
vector_store = FaissVectorStore(faiss_index=faiss_index)
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
1. **Choose right index type** - Flat for <10K, IVF for 10K-1M, HNSW for quality
|
||||
2. **Normalize for cosine** - Use IndexFlatIP with normalized vectors
|
||||
3. **Use GPU for large datasets** - 10-100× faster
|
||||
4. **Save trained indices** - Training is expensive
|
||||
5. **Tune nprobe/ef_search** - Balance speed/accuracy
|
||||
6. **Monitor memory** - PQ for large datasets
|
||||
7. **Batch queries** - Better GPU utilization
|
||||
|
||||
## Performance
|
||||
|
||||
| Index Type | Build Time | Search Time | Memory | Accuracy |
|
||||
|------------|------------|-------------|--------|----------|
|
||||
| Flat | Fast | Slow | High | 100% |
|
||||
| IVF | Medium | Fast | Medium | 95-99% |
|
||||
| HNSW | Slow | Fastest | High | 99% |
|
||||
| PQ | Medium | Fast | Low | 90-95% |
|
||||
|
||||
## Resources
|
||||
|
||||
- **GitHub**: https://github.com/facebookresearch/faiss ⭐ 31,700+
|
||||
- **Wiki**: https://github.com/facebookresearch/faiss/wiki
|
||||
- **License**: MIT
|
||||
|
||||
|
||||
280
skills/mlops/vector-databases/faiss/references/index_types.md
Normal file
280
skills/mlops/vector-databases/faiss/references/index_types.md
Normal file
@@ -0,0 +1,280 @@
|
||||
# FAISS Index Types Guide
|
||||
|
||||
Complete guide to choosing and using FAISS index types.
|
||||
|
||||
## Index selection guide
|
||||
|
||||
| Dataset Size | Index Type | Training | Accuracy | Speed |
|
||||
|--------------|------------|----------|----------|-------|
|
||||
| < 10K | Flat | No | 100% | Slow |
|
||||
| 10K-1M | IVF | Yes | 95-99% | Fast |
|
||||
| 1M-10M | HNSW | No | 99% | Fastest |
|
||||
| > 10M | IVF+PQ | Yes | 90-95% | Fast, low memory |
|
||||
|
||||
## Flat indices (exact search)
|
||||
|
||||
### IndexFlatL2 - L2 (Euclidean) distance
|
||||
|
||||
```python
|
||||
import faiss
|
||||
import numpy as np
|
||||
|
||||
d = 128 # Dimension
|
||||
index = faiss.IndexFlatL2(d)
|
||||
|
||||
# Add vectors
|
||||
vectors = np.random.random((1000, d)).astype('float32')
|
||||
index.add(vectors)
|
||||
|
||||
# Search
|
||||
k = 5
|
||||
query = np.random.random((1, d)).astype('float32')
|
||||
distances, indices = index.search(query, k)
|
||||
```
|
||||
|
||||
**Use when:**
|
||||
- Dataset < 10,000 vectors
|
||||
- Need 100% accuracy
|
||||
- Serving as baseline
|
||||
|
||||
### IndexFlatIP - Inner product (cosine similarity)
|
||||
|
||||
```python
|
||||
# For cosine similarity, normalize vectors first
|
||||
import faiss
|
||||
|
||||
d = 128
|
||||
index = faiss.IndexFlatIP(d)
|
||||
|
||||
# Normalize vectors (required for cosine similarity)
|
||||
faiss.normalize_L2(vectors)
|
||||
index.add(vectors)
|
||||
|
||||
# Search
|
||||
faiss.normalize_L2(query)
|
||||
distances, indices = index.search(query, k)
|
||||
```
|
||||
|
||||
**Use when:**
|
||||
- Need cosine similarity
|
||||
- Recommendation systems
|
||||
- Text embeddings
|
||||
|
||||
## IVF indices (inverted file)
|
||||
|
||||
### IndexIVFFlat - Cluster-based search
|
||||
|
||||
```python
|
||||
# Create quantizer
|
||||
quantizer = faiss.IndexFlatL2(d)
|
||||
|
||||
# Create IVF index with 100 clusters
|
||||
nlist = 100 # Number of clusters
|
||||
index = faiss.IndexIVFFlat(quantizer, d, nlist)
|
||||
|
||||
# Train on data (required!)
|
||||
index.train(vectors)
|
||||
|
||||
# Add vectors
|
||||
index.add(vectors)
|
||||
|
||||
# Search (nprobe = clusters to search)
|
||||
index.nprobe = 10 # Search 10 closest clusters
|
||||
distances, indices = index.search(query, k)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `nlist`: Number of clusters (√N to 4√N recommended)
|
||||
- `nprobe`: Clusters to search (1-nlist, higher = more accurate)
|
||||
|
||||
**Use when:**
|
||||
- Dataset 10K-1M vectors
|
||||
- Need fast approximate search
|
||||
- Can afford training time
|
||||
|
||||
### Tuning nprobe
|
||||
|
||||
```python
|
||||
# Test different nprobe values
|
||||
for nprobe in [1, 5, 10, 20, 50]:
|
||||
index.nprobe = nprobe
|
||||
distances, indices = index.search(query, k)
|
||||
# Measure recall/speed trade-off
|
||||
```
|
||||
|
||||
**Guidelines:**
|
||||
- `nprobe=1`: Fastest, ~50% recall
|
||||
- `nprobe=10`: Good balance, ~95% recall
|
||||
- `nprobe=nlist`: Exact search (same as Flat)
|
||||
|
||||
## HNSW indices (graph-based)
|
||||
|
||||
### IndexHNSWFlat - Hierarchical NSW
|
||||
|
||||
```python
|
||||
# HNSW index
|
||||
M = 32 # Number of connections per layer (16-64)
|
||||
index = faiss.IndexHNSWFlat(d, M)
|
||||
|
||||
# Optional: Set ef_construction (build time parameter)
|
||||
index.hnsw.efConstruction = 40 # Higher = better quality, slower build
|
||||
|
||||
# Add vectors (no training needed!)
|
||||
index.add(vectors)
|
||||
|
||||
# Search
|
||||
index.hnsw.efSearch = 16 # Search time parameter
|
||||
distances, indices = index.search(query, k)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `M`: Connections per layer (16-64, default 32)
|
||||
- `efConstruction`: Build quality (40-200, higher = better)
|
||||
- `efSearch`: Search quality (16-512, higher = more accurate)
|
||||
|
||||
**Use when:**
|
||||
- Need best quality approximate search
|
||||
- Can afford higher memory (more connections)
|
||||
- Dataset 1M-10M vectors
|
||||
|
||||
## PQ indices (product quantization)
|
||||
|
||||
### IndexPQ - Memory-efficient
|
||||
|
||||
```python
|
||||
# PQ reduces memory by 16-32×
|
||||
m = 8 # Number of subquantizers (divides d)
|
||||
nbits = 8 # Bits per subquantizer
|
||||
|
||||
index = faiss.IndexPQ(d, m, nbits)
|
||||
|
||||
# Train (required!)
|
||||
index.train(vectors)
|
||||
|
||||
# Add vectors
|
||||
index.add(vectors)
|
||||
|
||||
# Search
|
||||
distances, indices = index.search(query, k)
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `m`: Subquantizers (d must be divisible by m)
|
||||
- `nbits`: Bits per code (8 or 16)
|
||||
|
||||
**Memory savings:**
|
||||
- Original: d × 4 bytes (float32)
|
||||
- PQ: m bytes
|
||||
- Compression ratio: 4d/m
|
||||
|
||||
**Use when:**
|
||||
- Limited memory
|
||||
- Large datasets (> 10M vectors)
|
||||
- Can accept ~90-95% accuracy
|
||||
|
||||
### IndexIVFPQ - IVF + PQ combined
|
||||
|
||||
```python
|
||||
# Best for very large datasets
|
||||
nlist = 4096
|
||||
m = 8
|
||||
nbits = 8
|
||||
|
||||
quantizer = faiss.IndexFlatL2(d)
|
||||
index = faiss.IndexIVFPQ(quantizer, d, nlist, m, nbits)
|
||||
|
||||
# Train
|
||||
index.train(vectors)
|
||||
index.add(vectors)
|
||||
|
||||
# Search
|
||||
index.nprobe = 32
|
||||
distances, indices = index.search(query, k)
|
||||
```
|
||||
|
||||
**Use when:**
|
||||
- Dataset > 10M vectors
|
||||
- Need fast search + low memory
|
||||
- Can accept 90-95% accuracy
|
||||
|
||||
## GPU indices
|
||||
|
||||
### Single GPU
|
||||
|
||||
```python
|
||||
import faiss
|
||||
|
||||
# Create CPU index
|
||||
index_cpu = faiss.IndexFlatL2(d)
|
||||
|
||||
# Move to GPU
|
||||
res = faiss.StandardGpuResources() # GPU resources
|
||||
index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu) # GPU 0
|
||||
|
||||
# Use normally
|
||||
index_gpu.add(vectors)
|
||||
distances, indices = index_gpu.search(query, k)
|
||||
```
|
||||
|
||||
### Multi-GPU
|
||||
|
||||
```python
|
||||
# Use all available GPUs
|
||||
index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)
|
||||
|
||||
# Or specific GPUs
|
||||
gpus = [0, 1, 2, 3] # Use GPUs 0-3
|
||||
index_gpu = faiss.index_cpu_to_gpus_list(index_cpu, gpus)
|
||||
```
|
||||
|
||||
**Speedup:**
|
||||
- Single GPU: 10-50× faster than CPU
|
||||
- Multi-GPU: Near-linear scaling
|
||||
|
||||
## Index factory
|
||||
|
||||
```python
|
||||
# Easy index creation with string descriptors
|
||||
index = faiss.index_factory(d, "IVF100,Flat")
|
||||
index = faiss.index_factory(d, "HNSW32")
|
||||
index = faiss.index_factory(d, "IVF4096,PQ8")
|
||||
|
||||
# Train and use
|
||||
index.train(vectors)
|
||||
index.add(vectors)
|
||||
```
|
||||
|
||||
**Common descriptors:**
|
||||
- `"Flat"`: Exact search
|
||||
- `"IVF100,Flat"`: IVF with 100 clusters
|
||||
- `"HNSW32"`: HNSW with M=32
|
||||
- `"IVF4096,PQ8"`: IVF + PQ compression
|
||||
|
||||
## Performance comparison
|
||||
|
||||
### Search speed (1M vectors, k=10)
|
||||
|
||||
| Index | Build Time | Search Time | Memory | Recall |
|
||||
|-------|------------|-------------|--------|--------|
|
||||
| Flat | 0s | 50ms | 512 MB | 100% |
|
||||
| IVF100 | 5s | 2ms | 512 MB | 95% |
|
||||
| HNSW32 | 60s | 1ms | 1GB | 99% |
|
||||
| IVF4096+PQ8 | 30s | 3ms | 32 MB | 90% |
|
||||
|
||||
*CPU (16 cores), 128-dim vectors*
|
||||
|
||||
## Best practices
|
||||
|
||||
1. **Start with Flat** - Baseline for comparison
|
||||
2. **Use IVF for medium datasets** - Good balance
|
||||
3. **Use HNSW for best quality** - If memory allows
|
||||
4. **Add PQ for memory savings** - Large datasets
|
||||
5. **GPU for > 100K vectors** - 10-50× speedup
|
||||
6. **Tune nprobe/efSearch** - Trade-off speed/accuracy
|
||||
7. **Train on representative data** - Better clustering
|
||||
8. **Save trained indices** - Avoid retraining
|
||||
|
||||
## Resources
|
||||
|
||||
- **Wiki**: https://github.com/facebookresearch/faiss/wiki
|
||||
- **Paper**: https://arxiv.org/abs/1702.08734
|
||||
361
skills/mlops/vector-databases/pinecone/SKILL.md
Normal file
361
skills/mlops/vector-databases/pinecone/SKILL.md
Normal file
@@ -0,0 +1,361 @@
|
||||
---
|
||||
name: pinecone
|
||||
description: Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure.
|
||||
version: 1.0.0
|
||||
author: Orchestra Research
|
||||
license: MIT
|
||||
dependencies: [pinecone-client]
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [RAG, Pinecone, Vector Database, Managed Service, Serverless, Hybrid Search, Production, Auto-Scaling, Low Latency, Recommendations]
|
||||
|
||||
---
|
||||
|
||||
# Pinecone - Managed Vector Database
|
||||
|
||||
The vector database for production AI applications.
|
||||
|
||||
## When to use Pinecone
|
||||
|
||||
**Use when:**
|
||||
- Need managed, serverless vector database
|
||||
- Production RAG applications
|
||||
- Auto-scaling required
|
||||
- Low latency critical (<100ms)
|
||||
- Don't want to manage infrastructure
|
||||
- Need hybrid search (dense + sparse vectors)
|
||||
|
||||
**Metrics**:
|
||||
- Fully managed SaaS
|
||||
- Auto-scales to billions of vectors
|
||||
- **p95 latency <100ms**
|
||||
- 99.9% uptime SLA
|
||||
|
||||
**Use alternatives instead**:
|
||||
- **Chroma**: Self-hosted, open-source
|
||||
- **FAISS**: Offline, pure similarity search
|
||||
- **Weaviate**: Self-hosted with more features
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
pip install pinecone-client
|
||||
```
|
||||
|
||||
### Basic usage
|
||||
|
||||
```python
|
||||
from pinecone import Pinecone, ServerlessSpec
|
||||
|
||||
# Initialize
|
||||
pc = Pinecone(api_key="your-api-key")
|
||||
|
||||
# Create index
|
||||
pc.create_index(
|
||||
name="my-index",
|
||||
dimension=1536, # Must match embedding dimension
|
||||
metric="cosine", # or "euclidean", "dotproduct"
|
||||
spec=ServerlessSpec(cloud="aws", region="us-east-1")
|
||||
)
|
||||
|
||||
# Connect to index
|
||||
index = pc.Index("my-index")
|
||||
|
||||
# Upsert vectors
|
||||
index.upsert(vectors=[
|
||||
{"id": "vec1", "values": [0.1, 0.2, ...], "metadata": {"category": "A"}},
|
||||
{"id": "vec2", "values": [0.3, 0.4, ...], "metadata": {"category": "B"}}
|
||||
])
|
||||
|
||||
# Query
|
||||
results = index.query(
|
||||
vector=[0.1, 0.2, ...],
|
||||
top_k=5,
|
||||
include_metadata=True
|
||||
)
|
||||
|
||||
print(results["matches"])
|
||||
```
|
||||
|
||||
## Core operations
|
||||
|
||||
### Create index
|
||||
|
||||
```python
|
||||
# Serverless (recommended)
|
||||
pc.create_index(
|
||||
name="my-index",
|
||||
dimension=1536,
|
||||
metric="cosine",
|
||||
spec=ServerlessSpec(
|
||||
cloud="aws", # or "gcp", "azure"
|
||||
region="us-east-1"
|
||||
)
|
||||
)
|
||||
|
||||
# Pod-based (for consistent performance)
|
||||
from pinecone import PodSpec
|
||||
|
||||
pc.create_index(
|
||||
name="my-index",
|
||||
dimension=1536,
|
||||
metric="cosine",
|
||||
spec=PodSpec(
|
||||
environment="us-east1-gcp",
|
||||
pod_type="p1.x1"
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
### Upsert vectors
|
||||
|
||||
```python
|
||||
# Single upsert
|
||||
index.upsert(vectors=[
|
||||
{
|
||||
"id": "doc1",
|
||||
"values": [0.1, 0.2, ...], # 1536 dimensions
|
||||
"metadata": {
|
||||
"text": "Document content",
|
||||
"category": "tutorial",
|
||||
"timestamp": "2025-01-01"
|
||||
}
|
||||
}
|
||||
])
|
||||
|
||||
# Batch upsert (recommended)
|
||||
vectors = [
|
||||
{"id": f"vec{i}", "values": embedding, "metadata": metadata}
|
||||
for i, (embedding, metadata) in enumerate(zip(embeddings, metadatas))
|
||||
]
|
||||
|
||||
index.upsert(vectors=vectors, batch_size=100)
|
||||
```
|
||||
|
||||
### Query vectors
|
||||
|
||||
```python
|
||||
# Basic query
|
||||
results = index.query(
|
||||
vector=[0.1, 0.2, ...],
|
||||
top_k=10,
|
||||
include_metadata=True,
|
||||
include_values=False
|
||||
)
|
||||
|
||||
# With metadata filtering
|
||||
results = index.query(
|
||||
vector=[0.1, 0.2, ...],
|
||||
top_k=5,
|
||||
filter={"category": {"$eq": "tutorial"}}
|
||||
)
|
||||
|
||||
# Namespace query
|
||||
results = index.query(
|
||||
vector=[0.1, 0.2, ...],
|
||||
top_k=5,
|
||||
namespace="production"
|
||||
)
|
||||
|
||||
# Access results
|
||||
for match in results["matches"]:
|
||||
print(f"ID: {match['id']}")
|
||||
print(f"Score: {match['score']}")
|
||||
print(f"Metadata: {match['metadata']}")
|
||||
```
|
||||
|
||||
### Metadata filtering
|
||||
|
||||
```python
|
||||
# Exact match
|
||||
filter = {"category": "tutorial"}
|
||||
|
||||
# Comparison
|
||||
filter = {"price": {"$gte": 100}} # $gt, $gte, $lt, $lte, $ne
|
||||
|
||||
# Logical operators
|
||||
filter = {
|
||||
"$and": [
|
||||
{"category": "tutorial"},
|
||||
{"difficulty": {"$lte": 3}}
|
||||
]
|
||||
} # Also: $or
|
||||
|
||||
# In operator
|
||||
filter = {"tags": {"$in": ["python", "ml"]}}
|
||||
```
|
||||
|
||||
## Namespaces
|
||||
|
||||
```python
|
||||
# Partition data by namespace
|
||||
index.upsert(
|
||||
vectors=[{"id": "vec1", "values": [...]}],
|
||||
namespace="user-123"
|
||||
)
|
||||
|
||||
# Query specific namespace
|
||||
results = index.query(
|
||||
vector=[...],
|
||||
namespace="user-123",
|
||||
top_k=5
|
||||
)
|
||||
|
||||
# List namespaces
|
||||
stats = index.describe_index_stats()
|
||||
print(stats['namespaces'])
|
||||
```
|
||||
|
||||
## Hybrid search (dense + sparse)
|
||||
|
||||
```python
|
||||
# Upsert with sparse vectors
|
||||
index.upsert(vectors=[
|
||||
{
|
||||
"id": "doc1",
|
||||
"values": [0.1, 0.2, ...], # Dense vector
|
||||
"sparse_values": {
|
||||
"indices": [10, 45, 123], # Token IDs
|
||||
"values": [0.5, 0.3, 0.8] # TF-IDF scores
|
||||
},
|
||||
"metadata": {"text": "..."}
|
||||
}
|
||||
])
|
||||
|
||||
# Hybrid query
|
||||
results = index.query(
|
||||
vector=[0.1, 0.2, ...],
|
||||
sparse_vector={
|
||||
"indices": [10, 45],
|
||||
"values": [0.5, 0.3]
|
||||
},
|
||||
top_k=5,
|
||||
alpha=0.5 # 0=sparse, 1=dense, 0.5=hybrid
|
||||
)
|
||||
```
|
||||
|
||||
## LangChain integration
|
||||
|
||||
```python
|
||||
from langchain_pinecone import PineconeVectorStore
|
||||
from langchain_openai import OpenAIEmbeddings
|
||||
|
||||
# Create vector store
|
||||
vectorstore = PineconeVectorStore.from_documents(
|
||||
documents=docs,
|
||||
embedding=OpenAIEmbeddings(),
|
||||
index_name="my-index"
|
||||
)
|
||||
|
||||
# Query
|
||||
results = vectorstore.similarity_search("query", k=5)
|
||||
|
||||
# With metadata filter
|
||||
results = vectorstore.similarity_search(
|
||||
"query",
|
||||
k=5,
|
||||
filter={"category": "tutorial"}
|
||||
)
|
||||
|
||||
# As retriever
|
||||
retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
|
||||
```
|
||||
|
||||
## LlamaIndex integration
|
||||
|
||||
```python
|
||||
from llama_index.vector_stores.pinecone import PineconeVectorStore
|
||||
|
||||
# Connect to Pinecone
|
||||
pc = Pinecone(api_key="your-key")
|
||||
pinecone_index = pc.Index("my-index")
|
||||
|
||||
# Create vector store
|
||||
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
|
||||
|
||||
# Use in LlamaIndex
|
||||
from llama_index.core import StorageContext, VectorStoreIndex
|
||||
|
||||
storage_context = StorageContext.from_defaults(vector_store=vector_store)
|
||||
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
|
||||
```
|
||||
|
||||
## Index management
|
||||
|
||||
```python
|
||||
# List indices
|
||||
indexes = pc.list_indexes()
|
||||
|
||||
# Describe index
|
||||
index_info = pc.describe_index("my-index")
|
||||
print(index_info)
|
||||
|
||||
# Get index stats
|
||||
stats = index.describe_index_stats()
|
||||
print(f"Total vectors: {stats['total_vector_count']}")
|
||||
print(f"Namespaces: {stats['namespaces']}")
|
||||
|
||||
# Delete index
|
||||
pc.delete_index("my-index")
|
||||
```
|
||||
|
||||
## Delete vectors
|
||||
|
||||
```python
|
||||
# Delete by ID
|
||||
index.delete(ids=["vec1", "vec2"])
|
||||
|
||||
# Delete by filter
|
||||
index.delete(filter={"category": "old"})
|
||||
|
||||
# Delete all in namespace
|
||||
index.delete(delete_all=True, namespace="test")
|
||||
|
||||
# Delete entire index
|
||||
index.delete(delete_all=True)
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
1. **Use serverless** - Auto-scaling, cost-effective
|
||||
2. **Batch upserts** - More efficient (100-200 per batch)
|
||||
3. **Add metadata** - Enable filtering
|
||||
4. **Use namespaces** - Isolate data by user/tenant
|
||||
5. **Monitor usage** - Check Pinecone dashboard
|
||||
6. **Optimize filters** - Index frequently filtered fields
|
||||
7. **Test with free tier** - 1 index, 100K vectors free
|
||||
8. **Use hybrid search** - Better quality
|
||||
9. **Set appropriate dimensions** - Match embedding model
|
||||
10. **Regular backups** - Export important data
|
||||
|
||||
## Performance
|
||||
|
||||
| Operation | Latency | Notes |
|
||||
|-----------|---------|-------|
|
||||
| Upsert | ~50-100ms | Per batch |
|
||||
| Query (p50) | ~50ms | Depends on index size |
|
||||
| Query (p95) | ~100ms | SLA target |
|
||||
| Metadata filter | ~+10-20ms | Additional overhead |
|
||||
|
||||
## Pricing (as of 2025)
|
||||
|
||||
**Serverless**:
|
||||
- $0.096 per million read units
|
||||
- $0.06 per million write units
|
||||
- $0.06 per GB storage/month
|
||||
|
||||
**Free tier**:
|
||||
- 1 serverless index
|
||||
- 100K vectors (1536 dimensions)
|
||||
- Great for prototyping
|
||||
|
||||
## Resources
|
||||
|
||||
- **Website**: https://www.pinecone.io
|
||||
- **Docs**: https://docs.pinecone.io
|
||||
- **Console**: https://app.pinecone.io
|
||||
- **Pricing**: https://www.pinecone.io/pricing
|
||||
|
||||
|
||||
181
skills/mlops/vector-databases/pinecone/references/deployment.md
Normal file
181
skills/mlops/vector-databases/pinecone/references/deployment.md
Normal file
@@ -0,0 +1,181 @@
|
||||
# Pinecone Deployment Guide
|
||||
|
||||
Production deployment patterns for Pinecone.
|
||||
|
||||
## Serverless vs Pod-based
|
||||
|
||||
### Serverless (Recommended)
|
||||
|
||||
```python
|
||||
from pinecone import Pinecone, ServerlessSpec
|
||||
|
||||
pc = Pinecone(api_key="your-key")
|
||||
|
||||
# Create serverless index
|
||||
pc.create_index(
|
||||
name="my-index",
|
||||
dimension=1536,
|
||||
metric="cosine",
|
||||
spec=ServerlessSpec(
|
||||
cloud="aws", # or "gcp", "azure"
|
||||
region="us-east-1"
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Auto-scaling
|
||||
- Pay per usage
|
||||
- No infrastructure management
|
||||
- Cost-effective for variable load
|
||||
|
||||
**Use when:**
|
||||
- Variable traffic
|
||||
- Cost optimization important
|
||||
- Don't need consistent latency
|
||||
|
||||
### Pod-based
|
||||
|
||||
```python
|
||||
from pinecone import PodSpec
|
||||
|
||||
pc.create_index(
|
||||
name="my-index",
|
||||
dimension=1536,
|
||||
metric="cosine",
|
||||
spec=PodSpec(
|
||||
environment="us-east1-gcp",
|
||||
pod_type="p1.x1", # or p1.x2, p1.x4, p1.x8
|
||||
pods=2, # Number of pods
|
||||
replicas=2 # High availability
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Consistent performance
|
||||
- Predictable latency
|
||||
- Higher throughput
|
||||
- Dedicated resources
|
||||
|
||||
**Use when:**
|
||||
- Production workloads
|
||||
- Need consistent p95 latency
|
||||
- High throughput required
|
||||
|
||||
## Hybrid search
|
||||
|
||||
### Dense + Sparse vectors
|
||||
|
||||
```python
|
||||
# Upsert with both dense and sparse vectors
|
||||
index.upsert(vectors=[
|
||||
{
|
||||
"id": "doc1",
|
||||
"values": [0.1, 0.2, ...], # Dense (semantic)
|
||||
"sparse_values": {
|
||||
"indices": [10, 45, 123], # Token IDs
|
||||
"values": [0.5, 0.3, 0.8] # TF-IDF/BM25 scores
|
||||
},
|
||||
"metadata": {"text": "..."}
|
||||
}
|
||||
])
|
||||
|
||||
# Hybrid query
|
||||
results = index.query(
|
||||
vector=[0.1, 0.2, ...], # Dense query
|
||||
sparse_vector={
|
||||
"indices": [10, 45],
|
||||
"values": [0.5, 0.3]
|
||||
},
|
||||
top_k=10,
|
||||
alpha=0.5 # 0=sparse only, 1=dense only, 0.5=balanced
|
||||
)
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Best of both worlds
|
||||
- Semantic + keyword matching
|
||||
- Better recall than either alone
|
||||
|
||||
## Namespaces for multi-tenancy
|
||||
|
||||
```python
|
||||
# Separate data by user/tenant
|
||||
index.upsert(
|
||||
vectors=[{"id": "doc1", "values": [...]}],
|
||||
namespace="user-123"
|
||||
)
|
||||
|
||||
# Query specific namespace
|
||||
results = index.query(
|
||||
vector=[...],
|
||||
namespace="user-123",
|
||||
top_k=5
|
||||
)
|
||||
|
||||
# List namespaces
|
||||
stats = index.describe_index_stats()
|
||||
print(stats['namespaces'])
|
||||
```
|
||||
|
||||
**Use cases:**
|
||||
- Multi-tenant SaaS
|
||||
- User-specific data isolation
|
||||
- A/B testing (prod/staging namespaces)
|
||||
|
||||
## Metadata filtering
|
||||
|
||||
### Exact match
|
||||
|
||||
```python
|
||||
results = index.query(
|
||||
vector=[...],
|
||||
filter={"category": "tutorial"},
|
||||
top_k=5
|
||||
)
|
||||
```
|
||||
|
||||
### Range queries
|
||||
|
||||
```python
|
||||
results = index.query(
|
||||
vector=[...],
|
||||
filter={"price": {"$gte": 100, "$lte": 500}},
|
||||
top_k=5
|
||||
)
|
||||
```
|
||||
|
||||
### Complex filters
|
||||
|
||||
```python
|
||||
results = index.query(
|
||||
vector=[...],
|
||||
filter={
|
||||
"$and": [
|
||||
{"category": {"$in": ["tutorial", "guide"]}},
|
||||
{"difficulty": {"$lte": 3}},
|
||||
{"published": {"$gte": "2024-01-01"}}
|
||||
]
|
||||
},
|
||||
top_k=5
|
||||
)
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
1. **Use serverless for development** - Cost-effective
|
||||
2. **Switch to pods for production** - Consistent performance
|
||||
3. **Implement namespaces** - Multi-tenancy
|
||||
4. **Add metadata strategically** - Enable filtering
|
||||
5. **Use hybrid search** - Better quality
|
||||
6. **Batch upserts** - 100-200 vectors per batch
|
||||
7. **Monitor usage** - Check Pinecone dashboard
|
||||
8. **Set up alerts** - Usage/cost thresholds
|
||||
9. **Regular backups** - Export important data
|
||||
10. **Test filters** - Verify performance
|
||||
|
||||
## Resources
|
||||
|
||||
- **Docs**: https://docs.pinecone.io
|
||||
- **Console**: https://app.pinecone.io
|
||||
496
skills/mlops/vector-databases/qdrant/SKILL.md
Normal file
496
skills/mlops/vector-databases/qdrant/SKILL.md
Normal file
@@ -0,0 +1,496 @@
|
||||
---
|
||||
name: qdrant-vector-search
|
||||
description: High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance.
|
||||
version: 1.0.0
|
||||
author: Orchestra Research
|
||||
license: MIT
|
||||
dependencies: [qdrant-client>=1.12.0]
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [RAG, Vector Search, Qdrant, Semantic Search, Embeddings, Similarity Search, HNSW, Production, Distributed]
|
||||
|
||||
---
|
||||
|
||||
# Qdrant - Vector Similarity Search Engine
|
||||
|
||||
High-performance vector database written in Rust for production RAG and semantic search.
|
||||
|
||||
## When to use Qdrant
|
||||
|
||||
**Use Qdrant when:**
|
||||
- Building production RAG systems requiring low latency
|
||||
- Need hybrid search (vectors + metadata filtering)
|
||||
- Require horizontal scaling with sharding/replication
|
||||
- Want on-premise deployment with full data control
|
||||
- Need multi-vector storage per record (dense + sparse)
|
||||
- Building real-time recommendation systems
|
||||
|
||||
**Key features:**
|
||||
- **Rust-powered**: Memory-safe, high performance
|
||||
- **Rich filtering**: Filter by any payload field during search
|
||||
- **Multiple vectors**: Dense, sparse, multi-dense per point
|
||||
- **Quantization**: Scalar, product, binary for memory efficiency
|
||||
- **Distributed**: Raft consensus, sharding, replication
|
||||
- **REST + gRPC**: Both APIs with full feature parity
|
||||
|
||||
**Use alternatives instead:**
|
||||
- **Chroma**: Simpler setup, embedded use cases
|
||||
- **FAISS**: Maximum raw speed, research/batch processing
|
||||
- **Pinecone**: Fully managed, zero ops preferred
|
||||
- **Weaviate**: GraphQL preference, built-in vectorizers
|
||||
|
||||
## Quick start
|
||||
|
||||
### Installation
|
||||
|
||||
```bash
|
||||
# Python client
|
||||
pip install qdrant-client
|
||||
|
||||
# Docker (recommended for development)
|
||||
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
|
||||
|
||||
# Docker with persistent storage
|
||||
docker run -p 6333:6333 -p 6334:6334 \
|
||||
-v $(pwd)/qdrant_storage:/qdrant/storage \
|
||||
qdrant/qdrant
|
||||
```
|
||||
|
||||
### Basic usage
|
||||
|
||||
```python
|
||||
from qdrant_client import QdrantClient
|
||||
from qdrant_client.models import Distance, VectorParams, PointStruct
|
||||
|
||||
# Connect to Qdrant
|
||||
client = QdrantClient(host="localhost", port=6333)
|
||||
|
||||
# Create collection
|
||||
client.create_collection(
|
||||
collection_name="documents",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
|
||||
)
|
||||
|
||||
# Insert vectors with payload
|
||||
client.upsert(
|
||||
collection_name="documents",
|
||||
points=[
|
||||
PointStruct(
|
||||
id=1,
|
||||
vector=[0.1, 0.2, ...], # 384-dim vector
|
||||
payload={"title": "Doc 1", "category": "tech"}
|
||||
),
|
||||
PointStruct(
|
||||
id=2,
|
||||
vector=[0.3, 0.4, ...],
|
||||
payload={"title": "Doc 2", "category": "science"}
|
||||
)
|
||||
]
|
||||
)
|
||||
|
||||
# Search with filtering
|
||||
results = client.search(
|
||||
collection_name="documents",
|
||||
query_vector=[0.15, 0.25, ...],
|
||||
query_filter={
|
||||
"must": [{"key": "category", "match": {"value": "tech"}}]
|
||||
},
|
||||
limit=10
|
||||
)
|
||||
|
||||
for point in results:
|
||||
print(f"ID: {point.id}, Score: {point.score}, Payload: {point.payload}")
|
||||
```
|
||||
|
||||
## Core concepts
|
||||
|
||||
### Points - Basic data unit
|
||||
|
||||
```python
|
||||
from qdrant_client.models import PointStruct
|
||||
|
||||
# Point = ID + Vector(s) + Payload
|
||||
point = PointStruct(
|
||||
id=123, # Integer or UUID string
|
||||
vector=[0.1, 0.2, 0.3, ...], # Dense vector
|
||||
payload={ # Arbitrary JSON metadata
|
||||
"title": "Document title",
|
||||
"category": "tech",
|
||||
"timestamp": 1699900000,
|
||||
"tags": ["python", "ml"]
|
||||
}
|
||||
)
|
||||
|
||||
# Batch upsert (recommended)
|
||||
client.upsert(
|
||||
collection_name="documents",
|
||||
points=[point1, point2, point3],
|
||||
wait=True # Wait for indexing
|
||||
)
|
||||
```
|
||||
|
||||
### Collections - Vector containers
|
||||
|
||||
```python
|
||||
from qdrant_client.models import VectorParams, Distance, HnswConfigDiff
|
||||
|
||||
# Create with HNSW configuration
|
||||
client.create_collection(
|
||||
collection_name="documents",
|
||||
vectors_config=VectorParams(
|
||||
size=384, # Vector dimensions
|
||||
distance=Distance.COSINE # COSINE, EUCLID, DOT, MANHATTAN
|
||||
),
|
||||
hnsw_config=HnswConfigDiff(
|
||||
m=16, # Connections per node (default 16)
|
||||
ef_construct=100, # Build-time accuracy (default 100)
|
||||
full_scan_threshold=10000 # Switch to brute force below this
|
||||
),
|
||||
on_disk_payload=True # Store payload on disk
|
||||
)
|
||||
|
||||
# Collection info
|
||||
info = client.get_collection("documents")
|
||||
print(f"Points: {info.points_count}, Vectors: {info.vectors_count}")
|
||||
```
|
||||
|
||||
### Distance metrics
|
||||
|
||||
| Metric | Use Case | Range |
|
||||
|--------|----------|-------|
|
||||
| `COSINE` | Text embeddings, normalized vectors | 0 to 2 |
|
||||
| `EUCLID` | Spatial data, image features | 0 to ∞ |
|
||||
| `DOT` | Recommendations, unnormalized | -∞ to ∞ |
|
||||
| `MANHATTAN` | Sparse features, discrete data | 0 to ∞ |
|
||||
|
||||
## Search operations
|
||||
|
||||
### Basic search
|
||||
|
||||
```python
|
||||
# Simple nearest neighbor search
|
||||
results = client.search(
|
||||
collection_name="documents",
|
||||
query_vector=[0.1, 0.2, ...],
|
||||
limit=10,
|
||||
with_payload=True,
|
||||
with_vectors=False # Don't return vectors (faster)
|
||||
)
|
||||
```
|
||||
|
||||
### Filtered search
|
||||
|
||||
```python
|
||||
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
|
||||
|
||||
# Complex filtering
|
||||
results = client.search(
|
||||
collection_name="documents",
|
||||
query_vector=query_embedding,
|
||||
query_filter=Filter(
|
||||
must=[
|
||||
FieldCondition(key="category", match=MatchValue(value="tech")),
|
||||
FieldCondition(key="timestamp", range=Range(gte=1699000000))
|
||||
],
|
||||
must_not=[
|
||||
FieldCondition(key="status", match=MatchValue(value="archived"))
|
||||
]
|
||||
),
|
||||
limit=10
|
||||
)
|
||||
|
||||
# Shorthand filter syntax
|
||||
results = client.search(
|
||||
collection_name="documents",
|
||||
query_vector=query_embedding,
|
||||
query_filter={
|
||||
"must": [
|
||||
{"key": "category", "match": {"value": "tech"}},
|
||||
{"key": "price", "range": {"gte": 10, "lte": 100}}
|
||||
]
|
||||
},
|
||||
limit=10
|
||||
)
|
||||
```
|
||||
|
||||
### Batch search
|
||||
|
||||
```python
|
||||
from qdrant_client.models import SearchRequest
|
||||
|
||||
# Multiple queries in one request
|
||||
results = client.search_batch(
|
||||
collection_name="documents",
|
||||
requests=[
|
||||
SearchRequest(vector=[0.1, ...], limit=5),
|
||||
SearchRequest(vector=[0.2, ...], limit=5, filter={"must": [...]}),
|
||||
SearchRequest(vector=[0.3, ...], limit=10)
|
||||
]
|
||||
)
|
||||
```
|
||||
|
||||
## RAG integration
|
||||
|
||||
### With sentence-transformers
|
||||
|
||||
```python
|
||||
from sentence_transformers import SentenceTransformer
|
||||
from qdrant_client import QdrantClient
|
||||
from qdrant_client.models import VectorParams, Distance, PointStruct
|
||||
|
||||
# Initialize
|
||||
encoder = SentenceTransformer("all-MiniLM-L6-v2")
|
||||
client = QdrantClient(host="localhost", port=6333)
|
||||
|
||||
# Create collection
|
||||
client.create_collection(
|
||||
collection_name="knowledge_base",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
|
||||
)
|
||||
|
||||
# Index documents
|
||||
documents = [
|
||||
{"id": 1, "text": "Python is a programming language", "source": "wiki"},
|
||||
{"id": 2, "text": "Machine learning uses algorithms", "source": "textbook"},
|
||||
]
|
||||
|
||||
points = [
|
||||
PointStruct(
|
||||
id=doc["id"],
|
||||
vector=encoder.encode(doc["text"]).tolist(),
|
||||
payload={"text": doc["text"], "source": doc["source"]}
|
||||
)
|
||||
for doc in documents
|
||||
]
|
||||
client.upsert(collection_name="knowledge_base", points=points)
|
||||
|
||||
# RAG retrieval
|
||||
def retrieve(query: str, top_k: int = 5) -> list[dict]:
|
||||
query_vector = encoder.encode(query).tolist()
|
||||
results = client.search(
|
||||
collection_name="knowledge_base",
|
||||
query_vector=query_vector,
|
||||
limit=top_k
|
||||
)
|
||||
return [{"text": r.payload["text"], "score": r.score} for r in results]
|
||||
|
||||
# Use in RAG pipeline
|
||||
context = retrieve("What is Python?")
|
||||
prompt = f"Context: {context}\n\nQuestion: What is Python?"
|
||||
```
|
||||
|
||||
### With LangChain
|
||||
|
||||
```python
|
||||
from langchain_community.vectorstores import Qdrant
|
||||
from langchain_community.embeddings import HuggingFaceEmbeddings
|
||||
|
||||
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
|
||||
vectorstore = Qdrant.from_documents(documents, embeddings, url="http://localhost:6333", collection_name="docs")
|
||||
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
|
||||
```
|
||||
|
||||
### With LlamaIndex
|
||||
|
||||
```python
|
||||
from llama_index.vector_stores.qdrant import QdrantVectorStore
|
||||
from llama_index.core import VectorStoreIndex, StorageContext
|
||||
|
||||
vector_store = QdrantVectorStore(client=client, collection_name="llama_docs")
|
||||
storage_context = StorageContext.from_defaults(vector_store=vector_store)
|
||||
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
|
||||
query_engine = index.as_query_engine()
|
||||
```
|
||||
|
||||
## Multi-vector support
|
||||
|
||||
### Named vectors (different embedding models)
|
||||
|
||||
```python
|
||||
from qdrant_client.models import VectorParams, Distance
|
||||
|
||||
# Collection with multiple vector types
|
||||
client.create_collection(
|
||||
collection_name="hybrid_search",
|
||||
vectors_config={
|
||||
"dense": VectorParams(size=384, distance=Distance.COSINE),
|
||||
"sparse": VectorParams(size=30000, distance=Distance.DOT)
|
||||
}
|
||||
)
|
||||
|
||||
# Insert with named vectors
|
||||
client.upsert(
|
||||
collection_name="hybrid_search",
|
||||
points=[
|
||||
PointStruct(
|
||||
id=1,
|
||||
vector={
|
||||
"dense": dense_embedding,
|
||||
"sparse": sparse_embedding
|
||||
},
|
||||
payload={"text": "document text"}
|
||||
)
|
||||
]
|
||||
)
|
||||
|
||||
# Search specific vector
|
||||
results = client.search(
|
||||
collection_name="hybrid_search",
|
||||
query_vector=("dense", query_dense), # Specify which vector
|
||||
limit=10
|
||||
)
|
||||
```
|
||||
|
||||
### Sparse vectors (BM25, SPLADE)
|
||||
|
||||
```python
|
||||
from qdrant_client.models import SparseVectorParams, SparseIndexParams, SparseVector
|
||||
|
||||
# Collection with sparse vectors
|
||||
client.create_collection(
|
||||
collection_name="sparse_search",
|
||||
vectors_config={},
|
||||
sparse_vectors_config={"text": SparseVectorParams(index=SparseIndexParams(on_disk=False))}
|
||||
)
|
||||
|
||||
# Insert sparse vector
|
||||
client.upsert(
|
||||
collection_name="sparse_search",
|
||||
points=[PointStruct(id=1, vector={"text": SparseVector(indices=[1, 5, 100], values=[0.5, 0.8, 0.2])}, payload={"text": "document"})]
|
||||
)
|
||||
```
|
||||
|
||||
## Quantization (memory optimization)
|
||||
|
||||
```python
|
||||
from qdrant_client.models import ScalarQuantization, ScalarQuantizationConfig, ScalarType
|
||||
|
||||
# Scalar quantization (4x memory reduction)
|
||||
client.create_collection(
|
||||
collection_name="quantized",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
|
||||
quantization_config=ScalarQuantization(
|
||||
scalar=ScalarQuantizationConfig(
|
||||
type=ScalarType.INT8,
|
||||
quantile=0.99, # Clip outliers
|
||||
always_ram=True # Keep quantized in RAM
|
||||
)
|
||||
)
|
||||
)
|
||||
|
||||
# Search with rescoring
|
||||
results = client.search(
|
||||
collection_name="quantized",
|
||||
query_vector=query,
|
||||
search_params={"quantization": {"rescore": True}}, # Rescore top results
|
||||
limit=10
|
||||
)
|
||||
```
|
||||
|
||||
## Payload indexing
|
||||
|
||||
```python
|
||||
from qdrant_client.models import PayloadSchemaType
|
||||
|
||||
# Create payload index for faster filtering
|
||||
client.create_payload_index(
|
||||
collection_name="documents",
|
||||
field_name="category",
|
||||
field_schema=PayloadSchemaType.KEYWORD
|
||||
)
|
||||
|
||||
client.create_payload_index(
|
||||
collection_name="documents",
|
||||
field_name="timestamp",
|
||||
field_schema=PayloadSchemaType.INTEGER
|
||||
)
|
||||
|
||||
# Index types: KEYWORD, INTEGER, FLOAT, GEO, TEXT (full-text), BOOL
|
||||
```
|
||||
|
||||
## Production deployment
|
||||
|
||||
### Qdrant Cloud
|
||||
|
||||
```python
|
||||
from qdrant_client import QdrantClient
|
||||
|
||||
# Connect to Qdrant Cloud
|
||||
client = QdrantClient(
|
||||
url="https://your-cluster.cloud.qdrant.io",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
```
|
||||
|
||||
### Performance tuning
|
||||
|
||||
```python
|
||||
# Optimize for search speed (higher recall)
|
||||
client.update_collection(
|
||||
collection_name="documents",
|
||||
hnsw_config=HnswConfigDiff(ef_construct=200, m=32)
|
||||
)
|
||||
|
||||
# Optimize for indexing speed (bulk loads)
|
||||
client.update_collection(
|
||||
collection_name="documents",
|
||||
optimizer_config={"indexing_threshold": 20000}
|
||||
)
|
||||
```
|
||||
|
||||
## Best practices
|
||||
|
||||
1. **Batch operations** - Use batch upsert/search for efficiency
|
||||
2. **Payload indexing** - Index fields used in filters
|
||||
3. **Quantization** - Enable for large collections (>1M vectors)
|
||||
4. **Sharding** - Use for collections >10M vectors
|
||||
5. **On-disk storage** - Enable `on_disk_payload` for large payloads
|
||||
6. **Connection pooling** - Reuse client instances
|
||||
|
||||
## Common issues
|
||||
|
||||
**Slow search with filters:**
|
||||
```python
|
||||
# Create payload index for filtered fields
|
||||
client.create_payload_index(
|
||||
collection_name="docs",
|
||||
field_name="category",
|
||||
field_schema=PayloadSchemaType.KEYWORD
|
||||
)
|
||||
```
|
||||
|
||||
**Out of memory:**
|
||||
```python
|
||||
# Enable quantization and on-disk storage
|
||||
client.create_collection(
|
||||
collection_name="large_collection",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
|
||||
quantization_config=ScalarQuantization(...),
|
||||
on_disk_payload=True
|
||||
)
|
||||
```
|
||||
|
||||
**Connection issues:**
|
||||
```python
|
||||
# Use timeout and retry
|
||||
client = QdrantClient(
|
||||
host="localhost",
|
||||
port=6333,
|
||||
timeout=30,
|
||||
prefer_grpc=True # gRPC for better performance
|
||||
)
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- **[Advanced Usage](references/advanced-usage.md)** - Distributed mode, hybrid search, recommendations
|
||||
- **[Troubleshooting](references/troubleshooting.md)** - Common issues, debugging, performance tuning
|
||||
|
||||
## Resources
|
||||
|
||||
- **GitHub**: https://github.com/qdrant/qdrant (22k+ stars)
|
||||
- **Docs**: https://qdrant.tech/documentation/
|
||||
- **Python Client**: https://github.com/qdrant/qdrant-client
|
||||
- **Cloud**: https://cloud.qdrant.io
|
||||
- **Version**: 1.12.0+
|
||||
- **License**: Apache 2.0
|
||||
@@ -0,0 +1,648 @@
|
||||
# Qdrant Advanced Usage Guide
|
||||
|
||||
## Distributed Deployment
|
||||
|
||||
### Cluster Setup
|
||||
|
||||
Qdrant uses Raft consensus for distributed coordination.
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml for 3-node cluster
|
||||
version: '3.8'
|
||||
services:
|
||||
qdrant-node-1:
|
||||
image: qdrant/qdrant:latest
|
||||
ports:
|
||||
- "6333:6333"
|
||||
- "6334:6334"
|
||||
- "6335:6335"
|
||||
volumes:
|
||||
- ./node1_storage:/qdrant/storage
|
||||
environment:
|
||||
- QDRANT__CLUSTER__ENABLED=true
|
||||
- QDRANT__CLUSTER__P2P__PORT=6335
|
||||
- QDRANT__SERVICE__HTTP_PORT=6333
|
||||
- QDRANT__SERVICE__GRPC_PORT=6334
|
||||
|
||||
qdrant-node-2:
|
||||
image: qdrant/qdrant:latest
|
||||
ports:
|
||||
- "6343:6333"
|
||||
- "6344:6334"
|
||||
- "6345:6335"
|
||||
volumes:
|
||||
- ./node2_storage:/qdrant/storage
|
||||
environment:
|
||||
- QDRANT__CLUSTER__ENABLED=true
|
||||
- QDRANT__CLUSTER__P2P__PORT=6335
|
||||
- QDRANT__CLUSTER__BOOTSTRAP=http://qdrant-node-1:6335
|
||||
depends_on:
|
||||
- qdrant-node-1
|
||||
|
||||
qdrant-node-3:
|
||||
image: qdrant/qdrant:latest
|
||||
ports:
|
||||
- "6353:6333"
|
||||
- "6354:6334"
|
||||
- "6355:6335"
|
||||
volumes:
|
||||
- ./node3_storage:/qdrant/storage
|
||||
environment:
|
||||
- QDRANT__CLUSTER__ENABLED=true
|
||||
- QDRANT__CLUSTER__P2P__PORT=6335
|
||||
- QDRANT__CLUSTER__BOOTSTRAP=http://qdrant-node-1:6335
|
||||
depends_on:
|
||||
- qdrant-node-1
|
||||
```
|
||||
|
||||
### Sharding Configuration
|
||||
|
||||
```python
|
||||
from qdrant_client import QdrantClient
|
||||
from qdrant_client.models import VectorParams, Distance, ShardingMethod
|
||||
|
||||
client = QdrantClient(host="localhost", port=6333)
|
||||
|
||||
# Create sharded collection
|
||||
client.create_collection(
|
||||
collection_name="large_collection",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
|
||||
shard_number=6, # Number of shards
|
||||
replication_factor=2, # Replicas per shard
|
||||
write_consistency_factor=1 # Required acks for write
|
||||
)
|
||||
|
||||
# Check cluster status
|
||||
cluster_info = client.get_cluster_info()
|
||||
print(f"Peers: {cluster_info.peers}")
|
||||
print(f"Raft state: {cluster_info.raft_info}")
|
||||
```
|
||||
|
||||
### Replication and Consistency
|
||||
|
||||
```python
|
||||
from qdrant_client.models import WriteOrdering
|
||||
|
||||
# Strong consistency write
|
||||
client.upsert(
|
||||
collection_name="critical_data",
|
||||
points=points,
|
||||
ordering=WriteOrdering.STRONG # Wait for all replicas
|
||||
)
|
||||
|
||||
# Eventual consistency (faster)
|
||||
client.upsert(
|
||||
collection_name="logs",
|
||||
points=points,
|
||||
ordering=WriteOrdering.WEAK # Return after primary ack
|
||||
)
|
||||
|
||||
# Read from specific shard
|
||||
results = client.search(
|
||||
collection_name="documents",
|
||||
query_vector=query,
|
||||
consistency="majority" # Read from majority of replicas
|
||||
)
|
||||
```
|
||||
|
||||
## Hybrid Search
|
||||
|
||||
### Dense + Sparse Vectors
|
||||
|
||||
Combine semantic (dense) and keyword (sparse) search:
|
||||
|
||||
```python
|
||||
from qdrant_client.models import (
|
||||
VectorParams, SparseVectorParams, SparseIndexParams,
|
||||
Distance, PointStruct, SparseVector, Prefetch, Query
|
||||
)
|
||||
|
||||
# Create hybrid collection
|
||||
client.create_collection(
|
||||
collection_name="hybrid",
|
||||
vectors_config={
|
||||
"dense": VectorParams(size=384, distance=Distance.COSINE)
|
||||
},
|
||||
sparse_vectors_config={
|
||||
"sparse": SparseVectorParams(
|
||||
index=SparseIndexParams(on_disk=False)
|
||||
)
|
||||
}
|
||||
)
|
||||
|
||||
# Insert with both vector types
|
||||
def encode_sparse(text: str) -> SparseVector:
|
||||
"""Simple BM25-like sparse encoding"""
|
||||
from collections import Counter
|
||||
tokens = text.lower().split()
|
||||
counts = Counter(tokens)
|
||||
# Map tokens to indices (use vocabulary in production)
|
||||
indices = [hash(t) % 30000 for t in counts.keys()]
|
||||
values = list(counts.values())
|
||||
return SparseVector(indices=indices, values=values)
|
||||
|
||||
client.upsert(
|
||||
collection_name="hybrid",
|
||||
points=[
|
||||
PointStruct(
|
||||
id=1,
|
||||
vector={
|
||||
"dense": dense_encoder.encode("Python programming").tolist(),
|
||||
"sparse": encode_sparse("Python programming language code")
|
||||
},
|
||||
payload={"text": "Python programming language code"}
|
||||
)
|
||||
]
|
||||
)
|
||||
|
||||
# Hybrid search with Reciprocal Rank Fusion (RRF)
|
||||
from qdrant_client.models import FusionQuery
|
||||
|
||||
results = client.query_points(
|
||||
collection_name="hybrid",
|
||||
prefetch=[
|
||||
Prefetch(query=dense_query, using="dense", limit=20),
|
||||
Prefetch(query=sparse_query, using="sparse", limit=20)
|
||||
],
|
||||
query=FusionQuery(fusion="rrf"), # Combine results
|
||||
limit=10
|
||||
)
|
||||
```
|
||||
|
||||
### Multi-Stage Search
|
||||
|
||||
```python
|
||||
from qdrant_client.models import Prefetch, Query
|
||||
|
||||
# Two-stage retrieval: coarse then fine
|
||||
results = client.query_points(
|
||||
collection_name="documents",
|
||||
prefetch=[
|
||||
Prefetch(
|
||||
query=query_vector,
|
||||
limit=100, # Broad first stage
|
||||
params={"quantization": {"rescore": False}} # Fast, approximate
|
||||
)
|
||||
],
|
||||
query=Query(nearest=query_vector),
|
||||
limit=10,
|
||||
params={"quantization": {"rescore": True}} # Accurate reranking
|
||||
)
|
||||
```
|
||||
|
||||
## Recommendations
|
||||
|
||||
### Item-to-Item Recommendations
|
||||
|
||||
```python
|
||||
# Find similar items
|
||||
recommendations = client.recommend(
|
||||
collection_name="products",
|
||||
positive=[1, 2, 3], # IDs user liked
|
||||
negative=[4], # IDs user disliked
|
||||
limit=10
|
||||
)
|
||||
|
||||
# With filtering
|
||||
recommendations = client.recommend(
|
||||
collection_name="products",
|
||||
positive=[1, 2],
|
||||
query_filter={
|
||||
"must": [
|
||||
{"key": "category", "match": {"value": "electronics"}},
|
||||
{"key": "in_stock", "match": {"value": True}}
|
||||
]
|
||||
},
|
||||
limit=10
|
||||
)
|
||||
```
|
||||
|
||||
### Lookup from Another Collection
|
||||
|
||||
```python
|
||||
from qdrant_client.models import RecommendStrategy, LookupLocation
|
||||
|
||||
# Recommend using vectors from another collection
|
||||
results = client.recommend(
|
||||
collection_name="products",
|
||||
positive=[
|
||||
LookupLocation(
|
||||
collection_name="user_history",
|
||||
id="user_123"
|
||||
)
|
||||
],
|
||||
strategy=RecommendStrategy.AVERAGE_VECTOR,
|
||||
limit=10
|
||||
)
|
||||
```
|
||||
|
||||
## Advanced Filtering
|
||||
|
||||
### Nested Payload Filtering
|
||||
|
||||
```python
|
||||
from qdrant_client.models import Filter, FieldCondition, MatchValue, NestedCondition
|
||||
|
||||
# Filter on nested objects
|
||||
results = client.search(
|
||||
collection_name="documents",
|
||||
query_vector=query,
|
||||
query_filter=Filter(
|
||||
must=[
|
||||
NestedCondition(
|
||||
key="metadata",
|
||||
filter=Filter(
|
||||
must=[
|
||||
FieldCondition(
|
||||
key="author.name",
|
||||
match=MatchValue(value="John")
|
||||
)
|
||||
]
|
||||
)
|
||||
)
|
||||
]
|
||||
),
|
||||
limit=10
|
||||
)
|
||||
```
|
||||
|
||||
### Geo Filtering
|
||||
|
||||
```python
|
||||
from qdrant_client.models import FieldCondition, GeoRadius, GeoPoint
|
||||
|
||||
# Find within radius
|
||||
results = client.search(
|
||||
collection_name="locations",
|
||||
query_vector=query,
|
||||
query_filter=Filter(
|
||||
must=[
|
||||
FieldCondition(
|
||||
key="location",
|
||||
geo_radius=GeoRadius(
|
||||
center=GeoPoint(lat=40.7128, lon=-74.0060),
|
||||
radius=5000 # meters
|
||||
)
|
||||
)
|
||||
]
|
||||
),
|
||||
limit=10
|
||||
)
|
||||
|
||||
# Geo bounding box
|
||||
from qdrant_client.models import GeoBoundingBox
|
||||
|
||||
results = client.search(
|
||||
collection_name="locations",
|
||||
query_vector=query,
|
||||
query_filter=Filter(
|
||||
must=[
|
||||
FieldCondition(
|
||||
key="location",
|
||||
geo_bounding_box=GeoBoundingBox(
|
||||
top_left=GeoPoint(lat=40.8, lon=-74.1),
|
||||
bottom_right=GeoPoint(lat=40.6, lon=-73.9)
|
||||
)
|
||||
)
|
||||
]
|
||||
),
|
||||
limit=10
|
||||
)
|
||||
```
|
||||
|
||||
### Full-Text Search
|
||||
|
||||
```python
|
||||
from qdrant_client.models import TextIndexParams, TokenizerType
|
||||
|
||||
# Create text index
|
||||
client.create_payload_index(
|
||||
collection_name="documents",
|
||||
field_name="content",
|
||||
field_schema=TextIndexParams(
|
||||
type="text",
|
||||
tokenizer=TokenizerType.WORD,
|
||||
min_token_len=2,
|
||||
max_token_len=15,
|
||||
lowercase=True
|
||||
)
|
||||
)
|
||||
|
||||
# Full-text filter
|
||||
from qdrant_client.models import MatchText
|
||||
|
||||
results = client.search(
|
||||
collection_name="documents",
|
||||
query_vector=query,
|
||||
query_filter=Filter(
|
||||
must=[
|
||||
FieldCondition(
|
||||
key="content",
|
||||
match=MatchText(text="machine learning")
|
||||
)
|
||||
]
|
||||
),
|
||||
limit=10
|
||||
)
|
||||
```
|
||||
|
||||
## Quantization Strategies
|
||||
|
||||
### Scalar Quantization (INT8)
|
||||
|
||||
```python
|
||||
from qdrant_client.models import ScalarQuantization, ScalarQuantizationConfig, ScalarType
|
||||
|
||||
# ~4x memory reduction, minimal accuracy loss
|
||||
client.create_collection(
|
||||
collection_name="scalar_quantized",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
|
||||
quantization_config=ScalarQuantization(
|
||||
scalar=ScalarQuantizationConfig(
|
||||
type=ScalarType.INT8,
|
||||
quantile=0.99, # Clip extreme values
|
||||
always_ram=True # Keep quantized vectors in RAM
|
||||
)
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
### Product Quantization
|
||||
|
||||
```python
|
||||
from qdrant_client.models import ProductQuantization, ProductQuantizationConfig, CompressionRatio
|
||||
|
||||
# ~16x memory reduction, some accuracy loss
|
||||
client.create_collection(
|
||||
collection_name="product_quantized",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
|
||||
quantization_config=ProductQuantization(
|
||||
product=ProductQuantizationConfig(
|
||||
compression=CompressionRatio.X16,
|
||||
always_ram=True
|
||||
)
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
### Binary Quantization
|
||||
|
||||
```python
|
||||
from qdrant_client.models import BinaryQuantization, BinaryQuantizationConfig
|
||||
|
||||
# ~32x memory reduction, requires oversampling
|
||||
client.create_collection(
|
||||
collection_name="binary_quantized",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
|
||||
quantization_config=BinaryQuantization(
|
||||
binary=BinaryQuantizationConfig(always_ram=True)
|
||||
)
|
||||
)
|
||||
|
||||
# Search with oversampling
|
||||
results = client.search(
|
||||
collection_name="binary_quantized",
|
||||
query_vector=query,
|
||||
search_params={
|
||||
"quantization": {
|
||||
"rescore": True,
|
||||
"oversampling": 2.0 # Retrieve 2x candidates, rescore
|
||||
}
|
||||
},
|
||||
limit=10
|
||||
)
|
||||
```
|
||||
|
||||
## Snapshots and Backups
|
||||
|
||||
### Create Snapshot
|
||||
|
||||
```python
|
||||
# Create collection snapshot
|
||||
snapshot_info = client.create_snapshot(collection_name="documents")
|
||||
print(f"Snapshot: {snapshot_info.name}")
|
||||
|
||||
# List snapshots
|
||||
snapshots = client.list_snapshots(collection_name="documents")
|
||||
for s in snapshots:
|
||||
print(f"{s.name}: {s.size} bytes")
|
||||
|
||||
# Full storage snapshot
|
||||
full_snapshot = client.create_full_snapshot()
|
||||
```
|
||||
|
||||
### Restore from Snapshot
|
||||
|
||||
```python
|
||||
# Download snapshot
|
||||
client.download_snapshot(
|
||||
collection_name="documents",
|
||||
snapshot_name="documents-2024-01-01.snapshot",
|
||||
target_path="./backup/"
|
||||
)
|
||||
|
||||
# Restore (via REST API)
|
||||
import requests
|
||||
|
||||
response = requests.put(
|
||||
"http://localhost:6333/collections/documents/snapshots/recover",
|
||||
json={"location": "file:///backup/documents-2024-01-01.snapshot"}
|
||||
)
|
||||
```
|
||||
|
||||
## Collection Aliases
|
||||
|
||||
```python
|
||||
# Create alias
|
||||
client.update_collection_aliases(
|
||||
change_aliases_operations=[
|
||||
{"create_alias": {"alias_name": "production", "collection_name": "documents_v2"}}
|
||||
]
|
||||
)
|
||||
|
||||
# Blue-green deployment
|
||||
# 1. Create new collection with updates
|
||||
client.create_collection(collection_name="documents_v3", ...)
|
||||
|
||||
# 2. Populate new collection
|
||||
client.upsert(collection_name="documents_v3", points=new_points)
|
||||
|
||||
# 3. Atomic switch
|
||||
client.update_collection_aliases(
|
||||
change_aliases_operations=[
|
||||
{"delete_alias": {"alias_name": "production"}},
|
||||
{"create_alias": {"alias_name": "production", "collection_name": "documents_v3"}}
|
||||
]
|
||||
)
|
||||
|
||||
# Search via alias
|
||||
results = client.search(collection_name="production", query_vector=query, limit=10)
|
||||
```
|
||||
|
||||
## Scroll and Iteration
|
||||
|
||||
### Scroll Through All Points
|
||||
|
||||
```python
|
||||
# Paginated iteration
|
||||
offset = None
|
||||
all_points = []
|
||||
|
||||
while True:
|
||||
results, offset = client.scroll(
|
||||
collection_name="documents",
|
||||
limit=100,
|
||||
offset=offset,
|
||||
with_payload=True,
|
||||
with_vectors=False
|
||||
)
|
||||
all_points.extend(results)
|
||||
|
||||
if offset is None:
|
||||
break
|
||||
|
||||
print(f"Total points: {len(all_points)}")
|
||||
```
|
||||
|
||||
### Filtered Scroll
|
||||
|
||||
```python
|
||||
# Scroll with filter
|
||||
results, _ = client.scroll(
|
||||
collection_name="documents",
|
||||
scroll_filter=Filter(
|
||||
must=[
|
||||
FieldCondition(key="status", match=MatchValue(value="active"))
|
||||
]
|
||||
),
|
||||
limit=1000
|
||||
)
|
||||
```
|
||||
|
||||
## Async Client
|
||||
|
||||
```python
|
||||
import asyncio
|
||||
from qdrant_client import AsyncQdrantClient
|
||||
|
||||
async def main():
|
||||
client = AsyncQdrantClient(host="localhost", port=6333)
|
||||
|
||||
# Async operations
|
||||
await client.create_collection(
|
||||
collection_name="async_docs",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
|
||||
)
|
||||
|
||||
await client.upsert(
|
||||
collection_name="async_docs",
|
||||
points=points
|
||||
)
|
||||
|
||||
results = await client.search(
|
||||
collection_name="async_docs",
|
||||
query_vector=query,
|
||||
limit=10
|
||||
)
|
||||
|
||||
return results
|
||||
|
||||
results = asyncio.run(main())
|
||||
```
|
||||
|
||||
## gRPC Client
|
||||
|
||||
```python
|
||||
from qdrant_client import QdrantClient
|
||||
|
||||
# Prefer gRPC for better performance
|
||||
client = QdrantClient(
|
||||
host="localhost",
|
||||
port=6333,
|
||||
grpc_port=6334,
|
||||
prefer_grpc=True # Use gRPC when available
|
||||
)
|
||||
|
||||
# gRPC-only client
|
||||
from qdrant_client import QdrantClient
|
||||
|
||||
client = QdrantClient(
|
||||
host="localhost",
|
||||
grpc_port=6334,
|
||||
prefer_grpc=True,
|
||||
https=False
|
||||
)
|
||||
```
|
||||
|
||||
## Multitenancy
|
||||
|
||||
### Payload-Based Isolation
|
||||
|
||||
```python
|
||||
# Single collection, filter by tenant
|
||||
client.upsert(
|
||||
collection_name="multi_tenant",
|
||||
points=[
|
||||
PointStruct(
|
||||
id=1,
|
||||
vector=embedding,
|
||||
payload={"tenant_id": "tenant_a", "text": "..."}
|
||||
)
|
||||
]
|
||||
)
|
||||
|
||||
# Search within tenant
|
||||
results = client.search(
|
||||
collection_name="multi_tenant",
|
||||
query_vector=query,
|
||||
query_filter=Filter(
|
||||
must=[FieldCondition(key="tenant_id", match=MatchValue(value="tenant_a"))]
|
||||
),
|
||||
limit=10
|
||||
)
|
||||
```
|
||||
|
||||
### Collection-Per-Tenant
|
||||
|
||||
```python
|
||||
# Create tenant collection
|
||||
def create_tenant_collection(tenant_id: str):
|
||||
client.create_collection(
|
||||
collection_name=f"tenant_{tenant_id}",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
|
||||
)
|
||||
|
||||
# Search tenant collection
|
||||
def search_tenant(tenant_id: str, query_vector: list, limit: int = 10):
|
||||
return client.search(
|
||||
collection_name=f"tenant_{tenant_id}",
|
||||
query_vector=query_vector,
|
||||
limit=limit
|
||||
)
|
||||
```
|
||||
|
||||
## Performance Monitoring
|
||||
|
||||
### Collection Statistics
|
||||
|
||||
```python
|
||||
# Collection info
|
||||
info = client.get_collection("documents")
|
||||
print(f"Points: {info.points_count}")
|
||||
print(f"Indexed vectors: {info.indexed_vectors_count}")
|
||||
print(f"Segments: {len(info.segments)}")
|
||||
print(f"Status: {info.status}")
|
||||
|
||||
# Detailed segment info
|
||||
for i, segment in enumerate(info.segments):
|
||||
print(f"Segment {i}: {segment}")
|
||||
```
|
||||
|
||||
### Telemetry
|
||||
|
||||
```python
|
||||
# Get telemetry data
|
||||
telemetry = client.get_telemetry()
|
||||
print(f"Collections: {telemetry.collections}")
|
||||
print(f"Operations: {telemetry.operations}")
|
||||
```
|
||||
@@ -0,0 +1,631 @@
|
||||
# Qdrant Troubleshooting Guide
|
||||
|
||||
## Installation Issues
|
||||
|
||||
### Docker Issues
|
||||
|
||||
**Error**: `Cannot connect to Docker daemon`
|
||||
|
||||
**Fix**:
|
||||
```bash
|
||||
# Start Docker daemon
|
||||
sudo systemctl start docker
|
||||
|
||||
# Or use Docker Desktop on Mac/Windows
|
||||
open -a Docker
|
||||
```
|
||||
|
||||
**Error**: `Port 6333 already in use`
|
||||
|
||||
**Fix**:
|
||||
```bash
|
||||
# Find process using port
|
||||
lsof -i :6333
|
||||
|
||||
# Kill process or use different port
|
||||
docker run -p 6334:6333 qdrant/qdrant
|
||||
```
|
||||
|
||||
### Python Client Issues
|
||||
|
||||
**Error**: `ModuleNotFoundError: No module named 'qdrant_client'`
|
||||
|
||||
**Fix**:
|
||||
```bash
|
||||
pip install qdrant-client
|
||||
|
||||
# With specific version
|
||||
pip install qdrant-client>=1.12.0
|
||||
```
|
||||
|
||||
**Error**: `grpc._channel._InactiveRpcError`
|
||||
|
||||
**Fix**:
|
||||
```bash
|
||||
# Install with gRPC support
|
||||
pip install 'qdrant-client[grpc]'
|
||||
|
||||
# Or disable gRPC
|
||||
client = QdrantClient(host="localhost", port=6333, prefer_grpc=False)
|
||||
```
|
||||
|
||||
## Connection Issues
|
||||
|
||||
### Cannot Connect to Server
|
||||
|
||||
**Error**: `ConnectionRefusedError: [Errno 111] Connection refused`
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Check server is running**:
|
||||
```bash
|
||||
docker ps | grep qdrant
|
||||
curl http://localhost:6333/healthz
|
||||
```
|
||||
|
||||
2. **Verify port binding**:
|
||||
```bash
|
||||
# Check listening ports
|
||||
netstat -tlnp | grep 6333
|
||||
|
||||
# Docker port mapping
|
||||
docker port <container_id>
|
||||
```
|
||||
|
||||
3. **Use correct host**:
|
||||
```python
|
||||
# Docker on Linux
|
||||
client = QdrantClient(host="localhost", port=6333)
|
||||
|
||||
# Docker on Mac/Windows with networking issues
|
||||
client = QdrantClient(host="127.0.0.1", port=6333)
|
||||
|
||||
# Inside Docker network
|
||||
client = QdrantClient(host="qdrant", port=6333)
|
||||
```
|
||||
|
||||
### Timeout Errors
|
||||
|
||||
**Error**: `TimeoutError: Connection timed out`
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
# Increase timeout
|
||||
client = QdrantClient(
|
||||
host="localhost",
|
||||
port=6333,
|
||||
timeout=60 # seconds
|
||||
)
|
||||
|
||||
# For large operations
|
||||
client.upsert(
|
||||
collection_name="documents",
|
||||
points=large_batch,
|
||||
wait=False # Don't wait for indexing
|
||||
)
|
||||
```
|
||||
|
||||
### SSL/TLS Errors
|
||||
|
||||
**Error**: `ssl.SSLCertVerificationError`
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
# Qdrant Cloud
|
||||
client = QdrantClient(
|
||||
url="https://cluster.cloud.qdrant.io",
|
||||
api_key="your-api-key"
|
||||
)
|
||||
|
||||
# Self-signed certificate
|
||||
client = QdrantClient(
|
||||
host="localhost",
|
||||
port=6333,
|
||||
https=True,
|
||||
verify=False # Disable verification (not recommended for production)
|
||||
)
|
||||
```
|
||||
|
||||
## Collection Issues
|
||||
|
||||
### Collection Already Exists
|
||||
|
||||
**Error**: `ValueError: Collection 'documents' already exists`
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
# Check before creating
|
||||
collections = client.get_collections().collections
|
||||
names = [c.name for c in collections]
|
||||
|
||||
if "documents" not in names:
|
||||
client.create_collection(...)
|
||||
|
||||
# Or recreate
|
||||
client.recreate_collection(
|
||||
collection_name="documents",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
|
||||
)
|
||||
```
|
||||
|
||||
### Collection Not Found
|
||||
|
||||
**Error**: `NotFoundException: Collection 'docs' not found`
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
# List available collections
|
||||
collections = client.get_collections()
|
||||
print([c.name for c in collections.collections])
|
||||
|
||||
# Check exact name (case-sensitive)
|
||||
try:
|
||||
info = client.get_collection("documents")
|
||||
except Exception as e:
|
||||
print(f"Collection not found: {e}")
|
||||
```
|
||||
|
||||
### Vector Dimension Mismatch
|
||||
|
||||
**Error**: `ValueError: Vector dimension mismatch. Expected 384, got 768`
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
# Check collection config
|
||||
info = client.get_collection("documents")
|
||||
print(f"Expected dimension: {info.config.params.vectors.size}")
|
||||
|
||||
# Recreate with correct dimension
|
||||
client.recreate_collection(
|
||||
collection_name="documents",
|
||||
vectors_config=VectorParams(size=768, distance=Distance.COSINE) # Match your embeddings
|
||||
)
|
||||
```
|
||||
|
||||
## Search Issues
|
||||
|
||||
### Empty Search Results
|
||||
|
||||
**Problem**: Search returns empty results.
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Verify data exists**:
|
||||
```python
|
||||
info = client.get_collection("documents")
|
||||
print(f"Points: {info.points_count}")
|
||||
|
||||
# Scroll to check data
|
||||
points, _ = client.scroll(
|
||||
collection_name="documents",
|
||||
limit=10,
|
||||
with_payload=True
|
||||
)
|
||||
print(points)
|
||||
```
|
||||
|
||||
2. **Check vector format**:
|
||||
```python
|
||||
# Must be list of floats
|
||||
query_vector = embedding.tolist() # Convert numpy to list
|
||||
|
||||
# Check dimensions
|
||||
print(f"Query dimension: {len(query_vector)}")
|
||||
```
|
||||
|
||||
3. **Verify filter conditions**:
|
||||
```python
|
||||
# Test without filter first
|
||||
results = client.search(
|
||||
collection_name="documents",
|
||||
query_vector=query,
|
||||
limit=10
|
||||
# No filter
|
||||
)
|
||||
|
||||
# Then add filter incrementally
|
||||
```
|
||||
|
||||
### Slow Search Performance
|
||||
|
||||
**Problem**: Search takes too long.
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Create payload indexes**:
|
||||
```python
|
||||
# Index fields used in filters
|
||||
client.create_payload_index(
|
||||
collection_name="documents",
|
||||
field_name="category",
|
||||
field_schema="keyword"
|
||||
)
|
||||
```
|
||||
|
||||
2. **Enable quantization**:
|
||||
```python
|
||||
client.update_collection(
|
||||
collection_name="documents",
|
||||
quantization_config=ScalarQuantization(
|
||||
scalar=ScalarQuantizationConfig(type=ScalarType.INT8)
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
3. **Tune HNSW parameters**:
|
||||
```python
|
||||
# Faster search (less accurate)
|
||||
client.update_collection(
|
||||
collection_name="documents",
|
||||
hnsw_config=HnswConfigDiff(ef_construct=64, m=8)
|
||||
)
|
||||
|
||||
# Use ef search parameter
|
||||
results = client.search(
|
||||
collection_name="documents",
|
||||
query_vector=query,
|
||||
search_params={"hnsw_ef": 64}, # Lower = faster
|
||||
limit=10
|
||||
)
|
||||
```
|
||||
|
||||
4. **Use gRPC**:
|
||||
```python
|
||||
client = QdrantClient(
|
||||
host="localhost",
|
||||
port=6333,
|
||||
grpc_port=6334,
|
||||
prefer_grpc=True
|
||||
)
|
||||
```
|
||||
|
||||
### Inconsistent Results
|
||||
|
||||
**Problem**: Same query returns different results.
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Wait for indexing**:
|
||||
```python
|
||||
client.upsert(
|
||||
collection_name="documents",
|
||||
points=points,
|
||||
wait=True # Wait for index update
|
||||
)
|
||||
```
|
||||
|
||||
2. **Check replication consistency**:
|
||||
```python
|
||||
# Strong consistency read
|
||||
results = client.search(
|
||||
collection_name="documents",
|
||||
query_vector=query,
|
||||
consistency="all" # Read from all replicas
|
||||
)
|
||||
```
|
||||
|
||||
## Upsert Issues
|
||||
|
||||
### Batch Upsert Fails
|
||||
|
||||
**Error**: `PayloadError: Payload too large`
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
# Split into smaller batches
|
||||
def batch_upsert(client, collection, points, batch_size=100):
|
||||
for i in range(0, len(points), batch_size):
|
||||
batch = points[i:i + batch_size]
|
||||
client.upsert(
|
||||
collection_name=collection,
|
||||
points=batch,
|
||||
wait=True
|
||||
)
|
||||
|
||||
batch_upsert(client, "documents", large_points_list)
|
||||
```
|
||||
|
||||
### Invalid Point ID
|
||||
|
||||
**Error**: `ValueError: Invalid point ID`
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
# Valid ID types: int or UUID string
|
||||
from uuid import uuid4
|
||||
|
||||
# Integer ID
|
||||
PointStruct(id=123, vector=vec, payload={})
|
||||
|
||||
# UUID string
|
||||
PointStruct(id=str(uuid4()), vector=vec, payload={})
|
||||
|
||||
# NOT valid
|
||||
PointStruct(id="custom-string-123", ...) # Use UUID format
|
||||
```
|
||||
|
||||
### Payload Validation Errors
|
||||
|
||||
**Error**: `ValidationError: Invalid payload`
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
# Ensure JSON-serializable payload
|
||||
import json
|
||||
|
||||
payload = {
|
||||
"title": "Document",
|
||||
"count": 42,
|
||||
"tags": ["a", "b"],
|
||||
"nested": {"key": "value"}
|
||||
}
|
||||
|
||||
# Validate before upsert
|
||||
json.dumps(payload) # Should not raise
|
||||
|
||||
# Avoid non-serializable types
|
||||
# NOT valid: datetime, numpy arrays, custom objects
|
||||
payload = {
|
||||
"timestamp": datetime.now().isoformat(), # Convert to string
|
||||
"vector": embedding.tolist() # Convert numpy to list
|
||||
}
|
||||
```
|
||||
|
||||
## Memory Issues
|
||||
|
||||
### Out of Memory
|
||||
|
||||
**Error**: `MemoryError` or container killed
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. **Enable on-disk storage**:
|
||||
```python
|
||||
client.create_collection(
|
||||
collection_name="large_collection",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
|
||||
on_disk_payload=True, # Store payloads on disk
|
||||
hnsw_config=HnswConfigDiff(on_disk=True) # Store HNSW on disk
|
||||
)
|
||||
```
|
||||
|
||||
2. **Use quantization**:
|
||||
```python
|
||||
# 4x memory reduction
|
||||
client.update_collection(
|
||||
collection_name="large_collection",
|
||||
quantization_config=ScalarQuantization(
|
||||
scalar=ScalarQuantizationConfig(
|
||||
type=ScalarType.INT8,
|
||||
always_ram=False # Keep on disk
|
||||
)
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
3. **Increase Docker memory**:
|
||||
```bash
|
||||
docker run -m 8g -p 6333:6333 qdrant/qdrant
|
||||
```
|
||||
|
||||
4. **Configure Qdrant storage**:
|
||||
```yaml
|
||||
# config.yaml
|
||||
storage:
|
||||
performance:
|
||||
max_search_threads: 2
|
||||
optimizers:
|
||||
memmap_threshold_kb: 20000
|
||||
```
|
||||
|
||||
### High Memory Usage During Indexing
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
# Increase indexing threshold for bulk loads
|
||||
client.update_collection(
|
||||
collection_name="documents",
|
||||
optimizer_config={
|
||||
"indexing_threshold": 50000 # Delay indexing
|
||||
}
|
||||
)
|
||||
|
||||
# Bulk insert
|
||||
client.upsert(collection_name="documents", points=all_points, wait=False)
|
||||
|
||||
# Then optimize
|
||||
client.update_collection(
|
||||
collection_name="documents",
|
||||
optimizer_config={
|
||||
"indexing_threshold": 10000 # Resume normal indexing
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
## Cluster Issues
|
||||
|
||||
### Node Not Joining Cluster
|
||||
|
||||
**Problem**: New node fails to join cluster.
|
||||
|
||||
**Fix**:
|
||||
```bash
|
||||
# Check network connectivity
|
||||
docker exec qdrant-node-2 ping qdrant-node-1
|
||||
|
||||
# Verify bootstrap URL
|
||||
docker logs qdrant-node-2 | grep bootstrap
|
||||
|
||||
# Check Raft state
|
||||
curl http://localhost:6333/cluster
|
||||
```
|
||||
|
||||
### Split Brain
|
||||
|
||||
**Problem**: Cluster has inconsistent state.
|
||||
|
||||
**Fix**:
|
||||
```bash
|
||||
# Force leader election
|
||||
curl -X POST http://localhost:6333/cluster/recover
|
||||
|
||||
# Or restart minority nodes
|
||||
docker restart qdrant-node-2 qdrant-node-3
|
||||
```
|
||||
|
||||
### Replication Lag
|
||||
|
||||
**Problem**: Replicas fall behind.
|
||||
|
||||
**Fix**:
|
||||
```python
|
||||
# Check collection status
|
||||
info = client.get_collection("documents")
|
||||
print(f"Status: {info.status}")
|
||||
|
||||
# Use strong consistency for critical writes
|
||||
client.upsert(
|
||||
collection_name="documents",
|
||||
points=points,
|
||||
ordering=WriteOrdering.STRONG
|
||||
)
|
||||
```
|
||||
|
||||
## Performance Tuning
|
||||
|
||||
### Benchmark Configuration
|
||||
|
||||
```python
|
||||
import time
|
||||
import numpy as np
|
||||
|
||||
def benchmark_search(client, collection, n_queries=100, dimension=384):
|
||||
# Generate random queries
|
||||
queries = [np.random.rand(dimension).tolist() for _ in range(n_queries)]
|
||||
|
||||
# Warmup
|
||||
for q in queries[:10]:
|
||||
client.search(collection_name=collection, query_vector=q, limit=10)
|
||||
|
||||
# Benchmark
|
||||
start = time.perf_counter()
|
||||
for q in queries:
|
||||
client.search(collection_name=collection, query_vector=q, limit=10)
|
||||
elapsed = time.perf_counter() - start
|
||||
|
||||
print(f"QPS: {n_queries / elapsed:.2f}")
|
||||
print(f"Latency: {elapsed / n_queries * 1000:.2f}ms")
|
||||
|
||||
benchmark_search(client, "documents")
|
||||
```
|
||||
|
||||
### Optimal HNSW Parameters
|
||||
|
||||
```python
|
||||
# High recall (slower)
|
||||
client.create_collection(
|
||||
collection_name="high_recall",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
|
||||
hnsw_config=HnswConfigDiff(
|
||||
m=32, # More connections
|
||||
ef_construct=200 # Higher build quality
|
||||
)
|
||||
)
|
||||
|
||||
# High speed (lower recall)
|
||||
client.create_collection(
|
||||
collection_name="high_speed",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
|
||||
hnsw_config=HnswConfigDiff(
|
||||
m=8, # Fewer connections
|
||||
ef_construct=64 # Lower build quality
|
||||
)
|
||||
)
|
||||
|
||||
# Balanced
|
||||
client.create_collection(
|
||||
collection_name="balanced",
|
||||
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
|
||||
hnsw_config=HnswConfigDiff(
|
||||
m=16, # Default
|
||||
ef_construct=100 # Default
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
## Debugging Tips
|
||||
|
||||
### Enable Verbose Logging
|
||||
|
||||
```python
|
||||
import logging
|
||||
|
||||
logging.basicConfig(level=logging.DEBUG)
|
||||
logging.getLogger("qdrant_client").setLevel(logging.DEBUG)
|
||||
```
|
||||
|
||||
### Check Server Logs
|
||||
|
||||
```bash
|
||||
# Docker logs
|
||||
docker logs -f qdrant
|
||||
|
||||
# With timestamps
|
||||
docker logs --timestamps qdrant
|
||||
|
||||
# Last 100 lines
|
||||
docker logs --tail 100 qdrant
|
||||
```
|
||||
|
||||
### Inspect Collection State
|
||||
|
||||
```python
|
||||
# Collection info
|
||||
info = client.get_collection("documents")
|
||||
print(f"Status: {info.status}")
|
||||
print(f"Points: {info.points_count}")
|
||||
print(f"Segments: {len(info.segments)}")
|
||||
print(f"Config: {info.config}")
|
||||
|
||||
# Sample points
|
||||
points, _ = client.scroll(
|
||||
collection_name="documents",
|
||||
limit=5,
|
||||
with_payload=True,
|
||||
with_vectors=True
|
||||
)
|
||||
for p in points:
|
||||
print(f"ID: {p.id}, Payload: {p.payload}")
|
||||
```
|
||||
|
||||
### Test Connection
|
||||
|
||||
```python
|
||||
def test_connection(host="localhost", port=6333):
|
||||
try:
|
||||
client = QdrantClient(host=host, port=port, timeout=5)
|
||||
collections = client.get_collections()
|
||||
print(f"Connected! Collections: {len(collections.collections)}")
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"Connection failed: {e}")
|
||||
return False
|
||||
|
||||
test_connection()
|
||||
```
|
||||
|
||||
## Getting Help
|
||||
|
||||
1. **Documentation**: https://qdrant.tech/documentation/
|
||||
2. **GitHub Issues**: https://github.com/qdrant/qdrant/issues
|
||||
3. **Discord**: https://discord.gg/qdrant
|
||||
4. **Stack Overflow**: Tag `qdrant`
|
||||
|
||||
### Reporting Issues
|
||||
|
||||
Include:
|
||||
- Qdrant version: `curl http://localhost:6333/`
|
||||
- Python client version: `pip show qdrant-client`
|
||||
- Full error traceback
|
||||
- Minimal reproducible code
|
||||
- Collection configuration
|
||||
Reference in New Issue
Block a user