initial: sovereign home — morrowind agent, skills, training-data, research, specs, notes, operational docs

Tracked: morrowind agent (py/cfg), skills/, training-data/, research/,
notes/, specs/, test-results/, metrics/, heartbeat/, briefings/,
memories/, skins/, hooks/, decisions.md, OPERATIONS.md, SOUL.md

Excluded: screenshots, PNGs, binaries, sessions, databases, secrets,
audio cache, timmy-config/ and timmy-telemetry/ (separate repos)
This commit is contained in:
Alexander Whitestone
2026-03-27 13:05:57 -04:00
commit 0d64d8e559
2393 changed files with 178606 additions and 0 deletions

View File

@@ -0,0 +1,3 @@
---
description: Vector similarity search and embedding databases for RAG, semantic search, and AI application backends.
---

View File

@@ -0,0 +1,409 @@
---
name: chroma
description: Open-source embedding database for AI applications. Store embeddings and metadata, perform vector and full-text search, filter by metadata. Simple 4-function API. Scales from notebooks to production clusters. Use for semantic search, RAG applications, or document retrieval. Best for local development and open-source projects.
version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [chromadb, sentence-transformers]
metadata:
hermes:
tags: [RAG, Chroma, Vector Database, Embeddings, Semantic Search, Open Source, Self-Hosted, Document Retrieval, Metadata Filtering]
---
# Chroma - Open-Source Embedding Database
The AI-native database for building LLM applications with memory.
## When to use Chroma
**Use Chroma when:**
- Building RAG (retrieval-augmented generation) applications
- Need local/self-hosted vector database
- Want open-source solution (Apache 2.0)
- Prototyping in notebooks
- Semantic search over documents
- Storing embeddings with metadata
**Metrics**:
- **24,300+ GitHub stars**
- **1,900+ forks**
- **v1.3.3** (stable, weekly releases)
- **Apache 2.0 license**
**Use alternatives instead**:
- **Pinecone**: Managed cloud, auto-scaling
- **FAISS**: Pure similarity search, no metadata
- **Weaviate**: Production ML-native database
- **Qdrant**: High performance, Rust-based
## Quick start
### Installation
```bash
# Python
pip install chromadb
# JavaScript/TypeScript
npm install chromadb @chroma-core/default-embed
```
### Basic usage (Python)
```python
import chromadb
# Create client
client = chromadb.Client()
# Create collection
collection = client.create_collection(name="my_collection")
# Add documents
collection.add(
documents=["This is document 1", "This is document 2"],
metadatas=[{"source": "doc1"}, {"source": "doc2"}],
ids=["id1", "id2"]
)
# Query
results = collection.query(
query_texts=["document about topic"],
n_results=2
)
print(results)
```
## Core operations
### 1. Create collection
```python
# Simple collection
collection = client.create_collection("my_docs")
# With custom embedding function
from chromadb.utils import embedding_functions
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="your-key",
model_name="text-embedding-3-small"
)
collection = client.create_collection(
name="my_docs",
embedding_function=openai_ef
)
# Get existing collection
collection = client.get_collection("my_docs")
# Delete collection
client.delete_collection("my_docs")
```
### 2. Add documents
```python
# Add with auto-generated IDs
collection.add(
documents=["Doc 1", "Doc 2", "Doc 3"],
metadatas=[
{"source": "web", "category": "tutorial"},
{"source": "pdf", "page": 5},
{"source": "api", "timestamp": "2025-01-01"}
],
ids=["id1", "id2", "id3"]
)
# Add with custom embeddings
collection.add(
embeddings=[[0.1, 0.2, ...], [0.3, 0.4, ...]],
documents=["Doc 1", "Doc 2"],
ids=["id1", "id2"]
)
```
### 3. Query (similarity search)
```python
# Basic query
results = collection.query(
query_texts=["machine learning tutorial"],
n_results=5
)
# Query with filters
results = collection.query(
query_texts=["Python programming"],
n_results=3,
where={"source": "web"}
)
# Query with metadata filters
results = collection.query(
query_texts=["advanced topics"],
where={
"$and": [
{"category": "tutorial"},
{"difficulty": {"$gte": 3}}
]
}
)
# Access results
print(results["documents"]) # List of matching documents
print(results["metadatas"]) # Metadata for each doc
print(results["distances"]) # Similarity scores
print(results["ids"]) # Document IDs
```
### 4. Get documents
```python
# Get by IDs
docs = collection.get(
ids=["id1", "id2"]
)
# Get with filters
docs = collection.get(
where={"category": "tutorial"},
limit=10
)
# Get all documents
docs = collection.get()
```
### 5. Update documents
```python
# Update document content
collection.update(
ids=["id1"],
documents=["Updated content"],
metadatas=[{"source": "updated"}]
)
```
### 6. Delete documents
```python
# Delete by IDs
collection.delete(ids=["id1", "id2"])
# Delete with filter
collection.delete(
where={"source": "outdated"}
)
```
## Persistent storage
```python
# Persist to disk
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.create_collection("my_docs")
collection.add(documents=["Doc 1"], ids=["id1"])
# Data persisted automatically
# Reload later with same path
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.get_collection("my_docs")
```
## Embedding functions
### Default (Sentence Transformers)
```python
# Uses sentence-transformers by default
collection = client.create_collection("my_docs")
# Default model: all-MiniLM-L6-v2
```
### OpenAI
```python
from chromadb.utils import embedding_functions
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
api_key="your-key",
model_name="text-embedding-3-small"
)
collection = client.create_collection(
name="openai_docs",
embedding_function=openai_ef
)
```
### HuggingFace
```python
huggingface_ef = embedding_functions.HuggingFaceEmbeddingFunction(
api_key="your-key",
model_name="sentence-transformers/all-mpnet-base-v2"
)
collection = client.create_collection(
name="hf_docs",
embedding_function=huggingface_ef
)
```
### Custom embedding function
```python
from chromadb import Documents, EmbeddingFunction, Embeddings
class MyEmbeddingFunction(EmbeddingFunction):
def __call__(self, input: Documents) -> Embeddings:
# Your embedding logic
return embeddings
my_ef = MyEmbeddingFunction()
collection = client.create_collection(
name="custom_docs",
embedding_function=my_ef
)
```
## Metadata filtering
```python
# Exact match
results = collection.query(
query_texts=["query"],
where={"category": "tutorial"}
)
# Comparison operators
results = collection.query(
query_texts=["query"],
where={"page": {"$gt": 10}} # $gt, $gte, $lt, $lte, $ne
)
# Logical operators
results = collection.query(
query_texts=["query"],
where={
"$and": [
{"category": "tutorial"},
{"difficulty": {"$lte": 3}}
]
} # Also: $or
)
# Contains
results = collection.query(
query_texts=["query"],
where={"tags": {"$in": ["python", "ml"]}}
)
```
## LangChain integration
```python
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Split documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
docs = text_splitter.split_documents(documents)
# Create Chroma vector store
vectorstore = Chroma.from_documents(
documents=docs,
embedding=OpenAIEmbeddings(),
persist_directory="./chroma_db"
)
# Query
results = vectorstore.similarity_search("machine learning", k=3)
# As retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
```
## LlamaIndex integration
```python
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import VectorStoreIndex, StorageContext
import chromadb
# Initialize Chroma
db = chromadb.PersistentClient(path="./chroma_db")
collection = db.get_or_create_collection("my_collection")
# Create vector store
vector_store = ChromaVectorStore(chroma_collection=collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Create index
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context
)
# Query
query_engine = index.as_query_engine()
response = query_engine.query("What is machine learning?")
```
## Server mode
```python
# Run Chroma server
# Terminal: chroma run --path ./chroma_db --port 8000
# Connect to server
import chromadb
from chromadb.config import Settings
client = chromadb.HttpClient(
host="localhost",
port=8000,
settings=Settings(anonymized_telemetry=False)
)
# Use as normal
collection = client.get_or_create_collection("my_docs")
```
## Best practices
1. **Use persistent client** - Don't lose data on restart
2. **Add metadata** - Enables filtering and tracking
3. **Batch operations** - Add multiple docs at once
4. **Choose right embedding model** - Balance speed/quality
5. **Use filters** - Narrow search space
6. **Unique IDs** - Avoid collisions
7. **Regular backups** - Copy chroma_db directory
8. **Monitor collection size** - Scale up if needed
9. **Test embedding functions** - Ensure quality
10. **Use server mode for production** - Better for multi-user
## Performance
| Operation | Latency | Notes |
|-----------|---------|-------|
| Add 100 docs | ~1-3s | With embedding |
| Query (top 10) | ~50-200ms | Depends on collection size |
| Metadata filter | ~10-50ms | Fast with proper indexing |
## Resources
- **GitHub**: https://github.com/chroma-core/chroma ⭐ 24,300+
- **Docs**: https://docs.trychroma.com
- **Discord**: https://discord.gg/MMeYNTmh3x
- **Version**: 1.3.3+
- **License**: Apache 2.0

View File

@@ -0,0 +1,38 @@
# Chroma Integration Guide
Integration with LangChain, LlamaIndex, and frameworks.
## LangChain
```python
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
vectorstore = Chroma.from_documents(
documents=docs,
embedding=OpenAIEmbeddings(),
persist_directory="./chroma_db"
)
# Query
results = vectorstore.similarity_search("query", k=3)
# As retriever
retriever = vectorstore.as_retriever()
```
## LlamaIndex
```python
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb
db = chromadb.PersistentClient(path="./chroma_db")
collection = db.get_or_create_collection("docs")
vector_store = ChromaVectorStore(chroma_collection=collection)
```
## Resources
- **Docs**: https://docs.trychroma.com

View File

@@ -0,0 +1,224 @@
---
name: faiss
description: Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). Use for fast k-NN search, large-scale vector retrieval, or when you need pure similarity search without metadata. Best for high-performance applications.
version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [faiss-cpu, faiss-gpu, numpy]
metadata:
hermes:
tags: [RAG, FAISS, Similarity Search, Vector Search, Facebook AI, GPU Acceleration, Billion-Scale, K-NN, HNSW, High Performance, Large Scale]
---
# FAISS - Efficient Similarity Search
Facebook AI's library for billion-scale vector similarity search.
## When to use FAISS
**Use FAISS when:**
- Need fast similarity search on large vector datasets (millions/billions)
- GPU acceleration required
- Pure vector similarity (no metadata filtering needed)
- High throughput, low latency critical
- Offline/batch processing of embeddings
**Metrics**:
- **31,700+ GitHub stars**
- Meta/Facebook AI Research
- **Handles billions of vectors**
- **C++** with Python bindings
**Use alternatives instead**:
- **Chroma/Pinecone**: Need metadata filtering
- **Weaviate**: Need full database features
- **Annoy**: Simpler, fewer features
## Quick start
### Installation
```bash
# CPU only
pip install faiss-cpu
# GPU support
pip install faiss-gpu
```
### Basic usage
```python
import faiss
import numpy as np
# Create sample data (1000 vectors, 128 dimensions)
d = 128
nb = 1000
vectors = np.random.random((nb, d)).astype('float32')
# Create index
index = faiss.IndexFlatL2(d) # L2 distance
index.add(vectors) # Add vectors
# Search
k = 5 # Find 5 nearest neighbors
query = np.random.random((1, d)).astype('float32')
distances, indices = index.search(query, k)
print(f"Nearest neighbors: {indices}")
print(f"Distances: {distances}")
```
## Index types
### 1. Flat (exact search)
```python
# L2 (Euclidean) distance
index = faiss.IndexFlatL2(d)
# Inner product (cosine similarity if normalized)
index = faiss.IndexFlatIP(d)
# Slowest, most accurate
```
### 2. IVF (inverted file) - Fast approximate
```python
# Create quantizer
quantizer = faiss.IndexFlatL2(d)
# IVF index with 100 clusters
nlist = 100
index = faiss.IndexIVFFlat(quantizer, d, nlist)
# Train on data
index.train(vectors)
# Add vectors
index.add(vectors)
# Search (nprobe = clusters to search)
index.nprobe = 10
distances, indices = index.search(query, k)
```
### 3. HNSW (Hierarchical NSW) - Best quality/speed
```python
# HNSW index
M = 32 # Number of connections per layer
index = faiss.IndexHNSWFlat(d, M)
# No training needed
index.add(vectors)
# Search
distances, indices = index.search(query, k)
```
### 4. Product Quantization - Memory efficient
```python
# PQ reduces memory by 16-32×
m = 8 # Number of subquantizers
nbits = 8
index = faiss.IndexPQ(d, m, nbits)
# Train and add
index.train(vectors)
index.add(vectors)
```
## Save and load
```python
# Save index
faiss.write_index(index, "large.index")
# Load index
index = faiss.read_index("large.index")
# Continue using
distances, indices = index.search(query, k)
```
## GPU acceleration
```python
# Single GPU
res = faiss.StandardGpuResources()
index_cpu = faiss.IndexFlatL2(d)
index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu) # GPU 0
# Multi-GPU
index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)
# 10-100× faster than CPU
```
## LangChain integration
```python
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
# Create FAISS vector store
vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())
# Save
vectorstore.save_local("faiss_index")
# Load
vectorstore = FAISS.load_local(
"faiss_index",
OpenAIEmbeddings(),
allow_dangerous_deserialization=True
)
# Search
results = vectorstore.similarity_search("query", k=5)
```
## LlamaIndex integration
```python
from llama_index.vector_stores.faiss import FaissVectorStore
import faiss
# Create FAISS index
d = 1536
faiss_index = faiss.IndexFlatL2(d)
vector_store = FaissVectorStore(faiss_index=faiss_index)
```
## Best practices
1. **Choose right index type** - Flat for <10K, IVF for 10K-1M, HNSW for quality
2. **Normalize for cosine** - Use IndexFlatIP with normalized vectors
3. **Use GPU for large datasets** - 10-100× faster
4. **Save trained indices** - Training is expensive
5. **Tune nprobe/ef_search** - Balance speed/accuracy
6. **Monitor memory** - PQ for large datasets
7. **Batch queries** - Better GPU utilization
## Performance
| Index Type | Build Time | Search Time | Memory | Accuracy |
|------------|------------|-------------|--------|----------|
| Flat | Fast | Slow | High | 100% |
| IVF | Medium | Fast | Medium | 95-99% |
| HNSW | Slow | Fastest | High | 99% |
| PQ | Medium | Fast | Low | 90-95% |
## Resources
- **GitHub**: https://github.com/facebookresearch/faiss ⭐ 31,700+
- **Wiki**: https://github.com/facebookresearch/faiss/wiki
- **License**: MIT

View File

@@ -0,0 +1,280 @@
# FAISS Index Types Guide
Complete guide to choosing and using FAISS index types.
## Index selection guide
| Dataset Size | Index Type | Training | Accuracy | Speed |
|--------------|------------|----------|----------|-------|
| < 10K | Flat | No | 100% | Slow |
| 10K-1M | IVF | Yes | 95-99% | Fast |
| 1M-10M | HNSW | No | 99% | Fastest |
| > 10M | IVF+PQ | Yes | 90-95% | Fast, low memory |
## Flat indices (exact search)
### IndexFlatL2 - L2 (Euclidean) distance
```python
import faiss
import numpy as np
d = 128 # Dimension
index = faiss.IndexFlatL2(d)
# Add vectors
vectors = np.random.random((1000, d)).astype('float32')
index.add(vectors)
# Search
k = 5
query = np.random.random((1, d)).astype('float32')
distances, indices = index.search(query, k)
```
**Use when:**
- Dataset < 10,000 vectors
- Need 100% accuracy
- Serving as baseline
### IndexFlatIP - Inner product (cosine similarity)
```python
# For cosine similarity, normalize vectors first
import faiss
d = 128
index = faiss.IndexFlatIP(d)
# Normalize vectors (required for cosine similarity)
faiss.normalize_L2(vectors)
index.add(vectors)
# Search
faiss.normalize_L2(query)
distances, indices = index.search(query, k)
```
**Use when:**
- Need cosine similarity
- Recommendation systems
- Text embeddings
## IVF indices (inverted file)
### IndexIVFFlat - Cluster-based search
```python
# Create quantizer
quantizer = faiss.IndexFlatL2(d)
# Create IVF index with 100 clusters
nlist = 100 # Number of clusters
index = faiss.IndexIVFFlat(quantizer, d, nlist)
# Train on data (required!)
index.train(vectors)
# Add vectors
index.add(vectors)
# Search (nprobe = clusters to search)
index.nprobe = 10 # Search 10 closest clusters
distances, indices = index.search(query, k)
```
**Parameters:**
- `nlist`: Number of clusters (√N to 4√N recommended)
- `nprobe`: Clusters to search (1-nlist, higher = more accurate)
**Use when:**
- Dataset 10K-1M vectors
- Need fast approximate search
- Can afford training time
### Tuning nprobe
```python
# Test different nprobe values
for nprobe in [1, 5, 10, 20, 50]:
index.nprobe = nprobe
distances, indices = index.search(query, k)
# Measure recall/speed trade-off
```
**Guidelines:**
- `nprobe=1`: Fastest, ~50% recall
- `nprobe=10`: Good balance, ~95% recall
- `nprobe=nlist`: Exact search (same as Flat)
## HNSW indices (graph-based)
### IndexHNSWFlat - Hierarchical NSW
```python
# HNSW index
M = 32 # Number of connections per layer (16-64)
index = faiss.IndexHNSWFlat(d, M)
# Optional: Set ef_construction (build time parameter)
index.hnsw.efConstruction = 40 # Higher = better quality, slower build
# Add vectors (no training needed!)
index.add(vectors)
# Search
index.hnsw.efSearch = 16 # Search time parameter
distances, indices = index.search(query, k)
```
**Parameters:**
- `M`: Connections per layer (16-64, default 32)
- `efConstruction`: Build quality (40-200, higher = better)
- `efSearch`: Search quality (16-512, higher = more accurate)
**Use when:**
- Need best quality approximate search
- Can afford higher memory (more connections)
- Dataset 1M-10M vectors
## PQ indices (product quantization)
### IndexPQ - Memory-efficient
```python
# PQ reduces memory by 16-32×
m = 8 # Number of subquantizers (divides d)
nbits = 8 # Bits per subquantizer
index = faiss.IndexPQ(d, m, nbits)
# Train (required!)
index.train(vectors)
# Add vectors
index.add(vectors)
# Search
distances, indices = index.search(query, k)
```
**Parameters:**
- `m`: Subquantizers (d must be divisible by m)
- `nbits`: Bits per code (8 or 16)
**Memory savings:**
- Original: d × 4 bytes (float32)
- PQ: m bytes
- Compression ratio: 4d/m
**Use when:**
- Limited memory
- Large datasets (> 10M vectors)
- Can accept ~90-95% accuracy
### IndexIVFPQ - IVF + PQ combined
```python
# Best for very large datasets
nlist = 4096
m = 8
nbits = 8
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFPQ(quantizer, d, nlist, m, nbits)
# Train
index.train(vectors)
index.add(vectors)
# Search
index.nprobe = 32
distances, indices = index.search(query, k)
```
**Use when:**
- Dataset > 10M vectors
- Need fast search + low memory
- Can accept 90-95% accuracy
## GPU indices
### Single GPU
```python
import faiss
# Create CPU index
index_cpu = faiss.IndexFlatL2(d)
# Move to GPU
res = faiss.StandardGpuResources() # GPU resources
index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu) # GPU 0
# Use normally
index_gpu.add(vectors)
distances, indices = index_gpu.search(query, k)
```
### Multi-GPU
```python
# Use all available GPUs
index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)
# Or specific GPUs
gpus = [0, 1, 2, 3] # Use GPUs 0-3
index_gpu = faiss.index_cpu_to_gpus_list(index_cpu, gpus)
```
**Speedup:**
- Single GPU: 10-50× faster than CPU
- Multi-GPU: Near-linear scaling
## Index factory
```python
# Easy index creation with string descriptors
index = faiss.index_factory(d, "IVF100,Flat")
index = faiss.index_factory(d, "HNSW32")
index = faiss.index_factory(d, "IVF4096,PQ8")
# Train and use
index.train(vectors)
index.add(vectors)
```
**Common descriptors:**
- `"Flat"`: Exact search
- `"IVF100,Flat"`: IVF with 100 clusters
- `"HNSW32"`: HNSW with M=32
- `"IVF4096,PQ8"`: IVF + PQ compression
## Performance comparison
### Search speed (1M vectors, k=10)
| Index | Build Time | Search Time | Memory | Recall |
|-------|------------|-------------|--------|--------|
| Flat | 0s | 50ms | 512 MB | 100% |
| IVF100 | 5s | 2ms | 512 MB | 95% |
| HNSW32 | 60s | 1ms | 1GB | 99% |
| IVF4096+PQ8 | 30s | 3ms | 32 MB | 90% |
*CPU (16 cores), 128-dim vectors*
## Best practices
1. **Start with Flat** - Baseline for comparison
2. **Use IVF for medium datasets** - Good balance
3. **Use HNSW for best quality** - If memory allows
4. **Add PQ for memory savings** - Large datasets
5. **GPU for > 100K vectors** - 10-50× speedup
6. **Tune nprobe/efSearch** - Trade-off speed/accuracy
7. **Train on representative data** - Better clustering
8. **Save trained indices** - Avoid retraining
## Resources
- **Wiki**: https://github.com/facebookresearch/faiss/wiki
- **Paper**: https://arxiv.org/abs/1702.08734

View File

@@ -0,0 +1,361 @@
---
name: pinecone
description: Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure.
version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [pinecone-client]
metadata:
hermes:
tags: [RAG, Pinecone, Vector Database, Managed Service, Serverless, Hybrid Search, Production, Auto-Scaling, Low Latency, Recommendations]
---
# Pinecone - Managed Vector Database
The vector database for production AI applications.
## When to use Pinecone
**Use when:**
- Need managed, serverless vector database
- Production RAG applications
- Auto-scaling required
- Low latency critical (<100ms)
- Don't want to manage infrastructure
- Need hybrid search (dense + sparse vectors)
**Metrics**:
- Fully managed SaaS
- Auto-scales to billions of vectors
- **p95 latency <100ms**
- 99.9% uptime SLA
**Use alternatives instead**:
- **Chroma**: Self-hosted, open-source
- **FAISS**: Offline, pure similarity search
- **Weaviate**: Self-hosted with more features
## Quick start
### Installation
```bash
pip install pinecone-client
```
### Basic usage
```python
from pinecone import Pinecone, ServerlessSpec
# Initialize
pc = Pinecone(api_key="your-api-key")
# Create index
pc.create_index(
name="my-index",
dimension=1536, # Must match embedding dimension
metric="cosine", # or "euclidean", "dotproduct"
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
# Connect to index
index = pc.Index("my-index")
# Upsert vectors
index.upsert(vectors=[
{"id": "vec1", "values": [0.1, 0.2, ...], "metadata": {"category": "A"}},
{"id": "vec2", "values": [0.3, 0.4, ...], "metadata": {"category": "B"}}
])
# Query
results = index.query(
vector=[0.1, 0.2, ...],
top_k=5,
include_metadata=True
)
print(results["matches"])
```
## Core operations
### Create index
```python
# Serverless (recommended)
pc.create_index(
name="my-index",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(
cloud="aws", # or "gcp", "azure"
region="us-east-1"
)
)
# Pod-based (for consistent performance)
from pinecone import PodSpec
pc.create_index(
name="my-index",
dimension=1536,
metric="cosine",
spec=PodSpec(
environment="us-east1-gcp",
pod_type="p1.x1"
)
)
```
### Upsert vectors
```python
# Single upsert
index.upsert(vectors=[
{
"id": "doc1",
"values": [0.1, 0.2, ...], # 1536 dimensions
"metadata": {
"text": "Document content",
"category": "tutorial",
"timestamp": "2025-01-01"
}
}
])
# Batch upsert (recommended)
vectors = [
{"id": f"vec{i}", "values": embedding, "metadata": metadata}
for i, (embedding, metadata) in enumerate(zip(embeddings, metadatas))
]
index.upsert(vectors=vectors, batch_size=100)
```
### Query vectors
```python
# Basic query
results = index.query(
vector=[0.1, 0.2, ...],
top_k=10,
include_metadata=True,
include_values=False
)
# With metadata filtering
results = index.query(
vector=[0.1, 0.2, ...],
top_k=5,
filter={"category": {"$eq": "tutorial"}}
)
# Namespace query
results = index.query(
vector=[0.1, 0.2, ...],
top_k=5,
namespace="production"
)
# Access results
for match in results["matches"]:
print(f"ID: {match['id']}")
print(f"Score: {match['score']}")
print(f"Metadata: {match['metadata']}")
```
### Metadata filtering
```python
# Exact match
filter = {"category": "tutorial"}
# Comparison
filter = {"price": {"$gte": 100}} # $gt, $gte, $lt, $lte, $ne
# Logical operators
filter = {
"$and": [
{"category": "tutorial"},
{"difficulty": {"$lte": 3}}
]
} # Also: $or
# In operator
filter = {"tags": {"$in": ["python", "ml"]}}
```
## Namespaces
```python
# Partition data by namespace
index.upsert(
vectors=[{"id": "vec1", "values": [...]}],
namespace="user-123"
)
# Query specific namespace
results = index.query(
vector=[...],
namespace="user-123",
top_k=5
)
# List namespaces
stats = index.describe_index_stats()
print(stats['namespaces'])
```
## Hybrid search (dense + sparse)
```python
# Upsert with sparse vectors
index.upsert(vectors=[
{
"id": "doc1",
"values": [0.1, 0.2, ...], # Dense vector
"sparse_values": {
"indices": [10, 45, 123], # Token IDs
"values": [0.5, 0.3, 0.8] # TF-IDF scores
},
"metadata": {"text": "..."}
}
])
# Hybrid query
results = index.query(
vector=[0.1, 0.2, ...],
sparse_vector={
"indices": [10, 45],
"values": [0.5, 0.3]
},
top_k=5,
alpha=0.5 # 0=sparse, 1=dense, 0.5=hybrid
)
```
## LangChain integration
```python
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings
# Create vector store
vectorstore = PineconeVectorStore.from_documents(
documents=docs,
embedding=OpenAIEmbeddings(),
index_name="my-index"
)
# Query
results = vectorstore.similarity_search("query", k=5)
# With metadata filter
results = vectorstore.similarity_search(
"query",
k=5,
filter={"category": "tutorial"}
)
# As retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 10})
```
## LlamaIndex integration
```python
from llama_index.vector_stores.pinecone import PineconeVectorStore
# Connect to Pinecone
pc = Pinecone(api_key="your-key")
pinecone_index = pc.Index("my-index")
# Create vector store
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
# Use in LlamaIndex
from llama_index.core import StorageContext, VectorStoreIndex
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
```
## Index management
```python
# List indices
indexes = pc.list_indexes()
# Describe index
index_info = pc.describe_index("my-index")
print(index_info)
# Get index stats
stats = index.describe_index_stats()
print(f"Total vectors: {stats['total_vector_count']}")
print(f"Namespaces: {stats['namespaces']}")
# Delete index
pc.delete_index("my-index")
```
## Delete vectors
```python
# Delete by ID
index.delete(ids=["vec1", "vec2"])
# Delete by filter
index.delete(filter={"category": "old"})
# Delete all in namespace
index.delete(delete_all=True, namespace="test")
# Delete entire index
index.delete(delete_all=True)
```
## Best practices
1. **Use serverless** - Auto-scaling, cost-effective
2. **Batch upserts** - More efficient (100-200 per batch)
3. **Add metadata** - Enable filtering
4. **Use namespaces** - Isolate data by user/tenant
5. **Monitor usage** - Check Pinecone dashboard
6. **Optimize filters** - Index frequently filtered fields
7. **Test with free tier** - 1 index, 100K vectors free
8. **Use hybrid search** - Better quality
9. **Set appropriate dimensions** - Match embedding model
10. **Regular backups** - Export important data
## Performance
| Operation | Latency | Notes |
|-----------|---------|-------|
| Upsert | ~50-100ms | Per batch |
| Query (p50) | ~50ms | Depends on index size |
| Query (p95) | ~100ms | SLA target |
| Metadata filter | ~+10-20ms | Additional overhead |
## Pricing (as of 2025)
**Serverless**:
- $0.096 per million read units
- $0.06 per million write units
- $0.06 per GB storage/month
**Free tier**:
- 1 serverless index
- 100K vectors (1536 dimensions)
- Great for prototyping
## Resources
- **Website**: https://www.pinecone.io
- **Docs**: https://docs.pinecone.io
- **Console**: https://app.pinecone.io
- **Pricing**: https://www.pinecone.io/pricing

View File

@@ -0,0 +1,181 @@
# Pinecone Deployment Guide
Production deployment patterns for Pinecone.
## Serverless vs Pod-based
### Serverless (Recommended)
```python
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="your-key")
# Create serverless index
pc.create_index(
name="my-index",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(
cloud="aws", # or "gcp", "azure"
region="us-east-1"
)
)
```
**Benefits:**
- Auto-scaling
- Pay per usage
- No infrastructure management
- Cost-effective for variable load
**Use when:**
- Variable traffic
- Cost optimization important
- Don't need consistent latency
### Pod-based
```python
from pinecone import PodSpec
pc.create_index(
name="my-index",
dimension=1536,
metric="cosine",
spec=PodSpec(
environment="us-east1-gcp",
pod_type="p1.x1", # or p1.x2, p1.x4, p1.x8
pods=2, # Number of pods
replicas=2 # High availability
)
)
```
**Benefits:**
- Consistent performance
- Predictable latency
- Higher throughput
- Dedicated resources
**Use when:**
- Production workloads
- Need consistent p95 latency
- High throughput required
## Hybrid search
### Dense + Sparse vectors
```python
# Upsert with both dense and sparse vectors
index.upsert(vectors=[
{
"id": "doc1",
"values": [0.1, 0.2, ...], # Dense (semantic)
"sparse_values": {
"indices": [10, 45, 123], # Token IDs
"values": [0.5, 0.3, 0.8] # TF-IDF/BM25 scores
},
"metadata": {"text": "..."}
}
])
# Hybrid query
results = index.query(
vector=[0.1, 0.2, ...], # Dense query
sparse_vector={
"indices": [10, 45],
"values": [0.5, 0.3]
},
top_k=10,
alpha=0.5 # 0=sparse only, 1=dense only, 0.5=balanced
)
```
**Benefits:**
- Best of both worlds
- Semantic + keyword matching
- Better recall than either alone
## Namespaces for multi-tenancy
```python
# Separate data by user/tenant
index.upsert(
vectors=[{"id": "doc1", "values": [...]}],
namespace="user-123"
)
# Query specific namespace
results = index.query(
vector=[...],
namespace="user-123",
top_k=5
)
# List namespaces
stats = index.describe_index_stats()
print(stats['namespaces'])
```
**Use cases:**
- Multi-tenant SaaS
- User-specific data isolation
- A/B testing (prod/staging namespaces)
## Metadata filtering
### Exact match
```python
results = index.query(
vector=[...],
filter={"category": "tutorial"},
top_k=5
)
```
### Range queries
```python
results = index.query(
vector=[...],
filter={"price": {"$gte": 100, "$lte": 500}},
top_k=5
)
```
### Complex filters
```python
results = index.query(
vector=[...],
filter={
"$and": [
{"category": {"$in": ["tutorial", "guide"]}},
{"difficulty": {"$lte": 3}},
{"published": {"$gte": "2024-01-01"}}
]
},
top_k=5
)
```
## Best practices
1. **Use serverless for development** - Cost-effective
2. **Switch to pods for production** - Consistent performance
3. **Implement namespaces** - Multi-tenancy
4. **Add metadata strategically** - Enable filtering
5. **Use hybrid search** - Better quality
6. **Batch upserts** - 100-200 vectors per batch
7. **Monitor usage** - Check Pinecone dashboard
8. **Set up alerts** - Usage/cost thresholds
9. **Regular backups** - Export important data
10. **Test filters** - Verify performance
## Resources
- **Docs**: https://docs.pinecone.io
- **Console**: https://app.pinecone.io

View File

@@ -0,0 +1,496 @@
---
name: qdrant-vector-search
description: High-performance vector similarity search engine for RAG and semantic search. Use when building production RAG systems requiring fast nearest neighbor search, hybrid search with filtering, or scalable vector storage with Rust-powered performance.
version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [qdrant-client>=1.12.0]
metadata:
hermes:
tags: [RAG, Vector Search, Qdrant, Semantic Search, Embeddings, Similarity Search, HNSW, Production, Distributed]
---
# Qdrant - Vector Similarity Search Engine
High-performance vector database written in Rust for production RAG and semantic search.
## When to use Qdrant
**Use Qdrant when:**
- Building production RAG systems requiring low latency
- Need hybrid search (vectors + metadata filtering)
- Require horizontal scaling with sharding/replication
- Want on-premise deployment with full data control
- Need multi-vector storage per record (dense + sparse)
- Building real-time recommendation systems
**Key features:**
- **Rust-powered**: Memory-safe, high performance
- **Rich filtering**: Filter by any payload field during search
- **Multiple vectors**: Dense, sparse, multi-dense per point
- **Quantization**: Scalar, product, binary for memory efficiency
- **Distributed**: Raft consensus, sharding, replication
- **REST + gRPC**: Both APIs with full feature parity
**Use alternatives instead:**
- **Chroma**: Simpler setup, embedded use cases
- **FAISS**: Maximum raw speed, research/batch processing
- **Pinecone**: Fully managed, zero ops preferred
- **Weaviate**: GraphQL preference, built-in vectorizers
## Quick start
### Installation
```bash
# Python client
pip install qdrant-client
# Docker (recommended for development)
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant
# Docker with persistent storage
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant
```
### Basic usage
```python
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
# Connect to Qdrant
client = QdrantClient(host="localhost", port=6333)
# Create collection
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)
# Insert vectors with payload
client.upsert(
collection_name="documents",
points=[
PointStruct(
id=1,
vector=[0.1, 0.2, ...], # 384-dim vector
payload={"title": "Doc 1", "category": "tech"}
),
PointStruct(
id=2,
vector=[0.3, 0.4, ...],
payload={"title": "Doc 2", "category": "science"}
)
]
)
# Search with filtering
results = client.search(
collection_name="documents",
query_vector=[0.15, 0.25, ...],
query_filter={
"must": [{"key": "category", "match": {"value": "tech"}}]
},
limit=10
)
for point in results:
print(f"ID: {point.id}, Score: {point.score}, Payload: {point.payload}")
```
## Core concepts
### Points - Basic data unit
```python
from qdrant_client.models import PointStruct
# Point = ID + Vector(s) + Payload
point = PointStruct(
id=123, # Integer or UUID string
vector=[0.1, 0.2, 0.3, ...], # Dense vector
payload={ # Arbitrary JSON metadata
"title": "Document title",
"category": "tech",
"timestamp": 1699900000,
"tags": ["python", "ml"]
}
)
# Batch upsert (recommended)
client.upsert(
collection_name="documents",
points=[point1, point2, point3],
wait=True # Wait for indexing
)
```
### Collections - Vector containers
```python
from qdrant_client.models import VectorParams, Distance, HnswConfigDiff
# Create with HNSW configuration
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(
size=384, # Vector dimensions
distance=Distance.COSINE # COSINE, EUCLID, DOT, MANHATTAN
),
hnsw_config=HnswConfigDiff(
m=16, # Connections per node (default 16)
ef_construct=100, # Build-time accuracy (default 100)
full_scan_threshold=10000 # Switch to brute force below this
),
on_disk_payload=True # Store payload on disk
)
# Collection info
info = client.get_collection("documents")
print(f"Points: {info.points_count}, Vectors: {info.vectors_count}")
```
### Distance metrics
| Metric | Use Case | Range |
|--------|----------|-------|
| `COSINE` | Text embeddings, normalized vectors | 0 to 2 |
| `EUCLID` | Spatial data, image features | 0 to ∞ |
| `DOT` | Recommendations, unnormalized | -∞ to ∞ |
| `MANHATTAN` | Sparse features, discrete data | 0 to ∞ |
## Search operations
### Basic search
```python
# Simple nearest neighbor search
results = client.search(
collection_name="documents",
query_vector=[0.1, 0.2, ...],
limit=10,
with_payload=True,
with_vectors=False # Don't return vectors (faster)
)
```
### Filtered search
```python
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
# Complex filtering
results = client.search(
collection_name="documents",
query_vector=query_embedding,
query_filter=Filter(
must=[
FieldCondition(key="category", match=MatchValue(value="tech")),
FieldCondition(key="timestamp", range=Range(gte=1699000000))
],
must_not=[
FieldCondition(key="status", match=MatchValue(value="archived"))
]
),
limit=10
)
# Shorthand filter syntax
results = client.search(
collection_name="documents",
query_vector=query_embedding,
query_filter={
"must": [
{"key": "category", "match": {"value": "tech"}},
{"key": "price", "range": {"gte": 10, "lte": 100}}
]
},
limit=10
)
```
### Batch search
```python
from qdrant_client.models import SearchRequest
# Multiple queries in one request
results = client.search_batch(
collection_name="documents",
requests=[
SearchRequest(vector=[0.1, ...], limit=5),
SearchRequest(vector=[0.2, ...], limit=5, filter={"must": [...]}),
SearchRequest(vector=[0.3, ...], limit=10)
]
)
```
## RAG integration
### With sentence-transformers
```python
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct
# Initialize
encoder = SentenceTransformer("all-MiniLM-L6-v2")
client = QdrantClient(host="localhost", port=6333)
# Create collection
client.create_collection(
collection_name="knowledge_base",
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)
# Index documents
documents = [
{"id": 1, "text": "Python is a programming language", "source": "wiki"},
{"id": 2, "text": "Machine learning uses algorithms", "source": "textbook"},
]
points = [
PointStruct(
id=doc["id"],
vector=encoder.encode(doc["text"]).tolist(),
payload={"text": doc["text"], "source": doc["source"]}
)
for doc in documents
]
client.upsert(collection_name="knowledge_base", points=points)
# RAG retrieval
def retrieve(query: str, top_k: int = 5) -> list[dict]:
query_vector = encoder.encode(query).tolist()
results = client.search(
collection_name="knowledge_base",
query_vector=query_vector,
limit=top_k
)
return [{"text": r.payload["text"], "score": r.score} for r in results]
# Use in RAG pipeline
context = retrieve("What is Python?")
prompt = f"Context: {context}\n\nQuestion: What is Python?"
```
### With LangChain
```python
from langchain_community.vectorstores import Qdrant
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Qdrant.from_documents(documents, embeddings, url="http://localhost:6333", collection_name="docs")
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
```
### With LlamaIndex
```python
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import VectorStoreIndex, StorageContext
vector_store = QdrantVectorStore(client=client, collection_name="llama_docs")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)
query_engine = index.as_query_engine()
```
## Multi-vector support
### Named vectors (different embedding models)
```python
from qdrant_client.models import VectorParams, Distance
# Collection with multiple vector types
client.create_collection(
collection_name="hybrid_search",
vectors_config={
"dense": VectorParams(size=384, distance=Distance.COSINE),
"sparse": VectorParams(size=30000, distance=Distance.DOT)
}
)
# Insert with named vectors
client.upsert(
collection_name="hybrid_search",
points=[
PointStruct(
id=1,
vector={
"dense": dense_embedding,
"sparse": sparse_embedding
},
payload={"text": "document text"}
)
]
)
# Search specific vector
results = client.search(
collection_name="hybrid_search",
query_vector=("dense", query_dense), # Specify which vector
limit=10
)
```
### Sparse vectors (BM25, SPLADE)
```python
from qdrant_client.models import SparseVectorParams, SparseIndexParams, SparseVector
# Collection with sparse vectors
client.create_collection(
collection_name="sparse_search",
vectors_config={},
sparse_vectors_config={"text": SparseVectorParams(index=SparseIndexParams(on_disk=False))}
)
# Insert sparse vector
client.upsert(
collection_name="sparse_search",
points=[PointStruct(id=1, vector={"text": SparseVector(indices=[1, 5, 100], values=[0.5, 0.8, 0.2])}, payload={"text": "document"})]
)
```
## Quantization (memory optimization)
```python
from qdrant_client.models import ScalarQuantization, ScalarQuantizationConfig, ScalarType
# Scalar quantization (4x memory reduction)
client.create_collection(
collection_name="quantized",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
quantization_config=ScalarQuantization(
scalar=ScalarQuantizationConfig(
type=ScalarType.INT8,
quantile=0.99, # Clip outliers
always_ram=True # Keep quantized in RAM
)
)
)
# Search with rescoring
results = client.search(
collection_name="quantized",
query_vector=query,
search_params={"quantization": {"rescore": True}}, # Rescore top results
limit=10
)
```
## Payload indexing
```python
from qdrant_client.models import PayloadSchemaType
# Create payload index for faster filtering
client.create_payload_index(
collection_name="documents",
field_name="category",
field_schema=PayloadSchemaType.KEYWORD
)
client.create_payload_index(
collection_name="documents",
field_name="timestamp",
field_schema=PayloadSchemaType.INTEGER
)
# Index types: KEYWORD, INTEGER, FLOAT, GEO, TEXT (full-text), BOOL
```
## Production deployment
### Qdrant Cloud
```python
from qdrant_client import QdrantClient
# Connect to Qdrant Cloud
client = QdrantClient(
url="https://your-cluster.cloud.qdrant.io",
api_key="your-api-key"
)
```
### Performance tuning
```python
# Optimize for search speed (higher recall)
client.update_collection(
collection_name="documents",
hnsw_config=HnswConfigDiff(ef_construct=200, m=32)
)
# Optimize for indexing speed (bulk loads)
client.update_collection(
collection_name="documents",
optimizer_config={"indexing_threshold": 20000}
)
```
## Best practices
1. **Batch operations** - Use batch upsert/search for efficiency
2. **Payload indexing** - Index fields used in filters
3. **Quantization** - Enable for large collections (>1M vectors)
4. **Sharding** - Use for collections >10M vectors
5. **On-disk storage** - Enable `on_disk_payload` for large payloads
6. **Connection pooling** - Reuse client instances
## Common issues
**Slow search with filters:**
```python
# Create payload index for filtered fields
client.create_payload_index(
collection_name="docs",
field_name="category",
field_schema=PayloadSchemaType.KEYWORD
)
```
**Out of memory:**
```python
# Enable quantization and on-disk storage
client.create_collection(
collection_name="large_collection",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
quantization_config=ScalarQuantization(...),
on_disk_payload=True
)
```
**Connection issues:**
```python
# Use timeout and retry
client = QdrantClient(
host="localhost",
port=6333,
timeout=30,
prefer_grpc=True # gRPC for better performance
)
```
## References
- **[Advanced Usage](references/advanced-usage.md)** - Distributed mode, hybrid search, recommendations
- **[Troubleshooting](references/troubleshooting.md)** - Common issues, debugging, performance tuning
## Resources
- **GitHub**: https://github.com/qdrant/qdrant (22k+ stars)
- **Docs**: https://qdrant.tech/documentation/
- **Python Client**: https://github.com/qdrant/qdrant-client
- **Cloud**: https://cloud.qdrant.io
- **Version**: 1.12.0+
- **License**: Apache 2.0

View File

@@ -0,0 +1,648 @@
# Qdrant Advanced Usage Guide
## Distributed Deployment
### Cluster Setup
Qdrant uses Raft consensus for distributed coordination.
```yaml
# docker-compose.yml for 3-node cluster
version: '3.8'
services:
qdrant-node-1:
image: qdrant/qdrant:latest
ports:
- "6333:6333"
- "6334:6334"
- "6335:6335"
volumes:
- ./node1_storage:/qdrant/storage
environment:
- QDRANT__CLUSTER__ENABLED=true
- QDRANT__CLUSTER__P2P__PORT=6335
- QDRANT__SERVICE__HTTP_PORT=6333
- QDRANT__SERVICE__GRPC_PORT=6334
qdrant-node-2:
image: qdrant/qdrant:latest
ports:
- "6343:6333"
- "6344:6334"
- "6345:6335"
volumes:
- ./node2_storage:/qdrant/storage
environment:
- QDRANT__CLUSTER__ENABLED=true
- QDRANT__CLUSTER__P2P__PORT=6335
- QDRANT__CLUSTER__BOOTSTRAP=http://qdrant-node-1:6335
depends_on:
- qdrant-node-1
qdrant-node-3:
image: qdrant/qdrant:latest
ports:
- "6353:6333"
- "6354:6334"
- "6355:6335"
volumes:
- ./node3_storage:/qdrant/storage
environment:
- QDRANT__CLUSTER__ENABLED=true
- QDRANT__CLUSTER__P2P__PORT=6335
- QDRANT__CLUSTER__BOOTSTRAP=http://qdrant-node-1:6335
depends_on:
- qdrant-node-1
```
### Sharding Configuration
```python
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, ShardingMethod
client = QdrantClient(host="localhost", port=6333)
# Create sharded collection
client.create_collection(
collection_name="large_collection",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
shard_number=6, # Number of shards
replication_factor=2, # Replicas per shard
write_consistency_factor=1 # Required acks for write
)
# Check cluster status
cluster_info = client.get_cluster_info()
print(f"Peers: {cluster_info.peers}")
print(f"Raft state: {cluster_info.raft_info}")
```
### Replication and Consistency
```python
from qdrant_client.models import WriteOrdering
# Strong consistency write
client.upsert(
collection_name="critical_data",
points=points,
ordering=WriteOrdering.STRONG # Wait for all replicas
)
# Eventual consistency (faster)
client.upsert(
collection_name="logs",
points=points,
ordering=WriteOrdering.WEAK # Return after primary ack
)
# Read from specific shard
results = client.search(
collection_name="documents",
query_vector=query,
consistency="majority" # Read from majority of replicas
)
```
## Hybrid Search
### Dense + Sparse Vectors
Combine semantic (dense) and keyword (sparse) search:
```python
from qdrant_client.models import (
VectorParams, SparseVectorParams, SparseIndexParams,
Distance, PointStruct, SparseVector, Prefetch, Query
)
# Create hybrid collection
client.create_collection(
collection_name="hybrid",
vectors_config={
"dense": VectorParams(size=384, distance=Distance.COSINE)
},
sparse_vectors_config={
"sparse": SparseVectorParams(
index=SparseIndexParams(on_disk=False)
)
}
)
# Insert with both vector types
def encode_sparse(text: str) -> SparseVector:
"""Simple BM25-like sparse encoding"""
from collections import Counter
tokens = text.lower().split()
counts = Counter(tokens)
# Map tokens to indices (use vocabulary in production)
indices = [hash(t) % 30000 for t in counts.keys()]
values = list(counts.values())
return SparseVector(indices=indices, values=values)
client.upsert(
collection_name="hybrid",
points=[
PointStruct(
id=1,
vector={
"dense": dense_encoder.encode("Python programming").tolist(),
"sparse": encode_sparse("Python programming language code")
},
payload={"text": "Python programming language code"}
)
]
)
# Hybrid search with Reciprocal Rank Fusion (RRF)
from qdrant_client.models import FusionQuery
results = client.query_points(
collection_name="hybrid",
prefetch=[
Prefetch(query=dense_query, using="dense", limit=20),
Prefetch(query=sparse_query, using="sparse", limit=20)
],
query=FusionQuery(fusion="rrf"), # Combine results
limit=10
)
```
### Multi-Stage Search
```python
from qdrant_client.models import Prefetch, Query
# Two-stage retrieval: coarse then fine
results = client.query_points(
collection_name="documents",
prefetch=[
Prefetch(
query=query_vector,
limit=100, # Broad first stage
params={"quantization": {"rescore": False}} # Fast, approximate
)
],
query=Query(nearest=query_vector),
limit=10,
params={"quantization": {"rescore": True}} # Accurate reranking
)
```
## Recommendations
### Item-to-Item Recommendations
```python
# Find similar items
recommendations = client.recommend(
collection_name="products",
positive=[1, 2, 3], # IDs user liked
negative=[4], # IDs user disliked
limit=10
)
# With filtering
recommendations = client.recommend(
collection_name="products",
positive=[1, 2],
query_filter={
"must": [
{"key": "category", "match": {"value": "electronics"}},
{"key": "in_stock", "match": {"value": True}}
]
},
limit=10
)
```
### Lookup from Another Collection
```python
from qdrant_client.models import RecommendStrategy, LookupLocation
# Recommend using vectors from another collection
results = client.recommend(
collection_name="products",
positive=[
LookupLocation(
collection_name="user_history",
id="user_123"
)
],
strategy=RecommendStrategy.AVERAGE_VECTOR,
limit=10
)
```
## Advanced Filtering
### Nested Payload Filtering
```python
from qdrant_client.models import Filter, FieldCondition, MatchValue, NestedCondition
# Filter on nested objects
results = client.search(
collection_name="documents",
query_vector=query,
query_filter=Filter(
must=[
NestedCondition(
key="metadata",
filter=Filter(
must=[
FieldCondition(
key="author.name",
match=MatchValue(value="John")
)
]
)
)
]
),
limit=10
)
```
### Geo Filtering
```python
from qdrant_client.models import FieldCondition, GeoRadius, GeoPoint
# Find within radius
results = client.search(
collection_name="locations",
query_vector=query,
query_filter=Filter(
must=[
FieldCondition(
key="location",
geo_radius=GeoRadius(
center=GeoPoint(lat=40.7128, lon=-74.0060),
radius=5000 # meters
)
)
]
),
limit=10
)
# Geo bounding box
from qdrant_client.models import GeoBoundingBox
results = client.search(
collection_name="locations",
query_vector=query,
query_filter=Filter(
must=[
FieldCondition(
key="location",
geo_bounding_box=GeoBoundingBox(
top_left=GeoPoint(lat=40.8, lon=-74.1),
bottom_right=GeoPoint(lat=40.6, lon=-73.9)
)
)
]
),
limit=10
)
```
### Full-Text Search
```python
from qdrant_client.models import TextIndexParams, TokenizerType
# Create text index
client.create_payload_index(
collection_name="documents",
field_name="content",
field_schema=TextIndexParams(
type="text",
tokenizer=TokenizerType.WORD,
min_token_len=2,
max_token_len=15,
lowercase=True
)
)
# Full-text filter
from qdrant_client.models import MatchText
results = client.search(
collection_name="documents",
query_vector=query,
query_filter=Filter(
must=[
FieldCondition(
key="content",
match=MatchText(text="machine learning")
)
]
),
limit=10
)
```
## Quantization Strategies
### Scalar Quantization (INT8)
```python
from qdrant_client.models import ScalarQuantization, ScalarQuantizationConfig, ScalarType
# ~4x memory reduction, minimal accuracy loss
client.create_collection(
collection_name="scalar_quantized",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
quantization_config=ScalarQuantization(
scalar=ScalarQuantizationConfig(
type=ScalarType.INT8,
quantile=0.99, # Clip extreme values
always_ram=True # Keep quantized vectors in RAM
)
)
)
```
### Product Quantization
```python
from qdrant_client.models import ProductQuantization, ProductQuantizationConfig, CompressionRatio
# ~16x memory reduction, some accuracy loss
client.create_collection(
collection_name="product_quantized",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
quantization_config=ProductQuantization(
product=ProductQuantizationConfig(
compression=CompressionRatio.X16,
always_ram=True
)
)
)
```
### Binary Quantization
```python
from qdrant_client.models import BinaryQuantization, BinaryQuantizationConfig
# ~32x memory reduction, requires oversampling
client.create_collection(
collection_name="binary_quantized",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
quantization_config=BinaryQuantization(
binary=BinaryQuantizationConfig(always_ram=True)
)
)
# Search with oversampling
results = client.search(
collection_name="binary_quantized",
query_vector=query,
search_params={
"quantization": {
"rescore": True,
"oversampling": 2.0 # Retrieve 2x candidates, rescore
}
},
limit=10
)
```
## Snapshots and Backups
### Create Snapshot
```python
# Create collection snapshot
snapshot_info = client.create_snapshot(collection_name="documents")
print(f"Snapshot: {snapshot_info.name}")
# List snapshots
snapshots = client.list_snapshots(collection_name="documents")
for s in snapshots:
print(f"{s.name}: {s.size} bytes")
# Full storage snapshot
full_snapshot = client.create_full_snapshot()
```
### Restore from Snapshot
```python
# Download snapshot
client.download_snapshot(
collection_name="documents",
snapshot_name="documents-2024-01-01.snapshot",
target_path="./backup/"
)
# Restore (via REST API)
import requests
response = requests.put(
"http://localhost:6333/collections/documents/snapshots/recover",
json={"location": "file:///backup/documents-2024-01-01.snapshot"}
)
```
## Collection Aliases
```python
# Create alias
client.update_collection_aliases(
change_aliases_operations=[
{"create_alias": {"alias_name": "production", "collection_name": "documents_v2"}}
]
)
# Blue-green deployment
# 1. Create new collection with updates
client.create_collection(collection_name="documents_v3", ...)
# 2. Populate new collection
client.upsert(collection_name="documents_v3", points=new_points)
# 3. Atomic switch
client.update_collection_aliases(
change_aliases_operations=[
{"delete_alias": {"alias_name": "production"}},
{"create_alias": {"alias_name": "production", "collection_name": "documents_v3"}}
]
)
# Search via alias
results = client.search(collection_name="production", query_vector=query, limit=10)
```
## Scroll and Iteration
### Scroll Through All Points
```python
# Paginated iteration
offset = None
all_points = []
while True:
results, offset = client.scroll(
collection_name="documents",
limit=100,
offset=offset,
with_payload=True,
with_vectors=False
)
all_points.extend(results)
if offset is None:
break
print(f"Total points: {len(all_points)}")
```
### Filtered Scroll
```python
# Scroll with filter
results, _ = client.scroll(
collection_name="documents",
scroll_filter=Filter(
must=[
FieldCondition(key="status", match=MatchValue(value="active"))
]
),
limit=1000
)
```
## Async Client
```python
import asyncio
from qdrant_client import AsyncQdrantClient
async def main():
client = AsyncQdrantClient(host="localhost", port=6333)
# Async operations
await client.create_collection(
collection_name="async_docs",
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)
await client.upsert(
collection_name="async_docs",
points=points
)
results = await client.search(
collection_name="async_docs",
query_vector=query,
limit=10
)
return results
results = asyncio.run(main())
```
## gRPC Client
```python
from qdrant_client import QdrantClient
# Prefer gRPC for better performance
client = QdrantClient(
host="localhost",
port=6333,
grpc_port=6334,
prefer_grpc=True # Use gRPC when available
)
# gRPC-only client
from qdrant_client import QdrantClient
client = QdrantClient(
host="localhost",
grpc_port=6334,
prefer_grpc=True,
https=False
)
```
## Multitenancy
### Payload-Based Isolation
```python
# Single collection, filter by tenant
client.upsert(
collection_name="multi_tenant",
points=[
PointStruct(
id=1,
vector=embedding,
payload={"tenant_id": "tenant_a", "text": "..."}
)
]
)
# Search within tenant
results = client.search(
collection_name="multi_tenant",
query_vector=query,
query_filter=Filter(
must=[FieldCondition(key="tenant_id", match=MatchValue(value="tenant_a"))]
),
limit=10
)
```
### Collection-Per-Tenant
```python
# Create tenant collection
def create_tenant_collection(tenant_id: str):
client.create_collection(
collection_name=f"tenant_{tenant_id}",
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)
# Search tenant collection
def search_tenant(tenant_id: str, query_vector: list, limit: int = 10):
return client.search(
collection_name=f"tenant_{tenant_id}",
query_vector=query_vector,
limit=limit
)
```
## Performance Monitoring
### Collection Statistics
```python
# Collection info
info = client.get_collection("documents")
print(f"Points: {info.points_count}")
print(f"Indexed vectors: {info.indexed_vectors_count}")
print(f"Segments: {len(info.segments)}")
print(f"Status: {info.status}")
# Detailed segment info
for i, segment in enumerate(info.segments):
print(f"Segment {i}: {segment}")
```
### Telemetry
```python
# Get telemetry data
telemetry = client.get_telemetry()
print(f"Collections: {telemetry.collections}")
print(f"Operations: {telemetry.operations}")
```

View File

@@ -0,0 +1,631 @@
# Qdrant Troubleshooting Guide
## Installation Issues
### Docker Issues
**Error**: `Cannot connect to Docker daemon`
**Fix**:
```bash
# Start Docker daemon
sudo systemctl start docker
# Or use Docker Desktop on Mac/Windows
open -a Docker
```
**Error**: `Port 6333 already in use`
**Fix**:
```bash
# Find process using port
lsof -i :6333
# Kill process or use different port
docker run -p 6334:6333 qdrant/qdrant
```
### Python Client Issues
**Error**: `ModuleNotFoundError: No module named 'qdrant_client'`
**Fix**:
```bash
pip install qdrant-client
# With specific version
pip install qdrant-client>=1.12.0
```
**Error**: `grpc._channel._InactiveRpcError`
**Fix**:
```bash
# Install with gRPC support
pip install 'qdrant-client[grpc]'
# Or disable gRPC
client = QdrantClient(host="localhost", port=6333, prefer_grpc=False)
```
## Connection Issues
### Cannot Connect to Server
**Error**: `ConnectionRefusedError: [Errno 111] Connection refused`
**Solutions**:
1. **Check server is running**:
```bash
docker ps | grep qdrant
curl http://localhost:6333/healthz
```
2. **Verify port binding**:
```bash
# Check listening ports
netstat -tlnp | grep 6333
# Docker port mapping
docker port <container_id>
```
3. **Use correct host**:
```python
# Docker on Linux
client = QdrantClient(host="localhost", port=6333)
# Docker on Mac/Windows with networking issues
client = QdrantClient(host="127.0.0.1", port=6333)
# Inside Docker network
client = QdrantClient(host="qdrant", port=6333)
```
### Timeout Errors
**Error**: `TimeoutError: Connection timed out`
**Fix**:
```python
# Increase timeout
client = QdrantClient(
host="localhost",
port=6333,
timeout=60 # seconds
)
# For large operations
client.upsert(
collection_name="documents",
points=large_batch,
wait=False # Don't wait for indexing
)
```
### SSL/TLS Errors
**Error**: `ssl.SSLCertVerificationError`
**Fix**:
```python
# Qdrant Cloud
client = QdrantClient(
url="https://cluster.cloud.qdrant.io",
api_key="your-api-key"
)
# Self-signed certificate
client = QdrantClient(
host="localhost",
port=6333,
https=True,
verify=False # Disable verification (not recommended for production)
)
```
## Collection Issues
### Collection Already Exists
**Error**: `ValueError: Collection 'documents' already exists`
**Fix**:
```python
# Check before creating
collections = client.get_collections().collections
names = [c.name for c in collections]
if "documents" not in names:
client.create_collection(...)
# Or recreate
client.recreate_collection(
collection_name="documents",
vectors_config=VectorParams(size=384, distance=Distance.COSINE)
)
```
### Collection Not Found
**Error**: `NotFoundException: Collection 'docs' not found`
**Fix**:
```python
# List available collections
collections = client.get_collections()
print([c.name for c in collections.collections])
# Check exact name (case-sensitive)
try:
info = client.get_collection("documents")
except Exception as e:
print(f"Collection not found: {e}")
```
### Vector Dimension Mismatch
**Error**: `ValueError: Vector dimension mismatch. Expected 384, got 768`
**Fix**:
```python
# Check collection config
info = client.get_collection("documents")
print(f"Expected dimension: {info.config.params.vectors.size}")
# Recreate with correct dimension
client.recreate_collection(
collection_name="documents",
vectors_config=VectorParams(size=768, distance=Distance.COSINE) # Match your embeddings
)
```
## Search Issues
### Empty Search Results
**Problem**: Search returns empty results.
**Solutions**:
1. **Verify data exists**:
```python
info = client.get_collection("documents")
print(f"Points: {info.points_count}")
# Scroll to check data
points, _ = client.scroll(
collection_name="documents",
limit=10,
with_payload=True
)
print(points)
```
2. **Check vector format**:
```python
# Must be list of floats
query_vector = embedding.tolist() # Convert numpy to list
# Check dimensions
print(f"Query dimension: {len(query_vector)}")
```
3. **Verify filter conditions**:
```python
# Test without filter first
results = client.search(
collection_name="documents",
query_vector=query,
limit=10
# No filter
)
# Then add filter incrementally
```
### Slow Search Performance
**Problem**: Search takes too long.
**Solutions**:
1. **Create payload indexes**:
```python
# Index fields used in filters
client.create_payload_index(
collection_name="documents",
field_name="category",
field_schema="keyword"
)
```
2. **Enable quantization**:
```python
client.update_collection(
collection_name="documents",
quantization_config=ScalarQuantization(
scalar=ScalarQuantizationConfig(type=ScalarType.INT8)
)
)
```
3. **Tune HNSW parameters**:
```python
# Faster search (less accurate)
client.update_collection(
collection_name="documents",
hnsw_config=HnswConfigDiff(ef_construct=64, m=8)
)
# Use ef search parameter
results = client.search(
collection_name="documents",
query_vector=query,
search_params={"hnsw_ef": 64}, # Lower = faster
limit=10
)
```
4. **Use gRPC**:
```python
client = QdrantClient(
host="localhost",
port=6333,
grpc_port=6334,
prefer_grpc=True
)
```
### Inconsistent Results
**Problem**: Same query returns different results.
**Solutions**:
1. **Wait for indexing**:
```python
client.upsert(
collection_name="documents",
points=points,
wait=True # Wait for index update
)
```
2. **Check replication consistency**:
```python
# Strong consistency read
results = client.search(
collection_name="documents",
query_vector=query,
consistency="all" # Read from all replicas
)
```
## Upsert Issues
### Batch Upsert Fails
**Error**: `PayloadError: Payload too large`
**Fix**:
```python
# Split into smaller batches
def batch_upsert(client, collection, points, batch_size=100):
for i in range(0, len(points), batch_size):
batch = points[i:i + batch_size]
client.upsert(
collection_name=collection,
points=batch,
wait=True
)
batch_upsert(client, "documents", large_points_list)
```
### Invalid Point ID
**Error**: `ValueError: Invalid point ID`
**Fix**:
```python
# Valid ID types: int or UUID string
from uuid import uuid4
# Integer ID
PointStruct(id=123, vector=vec, payload={})
# UUID string
PointStruct(id=str(uuid4()), vector=vec, payload={})
# NOT valid
PointStruct(id="custom-string-123", ...) # Use UUID format
```
### Payload Validation Errors
**Error**: `ValidationError: Invalid payload`
**Fix**:
```python
# Ensure JSON-serializable payload
import json
payload = {
"title": "Document",
"count": 42,
"tags": ["a", "b"],
"nested": {"key": "value"}
}
# Validate before upsert
json.dumps(payload) # Should not raise
# Avoid non-serializable types
# NOT valid: datetime, numpy arrays, custom objects
payload = {
"timestamp": datetime.now().isoformat(), # Convert to string
"vector": embedding.tolist() # Convert numpy to list
}
```
## Memory Issues
### Out of Memory
**Error**: `MemoryError` or container killed
**Solutions**:
1. **Enable on-disk storage**:
```python
client.create_collection(
collection_name="large_collection",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
on_disk_payload=True, # Store payloads on disk
hnsw_config=HnswConfigDiff(on_disk=True) # Store HNSW on disk
)
```
2. **Use quantization**:
```python
# 4x memory reduction
client.update_collection(
collection_name="large_collection",
quantization_config=ScalarQuantization(
scalar=ScalarQuantizationConfig(
type=ScalarType.INT8,
always_ram=False # Keep on disk
)
)
)
```
3. **Increase Docker memory**:
```bash
docker run -m 8g -p 6333:6333 qdrant/qdrant
```
4. **Configure Qdrant storage**:
```yaml
# config.yaml
storage:
performance:
max_search_threads: 2
optimizers:
memmap_threshold_kb: 20000
```
### High Memory Usage During Indexing
**Fix**:
```python
# Increase indexing threshold for bulk loads
client.update_collection(
collection_name="documents",
optimizer_config={
"indexing_threshold": 50000 # Delay indexing
}
)
# Bulk insert
client.upsert(collection_name="documents", points=all_points, wait=False)
# Then optimize
client.update_collection(
collection_name="documents",
optimizer_config={
"indexing_threshold": 10000 # Resume normal indexing
}
)
```
## Cluster Issues
### Node Not Joining Cluster
**Problem**: New node fails to join cluster.
**Fix**:
```bash
# Check network connectivity
docker exec qdrant-node-2 ping qdrant-node-1
# Verify bootstrap URL
docker logs qdrant-node-2 | grep bootstrap
# Check Raft state
curl http://localhost:6333/cluster
```
### Split Brain
**Problem**: Cluster has inconsistent state.
**Fix**:
```bash
# Force leader election
curl -X POST http://localhost:6333/cluster/recover
# Or restart minority nodes
docker restart qdrant-node-2 qdrant-node-3
```
### Replication Lag
**Problem**: Replicas fall behind.
**Fix**:
```python
# Check collection status
info = client.get_collection("documents")
print(f"Status: {info.status}")
# Use strong consistency for critical writes
client.upsert(
collection_name="documents",
points=points,
ordering=WriteOrdering.STRONG
)
```
## Performance Tuning
### Benchmark Configuration
```python
import time
import numpy as np
def benchmark_search(client, collection, n_queries=100, dimension=384):
# Generate random queries
queries = [np.random.rand(dimension).tolist() for _ in range(n_queries)]
# Warmup
for q in queries[:10]:
client.search(collection_name=collection, query_vector=q, limit=10)
# Benchmark
start = time.perf_counter()
for q in queries:
client.search(collection_name=collection, query_vector=q, limit=10)
elapsed = time.perf_counter() - start
print(f"QPS: {n_queries / elapsed:.2f}")
print(f"Latency: {elapsed / n_queries * 1000:.2f}ms")
benchmark_search(client, "documents")
```
### Optimal HNSW Parameters
```python
# High recall (slower)
client.create_collection(
collection_name="high_recall",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
hnsw_config=HnswConfigDiff(
m=32, # More connections
ef_construct=200 # Higher build quality
)
)
# High speed (lower recall)
client.create_collection(
collection_name="high_speed",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
hnsw_config=HnswConfigDiff(
m=8, # Fewer connections
ef_construct=64 # Lower build quality
)
)
# Balanced
client.create_collection(
collection_name="balanced",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
hnsw_config=HnswConfigDiff(
m=16, # Default
ef_construct=100 # Default
)
)
```
## Debugging Tips
### Enable Verbose Logging
```python
import logging
logging.basicConfig(level=logging.DEBUG)
logging.getLogger("qdrant_client").setLevel(logging.DEBUG)
```
### Check Server Logs
```bash
# Docker logs
docker logs -f qdrant
# With timestamps
docker logs --timestamps qdrant
# Last 100 lines
docker logs --tail 100 qdrant
```
### Inspect Collection State
```python
# Collection info
info = client.get_collection("documents")
print(f"Status: {info.status}")
print(f"Points: {info.points_count}")
print(f"Segments: {len(info.segments)}")
print(f"Config: {info.config}")
# Sample points
points, _ = client.scroll(
collection_name="documents",
limit=5,
with_payload=True,
with_vectors=True
)
for p in points:
print(f"ID: {p.id}, Payload: {p.payload}")
```
### Test Connection
```python
def test_connection(host="localhost", port=6333):
try:
client = QdrantClient(host=host, port=port, timeout=5)
collections = client.get_collections()
print(f"Connected! Collections: {len(collections.collections)}")
return True
except Exception as e:
print(f"Connection failed: {e}")
return False
test_connection()
```
## Getting Help
1. **Documentation**: https://qdrant.tech/documentation/
2. **GitHub Issues**: https://github.com/qdrant/qdrant/issues
3. **Discord**: https://discord.gg/qdrant
4. **Stack Overflow**: Tag `qdrant`
### Reporting Issues
Include:
- Qdrant version: `curl http://localhost:6333/`
- Python client version: `pip show qdrant-client`
- Full error traceback
- Minimal reproducible code
- Collection configuration