182 lines
3.5 KiB
Markdown
182 lines
3.5 KiB
Markdown
|
|
# Pinecone Deployment Guide
|
||
|
|
|
||
|
|
Production deployment patterns for Pinecone.
|
||
|
|
|
||
|
|
## Serverless vs Pod-based
|
||
|
|
|
||
|
|
### Serverless (Recommended)
|
||
|
|
|
||
|
|
```python
|
||
|
|
from pinecone import Pinecone, ServerlessSpec
|
||
|
|
|
||
|
|
pc = Pinecone(api_key="your-key")
|
||
|
|
|
||
|
|
# Create serverless index
|
||
|
|
pc.create_index(
|
||
|
|
name="my-index",
|
||
|
|
dimension=1536,
|
||
|
|
metric="cosine",
|
||
|
|
spec=ServerlessSpec(
|
||
|
|
cloud="aws", # or "gcp", "azure"
|
||
|
|
region="us-east-1"
|
||
|
|
)
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Benefits:**
|
||
|
|
- Auto-scaling
|
||
|
|
- Pay per usage
|
||
|
|
- No infrastructure management
|
||
|
|
- Cost-effective for variable load
|
||
|
|
|
||
|
|
**Use when:**
|
||
|
|
- Variable traffic
|
||
|
|
- Cost optimization important
|
||
|
|
- Don't need consistent latency
|
||
|
|
|
||
|
|
### Pod-based
|
||
|
|
|
||
|
|
```python
|
||
|
|
from pinecone import PodSpec
|
||
|
|
|
||
|
|
pc.create_index(
|
||
|
|
name="my-index",
|
||
|
|
dimension=1536,
|
||
|
|
metric="cosine",
|
||
|
|
spec=PodSpec(
|
||
|
|
environment="us-east1-gcp",
|
||
|
|
pod_type="p1.x1", # or p1.x2, p1.x4, p1.x8
|
||
|
|
pods=2, # Number of pods
|
||
|
|
replicas=2 # High availability
|
||
|
|
)
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Benefits:**
|
||
|
|
- Consistent performance
|
||
|
|
- Predictable latency
|
||
|
|
- Higher throughput
|
||
|
|
- Dedicated resources
|
||
|
|
|
||
|
|
**Use when:**
|
||
|
|
- Production workloads
|
||
|
|
- Need consistent p95 latency
|
||
|
|
- High throughput required
|
||
|
|
|
||
|
|
## Hybrid search
|
||
|
|
|
||
|
|
### Dense + Sparse vectors
|
||
|
|
|
||
|
|
```python
|
||
|
|
# Upsert with both dense and sparse vectors
|
||
|
|
index.upsert(vectors=[
|
||
|
|
{
|
||
|
|
"id": "doc1",
|
||
|
|
"values": [0.1, 0.2, ...], # Dense (semantic)
|
||
|
|
"sparse_values": {
|
||
|
|
"indices": [10, 45, 123], # Token IDs
|
||
|
|
"values": [0.5, 0.3, 0.8] # TF-IDF/BM25 scores
|
||
|
|
},
|
||
|
|
"metadata": {"text": "..."}
|
||
|
|
}
|
||
|
|
])
|
||
|
|
|
||
|
|
# Hybrid query
|
||
|
|
results = index.query(
|
||
|
|
vector=[0.1, 0.2, ...], # Dense query
|
||
|
|
sparse_vector={
|
||
|
|
"indices": [10, 45],
|
||
|
|
"values": [0.5, 0.3]
|
||
|
|
},
|
||
|
|
top_k=10,
|
||
|
|
alpha=0.5 # 0=sparse only, 1=dense only, 0.5=balanced
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
**Benefits:**
|
||
|
|
- Best of both worlds
|
||
|
|
- Semantic + keyword matching
|
||
|
|
- Better recall than either alone
|
||
|
|
|
||
|
|
## Namespaces for multi-tenancy
|
||
|
|
|
||
|
|
```python
|
||
|
|
# Separate data by user/tenant
|
||
|
|
index.upsert(
|
||
|
|
vectors=[{"id": "doc1", "values": [...]}],
|
||
|
|
namespace="user-123"
|
||
|
|
)
|
||
|
|
|
||
|
|
# Query specific namespace
|
||
|
|
results = index.query(
|
||
|
|
vector=[...],
|
||
|
|
namespace="user-123",
|
||
|
|
top_k=5
|
||
|
|
)
|
||
|
|
|
||
|
|
# List namespaces
|
||
|
|
stats = index.describe_index_stats()
|
||
|
|
print(stats['namespaces'])
|
||
|
|
```
|
||
|
|
|
||
|
|
**Use cases:**
|
||
|
|
- Multi-tenant SaaS
|
||
|
|
- User-specific data isolation
|
||
|
|
- A/B testing (prod/staging namespaces)
|
||
|
|
|
||
|
|
## Metadata filtering
|
||
|
|
|
||
|
|
### Exact match
|
||
|
|
|
||
|
|
```python
|
||
|
|
results = index.query(
|
||
|
|
vector=[...],
|
||
|
|
filter={"category": "tutorial"},
|
||
|
|
top_k=5
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Range queries
|
||
|
|
|
||
|
|
```python
|
||
|
|
results = index.query(
|
||
|
|
vector=[...],
|
||
|
|
filter={"price": {"$gte": 100, "$lte": 500}},
|
||
|
|
top_k=5
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Complex filters
|
||
|
|
|
||
|
|
```python
|
||
|
|
results = index.query(
|
||
|
|
vector=[...],
|
||
|
|
filter={
|
||
|
|
"$and": [
|
||
|
|
{"category": {"$in": ["tutorial", "guide"]}},
|
||
|
|
{"difficulty": {"$lte": 3}},
|
||
|
|
{"published": {"$gte": "2024-01-01"}}
|
||
|
|
]
|
||
|
|
},
|
||
|
|
top_k=5
|
||
|
|
)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Best practices
|
||
|
|
|
||
|
|
1. **Use serverless for development** - Cost-effective
|
||
|
|
2. **Switch to pods for production** - Consistent performance
|
||
|
|
3. **Implement namespaces** - Multi-tenancy
|
||
|
|
4. **Add metadata strategically** - Enable filtering
|
||
|
|
5. **Use hybrid search** - Better quality
|
||
|
|
6. **Batch upserts** - 100-200 vectors per batch
|
||
|
|
7. **Monitor usage** - Check Pinecone dashboard
|
||
|
|
8. **Set up alerts** - Usage/cost thresholds
|
||
|
|
9. **Regular backups** - Export important data
|
||
|
|
10. **Test filters** - Verify performance
|
||
|
|
|
||
|
|
## Resources
|
||
|
|
|
||
|
|
- **Docs**: https://docs.pinecone.io
|
||
|
|
- **Console**: https://app.pinecone.io
|