skills/mlops/vector-databases/pinecone/references/deployment.md

# Pinecone Deployment Guide

Production deployment patterns for Pinecone.

## Serverless vs Pod-based

### Serverless (Recommended)

```python
from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-key")

# Create serverless index
pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",  # or "gcp", "azure"
        region="us-east-1"
    )
)
```

**Benefits:**
- Auto-scaling
- Pay per usage
- No infrastructure management
- Cost-effective for variable load

**Use when:**
- Variable traffic
- Cost optimization important
- Don't need consistent latency

### Pod-based

```python
from pinecone import PodSpec

pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=PodSpec(
        environment="us-east1-gcp",
        pod_type="p1.x1",  # or p1.x2, p1.x4, p1.x8
        pods=2,  # Number of pods
        replicas=2  # High availability
    )
)
```

**Benefits:**
- Consistent performance
- Predictable latency
- Higher throughput
- Dedicated resources

**Use when:**
- Production workloads
- Need consistent p95 latency
- High throughput required

## Hybrid search

### Dense + Sparse vectors

```python
# Upsert with both dense and sparse vectors
index.upsert(vectors=[
    {
        "id": "doc1",
        "values": [0.1, 0.2, ...],  # Dense (semantic)
        "sparse_values": {
            "indices": [10, 45, 123],  # Token IDs
            "values": [0.5, 0.3, 0.8]   # TF-IDF/BM25 scores
        },
        "metadata": {"text": "..."}
    }
])

# Hybrid query
results = index.query(
    vector=[0.1, 0.2, ...],  # Dense query
    sparse_vector={
        "indices": [10, 45],
        "values": [0.5, 0.3]
    },
    top_k=10,
    alpha=0.5  # 0=sparse only, 1=dense only, 0.5=balanced
)
```

**Benefits:**
- Best of both worlds
- Semantic + keyword matching
- Better recall than either alone

## Namespaces for multi-tenancy

```python
# Separate data by user/tenant
index.upsert(
    vectors=[{"id": "doc1", "values": [...]}],
    namespace="user-123"
)

# Query specific namespace
results = index.query(
    vector=[...],
    namespace="user-123",
    top_k=5
)

# List namespaces
stats = index.describe_index_stats()
print(stats['namespaces'])
```

**Use cases:**
- Multi-tenant SaaS
- User-specific data isolation
- A/B testing (prod/staging namespaces)

## Metadata filtering

### Exact match

```python
results = index.query(
    vector=[...],
    filter={"category": "tutorial"},
    top_k=5
)
```

### Range queries

```python
results = index.query(
    vector=[...],
    filter={"price": {"$gte": 100, "$lte": 500}},
    top_k=5
)
```

### Complex filters

```python
results = index.query(
    vector=[...],
    filter={
        "$and": [
            {"category": {"$in": ["tutorial", "guide"]}},
            {"difficulty": {"$lte": 3}},
            {"published": {"$gte": "2024-01-01"}}
        ]
    },
    top_k=5
)
```

## Best practices

1. **Use serverless for development** - Cost-effective
2. **Switch to pods for production** - Consistent performance
3. **Implement namespaces** - Multi-tenancy
4. **Add metadata strategically** - Enable filtering
5. **Use hybrid search** - Better quality
6. **Batch upserts** - 100-200 vectors per batch
7. **Monitor usage** - Check Pinecone dashboard
8. **Set up alerts** - Usage/cost thresholds
9. **Regular backups** - Export important data
10. **Test filters** - Verify performance

## Resources

- **Docs**: https://docs.pinecone.io
- **Console**: https://app.pinecone.io
Add skills tools and enhance model integration - Introduced new skills tools: `skills_categories`, `skills_list`, and `skill_view` in `model_tools.py`, allowing for better organization and access to skill-related functionalities. - Updated `toolsets.py` to include a new `skills` toolset, providing a dedicated space for skill tools. - Enhanced `batch_runner.py` to recognize and validate skills tools during batch processing. - Added comprehensive tool definitions for skills tools, ensuring compatibility with OpenAI's expected format. - Created new shell script `test_skills_kimi.sh` for testing skills tool functionality with Kimi K2.5. - Added example skill files demonstrating the structure and usage of skills within the Hermes-Agent framework, including `SKILL.md` for example and audiocraft skills. - Improved documentation for skills tools and their integration into the existing tool framework, ensuring clarity for future development and usage. 2026-01-30 07:39:55 +00:00			`# Pinecone Deployment Guide`

			`Production deployment patterns for Pinecone.`

			`## Serverless vs Pod-based`

			`### Serverless (Recommended)`

			```python
			`from pinecone import Pinecone, ServerlessSpec`

			`pc = Pinecone(api_key="your-key")`

			`# Create serverless index`
			`pc.create_index(`
			`name="my-index",`
			`dimension=1536,`
			`metric="cosine",`
			`spec=ServerlessSpec(`
			`cloud="aws", # or "gcp", "azure"`
			`region="us-east-1"`
			`)`
			`)`
			```

			`Benefits:`
			`- Auto-scaling`
			`- Pay per usage`
			`- No infrastructure management`
			`- Cost-effective for variable load`

			`Use when:`
			`- Variable traffic`
			`- Cost optimization important`
			`- Don't need consistent latency`

			`### Pod-based`

			```python
			`from pinecone import PodSpec`

			`pc.create_index(`
			`name="my-index",`
			`dimension=1536,`
			`metric="cosine",`
			`spec=PodSpec(`
			`environment="us-east1-gcp",`
			`pod_type="p1.x1", # or p1.x2, p1.x4, p1.x8`
			`pods=2, # Number of pods`
			`replicas=2 # High availability`
			`)`
			`)`
			```

			`Benefits:`
			`- Consistent performance`
			`- Predictable latency`
			`- Higher throughput`
			`- Dedicated resources`

			`Use when:`
			`- Production workloads`
			`- Need consistent p95 latency`
			`- High throughput required`

			`## Hybrid search`

			`### Dense + Sparse vectors`

			```python
			`# Upsert with both dense and sparse vectors`
			`index.upsert(vectors=[`
			`{`
			`"id": "doc1",`
			`"values": [0.1, 0.2, ...], # Dense (semantic)`
			`"sparse_values": {`
			`"indices": [10, 45, 123], # Token IDs`
			`"values": [0.5, 0.3, 0.8] # TF-IDF/BM25 scores`
			`},`
			`"metadata": {"text": "..."}`
			`}`
			`])`

			`# Hybrid query`
			`results = index.query(`
			`vector=[0.1, 0.2, ...], # Dense query`
			`sparse_vector={`
			`"indices": [10, 45],`
			`"values": [0.5, 0.3]`
			`},`
			`top_k=10,`
			`alpha=0.5 # 0=sparse only, 1=dense only, 0.5=balanced`
			`)`
			```

			`Benefits:`
			`- Best of both worlds`
			`- Semantic + keyword matching`
			`- Better recall than either alone`

			`## Namespaces for multi-tenancy`

			```python
			`# Separate data by user/tenant`
			`index.upsert(`
			`vectors=[{"id": "doc1", "values": [...]}],`
			`namespace="user-123"`
			`)`

			`# Query specific namespace`
			`results = index.query(`
			`vector=[...],`
			`namespace="user-123",`
			`top_k=5`
			`)`

			`# List namespaces`
			`stats = index.describe_index_stats()`
			`print(stats['namespaces'])`
			```

			`Use cases:`
			`- Multi-tenant SaaS`
			`- User-specific data isolation`
			`- A/B testing (prod/staging namespaces)`

			`## Metadata filtering`

			`### Exact match`

			```python
			`results = index.query(`
			`vector=[...],`
			`filter={"category": "tutorial"},`
			`top_k=5`
			`)`
			```

			`### Range queries`

			```python
			`results = index.query(`
			`vector=[...],`
			`filter={"price": {"$gte": 100, "$lte": 500}},`
			`top_k=5`
			`)`
			```

			`### Complex filters`

			```python
			`results = index.query(`
			`vector=[...],`
			`filter={`
			`"$and": [`
			`{"category": {"$in": ["tutorial", "guide"]}},`
			`{"difficulty": {"$lte": 3}},`
			`{"published": {"$gte": "2024-01-01"}}`
			`]`
			`},`
			`top_k=5`
			`)`
			```

			`## Best practices`

			`1. Use serverless for development - Cost-effective`
			`2. Switch to pods for production - Consistent performance`
			`3. Implement namespaces - Multi-tenancy`
			`4. Add metadata strategically - Enable filtering`
			`5. Use hybrid search - Better quality`
			`6. Batch upserts - 100-200 vectors per batch`
			`7. Monitor usage - Check Pinecone dashboard`
			`8. Set up alerts - Usage/cost thresholds`
			`9. Regular backups - Export important data`
			`10. Test filters - Verify performance`

			`## Resources`

			`- Docs: https://docs.pinecone.io`
			`- Console: https://app.pinecone.io`