skills/mlops/vector-databases/faiss/SKILL.md

---
name: faiss
description: Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). Use for fast k-NN search, large-scale vector retrieval, or when you need pure similarity search without metadata. Best for high-performance applications.
version: 1.0.0
author: Orchestra Research
license: MIT
dependencies: [faiss-cpu, faiss-gpu, numpy]
metadata:
  hermes:
    tags: [RAG, FAISS, Similarity Search, Vector Search, Facebook AI, GPU Acceleration, Billion-Scale, K-NN, HNSW, High Performance, Large Scale]

---

# FAISS - Efficient Similarity Search

Facebook AI's library for billion-scale vector similarity search.

## When to use FAISS

**Use FAISS when:**
- Need fast similarity search on large vector datasets (millions/billions)
- GPU acceleration required
- Pure vector similarity (no metadata filtering needed)
- High throughput, low latency critical
- Offline/batch processing of embeddings

**Metrics**:
- **31,700+ GitHub stars**
- Meta/Facebook AI Research
- **Handles billions of vectors**
- **C++** with Python bindings

**Use alternatives instead**:
- **Chroma/Pinecone**: Need metadata filtering
- **Weaviate**: Need full database features
- **Annoy**: Simpler, fewer features

## Quick start

### Installation

```bash
# CPU only
pip install faiss-cpu

# GPU support
pip install faiss-gpu
```

### Basic usage

```python
import faiss
import numpy as np

# Create sample data (1000 vectors, 128 dimensions)
d = 128
nb = 1000
vectors = np.random.random((nb, d)).astype('float32')

# Create index
index = faiss.IndexFlatL2(d)  # L2 distance
index.add(vectors)             # Add vectors

# Search
k = 5  # Find 5 nearest neighbors
query = np.random.random((1, d)).astype('float32')
distances, indices = index.search(query, k)

print(f"Nearest neighbors: {indices}")
print(f"Distances: {distances}")
```

## Index types

### 1. Flat (exact search)

```python
# L2 (Euclidean) distance
index = faiss.IndexFlatL2(d)

# Inner product (cosine similarity if normalized)
index = faiss.IndexFlatIP(d)

# Slowest, most accurate
```

### 2. IVF (inverted file) - Fast approximate

```python
# Create quantizer
quantizer = faiss.IndexFlatL2(d)

# IVF index with 100 clusters
nlist = 100
index = faiss.IndexIVFFlat(quantizer, d, nlist)

# Train on data
index.train(vectors)

# Add vectors
index.add(vectors)

# Search (nprobe = clusters to search)
index.nprobe = 10
distances, indices = index.search(query, k)
```

### 3. HNSW (Hierarchical NSW) - Best quality/speed

```python
# HNSW index
M = 32  # Number of connections per layer
index = faiss.IndexHNSWFlat(d, M)

# No training needed
index.add(vectors)

# Search
distances, indices = index.search(query, k)
```

### 4. Product Quantization - Memory efficient

```python
# PQ reduces memory by 16-32×
m = 8   # Number of subquantizers
nbits = 8
index = faiss.IndexPQ(d, m, nbits)

# Train and add
index.train(vectors)
index.add(vectors)
```

## Save and load

```python
# Save index
faiss.write_index(index, "large.index")

# Load index
index = faiss.read_index("large.index")

# Continue using
distances, indices = index.search(query, k)
```

## GPU acceleration

```python
# Single GPU
res = faiss.StandardGpuResources()
index_cpu = faiss.IndexFlatL2(d)
index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu)  # GPU 0

# Multi-GPU
index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)

# 10-100× faster than CPU
```

## LangChain integration

```python
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

# Create FAISS vector store
vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())

# Save
vectorstore.save_local("faiss_index")

# Load
vectorstore = FAISS.load_local(
    "faiss_index",
    OpenAIEmbeddings(),
    allow_dangerous_deserialization=True
)

# Search
results = vectorstore.similarity_search("query", k=5)
```

## LlamaIndex integration

```python
from llama_index.vector_stores.faiss import FaissVectorStore
import faiss

# Create FAISS index
d = 1536
faiss_index = faiss.IndexFlatL2(d)

vector_store = FaissVectorStore(faiss_index=faiss_index)
```

## Best practices

1. **Choose right index type** - Flat for <10K, IVF for 10K-1M, HNSW for quality
2. **Normalize for cosine** - Use IndexFlatIP with normalized vectors
3. **Use GPU for large datasets** - 10-100× faster
4. **Save trained indices** - Training is expensive
5. **Tune nprobe/ef_search** - Balance speed/accuracy
6. **Monitor memory** - PQ for large datasets
7. **Batch queries** - Better GPU utilization

## Performance

| Index Type | Build Time | Search Time | Memory | Accuracy |
|------------|------------|-------------|--------|----------|
| Flat | Fast | Slow | High | 100% |
| IVF | Medium | Fast | Medium | 95-99% |
| HNSW | Slow | Fastest | High | 99% |
| PQ | Medium | Fast | Low | 90-95% |

## Resources

- **GitHub**: https://github.com/facebookresearch/faiss ⭐ 31,700+
- **Wiki**: https://github.com/facebookresearch/faiss/wiki
- **License**: MIT
-												fix: restore all removed bundled skills + fix skills sync system

- Restored 21 skills removed in commits 757d012 and 740dd92:
  accelerate, audiocraft, code-review, faiss, flash-attention, gguf,
  grpo-rl-training, guidance, llava, nemo-curator, obliteratus, peft,
  pytorch-fsdp, pytorch-lightning, simpo, slime, stable-diffusion,
  tensorrt-llm, torchtitan, trl-fine-tuning, whisper

- Rewrote sync_skills() with proper update semantics:
  * New skills (not in manifest): copied to user dir
  * Existing skills (in manifest + on disk): updated via hash comparison
  * User-deleted skills (in manifest, not on disk): respected, not re-added
  * Stale manifest entries (removed from bundled): cleaned from manifest

- Added sync_skills() to CLI startup (cmd_chat) and gateway startup
  (start_gateway) — previously only ran during 'hermes update'

- Updated cmd_update output to show new/updated/cleaned counts

- Rewrote tests: 20 tests covering manifest CRUD, dir hashing, fresh
  install, user deletion respect, update detection, stale cleanup, and
  name collision handling

75 bundled skills total. 2002 tests pass.

											
										
										
											2026-03-06 15:57:12 -08:00
+								---
 								name: faiss
 								description: Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). Use for fast k-NN search, large-scale vector retrieval, or when you need pure similarity search without metadata. Best for high-performance applications.
 								version: 1.0.0
 								author: Orchestra Research
 								license: MIT
 								dependencies: [faiss-cpu, faiss-gpu, numpy]
 								metadata:
 								  hermes:
 								    tags: [RAG, FAISS, Similarity Search, Vector Search, Facebook AI, GPU Acceleration, Billion-Scale, K-NN, HNSW, High Performance, Large Scale]
 								---
 								# FAISS - Efficient Similarity Search
 								Facebook AI's library for billion-scale vector similarity search.
 								## When to use FAISS
 								**Use FAISS when:**
 								- Need fast similarity search on large vector datasets (millions/billions)
 								- GPU acceleration required
 								- Pure vector similarity (no metadata filtering needed)
 								- High throughput, low latency critical
 								- Offline/batch processing of embeddings
 								**Metrics**:
 								- **31,700+ GitHub stars**
 								- Meta/Facebook AI Research
 								- **Handles billions of vectors**
 								- **C++** with Python bindings
 								**Use alternatives instead**:
 								- **Chroma/Pinecone**: Need metadata filtering
 								- **Weaviate**: Need full database features
 								- **Annoy**: Simpler, fewer features
 								## Quick start
 								### Installation
 								```bash
 								# CPU only
 								pip install faiss-cpu
 								# GPU support
 								pip install faiss-gpu
 								```
 								### Basic usage
 								```python
 								import faiss
 								import numpy as np
 								# Create sample data (1000 vectors, 128 dimensions)
 								d = 128
 								nb = 1000
 								vectors = np.random.random((nb, d)).astype('float32')
 								# Create index
 								index = faiss.IndexFlatL2(d)  # L2 distance
 								index.add(vectors)             # Add vectors
 								# Search
 								k = 5  # Find 5 nearest neighbors
 								query = np.random.random((1, d)).astype('float32')
 								distances, indices = index.search(query, k)
 								print(f"Nearest neighbors: {indices}")
 								print(f"Distances: {distances}")
 								```
 								## Index types
 								### 1. Flat (exact search)
 								```python
 								# L2 (Euclidean) distance
 								index = faiss.IndexFlatL2(d)
 								# Inner product (cosine similarity if normalized)
 								index = faiss.IndexFlatIP(d)
 								# Slowest, most accurate
 								```
 								### 2. IVF (inverted file) - Fast approximate
 								```python
 								# Create quantizer
 								quantizer = faiss.IndexFlatL2(d)
 								# IVF index with 100 clusters
 								nlist = 100
 								index = faiss.IndexIVFFlat(quantizer, d, nlist)
 								# Train on data
 								index.train(vectors)
 								# Add vectors
 								index.add(vectors)
 								# Search (nprobe = clusters to search)
 								index.nprobe = 10
 								distances, indices = index.search(query, k)
 								```
 								### 3. HNSW (Hierarchical NSW) - Best quality/speed
 								```python
 								# HNSW index
 								M = 32  # Number of connections per layer
 								index = faiss.IndexHNSWFlat(d, M)
 								# No training needed
 								index.add(vectors)
 								# Search
 								distances, indices = index.search(query, k)
 								```
 								### 4. Product Quantization - Memory efficient
 								```python
 								# PQ reduces memory by 16-32×
 								m = 8   # Number of subquantizers
 								nbits = 8
 								index = faiss.IndexPQ(d, m, nbits)
 								# Train and add
 								index.train(vectors)
 								index.add(vectors)
 								```
 								## Save and load
 								```python
 								# Save index
 								faiss.write_index(index, "large.index")
 								# Load index
 								index = faiss.read_index("large.index")
 								# Continue using
 								distances, indices = index.search(query, k)
 								```
 								## GPU acceleration
 								```python
 								# Single GPU
 								res = faiss.StandardGpuResources()
 								index_cpu = faiss.IndexFlatL2(d)
 								index_gpu = faiss.index_cpu_to_gpu(res, 0, index_cpu)  # GPU 0
 								# Multi-GPU
 								index_gpu = faiss.index_cpu_to_all_gpus(index_cpu)
 								# 10-100× faster than CPU
 								```
 								## LangChain integration
 								```python
 								from langchain_community.vectorstores import FAISS
 								from langchain_openai import OpenAIEmbeddings
 								# Create FAISS vector store
 								vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())
 								# Save
 								vectorstore.save_local("faiss_index")
 								# Load
 								vectorstore = FAISS.load_local(
 								    "faiss_index",
 								    OpenAIEmbeddings(),
 								    allow_dangerous_deserialization=True
 								)
 								# Search
 								results = vectorstore.similarity_search("query", k=5)
 								```
 								## LlamaIndex integration
 								```python
 								from llama_index.vector_stores.faiss import FaissVectorStore
 								import faiss
 								# Create FAISS index
 								d = 1536
 								faiss_index = faiss.IndexFlatL2(d)
 								vector_store = FaissVectorStore(faiss_index=faiss_index)
 								```
 								## Best practices
 . **Choose right index type** - Flat for <10K, IVF for 10K-1M, HNSW for quality
 . **Normalize for cosine** - Use IndexFlatIP with normalized vectors
 . **Use GPU for large datasets** - 10-100× faster
 . **Save trained indices** - Training is expensive
 . **Tune nprobe/ef_search** - Balance speed/accuracy
 . **Monitor memory** - PQ for large datasets
 . **Batch queries** - Better GPU utilization
 								## Performance
 								| Index Type | Build Time | Search Time | Memory | Accuracy |
 								|------------|------------|-------------|--------|----------|
 								| Flat | Fast | Slow | High | 100% |
 								| IVF | Medium | Fast | Medium | 95-99% |
 								| HNSW | Slow | Fastest | High | 99% |
 								| PQ | Medium | Fast | Low | 90-95% |
 								## Resources
 								- **GitHub**: https://github.com/facebookresearch/faiss ⭐ 31,700+
 								- **Wiki**: https://github.com/facebookresearch/faiss/wiki
 								- **License**: MIT