TurboQuant Ansible Deployment

Deploy TurboQuant-compressed Gemma 4 inference across fleet nodes.

Quick Start

# 1. Copy and edit inventory
cp inventory.ini.example inventory.ini
vim inventory.ini

# 2. Deploy to all nodes
ansible-playbook -i inventory.ini deploy_turboquant.yml

# 3. Deploy without integration tests
ansible-playbook -i inventory.ini deploy_turboquant.yml -e run_integration_tests=false

# 4. Deploy to specific node
ansible-playbook -i inventory.ini deploy_turboquant.yml --limit timmy

Deployment Matrix

Node	Hardware	Model	Preset
Mac (Timmy)	M1, 16GB	gemma-4-26B-A4B	turboquant_k8v4
Allegro VPS	2 cores, 8GB	gemma-4-E4B	GGUF q4_0

Health Check

# Check local node
./health_check.sh localhost 8081

# Check remote node
./health_check.sh 192.168.1.100 8081

Role Variables

See roles/turboquant-deploy/defaults/main.yml for all configurable variables.

Key variables:

llama_cpp_port: Server port (default: 8081)
turboquant_kv_type: KV cache compression type (default: turbo4)
max_context_tokens: Maximum context length (default: 131072)
gemma4_model_filename: Model filename per node

1.2 KiB Raw Blame History

TurboQuant Ansible Deployment

Quick Start

Deployment Matrix

Health Check

Role Variables

1.2 KiB

Raw Blame History