All checks were successful
Smoke Test / smoke (pull_request) Successful in 15s
1.2 KiB
1.2 KiB
TurboQuant Ansible Deployment
Deploy TurboQuant-compressed Gemma 4 inference across fleet nodes.
Quick Start
# 1. Copy and edit inventory
cp inventory.ini.example inventory.ini
vim inventory.ini
# 2. Deploy to all nodes
ansible-playbook -i inventory.ini deploy_turboquant.yml
# 3. Deploy without integration tests
ansible-playbook -i inventory.ini deploy_turboquant.yml -e run_integration_tests=false
# 4. Deploy to specific node
ansible-playbook -i inventory.ini deploy_turboquant.yml --limit timmy
Deployment Matrix
| Node | Hardware | Model | Preset |
|---|---|---|---|
| Mac (Timmy) | M1, 16GB | gemma-4-26B-A4B | turboquant_k8v4 |
| Allegro VPS | 2 cores, 8GB | gemma-4-E4B | GGUF q4_0 |
Health Check
# Check local node
./health_check.sh localhost 8081
# Check remote node
./health_check.sh 192.168.1.100 8081
Role Variables
See roles/turboquant-deploy/defaults/main.yml for all configurable variables.
Key variables:
llama_cpp_port: Server port (default: 8081)turboquant_kv_type: KV cache compression type (default: turbo4)max_context_tokens: Maximum context length (default: 131072)gemma4_model_filename: Model filename per node