Files
turboquant/ansible/README.md
Alexander Whitestone bf68627ea1
All checks were successful
Smoke Test / smoke (pull_request) Successful in 15s
feat: add ansible/README.md for TurboQuant deployment
2026-04-16 01:51:23 +00:00

1.2 KiB

TurboQuant Ansible Deployment

Deploy TurboQuant-compressed Gemma 4 inference across fleet nodes.

Quick Start

# 1. Copy and edit inventory
cp inventory.ini.example inventory.ini
vim inventory.ini

# 2. Deploy to all nodes
ansible-playbook -i inventory.ini deploy_turboquant.yml

# 3. Deploy without integration tests
ansible-playbook -i inventory.ini deploy_turboquant.yml -e run_integration_tests=false

# 4. Deploy to specific node
ansible-playbook -i inventory.ini deploy_turboquant.yml --limit timmy

Deployment Matrix

Node Hardware Model Preset
Mac (Timmy) M1, 16GB gemma-4-26B-A4B turboquant_k8v4
Allegro VPS 2 cores, 8GB gemma-4-E4B GGUF q4_0

Health Check

# Check local node
./health_check.sh localhost 8081

# Check remote node
./health_check.sh 192.168.1.100 8081

Role Variables

See roles/turboquant-deploy/defaults/main.yml for all configurable variables.

Key variables:

  • llama_cpp_port: Server port (default: 8081)
  • turboquant_kv_type: KV cache compression type (default: turbo4)
  • max_context_tokens: Maximum context length (default: 131072)
  • gemma4_model_filename: Model filename per node