# Modelfile.qwen3-8b # # Qwen3-8B Q6_K — Fast routing model for routine agent tasks (Issue #1063) # # Tool calling F1: 0.933 at ~45-55 tok/s — 2x speed of Qwen3-14B. # Use for: simple tool calls, shell commands, file reads, status checks, JSON ops. # Route complex tasks (issue triage, multi-step planning, code review) to qwen3:14b. # # Build: # ollama pull qwen3:8b # ollama create qwen3-8b -f Modelfile.qwen3-8b # # Memory budget: ~6.6 GB weights + ~5 GB KV cache = ~11.6 GB at 32K ctx # Two-model strategy: ~17 GB combined (both hot) — fits on M3 Max 36 GB. # Set OLLAMA_MAX_LOADED_MODELS=2 in the Ollama environment. # # Generation: ~35-45 tok/s (Ollama) / ~45-60 tok/s (MLX) FROM qwen3:8b # 32K context PARAMETER num_ctx 32768 # Lower temperature for fast, deterministic tool execution PARAMETER temperature 0.2 # Nucleus sampling PARAMETER top_p 0.9 # Repeat penalty PARAMETER repeat_penalty 1.05 SYSTEM """You are Timmy's fast-routing agent. You handle routine tasks quickly and precisely. For simple tasks (tool calls, shell commands, file reads, status checks, JSON ops): respond immediately without a think block. For anything requiring multi-step planning: defer to the primary agent. Tool call format: {"name": "function_name", "arguments": {"param": "value"}} Be brief. Be accurate. Execute."""