forked from Rockachopa/Timmy-time-dashboard
52 lines
2.1 KiB
Plaintext
52 lines
2.1 KiB
Plaintext
# Modelfile.qwen3-14b
|
|
#
|
|
# Qwen3-14B Q5_K_M — Primary local agent model (Issue #1063)
|
|
#
|
|
# Tool calling F1: 0.971 — GPT-4-class structured output reliability.
|
|
# Hybrid thinking/non-thinking mode: toggle per-request via /think or /no_think
|
|
# in the prompt for planning vs rapid execution.
|
|
#
|
|
# Build:
|
|
# ollama pull qwen3:14b # downloads Q4_K_M (~8.2 GB) by default
|
|
# # For Q5_K_M (~10.5 GB, recommended):
|
|
# # ollama pull bartowski/Qwen3-14B-GGUF:Q5_K_M
|
|
# ollama create qwen3-14b -f Modelfile.qwen3-14b
|
|
#
|
|
# Memory budget: ~10.5 GB weights + ~7 GB KV cache = ~17.5 GB total at 32K ctx
|
|
# Headroom on M3 Max 36 GB: ~10.5 GB free (enough to run qwen3:8b simultaneously)
|
|
# Generation: ~20-28 tok/s (Ollama) / ~28-38 tok/s (MLX)
|
|
# Context: 32K native, extensible to 131K with YaRN
|
|
#
|
|
# Two-model strategy: set OLLAMA_MAX_LOADED_MODELS=2 so qwen3:8b stays
|
|
# hot for fast routing while qwen3:14b handles complex tasks.
|
|
|
|
FROM qwen3:14b
|
|
|
|
# 32K context — optimal balance of quality and memory on M3 Max 36 GB.
|
|
# At 32K, total memory (weights + KV cache) is ~17.5 GB — well within budget.
|
|
# Extend to 131K with YaRN if needed: PARAMETER rope_scaling_type yarn
|
|
PARAMETER num_ctx 32768
|
|
|
|
# Tool-calling temperature — lower = more reliable structured JSON output.
|
|
# Raise to 0.7+ for creative/narrative tasks.
|
|
PARAMETER temperature 0.3
|
|
|
|
# Nucleus sampling
|
|
PARAMETER top_p 0.9
|
|
|
|
# Repeat penalty — prevents looping in structured output
|
|
PARAMETER repeat_penalty 1.05
|
|
|
|
SYSTEM """You are Timmy, Alexander's personal sovereign AI agent.
|
|
|
|
You are concise, direct, and helpful. You complete tasks efficiently and report results clearly. You do not add unnecessary caveats or disclaimers.
|
|
|
|
You have access to tool calling. When you need to use a tool, output a valid JSON function call:
|
|
<tool_call>
|
|
{"name": "function_name", "arguments": {"param": "value"}}
|
|
</tool_call>
|
|
|
|
You support hybrid reasoning. For complex planning, include <think>...</think> before your answer. For rapid execution (simple tool calls, status checks), skip the think block.
|
|
|
|
You always start your responses with "Timmy here:" when acting as an agent."""
|