Install and configure Qwen3-14B Q5_K_M as Timmy primary brain #1064

Closed
opened 2026-03-23 12:52:11 +00:00 by perplexity · 1 comment
Collaborator

Parent: #1063

Objective

Set up Qwen3-14B Q5_K_M via Ollama as Timmy's primary local LLM on the M3 Max 36GB.

Steps

  1. ollama pull qwen3:14b
  2. Create Modelfile.timmy with the optimized parameters from the PDF:
    • num_ctx 32768, temperature 0.3, top_p 0.9, min_p 0.02
    • repeat_penalty 1.1, num_predict 4096
    • Stop tokens: <|im_end|>, <|im_start|>
    • Full system prompt defining Timmy's capabilities and operating rules
  3. ollama create timmy -f Modelfile.timmy
  4. Verify tool calling works via curl API test
  5. Verify JSON schema-constrained output works

Acceptance Criteria

  • ollama run timmy responds to tool-calling prompts with valid JSON
  • Memory usage stays under 18GB with 32K context
  • Generation speed ~20–28 tok/s confirmed

Memory Budget

  • Model: ~10.5 GB
  • 32K context KV cache: ~7 GB
  • Total: ~17.5 GB
  • Headroom on 28GB usable: ~10.5 GB free
Parent: #1063 ## Objective Set up Qwen3-14B Q5_K_M via Ollama as Timmy's primary local LLM on the M3 Max 36GB. ## Steps 1. `ollama pull qwen3:14b` 2. Create `Modelfile.timmy` with the optimized parameters from the PDF: - `num_ctx 32768`, `temperature 0.3`, `top_p 0.9`, `min_p 0.02` - `repeat_penalty 1.1`, `num_predict 4096` - Stop tokens: `<|im_end|>`, `<|im_start|>` - Full system prompt defining Timmy's capabilities and operating rules 3. `ollama create timmy -f Modelfile.timmy` 4. Verify tool calling works via curl API test 5. Verify JSON schema-constrained output works ## Acceptance Criteria - `ollama run timmy` responds to tool-calling prompts with valid JSON - Memory usage stays under 18GB with 32K context - Generation speed ~20–28 tok/s confirmed ## Memory Budget - Model: ~10.5 GB - 32K context KV cache: ~7 GB - Total: ~17.5 GB - Headroom on 28GB usable: ~10.5 GB free
claude added the harnessinferencep0-critical labels 2026-03-23 13:52:57 +00:00
claude self-assigned this 2026-03-23 14:01:40 +00:00
Collaborator

PR #1145 created.

Changes:

  • Modelfile.timmy: rebased to qwen3:14b; parameters per issue spec (num_ctx 32768, temperature 0.3, top_p 0.9, min_p 0.02, repeat_penalty 1.1, num_predict 4096, stop tokens <|im_end|> / <|im_start|>); full sovereign-AI system prompt.
  • config.py: default model → timmy; context → 32768; qwen3:14b first in fallback chain.
  • config/providers.yaml: timmy as default model; qwen3:14b added; fallback chains updated.
  • multimodal.py: capability entries added for qwen3:14b, qwen3:30b, timmy, hermes4-14b.
  • prompts.py: context window limitation updated from 4096 → 32K.

Manual steps still needed on the M3 Max:

ollama pull qwen3:14b
ollama create timmy -f Modelfile.timmy

All unit tests pass (20/20).

PR #1145 created. **Changes:** - `Modelfile.timmy`: rebased to `qwen3:14b`; parameters per issue spec (`num_ctx 32768`, `temperature 0.3`, `top_p 0.9`, `min_p 0.02`, `repeat_penalty 1.1`, `num_predict 4096`, stop tokens `<|im_end|>` / `<|im_start|>`); full sovereign-AI system prompt. - `config.py`: default model → `timmy`; context → 32768; `qwen3:14b` first in fallback chain. - `config/providers.yaml`: `timmy` as default model; `qwen3:14b` added; fallback chains updated. - `multimodal.py`: capability entries added for `qwen3:14b`, `qwen3:30b`, `timmy`, `hermes4-14b`. - `prompts.py`: context window limitation updated from 4096 → 32K. **Manual steps still needed on the M3 Max:** ``` ollama pull qwen3:14b ollama create timmy -f Modelfile.timmy ``` All unit tests pass (20/20).
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#1064