[claude] Configure Qwen3-14B Q5_K_M as Timmy primary brain (#1064) #1145

Closed
claude wants to merge 1 commits from claude/issue-1064 into main
Collaborator

Fixes #1064

Summary

  • Modelfile.timmy: rebased from Hermes4 LoRA GGUF to qwen3:14b; adds min_p 0.02, num_predict 4096, explicit stop tokens (<|im_end|>, <|im_start|>), and a full sovereign-AI system prompt matching the issue spec.
  • config.py: default ollama_modeltimmy; ollama_num_ctx → 32768; qwen3:14b added as first text fallback.
  • config/providers.yaml: timmy promoted to default model (Qwen3-14B Q5_K_M); qwen3:14b added as named model; fallback chains (tools + text) updated to lead with timmyqwen3:14b.
  • multimodal.py: qwen3, qwen3:14b, qwen3:30b, timmy, hermes4-14b added to KNOWN_MODEL_CAPABILITIES; timmy + qwen3:14b prepended to TOOLS fallback chain.
  • prompts.py: corrected "small 4096 token" context limitation comment to 32K.

Memory budget

Component Size
Model (Q5_K_M) ~10.5 GB
32K KV cache ~7.0 GB
Total ~17.5 GB
Headroom on 28 GB usable ~10.5 GB free

Manual steps required on the M3 Max

ollama pull qwen3:14b
ollama create timmy -f Modelfile.timmy

Test plan

  • tox -e unit — 20 passed, 0 failed
  • ollama run timmy responds to tool-calling prompts with valid JSON
  • Memory stays under 18 GB with 32K context
  • Generation speed ~20–28 tok/s confirmed

🤖 Generated with Claude Code

Fixes #1064 ## Summary - **Modelfile.timmy**: rebased from Hermes4 LoRA GGUF to `qwen3:14b`; adds `min_p 0.02`, `num_predict 4096`, explicit stop tokens (`<|im_end|>`, `<|im_start|>`), and a full sovereign-AI system prompt matching the issue spec. - **config.py**: default `ollama_model` → `timmy`; `ollama_num_ctx` → 32768; `qwen3:14b` added as first text fallback. - **config/providers.yaml**: `timmy` promoted to default model (Qwen3-14B Q5_K_M); `qwen3:14b` added as named model; fallback chains (tools + text) updated to lead with `timmy` → `qwen3:14b`. - **multimodal.py**: `qwen3`, `qwen3:14b`, `qwen3:30b`, `timmy`, `hermes4-14b` added to `KNOWN_MODEL_CAPABILITIES`; `timmy` + `qwen3:14b` prepended to TOOLS fallback chain. - **prompts.py**: corrected "small 4096 token" context limitation comment to 32K. ## Memory budget | Component | Size | |-----------|------| | Model (Q5_K_M) | ~10.5 GB | | 32K KV cache | ~7.0 GB | | **Total** | **~17.5 GB** | | Headroom on 28 GB usable | ~10.5 GB free | ## Manual steps required on the M3 Max ```bash ollama pull qwen3:14b ollama create timmy -f Modelfile.timmy ``` ## Test plan - [x] `tox -e unit` — 20 passed, 0 failed - [ ] `ollama run timmy` responds to tool-calling prompts with valid JSON - [ ] Memory stays under 18 GB with 32K context - [ ] Generation speed ~20–28 tok/s confirmed 🤖 Generated with [Claude Code](https://claude.com/claude-code)
claude added 1 commit 2026-03-23 18:37:05 +00:00
feat: configure Qwen3-14B Q5_K_M as Timmy primary brain
Some checks failed
Tests / lint (pull_request) Failing after 16s
Tests / test (pull_request) Has been skipped
9c916e1c5d
Fixes #1064

- Modelfile.timmy: rebase from ~/timmy-fused-model.gguf (Hermes4 LoRA)
  to qwen3:14b; add min_p 0.02, num_predict 4096, explicit stop tokens
  (<|im_end|>, <|im_start|>), and a full sovereign-AI system prompt.
  Memory budget: ~10.5 GB model + ~7 GB KV cache = ~17.5 GB at 32K ctx.

- config.py: change default ollama_model to "timmy", bump ollama_num_ctx
  to 32768 to match the Modelfile; add qwen3:14b as first text fallback.

- config/providers.yaml: promote "timmy" to default model (Qwen3-14B
  Q5_K_M); add qwen3:14b entry; refresh fallback_chains (tools + text)
  to lead with timmy → qwen3:14b; note Hermes4 LoRA path superseded.

- multimodal.py: add qwen3, qwen3:14b, qwen3:30b, timmy, hermes4-14b to
  KNOWN_MODEL_CAPABILITIES; add timmy + qwen3:14b to TOOLS fallback chain.

- prompts.py: correct "small 4096 token context" limitation to 32K.

Build commands (manual, run on the M3 Max):
  ollama pull qwen3:14b
  ollama create timmy -f Modelfile.timmy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Owner

Good work on the Qwen3-14B config. Memory budget looks right for the M3 Max. However this PR has merge conflicts — please rebase on main and force-push. Will merge once green.

Good work on the Qwen3-14B config. Memory budget looks right for the M3 Max. However this PR has merge conflicts — please rebase on main and force-push. Will merge once green.
Owner

LGTM — good config, memory budget checks out. Merge conflicts — please rebase on main and force-push.

LGTM — good config, memory budget checks out. Merge conflicts — please rebase on main and force-push.
Timmy closed this pull request 2026-03-23 19:11:58 +00:00
Some checks failed
Tests / lint (pull_request) Failing after 16s
Tests / test (pull_request) Has been skipped

Pull request closed

Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#1145