[claude] Configure Qwen3-14B Q5_K_M as Timmy primary brain (#1064) #1145

claude · 2026-03-23T18:37:04Z

claude commented

2026-03-23 18:37:04 +00:00

Fixes #1064

Summary

Modelfile.timmy: rebased from Hermes4 LoRA GGUF to qwen3:14b; adds min_p 0.02, num_predict 4096, explicit stop tokens (<|im_end|>, <|im_start|>), and a full sovereign-AI system prompt matching the issue spec.
config.py: default ollama_model → timmy; ollama_num_ctx → 32768; qwen3:14b added as first text fallback.
config/providers.yaml: timmy promoted to default model (Qwen3-14B Q5_K_M); qwen3:14b added as named model; fallback chains (tools + text) updated to lead with timmy → qwen3:14b.
multimodal.py: qwen3, qwen3:14b, qwen3:30b, timmy, hermes4-14b added to KNOWN_MODEL_CAPABILITIES; timmy + qwen3:14b prepended to TOOLS fallback chain.
prompts.py: corrected "small 4096 token" context limitation comment to 32K.

Memory budget

Component	Size
Model (Q5_K_M)	~10.5 GB
32K KV cache	~7.0 GB
Total	~17.5 GB
Headroom on 28 GB usable	~10.5 GB free

Manual steps required on the M3 Max

ollama pull qwen3:14b
ollama create timmy -f Modelfile.timmy

Test plan

tox -e unit — 20 passed, 0 failed
ollama run timmy responds to tool-calling prompts with valid JSON
Memory stays under 18 GB with 32K context
Generation speed ~20–28 tok/s confirmed

🤖 Generated with Claude Code

Fixes #1064 ## Summary - **Modelfile.timmy**: rebased from Hermes4 LoRA GGUF to `qwen3:14b`; adds `min_p 0.02`, `num_predict 4096`, explicit stop tokens (`<|im_end|>`, `<|im_start|>`), and a full sovereign-AI system prompt matching the issue spec. - **config.py**: default `ollama_model` → `timmy`; `ollama_num_ctx` → 32768; `qwen3:14b` added as first text fallback. - **config/providers.yaml**: `timmy` promoted to default model (Qwen3-14B Q5_K_M); `qwen3:14b` added as named model; fallback chains (tools + text) updated to lead with `timmy` → `qwen3:14b`. - **multimodal.py**: `qwen3`, `qwen3:14b`, `qwen3:30b`, `timmy`, `hermes4-14b` added to `KNOWN_MODEL_CAPABILITIES`; `timmy` + `qwen3:14b` prepended to TOOLS fallback chain. - **prompts.py**: corrected "small 4096 token" context limitation comment to 32K. ## Memory budget | Component | Size | |-----------|------| | Model (Q5_K_M) | ~10.5 GB | | 32K KV cache | ~7.0 GB | | **Total** | **~17.5 GB** | | Headroom on 28 GB usable | ~10.5 GB free | ## Manual steps required on the M3 Max ```bash ollama pull qwen3:14b ollama create timmy -f Modelfile.timmy ``` ## Test plan - [x] `tox -e unit` — 20 passed, 0 failed - [ ] `ollama run timmy` responds to tool-calling prompts with valid JSON - [ ] Memory stays under 18 GB with 32K context - [ ] Generation speed ~20–28 tok/s confirmed 🤖 Generated with [Claude Code](https://claude.com/claude-code)

claude added 1 commit 2026-03-23 18:37:05 +00:00

feat: configure Qwen3-14B Q5_K_M as Timmy primary brain

Tests / lint (pull_request) Failing after 16s

Details

Tests / test (pull_request) Has been skipped

Details

9c916e1c5d

Fixes #1064

- Modelfile.timmy: rebase from ~/timmy-fused-model.gguf (Hermes4 LoRA)
  to qwen3:14b; add min_p 0.02, num_predict 4096, explicit stop tokens
  (<|im_end|>, <|im_start|>), and a full sovereign-AI system prompt.
  Memory budget: ~10.5 GB model + ~7 GB KV cache = ~17.5 GB at 32K ctx.

- config.py: change default ollama_model to "timmy", bump ollama_num_ctx
  to 32768 to match the Modelfile; add qwen3:14b as first text fallback.

- config/providers.yaml: promote "timmy" to default model (Qwen3-14B
  Q5_K_M); add qwen3:14b entry; refresh fallback_chains (tools + text)
  to lead with timmy → qwen3:14b; note Hermes4 LoRA path superseded.

- multimodal.py: add qwen3, qwen3:14b, qwen3:30b, timmy, hermes4-14b to
  KNOWN_MODEL_CAPABILITIES; add timmy + qwen3:14b to TOOLS fallback chain.

- prompts.py: correct "small 4096 token context" limitation to 32K.

Build commands (manual, run on the M3 Max):
  ollama pull qwen3:14b
  ollama create timmy -f Modelfile.timmy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude referenced this pull request

2026-03-23 18:37:14 +00:00

Install and configure Qwen3-14B Q5_K_M as Timmy primary brain #1064

Rockachopa commented

2026-03-23 18:43:22 +00:00

Good work on the Qwen3-14B config. Memory budget looks right for the M3 Max. However this PR has merge conflicts — please rebase on main and force-push. Will merge once green.

Rockachopa commented

2026-03-23 18:51:38 +00:00

LGTM — good config, memory budget checks out. Merge conflicts — please rebase on main and force-push.

Timmy closed this pull request

2026-03-23 19:11:58 +00:00

Tests / lint (pull_request) Failing after 16s

Details

Tests / test (pull_request) Has been skipped

Details

Pull request closed

Please reopen this pull request to perform a merge.

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#1145