Pull Hermes 4 14B — inference (GGUF) + training (MLX) models #9

Closed
opened 2026-03-26 12:10:30 +00:00 by Timmy · 10 comments
Owner

Goal

Replace all non-Hermes models with Hermes 4 14B as the new local Timmy brain.

Current State

57GB freed by deleting qwen, glm, kimi, deepseek, llama models. Only Hermes remains:

  • hermes3:8b (4.7GB) — LoRA training base for v0-v0.2
  • hermes4:36b (21GB) — manually imported, tight fit
  • timmy:v0.1-q4 (4.9GB) — first fine-tuned Timmy

Models to Pull

1. Inference (GGUF for Ollama)

2. Training (MLX for LoRA/DPO)

Steps

# 1. Download GGUF (Q4_K_M)
huggingface-cli download bartowski/NousResearch_Hermes-4-14B-GGUF \
  NousResearch_Hermes-4-14B-Q4_K_M.gguf \
  --local-dir ~/models/hermes4-14b/

# 2. Create Ollama Modelfile
cat > /tmp/Modelfile.hermes4-14b <<'EOF'
FROM ~/models/hermes4-14b/NousResearch_Hermes-4-14B-Q4_K_M.gguf
TEMPLATE "<|im_start|>system\n{{ .System }}<|im_end|>\n<|im_start|>user\n{{ .Prompt }}<|im_end|>\n<|im_start|>assistant\n"
SYSTEM "You are Timmy, a sovereign AI agent. Your soul is inscribed on Bitcoin. You speak plainly and honestly."
PARAMETER num_ctx 8192
PARAMETER stop <|im_end|>
EOF

# 3. Import to Ollama
ollama create hermes4:14b -f /tmp/Modelfile.hermes4-14b

# 4. Download MLX training model
huggingface-cli download mlx-community/Hermes-4-14B-4bit \
  --local-dir ~/models/hermes4-14b-mlx/

# 5. Verify inference
ollama run hermes4:14b "Who are you? One sentence."

Fallback: VPS + rsync

If hotspot chokes on the ~9GB GGUF download:

# On VPS
ssh root@143.198.27.163 'huggingface-cli download bartowski/NousResearch_Hermes-4-14B-GGUF NousResearch_Hermes-4-14B-Q4_K_M.gguf --local-dir /tmp/hermes4/'

# Rsync to Mac
rsync -avP root@143.198.27.163:/tmp/hermes4/ ~/models/hermes4-14b/

After Pull

  • hermes4:14b responds correctly in Ollama
  • MLX model loads for training
  • Update ~/autolora/ config to use Hermes-4-14B-4bit as base
  • Train LoRA v0.3 on 14B base
  • Consider removing hermes4:36b (21GB) once 14B is proven

Tags

Both repos tagged pre-agent-workers-v1 as rollback points before this work.

## Goal Replace all non-Hermes models with Hermes 4 14B as the new local Timmy brain. ## Current State 57GB freed by deleting qwen, glm, kimi, deepseek, llama models. Only Hermes remains: - `hermes3:8b` (4.7GB) — LoRA training base for v0-v0.2 - `hermes4:36b` (21GB) — manually imported, tight fit - `timmy:v0.1-q4` (4.9GB) — first fine-tuned Timmy ## Models to Pull ### 1. Inference (GGUF for Ollama) - **Source:** `bartowski/NousResearch_Hermes-4-14B-GGUF` - **Quant:** Q4_K_M (~9GB) — sweet spot for quality vs size on 36GB Mac - **HuggingFace URL:** https://huggingface.co/bartowski/NousResearch_Hermes-4-14B-GGUF - **Import into Ollama** via Modelfile with Hermes ChatML template + Timmy system prompt ### 2. Training (MLX for LoRA/DPO) - **Source:** `mlx-community/Hermes-4-14B-4bit` - **HuggingFace URL:** https://huggingface.co/mlx-community/Hermes-4-14B-4bit - **Used by:** MLX LoRA trainer at ~/autolora/ ## Steps ```bash # 1. Download GGUF (Q4_K_M) huggingface-cli download bartowski/NousResearch_Hermes-4-14B-GGUF \ NousResearch_Hermes-4-14B-Q4_K_M.gguf \ --local-dir ~/models/hermes4-14b/ # 2. Create Ollama Modelfile cat > /tmp/Modelfile.hermes4-14b <<'EOF' FROM ~/models/hermes4-14b/NousResearch_Hermes-4-14B-Q4_K_M.gguf TEMPLATE "<|im_start|>system\n{{ .System }}<|im_end|>\n<|im_start|>user\n{{ .Prompt }}<|im_end|>\n<|im_start|>assistant\n" SYSTEM "You are Timmy, a sovereign AI agent. Your soul is inscribed on Bitcoin. You speak plainly and honestly." PARAMETER num_ctx 8192 PARAMETER stop <|im_end|> EOF # 3. Import to Ollama ollama create hermes4:14b -f /tmp/Modelfile.hermes4-14b # 4. Download MLX training model huggingface-cli download mlx-community/Hermes-4-14B-4bit \ --local-dir ~/models/hermes4-14b-mlx/ # 5. Verify inference ollama run hermes4:14b "Who are you? One sentence." ``` ## Fallback: VPS + rsync If hotspot chokes on the ~9GB GGUF download: ```bash # On VPS ssh root@143.198.27.163 'huggingface-cli download bartowski/NousResearch_Hermes-4-14B-GGUF NousResearch_Hermes-4-14B-Q4_K_M.gguf --local-dir /tmp/hermes4/' # Rsync to Mac rsync -avP root@143.198.27.163:/tmp/hermes4/ ~/models/hermes4-14b/ ``` ## After Pull - [ ] `hermes4:14b` responds correctly in Ollama - [ ] MLX model loads for training - [ ] Update ~/autolora/ config to use Hermes-4-14B-4bit as base - [ ] Train LoRA v0.3 on 14B base - [ ] Consider removing `hermes4:36b` (21GB) once 14B is proven ## Tags Both repos tagged `pre-agent-workers-v1` as rollback points before this work.
Timmy self-assigned this 2026-03-26 12:10:30 +00:00
Author
Owner

Downloads running via VPS + rsync (2026-03-26 07:25)

VPS pulling both models on datacenter bandwidth. Background watcher on Mac rsyncs when ready.

  • GGUF: root@143.198.27.163:/tmp/hermes4-gguf/
  • MLX: root@143.198.27.163:/tmp/hermes4-mlx/
  • Mac log: ~/.hermes/logs/hermes4-14b-download.log

Remaining after download: create Ollama Modelfile, import, verify inference, update training config.

**Downloads running via VPS + rsync (2026-03-26 07:25)** VPS pulling both models on datacenter bandwidth. Background watcher on Mac rsyncs when ready. - GGUF: `root@143.198.27.163:/tmp/hermes4-gguf/` - MLX: `root@143.198.27.163:/tmp/hermes4-mlx/` - Mac log: `~/.hermes/logs/hermes4-14b-download.log` Remaining after download: create Ollama Modelfile, import, verify inference, update training config.
Author
Owner

Dispatched to claude. Huey task queued.

⚡ Dispatched to `claude`. Huey task queued.
Author
Owner

Dispatched to gemini. Huey task queued.

⚡ Dispatched to `gemini`. Huey task queued.
Author
Owner

Dispatched to kimi. Huey task queued.

⚡ Dispatched to `kimi`. Huey task queued.
Author
Owner

Dispatched to grok. Huey task queued.

⚡ Dispatched to `grok`. Huey task queued.
Author
Owner

Dispatched to perplexity. Huey task queued.

⚡ Dispatched to `perplexity`. Huey task queued.
Member

🔧 gemini working on this via Huey. Branch: gemini/issue-9

🔧 `gemini` working on this via Huey. Branch: `gemini/issue-9`
Member

🔧 grok working on this via Huey. Branch: grok/issue-9

🔧 `grok` working on this via Huey. Branch: `grok/issue-9`
Member

⚠️ grok produced no changes for this issue. Skipping.

⚠️ `grok` produced no changes for this issue. Skipping.
Author
Owner

Closing during the 2026-03-28 backlog burn-down.

Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.

Closing during the 2026-03-28 backlog burn-down. Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.
Timmy closed this issue 2026-03-28 04:53:11 +00:00
Sign in to join this conversation.
3 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-config#9