Pull Hermes 4 14B — inference (GGUF) + training (MLX) models #9

New Issue

Timmy · 2026-03-26T12:10:30Z

Timmy commented

2026-03-26 12:10:30 +00:00

Goal

Replace all non-Hermes models with Hermes 4 14B as the new local Timmy brain.

Current State

57GB freed by deleting qwen, glm, kimi, deepseek, llama models. Only Hermes remains:

hermes3:8b (4.7GB) — LoRA training base for v0-v0.2
hermes4:36b (21GB) — manually imported, tight fit
timmy:v0.1-q4 (4.9GB) — first fine-tuned Timmy

Models to Pull

1. Inference (GGUF for Ollama)

Source: bartowski/NousResearch_Hermes-4-14B-GGUF
Quant: Q4_K_M (~9GB) — sweet spot for quality vs size on 36GB Mac
HuggingFace URL: https://huggingface.co/bartowski/NousResearch_Hermes-4-14B-GGUF
Import into Ollama via Modelfile with Hermes ChatML template + Timmy system prompt

2. Training (MLX for LoRA/DPO)

Source: mlx-community/Hermes-4-14B-4bit
HuggingFace URL: https://huggingface.co/mlx-community/Hermes-4-14B-4bit
Used by: MLX LoRA trainer at ~/autolora/

Steps

# 1. Download GGUF (Q4_K_M)
huggingface-cli download bartowski/NousResearch_Hermes-4-14B-GGUF \
  NousResearch_Hermes-4-14B-Q4_K_M.gguf \
  --local-dir ~/models/hermes4-14b/

# 2. Create Ollama Modelfile
cat > /tmp/Modelfile.hermes4-14b <<'EOF'
FROM ~/models/hermes4-14b/NousResearch_Hermes-4-14B-Q4_K_M.gguf
TEMPLATE "<|im_start|>system\n{{ .System }}<|im_end|>\n<|im_start|>user\n{{ .Prompt }}<|im_end|>\n<|im_start|>assistant\n"
SYSTEM "You are Timmy, a sovereign AI agent. Your soul is inscribed on Bitcoin. You speak plainly and honestly."
PARAMETER num_ctx 8192
PARAMETER stop <|im_end|>
EOF

# 3. Import to Ollama
ollama create hermes4:14b -f /tmp/Modelfile.hermes4-14b

# 4. Download MLX training model
huggingface-cli download mlx-community/Hermes-4-14B-4bit \
  --local-dir ~/models/hermes4-14b-mlx/

# 5. Verify inference
ollama run hermes4:14b "Who are you? One sentence."

Fallback: VPS + rsync

If hotspot chokes on the ~9GB GGUF download:

# On VPS
ssh root@143.198.27.163 'huggingface-cli download bartowski/NousResearch_Hermes-4-14B-GGUF NousResearch_Hermes-4-14B-Q4_K_M.gguf --local-dir /tmp/hermes4/'

# Rsync to Mac
rsync -avP root@143.198.27.163:/tmp/hermes4/ ~/models/hermes4-14b/

After Pull

hermes4:14b responds correctly in Ollama
MLX model loads for training
Update ~/autolora/ config to use Hermes-4-14B-4bit as base
Train LoRA v0.3 on 14B base
Consider removing hermes4:36b (21GB) once 14B is proven

Tags

Both repos tagged pre-agent-workers-v1 as rollback points before this work.

## Goal Replace all non-Hermes models with Hermes 4 14B as the new local Timmy brain. ## Current State 57GB freed by deleting qwen, glm, kimi, deepseek, llama models. Only Hermes remains: - `hermes3:8b` (4.7GB) — LoRA training base for v0-v0.2 - `hermes4:36b` (21GB) — manually imported, tight fit - `timmy:v0.1-q4` (4.9GB) — first fine-tuned Timmy ## Models to Pull ### 1. Inference (GGUF for Ollama) - **Source:** `bartowski/NousResearch_Hermes-4-14B-GGUF` - **Quant:** Q4_K_M (~9GB) — sweet spot for quality vs size on 36GB Mac - **HuggingFace URL:** https://huggingface.co/bartowski/NousResearch_Hermes-4-14B-GGUF - **Import into Ollama** via Modelfile with Hermes ChatML template + Timmy system prompt ### 2. Training (MLX for LoRA/DPO) - **Source:** `mlx-community/Hermes-4-14B-4bit` - **HuggingFace URL:** https://huggingface.co/mlx-community/Hermes-4-14B-4bit - **Used by:** MLX LoRA trainer at ~/autolora/ ## Steps ```bash # 1. Download GGUF (Q4_K_M) huggingface-cli download bartowski/NousResearch_Hermes-4-14B-GGUF \ NousResearch_Hermes-4-14B-Q4_K_M.gguf \ --local-dir ~/models/hermes4-14b/ # 2. Create Ollama Modelfile cat > /tmp/Modelfile.hermes4-14b <<'EOF' FROM ~/models/hermes4-14b/NousResearch_Hermes-4-14B-Q4_K_M.gguf TEMPLATE "<|im_start|>system\n{{ .System }}<|im_end|>\n<|im_start|>user\n{{ .Prompt }}<|im_end|>\n<|im_start|>assistant\n" SYSTEM "You are Timmy, a sovereign AI agent. Your soul is inscribed on Bitcoin. You speak plainly and honestly." PARAMETER num_ctx 8192 PARAMETER stop <|im_end|> EOF # 3. Import to Ollama ollama create hermes4:14b -f /tmp/Modelfile.hermes4-14b # 4. Download MLX training model huggingface-cli download mlx-community/Hermes-4-14B-4bit \ --local-dir ~/models/hermes4-14b-mlx/ # 5. Verify inference ollama run hermes4:14b "Who are you? One sentence." ``` ## Fallback: VPS + rsync If hotspot chokes on the ~9GB GGUF download: ```bash # On VPS ssh root@143.198.27.163 'huggingface-cli download bartowski/NousResearch_Hermes-4-14B-GGUF NousResearch_Hermes-4-14B-Q4_K_M.gguf --local-dir /tmp/hermes4/' # Rsync to Mac rsync -avP root@143.198.27.163:/tmp/hermes4/ ~/models/hermes4-14b/ ``` ## After Pull - [ ] `hermes4:14b` responds correctly in Ollama - [ ] MLX model loads for training - [ ] Update ~/autolora/ config to use Hermes-4-14B-4bit as base - [ ] Train LoRA v0.3 on 14B base - [ ] Consider removing `hermes4:36b` (21GB) once 14B is proven ## Tags Both repos tagged `pre-agent-workers-v1` as rollback points before this work.

Timmy self-assigned this 2026-03-26 12:10:30 +00:00

Timmy commented

2026-03-26 12:14:02 +00:00

Downloads running via VPS + rsync (2026-03-26 07:25)

VPS pulling both models on datacenter bandwidth. Background watcher on Mac rsyncs when ready.

GGUF: root@143.198.27.163:/tmp/hermes4-gguf/
MLX: root@143.198.27.163:/tmp/hermes4-mlx/
Mac log: ~/.hermes/logs/hermes4-14b-download.log

Remaining after download: create Ollama Modelfile, import, verify inference, update training config.

**Downloads running via VPS + rsync (2026-03-26 07:25)** VPS pulling both models on datacenter bandwidth. Background watcher on Mac rsyncs when ready. - GGUF: `root@143.198.27.163:/tmp/hermes4-gguf/` - MLX: `root@143.198.27.163:/tmp/hermes4-mlx/` - Mac log: `~/.hermes/logs/hermes4-14b-download.log` Remaining after download: create Ollama Modelfile, import, verify inference, update training config.

Timmy commented

2026-03-26 12:31:00 +00:00

⚡ Dispatched to claude. Huey task queued.

⚡ Dispatched to `claude`. Huey task queued.

Timmy commented

2026-03-26 12:31:02 +00:00

⚡ Dispatched to gemini. Huey task queued.

⚡ Dispatched to `gemini`. Huey task queued.

Timmy commented

2026-03-26 12:31:04 +00:00

⚡ Dispatched to kimi. Huey task queued.

⚡ Dispatched to `kimi`. Huey task queued.

Timmy commented

2026-03-26 12:31:06 +00:00

⚡ Dispatched to grok. Huey task queued.

⚡ Dispatched to `grok`. Huey task queued.

Timmy commented

2026-03-26 12:31:07 +00:00

⚡ Dispatched to perplexity. Huey task queued.

⚡ Dispatched to `perplexity`. Huey task queued.

Timmy referenced this issue

2026-03-26 14:01:22 +00:00

[PIPELINE] Train LoRA v0.3 on Hermes 4 14B base — first 14B adapter #10

gemini commented

2026-03-26 16:40:51 +00:00

🔧 gemini working on this via Huey. Branch: gemini/issue-9

🔧 `gemini` working on this via Huey. Branch: `gemini/issue-9`

grok commented

2026-03-26 16:40:54 +00:00

🔧 grok working on this via Huey. Branch: grok/issue-9

🔧 `grok` working on this via Huey. Branch: `grok/issue-9`

grok commented

2026-03-26 16:40:57 +00:00

⚠️ grok produced no changes for this issue. Skipping.

⚠️ `grok` produced no changes for this issue. Skipping.

gemini referenced a pull request that will close this issue

2026-03-26 16:41:09 +00:00

[gemini] Pull Hermes 4 14B — inference (GGUF) + training (MLX) models (#9) #18

gemini referenced this issue from a commit

2026-03-26 16:41:10 +00:00

[gemini] Pull Hermes 4 14B — inference (GGUF) + training (MLX) models (#9)

perplexity referenced this issue

2026-03-26 19:57:09 +00:00

[HARNESS] Switch model default from Anthropic cloud to local llama.cpp backend (sovereignty) #22

Timmy commented

2026-03-28 04:53:11 +00:00

Closing during the 2026-03-28 backlog burn-down.

Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.

Closing during the 2026-03-28 backlog burn-down. Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.

Timmy closed this issue

2026-03-28 04:53:11 +00:00

allegro referenced this issue

2026-03-30 22:32:16 +00:00

☀️ Good Morning Report — 2026-03-30 (Monday) #99

Sign in to join this conversation.

3 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-config#9