[AutoLoRA P0] Download and Test Hermes 4 Base Model #1101

Closed
opened 2026-03-23 17:29:41 +00:00 by perplexity · 1 comment
Collaborator

Download and Test Hermes 4 Base Model

Priority: P0-Critical
Assignee: Alexander
Epic: #1091 — Project Bannerlord
Pipeline: AutoLoRA Sovereignty Loop (Step 2 of 7)


Context

NousResearch released Hermes 4 with native tool calling, hybrid reasoning (toggleable <think> tags), structured JSON output, and neutral alignment. It was built to be the brain inside the Hermes harness. We use Hermes 4 as the base and fine-tune a LoRA on top.

Model Options for 36GB M3 Max

Model Size (Q4_K_M) Context Best For
Hermes 4.3 Seed 36B ~21GB 8K comfortable Best quality that fits
Hermes 4 14B ~9GB 32K comfortable Fastest with most headroom
Hermes 3 8B ~5GB 32K+ easy Fast mode / fallback

Hermes 4 14B is the safe bet. Hermes 4.3 36B is the stretch goal.

What To Do

1. Check if Hermes 4 is on Ollama yet:

ollama list | grep hermes
ollama pull hermes3:8b  # baseline reference

2. If not on Ollama, get GGUF from HuggingFace:

# Go to: https://huggingface.co/collections/NousResearch/hermes-4-collection-68a7
# Find the 14B variant, download Q5_K_M GGUF
# Import into Ollama:
ollama create hermes4-14b -f Modelfile
# Where Modelfile contains:
# FROM ./hermes-4-14b.Q5_K_M.gguf
# PARAMETER num_ctx 32768
# PARAMETER temperature 0.3

3. Try the 36B stretch goal (optional):

# Hermes 4.3 Seed 36B — Download Q4_K_M GGUF (~21GB)
# Test with num_ctx 8192 first (lower context = less memory)

4. Smoke test:

ollama run hermes4-14b "You are Timmy, an AI agent. List the open issues at http://143.198.27.163:3000"

5. Test tool calling:

curl -s http://localhost:11434/api/chat -d '{
  "model": "hermes4-14b",
  "messages": [{"role": "user", "content": "Read the file at /tmp/test.txt"}],
  "tools": [{"type": "function", "function": {"name": "read_file", "description": "Read contents of a file", "parameters": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}}}],
  "stream": false
}' | python3 -m json.tool

6. Test IN the Hermes harness:

hermes model hermes4-14b
# Try a few skills, note which WORK vs FAIL

Done When

  • Hermes 4 model downloaded and runs via Ollama
  • Memory usage confirmed under 28GB with model loaded
  • Tool calling produces structured output (not raw text)
  • Model tested inside the Hermes harness
  • List of skills that WORK vs skills that FAIL with base model (before fine-tuning)

The fail list becomes the fine-tuning target.

## Download and Test Hermes 4 Base Model **Priority:** P0-Critical **Assignee:** Alexander **Epic:** #1091 — Project Bannerlord **Pipeline:** AutoLoRA Sovereignty Loop (Step 2 of 7) --- ### Context NousResearch released Hermes 4 with native tool calling, hybrid reasoning (toggleable `<think>` tags), structured JSON output, and neutral alignment. It was built to be the brain inside the Hermes harness. We use Hermes 4 as the base and fine-tune a LoRA on top. ### Model Options for 36GB M3 Max | Model | Size (Q4_K_M) | Context | Best For | |-------|--------------|---------|----------| | Hermes 4.3 Seed 36B | ~21GB | 8K comfortable | Best quality that fits | | **Hermes 4 14B** | **~9GB** | **32K comfortable** | **Fastest with most headroom** | | Hermes 3 8B | ~5GB | 32K+ easy | Fast mode / fallback | **Hermes 4 14B is the safe bet.** Hermes 4.3 36B is the stretch goal. ### What To Do **1. Check if Hermes 4 is on Ollama yet:** ```bash ollama list | grep hermes ollama pull hermes3:8b # baseline reference ``` **2. If not on Ollama, get GGUF from HuggingFace:** ```bash # Go to: https://huggingface.co/collections/NousResearch/hermes-4-collection-68a7 # Find the 14B variant, download Q5_K_M GGUF # Import into Ollama: ollama create hermes4-14b -f Modelfile # Where Modelfile contains: # FROM ./hermes-4-14b.Q5_K_M.gguf # PARAMETER num_ctx 32768 # PARAMETER temperature 0.3 ``` **3. Try the 36B stretch goal (optional):** ```bash # Hermes 4.3 Seed 36B — Download Q4_K_M GGUF (~21GB) # Test with num_ctx 8192 first (lower context = less memory) ``` **4. Smoke test:** ```bash ollama run hermes4-14b "You are Timmy, an AI agent. List the open issues at http://143.198.27.163:3000" ``` **5. Test tool calling:** ```bash curl -s http://localhost:11434/api/chat -d '{ "model": "hermes4-14b", "messages": [{"role": "user", "content": "Read the file at /tmp/test.txt"}], "tools": [{"type": "function", "function": {"name": "read_file", "description": "Read contents of a file", "parameters": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}}}], "stream": false }' | python3 -m json.tool ``` **6. Test IN the Hermes harness:** ```bash hermes model hermes4-14b # Try a few skills, note which WORK vs FAIL ``` ### Done When - [ ] Hermes 4 model downloaded and runs via Ollama - [ ] Memory usage confirmed under 28GB with model loaded - [ ] Tool calling produces structured output (not raw text) - [ ] Model tested inside the Hermes harness - [ ] List of skills that WORK vs skills that FAIL with base model (before fine-tuning) **The fail list becomes the fine-tuning target.**
claude self-assigned this 2026-03-23 17:54:09 +00:00
Collaborator

PR created: http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/pulls/1110

What was done

Added all code infrastructure for Hermes 4 14B as the AutoLoRA base model:

  1. Modelfile.hermes4-14b — Ollama Modelfile for NousResearch Hermes 4 14B. Configured for tool-calling use: 32K context, temperature=0.3, ChatML format, <think> reasoning support. Includes instructions for both GGUF import and direct registry pull.

  2. config/providers.yaml — Registered hermes4-14b and hermes4-36b as model entries on ollama-local. hermes4-14b is now first in the tools fallback chain (native tool calling + structured JSON output).

  3. scripts/test_hermes4.py — 5-test smoke suite covering all "Done When" criteria:

    • Model availability check
    • Basic response / latency
    • Memory usage gate (< 28 GB)
    • Tool calling — structured tool_calls output (not raw text)
    • Timmy-persona acceptance

Next steps (manual)

# Download Q5_K_M GGUF from HuggingFace NousResearch Hermes 4 collection
# Edit Modelfile.hermes4-14b to point FROM at the local path
ollama create hermes4-14b -f Modelfile.hermes4-14b

# Run smoke tests:
python scripts/test_hermes4.py --model hermes4-14b

The FAIL list from test_hermes4.py becomes the fine-tuning target for the LoRA step.

PR created: http://143.198.27.163:3000/rockachopa/Timmy-time-dashboard/pulls/1110 ## What was done Added all code infrastructure for Hermes 4 14B as the AutoLoRA base model: 1. **`Modelfile.hermes4-14b`** — Ollama Modelfile for NousResearch Hermes 4 14B. Configured for tool-calling use: 32K context, temperature=0.3, ChatML format, `<think>` reasoning support. Includes instructions for both GGUF import and direct registry pull. 2. **`config/providers.yaml`** — Registered `hermes4-14b` and `hermes4-36b` as model entries on `ollama-local`. `hermes4-14b` is now first in the `tools` fallback chain (native tool calling + structured JSON output). 3. **`scripts/test_hermes4.py`** — 5-test smoke suite covering all "Done When" criteria: - Model availability check - Basic response / latency - Memory usage gate (< 28 GB) - Tool calling — structured `tool_calls` output (not raw text) - Timmy-persona acceptance ## Next steps (manual) ```bash # Download Q5_K_M GGUF from HuggingFace NousResearch Hermes 4 collection # Edit Modelfile.hermes4-14b to point FROM at the local path ollama create hermes4-14b -f Modelfile.hermes4-14b # Run smoke tests: python scripts/test_hermes4.py --model hermes4-14b ``` The FAIL list from `test_hermes4.py` becomes the fine-tuning target for the LoRA step.
Sign in to join this conversation.
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#1101