Modelfile.hermes4-14b

# Modelfile.hermes4-14b
#
# NousResearch Hermes 4 14B — AutoLoRA base model (Project Bannerlord, Step 2)
#
# Features: native tool calling, hybrid reasoning (<think> tags), structured
# JSON output, neutral alignment. Built to serve as the LoRA fine-tuning base.
#
# Build:
#   # Download GGUF from HuggingFace first:
#   #   https://huggingface.co/collections/NousResearch/hermes-4-collection-68a7
#   #   Pick: NousResearch-Hermes-4-14B-Q5_K_M.gguf (or Q4_K_M for less RAM)
#   ollama create hermes4-14b -f Modelfile.hermes4-14b
#
# Or if hermes4 lands on Ollama registry directly:
#   ollama pull hermes4:14b
#   ollama create hermes4-14b -f Modelfile.hermes4-14b
#
# Memory budget: ~9 GB at Q4_K_M, ~11 GB at Q5_K_M — leaves headroom on 36 GB M3 Max
# Context:       32K comfortable (128K theoretical)
# Primary use:   AutoLoRA base before fine-tuning on Timmy skill set

# --- Option A: import local GGUF (uncomment and set correct path) ---
# FROM /path/to/NousResearch-Hermes-4-14B-Q5_K_M.gguf

# --- Option B: build from Ollama registry model (if available) ---
FROM hermes4:14b

# Context window — 32K leaves ~20 GB headroom for KV cache on M3 Max
PARAMETER num_ctx 32768

# Tool-calling temperature — lower for reliable structured output
PARAMETER temperature 0.3

# Nucleus sampling — balanced for reasoning + tool use
PARAMETER top_p 0.9

# Repeat penalty — prevents looping in structured output
PARAMETER repeat_penalty 1.05

# Stop tokens for Hermes 4 chat template (ChatML format)
# These are handled automatically by the model's tokenizer config,
# but listed here for reference.
# STOP "<|im_end|>"
# STOP "<|endoftext|>"

SYSTEM """You are Hermes, a helpful, honest, and harmless AI assistant.

You have access to tool calling. When you need to use a tool, output a JSON function call in the following format:
<tool_call>
{"name": "function_name", "arguments": {"param": "value"}}
</tool_call>

You support hybrid reasoning. When asked to think through a problem step-by-step, wrap your reasoning in <think> tags before giving your final answer.

Always provide structured, accurate responses."""
[claude] Add Hermes 4 14B Modelfile, providers config, and smoke test (#1101) (#1110) 2026-03-23 17:59:45 +00:00			`# Modelfile.hermes4-14b`
			`#`
			`# NousResearch Hermes 4 14B — AutoLoRA base model (Project Bannerlord, Step 2)`
			`#`
			`# Features: native tool calling, hybrid reasoning (<think> tags), structured`
			`# JSON output, neutral alignment. Built to serve as the LoRA fine-tuning base.`
			`#`
			`# Build:`
			`# # Download GGUF from HuggingFace first:`
			`# # https://huggingface.co/collections/NousResearch/hermes-4-collection-68a7`
			`# # Pick: NousResearch-Hermes-4-14B-Q5_K_M.gguf (or Q4_K_M for less RAM)`
			`# ollama create hermes4-14b -f Modelfile.hermes4-14b`
			`#`
			`# Or if hermes4 lands on Ollama registry directly:`
			`# ollama pull hermes4:14b`
			`# ollama create hermes4-14b -f Modelfile.hermes4-14b`
			`#`
			`# Memory budget: ~9 GB at Q4_K_M, ~11 GB at Q5_K_M — leaves headroom on 36 GB M3 Max`
			`# Context: 32K comfortable (128K theoretical)`
			`# Primary use: AutoLoRA base before fine-tuning on Timmy skill set`

			`# --- Option A: import local GGUF (uncomment and set correct path) ---`
			`# FROM /path/to/NousResearch-Hermes-4-14B-Q5_K_M.gguf`

			`# --- Option B: build from Ollama registry model (if available) ---`
			`FROM hermes4:14b`

			`# Context window — 32K leaves ~20 GB headroom for KV cache on M3 Max`
			`PARAMETER num_ctx 32768`

			`# Tool-calling temperature — lower for reliable structured output`
			`PARAMETER temperature 0.3`

			`# Nucleus sampling — balanced for reasoning + tool use`
			`PARAMETER top_p 0.9`

			`# Repeat penalty — prevents looping in structured output`
			`PARAMETER repeat_penalty 1.05`

			`# Stop tokens for Hermes 4 chat template (ChatML format)`
			`# These are handled automatically by the model's tokenizer config,`
			`# but listed here for reference.`
			`# STOP "<\|im_end\|>"`
			`# STOP "<\|endoftext\|>"`

			`SYSTEM """You are Hermes, a helpful, honest, and harmless AI assistant.`

			`You have access to tool calling. When you need to use a tool, output a JSON function call in the following format:`
			`<tool_call>`
			`{"name": "function_name", "arguments": {"param": "value"}}`
			`</tool_call>`

			`You support hybrid reasoning. When asked to think through a problem step-by-step, wrap your reasoning in <think> tags before giving your final answer.`

			`Always provide structured, accurate responses."""`