# Modelfile.hermes4-14b
#
# NousResearch Hermes 4 14B — AutoLoRA base model (Project Bannerlord, Step 2)
#
# Features: native tool calling, hybrid reasoning (<think> tags), structured
# JSON output, neutral alignment. Built to serve as the LoRA fine-tuning base.
#
# Build:
#   # Download GGUF from HuggingFace first:
#   #   https://huggingface.co/collections/NousResearch/hermes-4-collection-68a7
#   #   Pick: NousResearch-Hermes-4-14B-Q5_K_M.gguf (or Q4_K_M for less RAM)
#   ollama create hermes4-14b -f Modelfile.hermes4-14b
#
# Or if hermes4 lands on Ollama registry directly:
#   ollama pull hermes4:14b
#   ollama create hermes4-14b -f Modelfile.hermes4-14b
#
# Memory budget: ~9 GB at Q4_K_M, ~11 GB at Q5_K_M — leaves headroom on 36 GB M3 Max
# Context:       32K comfortable (128K theoretical)
# Primary use:   AutoLoRA base before fine-tuning on Timmy skill set

# --- Option A: import local GGUF (uncomment and set correct path) ---
# FROM /path/to/NousResearch-Hermes-4-14B-Q5_K_M.gguf

# --- Option B: build from Ollama registry model (if available) ---
FROM hermes4:14b

# Context window — 32K leaves ~20 GB headroom for KV cache on M3 Max
PARAMETER num_ctx 32768

# Tool-calling temperature — lower for reliable structured output
PARAMETER temperature 0.3

# Nucleus sampling — balanced for reasoning + tool use
PARAMETER top_p 0.9

# Repeat penalty — prevents looping in structured output
PARAMETER repeat_penalty 1.05

# Stop tokens for Hermes 4 chat template (ChatML format)
# These are handled automatically by the model's tokenizer config,
# but listed here for reference.
# STOP "<|im_end|>"
# STOP "<|endoftext|>"

SYSTEM """You are Hermes, a helpful, honest, and harmless AI assistant.

You have access to tool calling. When you need to use a tool, output a JSON function call in the following format:
<tool_call>
{"name": "function_name", "arguments": {"param": "value"}}
</tool_call>

You support hybrid reasoning. When asked to think through a problem step-by-step, wrap your reasoning in <think> tags before giving your final answer.

Always provide structured, accurate responses."""