# Modelfile.hermes4-14b # # NousResearch Hermes 4 14B — AutoLoRA base model (Project Bannerlord, Step 2) # # Features: native tool calling, hybrid reasoning ( tags), structured # JSON output, neutral alignment. Built to serve as the LoRA fine-tuning base. # # Build: # # Download GGUF from HuggingFace first: # # https://huggingface.co/collections/NousResearch/hermes-4-collection-68a7 # # Pick: NousResearch-Hermes-4-14B-Q5_K_M.gguf (or Q4_K_M for less RAM) # ollama create hermes4-14b -f Modelfile.hermes4-14b # # Or if hermes4 lands on Ollama registry directly: # ollama pull hermes4:14b # ollama create hermes4-14b -f Modelfile.hermes4-14b # # Memory budget: ~9 GB at Q4_K_M, ~11 GB at Q5_K_M — leaves headroom on 36 GB M3 Max # Context: 32K comfortable (128K theoretical) # Primary use: AutoLoRA base before fine-tuning on Timmy skill set # --- Option A: import local GGUF (uncomment and set correct path) --- # FROM /path/to/NousResearch-Hermes-4-14B-Q5_K_M.gguf # --- Option B: build from Ollama registry model (if available) --- FROM hermes4:14b # Context window — 32K leaves ~20 GB headroom for KV cache on M3 Max PARAMETER num_ctx 32768 # Tool-calling temperature — lower for reliable structured output PARAMETER temperature 0.3 # Nucleus sampling — balanced for reasoning + tool use PARAMETER top_p 0.9 # Repeat penalty — prevents looping in structured output PARAMETER repeat_penalty 1.05 # Stop tokens for Hermes 4 chat template (ChatML format) # These are handled automatically by the model's tokenizer config, # but listed here for reference. # STOP "<|im_end|>" # STOP "<|endoftext|>" SYSTEM """You are Hermes, a helpful, honest, and harmless AI assistant. You have access to tool calling. When you need to use a tool, output a JSON function call in the following format: {"name": "function_name", "arguments": {"param": "value"}} You support hybrid reasoning. When asked to think through a problem step-by-step, wrap your reasoning in tags before giving your final answer. Always provide structured, accurate responses."""