forked from Rockachopa/Timmy-time-dashboard
56 lines
2.1 KiB
Plaintext
56 lines
2.1 KiB
Plaintext
# Modelfile.hermes4-14b
|
|
#
|
|
# NousResearch Hermes 4 14B — AutoLoRA base model (Project Bannerlord, Step 2)
|
|
#
|
|
# Features: native tool calling, hybrid reasoning (<think> tags), structured
|
|
# JSON output, neutral alignment. Built to serve as the LoRA fine-tuning base.
|
|
#
|
|
# Build:
|
|
# # Download GGUF from HuggingFace first:
|
|
# # https://huggingface.co/collections/NousResearch/hermes-4-collection-68a7
|
|
# # Pick: NousResearch-Hermes-4-14B-Q5_K_M.gguf (or Q4_K_M for less RAM)
|
|
# ollama create hermes4-14b -f Modelfile.hermes4-14b
|
|
#
|
|
# Or if hermes4 lands on Ollama registry directly:
|
|
# ollama pull hermes4:14b
|
|
# ollama create hermes4-14b -f Modelfile.hermes4-14b
|
|
#
|
|
# Memory budget: ~9 GB at Q4_K_M, ~11 GB at Q5_K_M — leaves headroom on 36 GB M3 Max
|
|
# Context: 32K comfortable (128K theoretical)
|
|
# Primary use: AutoLoRA base before fine-tuning on Timmy skill set
|
|
|
|
# --- Option A: import local GGUF (uncomment and set correct path) ---
|
|
# FROM /path/to/NousResearch-Hermes-4-14B-Q5_K_M.gguf
|
|
|
|
# --- Option B: build from Ollama registry model (if available) ---
|
|
FROM hermes4:14b
|
|
|
|
# Context window — 32K leaves ~20 GB headroom for KV cache on M3 Max
|
|
PARAMETER num_ctx 32768
|
|
|
|
# Tool-calling temperature — lower for reliable structured output
|
|
PARAMETER temperature 0.3
|
|
|
|
# Nucleus sampling — balanced for reasoning + tool use
|
|
PARAMETER top_p 0.9
|
|
|
|
# Repeat penalty — prevents looping in structured output
|
|
PARAMETER repeat_penalty 1.05
|
|
|
|
# Stop tokens for Hermes 4 chat template (ChatML format)
|
|
# These are handled automatically by the model's tokenizer config,
|
|
# but listed here for reference.
|
|
# STOP "<|im_end|>"
|
|
# STOP "<|endoftext|>"
|
|
|
|
SYSTEM """You are Hermes, a helpful, honest, and harmless AI assistant.
|
|
|
|
You have access to tool calling. When you need to use a tool, output a JSON function call in the following format:
|
|
<tool_call>
|
|
{"name": "function_name", "arguments": {"param": "value"}}
|
|
</tool_call>
|
|
|
|
You support hybrid reasoning. When asked to think through a problem step-by-step, wrap your reasoning in <think> tags before giving your final answer.
|
|
|
|
Always provide structured, accurate responses."""
|