Files
Timmy-time-dashboard/Modelfile.hermes4-14b
Alexander Whitestone 895d6c6ba3
Some checks failed
Tests / lint (pull_request) Successful in 16s
Tests / test (pull_request) Failing after 25m38s
feat: add Hermes 4 14B Modelfile, providers config, and smoke test suite
Sets up the AutoLoRA base model infrastructure for Project Bannerlord.
No live model download (requires manual GGUF import into Ollama), but
provides all scaffolding needed to onboard and validate the model.

Changes:
- Modelfile.hermes4-14b: Ollama Modelfile for NousResearch Hermes 4 14B
  (Q5_K_M, 32K context, temp=0.3, tool-calling + reasoning configured)
- config/providers.yaml: register hermes4-14b and hermes4-36b as model
  entries on ollama-local; hermes4-14b promoted to top of tools fallback
  chain (native tool calling + structured JSON)
- scripts/test_hermes4.py: 5-test smoke suite (availability, basic
  response, memory < 28 GB, tool calling, Timmy-persona acceptance)

Refs #1101
2026-03-23 13:58:40 -04:00

56 lines
2.1 KiB
Plaintext

# Modelfile.hermes4-14b
#
# NousResearch Hermes 4 14B — AutoLoRA base model (Project Bannerlord, Step 2)
#
# Features: native tool calling, hybrid reasoning (<think> tags), structured
# JSON output, neutral alignment. Built to serve as the LoRA fine-tuning base.
#
# Build:
# # Download GGUF from HuggingFace first:
# # https://huggingface.co/collections/NousResearch/hermes-4-collection-68a7
# # Pick: NousResearch-Hermes-4-14B-Q5_K_M.gguf (or Q4_K_M for less RAM)
# ollama create hermes4-14b -f Modelfile.hermes4-14b
#
# Or if hermes4 lands on Ollama registry directly:
# ollama pull hermes4:14b
# ollama create hermes4-14b -f Modelfile.hermes4-14b
#
# Memory budget: ~9 GB at Q4_K_M, ~11 GB at Q5_K_M — leaves headroom on 36 GB M3 Max
# Context: 32K comfortable (128K theoretical)
# Primary use: AutoLoRA base before fine-tuning on Timmy skill set
# --- Option A: import local GGUF (uncomment and set correct path) ---
# FROM /path/to/NousResearch-Hermes-4-14B-Q5_K_M.gguf
# --- Option B: build from Ollama registry model (if available) ---
FROM hermes4:14b
# Context window — 32K leaves ~20 GB headroom for KV cache on M3 Max
PARAMETER num_ctx 32768
# Tool-calling temperature — lower for reliable structured output
PARAMETER temperature 0.3
# Nucleus sampling — balanced for reasoning + tool use
PARAMETER top_p 0.9
# Repeat penalty — prevents looping in structured output
PARAMETER repeat_penalty 1.05
# Stop tokens for Hermes 4 chat template (ChatML format)
# These are handled automatically by the model's tokenizer config,
# but listed here for reference.
# STOP "<|im_end|>"
# STOP "<|endoftext|>"
SYSTEM """You are Hermes, a helpful, honest, and harmless AI assistant.
You have access to tool calling. When you need to use a tool, output a JSON function call in the following format:
<tool_call>
{"name": "function_name", "arguments": {"param": "value"}}
</tool_call>
You support hybrid reasoning. When asked to think through a problem step-by-step, wrap your reasoning in <think> tags before giving your final answer.
Always provide structured, accurate responses."""