From 773d3bb4dfe6e594986db8fa0208446b93cb8af8 Mon Sep 17 00:00:00 2001
From: Teknium <127238744+teknium1@users.noreply.github.com>
Date: Tue, 24 Mar 2026 07:19:26 -0700
Subject: [PATCH] docs: update all docs for /model command overhaul and custom
 provider support

Documents the full /model command overhaul across 6 files:

AGENTS.md:
- Add model_switch.py to project structure tree

configuration.md:
- Rewrite General Setup with 3 config methods (interactive, config.yaml, env vars)
- Add new 'Switching Models with /model' section documenting all syntax variants
- Add 'Named Custom Providers' section with config.yaml examples and
  custom:name:model triple syntax

slash-commands.md:
- Update /model descriptions in both CLI and messaging tables with
  full syntax examples (provider:model, custom:model, custom:name:model,
  bare custom auto-detect)

cli-commands.md:
- Add /model slash command subsection under hermes model with syntax table
- Add custom endpoint config to hermes model use cases

faq.md:
- Add config.yaml example for offline/local model setup
- Note that provider: custom is a first-class provider
- Document /model custom auto-detect

provider-runtime.md:
- Add model_switch.py to implementation file list
- Update provider families to show Custom as first-class with named variants
---
 AGENTS.md                                     |  1 +
 .../docs/developer-guide/provider-runtime.md  | 10 ++-
 website/docs/reference/cli-commands.md        | 17 +++++
 website/docs/reference/faq.md                 | 11 ++-
 website/docs/reference/slash-commands.md      |  4 +-
 website/docs/user-guide/configuration.md      | 74 +++++++++++++++++--
 6 files changed, 105 insertions(+), 12 deletions(-)

diff --git a/AGENTS.md b/AGENTS.md
index fa733bc00..a25393ad9 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -38,6 +38,7 @@ hermes-agent/
 │   ├── tools_config.py   # `hermes tools` — enable/disable tools per platform
 │   ├── skills_hub.py     # `/skills` slash command (search, browse, install)
 │   ├── models.py         # Model catalog, provider model lists
+│   ├── model_switch.py   # Shared /model switch pipeline (CLI + gateway)
 │   └── auth.py           # Provider credential resolution
 ├── tools/                # Tool implementations (one file per tool)
 │   ├── registry.py       # Central tool registry (schemas, handlers, dispatch)
diff --git a/website/docs/developer-guide/provider-runtime.md b/website/docs/developer-guide/provider-runtime.md
index faa84d5f6..007729595 100644
--- a/website/docs/developer-guide/provider-runtime.md
+++ b/website/docs/developer-guide/provider-runtime.md
@@ -16,9 +16,10 @@ Hermes has a shared provider runtime resolver used across:
 
 Primary implementation:
 
-- `hermes_cli/runtime_provider.py`
-- `hermes_cli/auth.py`
-- `agent/auxiliary_client.py`
+- `hermes_cli/runtime_provider.py` — credential resolution, `_resolve_custom_runtime()`
+- `hermes_cli/auth.py` — provider registry, `resolve_provider()`
+- `hermes_cli/model_switch.py` — shared `/model` switch pipeline (CLI + gateway)
+- `agent/auxiliary_client.py` — auxiliary model routing
 
 If you are trying to add a new first-class inference provider, read [Adding Providers](./adding-providers.md) alongside this page.
 
@@ -46,7 +47,8 @@ Current provider families include:
 - Kimi / Moonshot
 - MiniMax
 - MiniMax China
-- custom OpenAI-compatible endpoints
+- Custom (`provider: custom`) — first-class provider for any OpenAI-compatible endpoint
+- Named custom providers (`custom_providers` list in config.yaml)
 
 ## Output of runtime resolution
 
diff --git a/website/docs/reference/cli-commands.md b/website/docs/reference/cli-commands.md
index b76859081..db8a0d314 100644
--- a/website/docs/reference/cli-commands.md
+++ b/website/docs/reference/cli-commands.md
@@ -98,8 +98,25 @@ Use this when you want to:
 - switch default providers
 - log into OAuth-backed providers during model selection
 - pick from provider-specific model lists
+- configure a custom/self-hosted endpoint
 - save the new default into config
 
+### `/model` slash command (mid-session)
+
+Switch models without leaving a session:
+
+```
+/model                              # Show current model and available options
+/model claude-sonnet-4              # Switch model (auto-detects provider)
+/model zai:glm-5                    # Switch provider and model
+/model custom:qwen-2.5              # Use model on your custom endpoint
+/model custom                       # Auto-detect model from custom endpoint
+/model custom:local:qwen-2.5        # Use a named custom provider
+/model openrouter:anthropic/claude-sonnet-4  # Switch back to cloud
+```
+
+Provider and base URL changes are persisted to `config.yaml` automatically. When switching away from a custom endpoint, the stale base URL is cleared to prevent it leaking into other providers.
+
 ## `hermes gateway`
 
 ```bash
diff --git a/website/docs/reference/faq.md b/website/docs/reference/faq.md
index 97051fcee..5e8326ff9 100644
--- a/website/docs/reference/faq.md
+++ b/website/docs/reference/faq.md
@@ -53,7 +53,16 @@ hermes model
 # Context length: 32768   ← set this to match your server's actual context window
 ```
 
-Hermes persists the endpoint in `config.yaml` and prompts for the context window size so compression triggers at the right time. If you leave context length blank, Hermes auto-detects it from the server's `/models` endpoint or [models.dev](https://models.dev).
+Or configure it directly in `config.yaml`:
+
+```yaml
+model:
+  default: qwen3.5:27b
+  provider: custom
+  base_url: http://localhost:11434/v1
+```
+
+Hermes persists the endpoint, provider, and base URL in `config.yaml` so it survives restarts. If your local server has exactly one model loaded, `/model custom` auto-detects it. You can also set `provider: custom` in config.yaml — it's a first-class provider, not an alias for anything else.
 
 This works with Ollama, vLLM, llama.cpp server, SGLang, LocalAI, and others. See the [Configuration guide](../user-guide/configuration.md) for details.
 
diff --git a/website/docs/reference/slash-commands.md b/website/docs/reference/slash-commands.md
index 0ccf116fc..9c9b42cbe 100644
--- a/website/docs/reference/slash-commands.md
+++ b/website/docs/reference/slash-commands.md
@@ -40,7 +40,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
 | Command | Description |
 |---------|-------------|
 | `/config` | Show current configuration |
-| `/model` | Show or change the current model |
+| `/model [model-name]` | Show or change the current model. Supports: `/model claude-sonnet-4`, `/model provider:model` (switch providers), `/model custom:model` (custom endpoint), `/model custom:name:model` (named custom provider), `/model custom` (auto-detect from endpoint) |
 | `/provider` | Show available providers and current provider |
 | `/prompt` | View/set custom system prompt |
 | `/personality` | Set a predefined personality |
@@ -98,7 +98,7 @@ The messaging gateway supports the following built-in commands inside Telegram,
 | `/reset` | Reset conversation history. |
 | `/status` | Show session info. |
 | `/stop` | Kill all running background processes and interrupt the running agent. |
-| `/model [provider:model]` | Show or change the model, including provider switches. |
+| `/model [provider:model]` | Show or change the model. Supports provider switches (`/model zai:glm-5`), custom endpoints (`/model custom:model`), named custom providers (`/model custom:local:qwen`), and auto-detect (`/model custom`). |
 | `/provider` | Show provider availability and auth status. |
 | `/personality [name]` | Set a personality overlay for the session. |
 | `/retry` | Retry the last message. |
diff --git a/website/docs/user-guide/configuration.md b/website/docs/user-guide/configuration.md
index 10cb92b48..8e97cf99f 100644
--- a/website/docs/user-guide/configuration.md
+++ b/website/docs/user-guide/configuration.md
@@ -214,24 +214,57 @@ Hermes Agent works with **any OpenAI-compatible API endpoint**. If a server impl
 
 ### General Setup
 
-Two ways to configure a custom endpoint:
+Three ways to configure a custom endpoint:
 
-**Interactive (recommended):**
+**Interactive setup (recommended):**
 ```bash
 hermes model
 # Select "Custom endpoint (self-hosted / VLLM / etc.)"
 # Enter: API base URL, API key, Model name
 ```
 
-**Manual (`.env` file):**
+**Manual config (`config.yaml`):**
+```yaml
+# In ~/.hermes/config.yaml
+model:
+  default: your-model-name
+  provider: custom
+  base_url: http://localhost:8000/v1
+  api_key: your-key-or-leave-empty-for-local
+```
+
+**Environment variables (`.env` file):**
 ```bash
 # Add to ~/.hermes/.env
 OPENAI_BASE_URL=http://localhost:8000/v1
-OPENAI_API_KEY=***
+OPENAI_API_KEY=your-key     # Any non-empty string for local servers
 LLM_MODEL=your-model-name
 ```
 
-`hermes model` and the manual `.env` approach end up in the same runtime path. If you save a custom endpoint through `hermes model`, Hermes persists the provider + base URL in `config.yaml` so later sessions keep using that endpoint even if `OPENAI_BASE_URL` is not exported in your current shell.
+All three approaches end up in the same runtime path. `hermes model` persists provider, model, and base URL to `config.yaml` so later sessions keep using that endpoint even if env vars are not set.
+
+### Switching Models with `/model`
+
+Once a custom endpoint is configured, you can switch models mid-session:
+
+```
+/model custom:qwen-2.5          # Switch to a model on your custom endpoint
+/model custom                    # Auto-detect the model from the endpoint
+/model openrouter:claude-sonnet-4 # Switch back to a cloud provider
+```
+
+If you have **named custom providers** configured (see below), use the triple syntax:
+
+```
+/model custom:local:qwen-2.5    # Use the "local" custom provider with model qwen-2.5
+/model custom:work:llama3       # Use the "work" custom provider with llama3
+```
+
+When switching providers, Hermes persists the base URL and provider to config so the change survives restarts. When switching away from a custom endpoint to a built-in provider, the stale base URL is automatically cleared.
+
+:::tip
+`/model custom` (bare, no model name) queries your endpoint's `/models` API and auto-selects the model if exactly one is loaded. Useful for local servers running a single model.
+:::
 
 Everything below follows this same pattern — just change the URL, key, and model name.
 
@@ -462,6 +495,37 @@ custom_providers:
 
 ---
 
+### Named Custom Providers
+
+If you work with multiple custom endpoints (e.g., a local dev server and a remote GPU server), you can define them as named custom providers in `config.yaml`:
+
+```yaml
+custom_providers:
+  - name: local
+    base_url: http://localhost:8080/v1
+    # api_key omitted — Hermes uses "no-key-required" for keyless local servers
+  - name: work
+    base_url: https://gpu-server.internal.corp/v1
+    api_key: corp-api-key
+    api_mode: chat_completions   # optional, auto-detected from URL
+  - name: anthropic-proxy
+    base_url: https://proxy.example.com/anthropic
+    api_key: proxy-key
+    api_mode: anthropic_messages  # for Anthropic-compatible proxies
+```
+
+Switch between them mid-session with the triple syntax:
+
+```
+/model custom:local:qwen-2.5       # Use the "local" endpoint with qwen-2.5
+/model custom:work:llama3-70b      # Use the "work" endpoint with llama3-70b
+/model custom:anthropic-proxy:claude-sonnet-4  # Use the proxy
+```
+
+You can also select named custom providers from the interactive `hermes model` menu.
+
+---
+
 ### Choosing the Right Setup
 
 | Use Case | Recommended |