diff --git a/website/docs/user-guide/configuration.md b/website/docs/user-guide/configuration.md
index e1698f7cb..450c9e6b0 100644
--- a/website/docs/user-guide/configuration.md
+++ b/website/docs/user-guide/configuration.md
@@ -75,7 +75,7 @@ The OpenAI Codex provider authenticates via device code (open a URL, enter a cod
 :::
 
 :::warning
-Even when using Nous Portal, Codex, or a custom endpoint, some tools (vision, web summarization, MoA) use OpenRouter independently. An `OPENROUTER_API_KEY` enables these tools.
+Even when using Nous Portal, Codex, or a custom endpoint, some tools (vision, web summarization, MoA) use a separate "auxiliary" model — by default Gemini Flash via OpenRouter. An `OPENROUTER_API_KEY` enables these tools automatically. You can also configure which model and provider these tools use — see [Auxiliary Models](#auxiliary-models) below.
 :::
 
 ### First-Class Chinese AI Providers
@@ -432,9 +432,78 @@ node_modules/
 ```yaml
 compression:
   enabled: true
-  threshold: 0.85    # Compress at 85% of context limit
+  threshold: 0.85              # Compress at 85% of context limit
+  summary_model: "google/gemini-3-flash-preview"   # Model for summarization
+  # summary_provider: "auto"   # "auto", "openrouter", "nous", "main"
 ```
 
+The `summary_model` must support a context length at least as large as your main model's, since it receives the full middle section of the conversation for compression.
+
+## Auxiliary Models
+
+Hermes uses lightweight "auxiliary" models for side tasks like image analysis, web page summarization, and browser screenshot analysis. By default, these use **Gemini Flash** via OpenRouter or Nous Portal — you don't need to configure anything.
+
+To use a different model, add an `auxiliary` section to `~/.hermes/config.yaml`:
+
+```yaml
+auxiliary:
+  # Image analysis (vision_analyze tool + browser screenshots)
+  vision:
+    provider: "auto"           # "auto", "openrouter", "nous", "main"
+    model: ""                  # e.g. "openai/gpt-4o", "google/gemini-2.5-flash"
+
+  # Web page summarization + browser page text extraction
+  web_extract:
+    provider: "auto"
+    model: ""                  # e.g. "google/gemini-2.5-flash"
+```
+
+### Changing the Vision Model
+
+To use GPT-4o instead of Gemini Flash for image analysis:
+
+```yaml
+auxiliary:
+  vision:
+    model: "openai/gpt-4o"
+```
+
+Or via environment variable (in `~/.hermes/.env`):
+
+```bash
+AUXILIARY_VISION_MODEL=openai/gpt-4o
+```
+
+### Provider Options
+
+| Provider | Description |
+|----------|-------------|
+| `"auto"` | Best available (default). Vision only tries OpenRouter + Nous Portal. |
+| `"openrouter"` | Force OpenRouter (requires `OPENROUTER_API_KEY`) |
+| `"nous"` | Force Nous Portal (requires `hermes login`) |
+| `"main"` | Use your main chat model's provider. Useful for local/self-hosted models. |
+
+:::warning
+**Vision requires a multimodal model.** In `auto` mode, only OpenRouter and Nous Portal are tried because they support image input (via Gemini). If you set `provider: "main"`, make sure your endpoint supports multimodal/vision — otherwise image analysis will fail.
+:::
+
+### Environment Variables
+
+You can also configure auxiliary models via environment variables instead of `config.yaml`:
+
+| Setting | Environment Variable |
+|---------|---------------------|
+| Vision provider | `AUXILIARY_VISION_PROVIDER` |
+| Vision model | `AUXILIARY_VISION_MODEL` |
+| Web extract provider | `AUXILIARY_WEB_EXTRACT_PROVIDER` |
+| Web extract model | `AUXILIARY_WEB_EXTRACT_MODEL` |
+| Compression provider | `CONTEXT_COMPRESSION_PROVIDER` |
+| Compression model | `CONTEXT_COMPRESSION_MODEL` |
+
+:::tip
+Run `hermes config` to see your current auxiliary model settings. Overrides only show up when they differ from the defaults.
+:::
+
 ## Reasoning Effort
 
 Control how much "thinking" the model does before responding: