feat: add 'openai' as auxiliary provider option

Users can now set provider: "openai" for auxiliary tasks (vision, web extract, compression) to use OpenAI's API directly with their OPENAI_API_KEY. This hits api.openai.com/v1 with gpt-4o-mini as the default model — supports vision since GPT-4o handles image input. Provider options are now: auto, openrouter, nous, openai, main. Changes: - agent/auxiliary_client.py: added _try_openai(), "openai" case in _resolve_forced_provider(), updated auxiliary_max_tokens_param() to use max_completion_tokens for OpenAI - Updated docs: cli-config.yaml.example, AGENTS.md, and user-facing configuration.md with Common Setups section showing OpenAI, OpenRouter, and local model examples - 3 new tests for OpenAI provider resolution Tests: 2459 passed (was 2429).
2026-03-08 18:25:30 -07:00
parent 169615abc8
commit ae4a674c84
5 changed files with 89 additions and 9 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -744,9 +744,10 @@ compression:
 | `"auto"` | Best available (default). For vision, only tries OpenRouter + Nous. |
 | `"openrouter"` | Force OpenRouter (requires `OPENROUTER_API_KEY`) |
 | `"nous"` | Force Nous Portal (requires `hermes login`) |
+| `"openai"` | Force OpenAI direct API at `api.openai.com` (requires `OPENAI_API_KEY`). Supports vision via GPT-4o. |
 | `"main"` | Use the same provider as your main chat model. Skips OpenRouter/Nous. Useful for local models. |

-**Important:** Vision tasks require a multimodal-capable model. In `auto` mode, only OpenRouter and Nous Portal are tried (they route to Gemini, which supports images). Setting `provider: "main"` for vision will work only if your main endpoint supports multimodal input.
+**Important:** Vision tasks require a multimodal-capable model. In `auto` mode, only OpenRouter and Nous Portal are tried (they route to Gemini, which supports images). The `"openai"` provider also works for vision since GPT-4o supports image input. Setting `provider: "main"` for vision will work only if your main endpoint supports multimodal input.

 **Key files:** `agent/auxiliary_client.py` (resolution chain), `tools/vision_tools.py`, `tools/browser_tool.py`, `tools/web_tools.py`

--- a/agent/auxiliary_client.py
+++ b/agent/auxiliary_client.py
@@ -21,7 +21,7 @@ Resolution order for vision/multimodal tasks (auto mode):

 Per-task provider overrides (e.g. AUXILIARY_VISION_PROVIDER,
 CONTEXT_COMPRESSION_PROVIDER) can force a specific provider for each task:
-"openrouter", "nous", or "main" (= steps 3-5).
+"openrouter", "nous", "openai", or "main" (= steps 3-5).
 Default "auto" follows the chains above.

 Per-task model overrides (e.g. AUXILIARY_VISION_MODEL,
@@ -71,6 +71,11 @@ _NOUS_MODEL = "gemini-3-flash"
 _NOUS_DEFAULT_BASE_URL = "https://inference-api.nousresearch.com/v1"
 _AUTH_JSON_PATH = Path.home() / ".hermes" / "auth.json"

+# OpenAI direct: uses OPENAI_API_KEY with the official API endpoint.
+# gpt-4o-mini is cheap/fast and supports vision — good default for auxiliary tasks.
+_OPENAI_AUX_MODEL = "gpt-4o-mini"
+_OPENAI_BASE_URL = "https://api.openai.com/v1"
+
 # Codex fallback: uses the Responses API (the only endpoint the Codex
 # OAuth token can access) with a fast model for auxiliary tasks.
 _CODEX_AUX_MODEL = "gpt-5.3-codex"
@@ -385,6 +390,15 @@ def _try_nous() -> Tuple[Optional[OpenAI], Optional[str]]:
    )


+def _try_openai() -> Tuple[Optional[OpenAI], Optional[str]]:
+    """Try OpenAI direct API (api.openai.com) using OPENAI_API_KEY."""
+    api_key = os.getenv("OPENAI_API_KEY", "").strip()
+    if not api_key:
+        return None, None
+    logger.debug("Auxiliary client: OpenAI direct (%s)", _OPENAI_AUX_MODEL)
+    return OpenAI(api_key=api_key, base_url=_OPENAI_BASE_URL), _OPENAI_AUX_MODEL
+
+
 def _try_custom_endpoint() -> Tuple[Optional[OpenAI], Optional[str]]:
    custom_base = os.getenv("OPENAI_BASE_URL")
    custom_key = os.getenv("OPENAI_API_KEY")
@@ -418,6 +432,12 @@ def _resolve_forced_provider(forced: str) -> Tuple[Optional[OpenAI], Optional[st
            logger.warning("auxiliary.provider=nous but Nous Portal not configured (run: hermes login)")
        return client, model

+    if forced == "openai":
+        client, model = _try_openai()
+        if client is None:
+            logger.warning("auxiliary.provider=openai but OPENAI_API_KEY not set")
+        return client, model
+
    if forced == "main":
        # "main" = skip OpenRouter/Nous, use the main chat model's credentials.
        for try_fn in (_try_custom_endpoint, _try_codex, _resolve_api_key_provider):
@@ -530,6 +550,10 @@ def auxiliary_max_tokens_param(value: int) -> dict:
    The Codex adapter translates max_tokens internally, so we use max_tokens
    for it as well.
    """
+    # Check if any auxiliary task is explicitly forced to "openai"
+    for task in ("vision", "web_extract", "compression"):
+        if _get_auxiliary_provider(task) == "openai":
+            return {"max_completion_tokens": value}
    custom_base = os.getenv("OPENAI_BASE_URL", "")
    or_key = os.getenv("OPENROUTER_API_KEY")
    # Only use max_completion_tokens for direct OpenAI custom endpoints
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@@ -241,6 +241,9 @@ compression:
 #   "auto"       - Best available: OpenRouter → Nous Portal → main endpoint (default)
 #   "openrouter" - Force OpenRouter (requires OPENROUTER_API_KEY)
 #   "nous"       - Force Nous Portal (requires: hermes login)
+#   "openai"     - Force OpenAI direct API (requires OPENAI_API_KEY).
+#                  Uses api.openai.com/v1 with models like gpt-4o, gpt-4o-mini.
+#                  Great for vision since GPT-4o supports image input.
 #   "main"       - Use the same provider & credentials as your main chat model.
 #                  Skips OpenRouter/Nous and uses your custom endpoint
 #                  (OPENAI_BASE_URL), Codex OAuth, or API-key provider directly.
--- a/tests/agent/test_auxiliary_client.py
+++ b/tests/agent/test_auxiliary_client.py
@@ -218,6 +218,15 @@ class TestVisionClientFallback:
        assert client is None
        assert model is None

+    def test_vision_forced_openai(self, monkeypatch):
+        """When forced to 'openai', vision uses OpenAI direct API."""
+        monkeypatch.setenv("AUXILIARY_VISION_PROVIDER", "openai")
+        monkeypatch.setenv("OPENAI_API_KEY", "sk-test")
+        with patch("agent.auxiliary_client.OpenAI") as mock_openai:
+            client, model = get_vision_auxiliary_client()
+        assert client is not None
+        assert model == "gpt-4o-mini"
+

 class TestGetAuxiliaryProvider:
    """Tests for _get_auxiliary_provider env var resolution."""
@@ -313,6 +322,22 @@ class TestResolveForcedProvider:
        assert isinstance(client, CodexAuxiliaryClient)
        assert model == "gpt-5.3-codex"

+    def test_forced_openai_with_key(self, monkeypatch):
+        monkeypatch.setenv("OPENAI_API_KEY", "sk-test-key")
+        with patch("agent.auxiliary_client.OpenAI") as mock_openai:
+            client, model = _resolve_forced_provider("openai")
+        assert model == "gpt-4o-mini"
+        assert client is not None
+        call_kwargs = mock_openai.call_args
+        assert call_kwargs.kwargs["base_url"] == "https://api.openai.com/v1"
+        assert call_kwargs.kwargs["api_key"] == "sk-test-key"
+
+    def test_forced_openai_no_key(self, monkeypatch):
+        monkeypatch.delenv("OPENAI_API_KEY", raising=False)
+        client, model = _resolve_forced_provider("openai")
+        assert client is None
+        assert model is None
+
    def test_forced_unknown_returns_none(self, monkeypatch):
        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
             patch("agent.auxiliary_client._read_codex_access_token", return_value=None):
--- a/website/docs/user-guide/configuration.md
+++ b/website/docs/user-guide/configuration.md
@@ -476,15 +476,42 @@ AUXILIARY_VISION_MODEL=openai/gpt-4o

 ### Provider Options

-| Provider | Description |
-|----------|-------------|
-| `"auto"` | Best available (default). Vision only tries OpenRouter + Nous Portal. |
-| `"openrouter"` | Force OpenRouter (requires `OPENROUTER_API_KEY`) |
-| `"nous"` | Force Nous Portal (requires `hermes login`) |
-| `"main"` | Use your main chat model's provider. Useful for local/self-hosted models. |
+| Provider | Description | Requirements |
+|----------|-------------|-------------|
+| `"auto"` | Best available (default). Vision only tries OpenRouter + Nous Portal. | — |
+| `"openrouter"` | Force OpenRouter — routes to any model (Gemini, GPT-4o, Claude, etc.) | `OPENROUTER_API_KEY` |
+| `"nous"` | Force Nous Portal | `hermes login` |
+| `"openai"` | Force OpenAI direct API (`api.openai.com`). Supports vision (GPT-4o). | `OPENAI_API_KEY` |
+| `"main"` | Use your main chat model's provider. For local/self-hosted models. | Depends on your setup |
+
+### Common Setups
+
+**Using OpenAI for vision** (if you have an OpenAI API key):
+```yaml
+auxiliary:
+  vision:
+    provider: "openai"
+    model: "gpt-4o"       # or "gpt-4o-mini" for cheaper
+```
+
+**Using OpenRouter for vision** (route to any model):
+```yaml
+auxiliary:
+  vision:
+    provider: "openrouter"
+    model: "openai/gpt-4o"      # or "google/gemini-2.5-flash", etc.
+```
+
+**Using a local/self-hosted model:**
+```yaml
+auxiliary:
+  vision:
+    provider: "main"      # uses your OPENAI_BASE_URL endpoint
+    model: "my-local-model"
+```

 :::warning
-**Vision requires a multimodal model.** In `auto` mode, only OpenRouter and Nous Portal are tried because they support image input (via Gemini). If you set `provider: "main"`, make sure your endpoint supports multimodal/vision — otherwise image analysis will fail.
+**Vision requires a multimodal model.** In `auto` mode, only OpenRouter and Nous Portal are tried (they route to Gemini, which supports images). If you set `provider: "main"`, make sure your endpoint supports multimodal/vision — otherwise image analysis will fail. The `"openai"` provider works for vision since GPT-4o supports image input.
 :::

 ### Environment Variables