feat: add 'openai' as auxiliary provider option

Users can now set provider: "openai" for auxiliary tasks (vision, web
extract, compression) to use OpenAI's API directly with their
OPENAI_API_KEY. This hits api.openai.com/v1 with gpt-4o-mini as the
default model — supports vision since GPT-4o handles image input.

Provider options are now: auto, openrouter, nous, openai, main.

Changes:
- agent/auxiliary_client.py: added _try_openai(), "openai" case in
  _resolve_forced_provider(), updated auxiliary_max_tokens_param()
  to use max_completion_tokens for OpenAI
- Updated docs: cli-config.yaml.example, AGENTS.md, and user-facing
  configuration.md with Common Setups section showing OpenAI,
  OpenRouter, and local model examples
- 3 new tests for OpenAI provider resolution

Tests: 2459 passed (was 2429).
This commit is contained in:
teknium1
2026-03-08 18:25:30 -07:00
parent 169615abc8
commit ae4a674c84
5 changed files with 89 additions and 9 deletions

View File

@@ -744,9 +744,10 @@ compression:
| `"auto"` | Best available (default). For vision, only tries OpenRouter + Nous. |
| `"openrouter"` | Force OpenRouter (requires `OPENROUTER_API_KEY`) |
| `"nous"` | Force Nous Portal (requires `hermes login`) |
| `"openai"` | Force OpenAI direct API at `api.openai.com` (requires `OPENAI_API_KEY`). Supports vision via GPT-4o. |
| `"main"` | Use the same provider as your main chat model. Skips OpenRouter/Nous. Useful for local models. |
**Important:** Vision tasks require a multimodal-capable model. In `auto` mode, only OpenRouter and Nous Portal are tried (they route to Gemini, which supports images). Setting `provider: "main"` for vision will work only if your main endpoint supports multimodal input.
**Important:** Vision tasks require a multimodal-capable model. In `auto` mode, only OpenRouter and Nous Portal are tried (they route to Gemini, which supports images). The `"openai"` provider also works for vision since GPT-4o supports image input. Setting `provider: "main"` for vision will work only if your main endpoint supports multimodal input.
**Key files:** `agent/auxiliary_client.py` (resolution chain), `tools/vision_tools.py`, `tools/browser_tool.py`, `tools/web_tools.py`