fix: vision auto-detection now falls back to custom/local endpoints

Vision auto-mode previously only tried OpenRouter, Nous, and Codex for multimodal — deliberately skipping custom endpoints with the assumption they 'may not handle vision input.' This caused silent failures for users running local multimodal models (Qwen-VL, LLaVA, Pixtral, etc.) without any cloud API keys. Now custom endpoints are tried as a last resort in auto mode. If the model doesn't support vision, the API call fails gracefully — but users with local vision models no longer need to manually set auxiliary.vision.provider: main in config.yaml. Reported by @Spadav and @kotyKD.
2026-03-09 15:36:19 -07:00
parent 1a2141d04d
commit ef5d811aba
2 changed files with 16 additions and 8 deletions
--- a/agent/auxiliary_client.py
+++ b/agent/auxiliary_client.py
@@ -560,12 +560,16 @@ def get_vision_auxiliary_client() -> Tuple[Optional[OpenAI], Optional[str]]:
    forced = _get_auxiliary_provider("vision")
    if forced != "auto":
        return _resolve_forced_provider(forced)
-    # Auto: only multimodal-capable providers
-    for try_fn in (_try_openrouter, _try_nous, _try_codex):
+    # Auto: try providers known to support multimodal first, then fall
+    # back to the user's custom endpoint.  Many local models (Qwen-VL,
+    # LLaVA, Pixtral, etc.) support vision — skipping them entirely
+    # caused silent failures for local-only users.
+    for try_fn in (_try_openrouter, _try_nous, _try_codex,
+                   _try_custom_endpoint):
        client, model = try_fn()
        if client is not None:
            return client, model
-    logger.debug("Auxiliary vision client: none available (auto only tries OpenRouter/Nous/Codex)")
+    logger.debug("Auxiliary vision client: none available")
    return None, None