feat: Wire Gemma 4 vision into browser_tool for screenshot analysis

Default browser_vision screenshots to google/gemma-4-27b-it (Gemma 4 native multimodal) for reduced latency and unified text+vision model. Resolution order for _get_vision_model(): 1. BROWSER_VISION_MODEL env var (new, browser-specific override) 2. auxiliary.browser_vision.model in config.yaml (new config key) 3. AUXILIARY_VISION_MODEL env var (existing global vision override) 4. Default: google/gemma-4-27b-it Backward compatibility: existing AUXILIARY_VISION_MODEL users are unaffected — their override still flows through to browser_vision. Also documents the new auxiliary.browser_vision config section in cli-config.yaml.example and adds 14 unit tests covering the full priority chain. Fixes #816 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 17:12:58 -04:00
parent 12b5d9a7fd
commit 95bb842a21
3 changed files with 34 additions and 2 deletions
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@@ -360,6 +360,7 @@ compression:
 #   # Defaults to Gemma 4 27B — natively multimodal, same model family as the main
 #   # text model, which avoids model-switching overhead and improves context continuity.
 #   # Override with any vision-capable model.  Set to "" to fall back to auto-detection.
+#   # Can also be overridden per-session with BROWSER_VISION_MODEL env var.
 #   browser_vision:
 #     model: "google/gemma-4-27b-it"  # default; override e.g. "google/gemini-2.5-flash"
 #