feat: Wire Gemma 4 vision into browser_tool for screenshot analysis
All checks were successful
Lint / lint (pull_request) Successful in 8s

Default browser_vision screenshots to google/gemma-4-27b-it (Gemma 4
native multimodal) for reduced latency and unified text+vision model.

Resolution order for _get_vision_model():
1. BROWSER_VISION_MODEL env var (new, browser-specific override)
2. auxiliary.browser_vision.model in config.yaml (new config key)
3. AUXILIARY_VISION_MODEL env var (existing global vision override)
4. Default: google/gemma-4-27b-it

Backward compatibility: existing AUXILIARY_VISION_MODEL users are
unaffected — their override still flows through to browser_vision.

Also documents the new auxiliary.browser_vision config section in
cli-config.yaml.example and adds 14 unit tests covering the full
priority chain.

Fixes #816

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
Alexander Whitestone
2026-04-21 17:12:58 -04:00
parent 12b5d9a7fd
commit 95bb842a21
3 changed files with 34 additions and 2 deletions

View File

@@ -360,6 +360,7 @@ compression:
# # Defaults to Gemma 4 27B — natively multimodal, same model family as the main
# # text model, which avoids model-switching overhead and improves context continuity.
# # Override with any vision-capable model. Set to "" to fall back to auto-detection.
# # Can also be overridden per-session with BROWSER_VISION_MODEL env var.
# browser_vision:
# model: "google/gemma-4-27b-it" # default; override e.g. "google/gemini-2.5-flash"
#