feat: wire Gemma 4 vision into browser_tool for screenshot analysis
All checks were successful
Lint / lint (pull_request) Successful in 19s
All checks were successful
Lint / lint (pull_request) Successful in 19s
Default browser screenshot analysis now uses Gemma 4 27B (google/gemma-4-27b-it) instead of deferring to the auxiliary router's auto-detection. Gemma 4 is natively multimodal — the same model family already in use for text tasks — which avoids cold-start model-switching overhead and improves context continuity. Resolution order for _get_vision_model(): 1. BROWSER_VISION_MODEL env var (browser-specific override) 2. auxiliary.browser_vision.model in config.yaml 3. AUXILIARY_VISION_MODEL env var (shared/legacy override) 4. google/gemma-4-27b-it (new default) - Add _BROWSER_VISION_DEFAULT_MODEL constant to browser_tool.py - Document auxiliary.browser_vision config key in cli-config.yaml.example - Add 10 unit tests covering all resolution steps Fixes #816 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -348,7 +348,7 @@ compression:
|
||||
# Other providers pick a sensible default automatically.
|
||||
#
|
||||
# auxiliary:
|
||||
# # Image analysis: vision_analyze tool + browser screenshots
|
||||
# # Image analysis: vision_analyze tool
|
||||
# vision:
|
||||
# provider: "auto"
|
||||
# model: "" # e.g. "google/gemini-2.5-flash", "openai/gpt-4o"
|
||||
@@ -356,6 +356,13 @@ compression:
|
||||
# download_timeout: 30 # Image HTTP download timeout (seconds)
|
||||
# # Increase for slow connections or self-hosted image servers
|
||||
#
|
||||
# # Browser screenshot analysis (browser_vision tool)
|
||||
# # Defaults to Gemma 4 27B — natively multimodal, same model family as the main
|
||||
# # text model, which avoids model-switching overhead and improves context continuity.
|
||||
# # Override with any vision-capable model. Set to "" to fall back to auto-detection.
|
||||
# browser_vision:
|
||||
# model: "google/gemma-4-27b-it" # default; override e.g. "google/gemini-2.5-flash"
|
||||
#
|
||||
# # Web page scraping / summarization + browser page text extraction
|
||||
# web_extract:
|
||||
# provider: "auto"
|
||||
|
||||
Reference in New Issue
Block a user