Commit Graph

4 Commits

Author SHA1 Message Date
Alexander Whitestone
671283389c feat: Wire Gemma 4 vision into browser_tool for screenshot analysis
All checks were successful
Lint / lint (pull_request) Successful in 8s
_get_vision_model() now resolves via a layered priority chain:
  1. BROWSER_VISION_MODEL env var (browser-specific override)
  2. config.yaml browser.vision_model
  3. AUXILIARY_VISION_MODEL env var (backward-compat shared override)
  4. google/gemma-4-27b-it — Gemma 4 native multimodal default

Add browser.vision_model config key to hermes_cli/config.py defaults
with inline documentation.

call_kwargs["model"] is now always set (model is never None), and a
debug log line records which model is in use for each screenshot.

Fixes #816

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 20:51:04 -04:00
Alexander Whitestone
da9c4cf10c feat: wire Gemma 4 vision into browser_tool for screenshot analysis
All checks were successful
Lint / lint (pull_request) Successful in 7s
Extends `_get_vision_model()` with a 5-level resolution chain:
1. `BROWSER_VISION_MODEL` env var — browser-specific override
2. `auxiliary.browser.vision_model` config key — per-install default
3. `AUXILIARY_VISION_MODEL` env var — backward-compat shared override
4. Auto-select `gemma-4-27b-it` when the main provider is Gemini/Google
5. `None` — fall through to `call_llm` vision router

Adds `_BROWSER_VISION_DEFAULT_MODEL = "gemma-4-27b-it"` constant and
registers `gemma-4-27b-it` in the Gemini provider model catalog.

16 new tests in `tests/tools/test_browser_vision_model.py` cover each
priority level, edge cases (empty env, config exceptions, wrong provider).

Fixes #816

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 18:18:30 -04:00
Alexander Whitestone
95bb842a21 feat: Wire Gemma 4 vision into browser_tool for screenshot analysis
All checks were successful
Lint / lint (pull_request) Successful in 8s
Default browser_vision screenshots to google/gemma-4-27b-it (Gemma 4
native multimodal) for reduced latency and unified text+vision model.

Resolution order for _get_vision_model():
1. BROWSER_VISION_MODEL env var (new, browser-specific override)
2. auxiliary.browser_vision.model in config.yaml (new config key)
3. AUXILIARY_VISION_MODEL env var (existing global vision override)
4. Default: google/gemma-4-27b-it

Backward compatibility: existing AUXILIARY_VISION_MODEL users are
unaffected — their override still flows through to browser_vision.

Also documents the new auxiliary.browser_vision config section in
cli-config.yaml.example and adds 14 unit tests covering the full
priority chain.

Fixes #816

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 17:14:32 -04:00
Alexander Whitestone
b6398b8b0d feat: wire Gemma 4 vision into browser_tool for screenshot analysis
All checks were successful
Lint / lint (pull_request) Successful in 19s
Default browser screenshot analysis now uses Gemma 4 27B
(google/gemma-4-27b-it) instead of deferring to the auxiliary router's
auto-detection.  Gemma 4 is natively multimodal — the same model family
already in use for text tasks — which avoids cold-start model-switching
overhead and improves context continuity.

Resolution order for _get_vision_model():
  1. BROWSER_VISION_MODEL env var (browser-specific override)
  2. auxiliary.browser_vision.model in config.yaml
  3. AUXILIARY_VISION_MODEL env var (shared/legacy override)
  4. google/gemma-4-27b-it (new default)

- Add _BROWSER_VISION_DEFAULT_MODEL constant to browser_tool.py
- Document auxiliary.browser_vision config key in cli-config.yaml.example
- Add 10 unit tests covering all resolution steps

Fixes #816

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 12:49:46 -04:00