Timmy
eed87e454e
Contributor Attribution Check / check-attribution (pull_request) Successful in 26s
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 26s
Tests / e2e (pull_request) Successful in 2m38s
Tests / test (pull_request) Failing after 47m49s
test: Benchmark Gemma 4 vision accuracy vs current approach (#817)
Vision benchmark suite comparing Gemma 4 (google/gemma-4-27b-it) vs
current Gemini 3 Flash Preview (google/gemini-3-flash-preview).
Metrics:
- OCR accuracy (character + word overlap)
- Description completeness (keyword coverage)
- Structural quality (length, sentences, numbers)
- Latency (ms per image)
- Token usage
- Consistency across runs
Features:
- 24 diverse test images (screenshots, diagrams, photos, charts)
- Category-specific evaluation prompts
- Automated verdict with composite scoring
- JSON + markdown report output
- 28 unit tests passing
Usage:
python benchmarks/vision_benchmark.py --images benchmarks/test_images.json
python benchmarks/vision_benchmark.py --url https://example.com/img.png
python benchmarks/vision_benchmark.py --generate-dataset
Closes #817.
2026-04-15 23:02:02 -04:00
..
2026-04-10 03:45:36 -07:00
2026-04-16 01:35:21 +00:00
2026-04-14 01:43:45 -07:00
2026-04-15 03:46:58 -07:00
2026-04-11 13:59:52 -07:00
2026-04-07 17:28:37 -07:00
2026-03-14 14:27:20 +03:00
2026-04-15 04:16:16 -07:00
2026-04-14 23:13:35 -07:00
2026-04-11 00:43:27 -07:00
2026-04-13 16:32:04 -07:00
2026-04-14 10:49:35 -07:00
2026-04-15 03:19:43 -07:00
2026-04-12 00:33:54 -07:00
2026-04-16 01:35:24 +00:00
2025-10-01 23:29:25 +00:00
2026-04-09 13:17:06 -07:00
2026-03-17 02:53:33 -07:00
2026-03-13 23:59:12 -07:00
2026-04-10 13:06:02 -07:00
2026-04-07 17:59:42 -07:00
2026-04-13 10:50:24 -07:00
2026-04-12 03:53:30 -07:00
2026-03-15 21:59:53 -07:00
2026-04-12 16:36:11 -07:00
2026-04-13 10:50:24 -07:00
2026-04-09 02:41:56 -07:00
2026-04-02 15:33:51 -07:00
2026-04-11 23:12:11 -07:00
2026-03-29 15:47:19 -07:00
2026-03-24 08:19:23 -07:00
2026-04-07 17:59:42 -07:00
2026-03-20 15:41:06 -04:00
2026-04-13 22:01:49 -07:00
2026-04-07 22:23:28 -07:00
2026-03-30 17:34:43 -07:00
2026-04-14 10:42:58 -07:00
2026-04-10 21:15:59 -07:00
2026-04-08 00:41:36 -07:00
2026-03-19 15:16:35 +01:00
2026-04-10 13:37:45 -07:00
2026-04-10 03:44:43 -07:00
2026-02-26 13:54:20 +03:00
2026-04-14 17:19:20 -07:00
2026-04-13 10:50:24 -07:00
2026-04-14 10:24:19 -07:00
2026-03-30 13:28:10 +09:00
2026-04-15 23:02:02 -04:00