Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 35s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 34s
Tests / e2e (pull_request) Successful in 2m45s
Tests / test (pull_request) Failing after 17m0s
Vision benchmark used external URLs that may become unavailable, causing flaky CI runs. New benchmarks/test_images.json: - 5 test images with local paths, descriptions, expected answers - Categories: shape_color, ocr, counting New benchmarks/test_images/: - 5 generated PNG test images (red_circle, blue_square, green_triangle, text_hello, mixed_shapes) - Deterministic, always available, ~1-3KB each New benchmarks/vision_benchmark.py: - load_test_dataset(): loads test_images.json - verify_images_exist(): checks all images present - run_vision_test(): single test with base64 image encoding - evaluate_response(): checks expected keywords in response - run_benchmark(): full benchmark suite - format_report(): human-readable results - --model, --base-url, --json flags Closes #868
43 lines
1.3 KiB
JSON
43 lines
1.3 KiB
JSON
[
|
|
{
|
|
"id": "img_001",
|
|
"name": "red_circle",
|
|
"path": "benchmarks/test_images/red_circle.png",
|
|
"description": "A red circle on a white background",
|
|
"expected_answer_contains": ["red", "circle"],
|
|
"category": "shape_color"
|
|
},
|
|
{
|
|
"id": "img_002",
|
|
"name": "blue_square",
|
|
"path": "benchmarks/test_images/blue_square.png",
|
|
"description": "A blue square on a white background",
|
|
"expected_answer_contains": ["blue", "square"],
|
|
"category": "shape_color"
|
|
},
|
|
{
|
|
"id": "img_003",
|
|
"name": "green_triangle",
|
|
"path": "benchmarks/test_images/green_triangle.png",
|
|
"description": "A green triangle on a white background",
|
|
"expected_answer_contains": ["green", "triangle"],
|
|
"category": "shape_color"
|
|
},
|
|
{
|
|
"id": "img_004",
|
|
"name": "text_hello",
|
|
"path": "benchmarks/test_images/text_hello.png",
|
|
"description": "An image containing the text 'Hello World'",
|
|
"expected_answer_contains": ["hello", "world"],
|
|
"category": "ocr"
|
|
},
|
|
{
|
|
"id": "img_005",
|
|
"name": "mixed_shapes",
|
|
"path": "benchmarks/test_images/mixed_shapes.png",
|
|
"description": "Multiple colored shapes: red circle, blue square, yellow star",
|
|
"expected_answer_contains": ["red", "blue", "yellow"],
|
|
"category": "counting"
|
|
}
|
|
]
|