Some checks failed
Docker Build and Publish / build-and-push (pull_request) Has been skipped
Contributor Attribution Check / check-attribution (pull_request) Failing after 35s
Supply Chain Audit / Scan PR for supply chain risks (pull_request) Successful in 34s
Tests / e2e (pull_request) Successful in 2m45s
Tests / test (pull_request) Failing after 17m0s
Vision benchmark used external URLs that may become unavailable, causing flaky CI runs. New benchmarks/test_images.json: - 5 test images with local paths, descriptions, expected answers - Categories: shape_color, ocr, counting New benchmarks/test_images/: - 5 generated PNG test images (red_circle, blue_square, green_triangle, text_hello, mixed_shapes) - Deterministic, always available, ~1-3KB each New benchmarks/vision_benchmark.py: - load_test_dataset(): loads test_images.json - verify_images_exist(): checks all images present - run_vision_test(): single test with base64 image encoding - evaluate_response(): checks expected keywords in response - run_benchmark(): full benchmark suite - format_report(): human-readable results - --model, --base-url, --json flags Closes #868
1.3 KiB
256x256px
1.3 KiB
256x256px