diff --git a/docs/visual-evidence-689.md b/docs/visual-evidence-689.md new file mode 100644 index 00000000..68392544 --- /dev/null +++ b/docs/visual-evidence-689.md @@ -0,0 +1,74 @@ +# Visual Evidence — Gemma 4 Multimodal Scene Description Generator + +## Test Image: Coffee Beans (Macro Photo) + +### Gemma 4 Vision Analysis (via Ollama) + +**Model:** gemma4:latest (8B, Q4_K_M) +**Input:** sample_photo.jpg (46KB JPEG) + +**Structured Output (JSONL):** +```json +{ + "mood": "dark", + "colors": ["dark brown", "espresso", "black"], + "composition": "close-up", + "camera": "static", + "lighting": "soft", + "description": "An extreme close-up shot captures a dense pile of roasted coffee beans. The beans are a uniform, deep dark brown and appear slightly oily, filling the entire frame. The focus emphasizes the rich texture and individual shapes of the beans." +} +``` + +### Hermes Vision Analysis (Cross-Validation) + +**Scene ID:** COFFEE_MACRO_001 +**Mood:** Warm, aromatic, and comforting +**Dominant Colors:** Deep umber, burnt sienna, espresso black, mahogany +**Composition:** Full-frame fill, centrally weighted +**Camera:** High-angle, close-up (Macro) +**Lighting:** Soft, diffused top-lighting + +## Test Image: Abstract Geometric Composition + +### Gemma 4 Vision Analysis + +**Input:** scene1.jpg (10KB, PIL-generated) + +**Structured Output (JSONL):** +```json +{ + "mood": "energetic", + "colors": ["deep blue", "yellow", "coral"], + "composition": "wide-shot", + "camera": "static", + "lighting": "artificial", + "description": "This is an abstract graphic composition set against a solid, deep blue background. A bright yellow square is placed in the upper left quadrant, while a large, solid coral-colored circle occupies the lower right quadrant. The geometric shapes create a high-contrast, minimalist visual balance." +} +``` + +## Verification Summary + +| Test | Status | Details | +|------|--------|---------| +| Model detection | ✅ PASS | `gemma4:latest` auto-detected | +| Image scanning | ✅ PASS | 2 images found recursively | +| Vision analysis | ✅ PASS | Both images described accurately | +| JSON parsing | ✅ PASS | Structured output with all fields | +| Training format | ✅ PASS | JSONL with source, model, timestamp | +| ShareGPT format | ⚠️ PARTIAL | Works but needs retry on rate limit | + +## Running the Generator + +```bash +# Check model availability +python scripts/generate_scene_descriptions.py --check-model + +# Generate scene descriptions from assets +python scripts/generate_scene_descriptions.py --input ./assets --output training-data/scene-descriptions-auto.jsonl + +# Limit to 10 files with specific model +python scripts/generate_scene_descriptions.py --input ./assets --model gemma4:latest --limit 10 + +# ShareGPT format for training pipeline +python scripts/generate_scene_descriptions.py --input ./assets --format sharegpt +```