docs: add visual evidence for scene description generator #689

2026-04-15 10:03:16 +00:00
parent e176fadef5
commit b3f5a2f21c
1 changed files with 74 additions and 0 deletions
--- a/docs/visual-evidence-689.md
+++ b/docs/visual-evidence-689.md
@@ -0,0 +1,74 @@
+# Visual Evidence — Gemma 4 Multimodal Scene Description Generator
+
+## Test Image: Coffee Beans (Macro Photo)
+
+### Gemma 4 Vision Analysis (via Ollama)
+
+**Model:** gemma4:latest (8B, Q4_K_M)
+**Input:** sample_photo.jpg (46KB JPEG)
+
+**Structured Output (JSONL):**
+```json
+{
+  "mood": "dark",
+  "colors": ["dark brown", "espresso", "black"],
+  "composition": "close-up",
+  "camera": "static",
+  "lighting": "soft",
+  "description": "An extreme close-up shot captures a dense pile of roasted coffee beans. The beans are a uniform, deep dark brown and appear slightly oily, filling the entire frame. The focus emphasizes the rich texture and individual shapes of the beans."
+}
+```
+
+### Hermes Vision Analysis (Cross-Validation)
+
+**Scene ID:** COFFEE_MACRO_001
+**Mood:** Warm, aromatic, and comforting
+**Dominant Colors:** Deep umber, burnt sienna, espresso black, mahogany
+**Composition:** Full-frame fill, centrally weighted
+**Camera:** High-angle, close-up (Macro)
+**Lighting:** Soft, diffused top-lighting
+
+## Test Image: Abstract Geometric Composition
+
+### Gemma 4 Vision Analysis
+
+**Input:** scene1.jpg (10KB, PIL-generated)
+
+**Structured Output (JSONL):**
+```json
+{
+  "mood": "energetic",
+  "colors": ["deep blue", "yellow", "coral"],
+  "composition": "wide-shot",
+  "camera": "static",
+  "lighting": "artificial",
+  "description": "This is an abstract graphic composition set against a solid, deep blue background. A bright yellow square is placed in the upper left quadrant, while a large, solid coral-colored circle occupies the lower right quadrant. The geometric shapes create a high-contrast, minimalist visual balance."
+}
+```
+
+## Verification Summary
+
+| Test | Status | Details |
+|------|--------|---------|
+| Model detection | ✅ PASS | `gemma4:latest` auto-detected |
+| Image scanning | ✅ PASS | 2 images found recursively |
+| Vision analysis | ✅ PASS | Both images described accurately |
+| JSON parsing | ✅ PASS | Structured output with all fields |
+| Training format | ✅ PASS | JSONL with source, model, timestamp |
+| ShareGPT format | ⚠️ PARTIAL | Works but needs retry on rate limit |
+
+## Running the Generator
+
+```bash
+# Check model availability
+python scripts/generate_scene_descriptions.py --check-model
+
+# Generate scene descriptions from assets
+python scripts/generate_scene_descriptions.py --input ./assets --output training-data/scene-descriptions-auto.jsonl
+
+# Limit to 10 files with specific model
+python scripts/generate_scene_descriptions.py --input ./assets --model gemma4:latest --limit 10
+
+# ShareGPT format for training pipeline
+python scripts/generate_scene_descriptions.py --input ./assets --format sharegpt
+```