75 lines
2.6 KiB
Markdown
75 lines
2.6 KiB
Markdown
|
|
# Visual Evidence — Gemma 4 Multimodal Scene Description Generator
|
||
|
|
|
||
|
|
## Test Image: Coffee Beans (Macro Photo)
|
||
|
|
|
||
|
|
### Gemma 4 Vision Analysis (via Ollama)
|
||
|
|
|
||
|
|
**Model:** gemma4:latest (8B, Q4_K_M)
|
||
|
|
**Input:** sample_photo.jpg (46KB JPEG)
|
||
|
|
|
||
|
|
**Structured Output (JSONL):**
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"mood": "dark",
|
||
|
|
"colors": ["dark brown", "espresso", "black"],
|
||
|
|
"composition": "close-up",
|
||
|
|
"camera": "static",
|
||
|
|
"lighting": "soft",
|
||
|
|
"description": "An extreme close-up shot captures a dense pile of roasted coffee beans. The beans are a uniform, deep dark brown and appear slightly oily, filling the entire frame. The focus emphasizes the rich texture and individual shapes of the beans."
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Hermes Vision Analysis (Cross-Validation)
|
||
|
|
|
||
|
|
**Scene ID:** COFFEE_MACRO_001
|
||
|
|
**Mood:** Warm, aromatic, and comforting
|
||
|
|
**Dominant Colors:** Deep umber, burnt sienna, espresso black, mahogany
|
||
|
|
**Composition:** Full-frame fill, centrally weighted
|
||
|
|
**Camera:** High-angle, close-up (Macro)
|
||
|
|
**Lighting:** Soft, diffused top-lighting
|
||
|
|
|
||
|
|
## Test Image: Abstract Geometric Composition
|
||
|
|
|
||
|
|
### Gemma 4 Vision Analysis
|
||
|
|
|
||
|
|
**Input:** scene1.jpg (10KB, PIL-generated)
|
||
|
|
|
||
|
|
**Structured Output (JSONL):**
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"mood": "energetic",
|
||
|
|
"colors": ["deep blue", "yellow", "coral"],
|
||
|
|
"composition": "wide-shot",
|
||
|
|
"camera": "static",
|
||
|
|
"lighting": "artificial",
|
||
|
|
"description": "This is an abstract graphic composition set against a solid, deep blue background. A bright yellow square is placed in the upper left quadrant, while a large, solid coral-colored circle occupies the lower right quadrant. The geometric shapes create a high-contrast, minimalist visual balance."
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Verification Summary
|
||
|
|
|
||
|
|
| Test | Status | Details |
|
||
|
|
|------|--------|---------|
|
||
|
|
| Model detection | ✅ PASS | `gemma4:latest` auto-detected |
|
||
|
|
| Image scanning | ✅ PASS | 2 images found recursively |
|
||
|
|
| Vision analysis | ✅ PASS | Both images described accurately |
|
||
|
|
| JSON parsing | ✅ PASS | Structured output with all fields |
|
||
|
|
| Training format | ✅ PASS | JSONL with source, model, timestamp |
|
||
|
|
| ShareGPT format | ⚠️ PARTIAL | Works but needs retry on rate limit |
|
||
|
|
|
||
|
|
## Running the Generator
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check model availability
|
||
|
|
python scripts/generate_scene_descriptions.py --check-model
|
||
|
|
|
||
|
|
# Generate scene descriptions from assets
|
||
|
|
python scripts/generate_scene_descriptions.py --input ./assets --output training-data/scene-descriptions-auto.jsonl
|
||
|
|
|
||
|
|
# Limit to 10 files with specific model
|
||
|
|
python scripts/generate_scene_descriptions.py --input ./assets --model gemma4:latest --limit 10
|
||
|
|
|
||
|
|
# ShareGPT format for training pipeline
|
||
|
|
python scripts/generate_scene_descriptions.py --input ./assets --format sharegpt
|
||
|
|
```
|