scripts/generate_scenes_from_media.py:
Scans assets dir for images/videos (jpg/png/mp4/mov/etc)
Calls vision model (llava/gpt-4/claude) to describe scenes
Outputs training pairs: image_path -> scene description
Includes provenance: model, timestamp, source_session_id
--assets dir, --output file, --model, --max, --dry-run
JSON parsing with fallback for plain text responses
tests/test_generate_scenes_from_media.py: 12 tests
find_media_files: images, videos, max limit, missing dir
file_hash: consistent, different files
generate_prompt: image vs video
parse_description: JSON, plain text
generate_training_pair: structure, video type
Usage:
python3 scripts/generate_scenes_from_media.py --assets ~/assets/
python3 scripts/generate_scenes_from_media.py --assets ~/assets/ --model gpt-4
python3 scripts/generate_scenes_from_media.py --assets ~/assets/ --dry-run