# Gemma 4 Multimodal Backlog ## Epic 1: Visual QA for Nexus World - **Goal:** Use Gemma 4's vision to audit screenshots of the Three.js Nexus world for layout inconsistencies and UI bugs. - **Tasks:** - [x] Capture automated screenshots of all primary Nexus zones. - [ ] Analyze images for clipping, overlapping UI elements, and lighting glitches. - [ ] Generate a structured bug report with coordinates and suggested fixes. ## Epic 2: The Testament Visual Consistency Audit - **Goal:** Ensure the generated image assets for The Testament align with the narrative mood and visual manifest. - **Tasks:** - [ ] Compare generated assets against `visual_manifest.json` descriptions. - [ ] Flag images that diverge from the "Cinematic Noir, 35mm, high contrast" aesthetic. - [ ] Refine prompts for divergent beats and trigger re-renders. ## Epic 3: Sovereign Heart Emotive Stillness - **Goal:** Develop a system for selecting the most emotive static image based on the sentiment of generated TTS. - **Tasks:** - [ ] Analyze TTS output for emotional valence and arousal. - [ ] Map sentiment kernels to the visual asset library. - [ ] Implement a "breathing" transition logic between assets for an expressive presence. ## Epic 4: Multimodal Architecture Synthesis - **Goal:** Extract and synthesize architectural patterns from visual research papers. - **Tasks:** - [ ] Ingest PDF research papers on agentic workflows. - [ ] Analyze diagrams and charts to extract structural logic. - [ ] Synthesize findings into `Sovereign_Knowledge_Graph.md`. ## General Tasks - [x] **Task 1:** Add Gemma 4 entries to `KNOWN_MODEL_CAPABILITIES` and vision fallback chain in `src/infrastructure/models/multimodal.py`. Gemma 4 is a multimodal model supporting vision, text, tools, JSON, and streaming. ✅ PR #1493 - [x] **Task 3:** Add a `ModelCapability.VIDEO` enum member for future video understanding models. ✅ PR #1494 - [ ] **Task 4:** Implement `get_model_for_content("video")` routing with appropriate fallback chain.