# Gemma 4 Multimodal Backlog

## Epic 1: Visual QA for Nexus World
- **Goal:** Use Gemma 4's vision to audit screenshots of the Three.js Nexus world for layout inconsistencies and UI bugs.
- **Tasks:** 
  - [x] Capture automated screenshots of all primary Nexus zones.
  - [ ] Analyze images for clipping, overlapping UI elements, and lighting glitches.
  - [ ] Generate a structured bug report with coordinates and suggested fixes.

## Epic 2: The Testament Visual Consistency Audit
- **Goal:** Ensure the generated image assets for The Testament align with the narrative mood and visual manifest.
- **Tasks:**
  - [ ] Compare generated assets against `visual_manifest.json` descriptions.
  - [ ] Flag images that diverge from the "Cinematic Noir, 35mm, high contrast" aesthetic.
  - [ ] Refine prompts for divergent beats and trigger re-renders.

## Epic 3: Sovereign Heart Emotive Stillness
- **Goal:** Develop a system for selecting the most emotive static image based on the sentiment of generated TTS.
- **Tasks:**
  - [ ] Analyze TTS output for emotional valence and arousal.
  - [ ] Map sentiment kernels to the visual asset library.
  - [ ] Implement a "breathing" transition logic between assets for an expressive presence.

## Epic 4: Multimodal Architecture Synthesis
- **Goal:** Extract and synthesize architectural patterns from visual research papers.
- **Tasks:**
  - [ ] Ingest PDF research papers on agentic workflows.
  - [ ] Analyze diagrams and charts to extract structural logic.
  - [ ] Synthesize findings into `Sovereign_Knowledge_Graph.md`.

## General Tasks

- [x] **Task 1:** Add Gemma 4 entries to `KNOWN_MODEL_CAPABILITIES` and vision fallback chain in `src/infrastructure/models/multimodal.py`. Gemma 4 is a multimodal model supporting vision, text, tools, JSON, and streaming. ✅ PR #1493
- [x] **Task 3:** Add a `ModelCapability.VIDEO` enum member for future video understanding models. ✅ PR #1494
- [ ] **Task 4:** Implement `get_model_for_content("video")` routing with appropriate fallback chain.