2.0 KiB
2.0 KiB
Gemma 4 Multimodal Backlog
Epic 1: Visual QA for Nexus World
- Goal: Use Gemma 4's vision to audit screenshots of the Three.js Nexus world for layout inconsistencies and UI bugs.
- Tasks:
- Capture automated screenshots of all primary Nexus zones.
- Analyze images for clipping, overlapping UI elements, and lighting glitches.
- Generate a structured bug report with coordinates and suggested fixes.
Epic 2: The Testament Visual Consistency Audit
- Goal: Ensure the generated image assets for The Testament align with the narrative mood and visual manifest.
- Tasks:
- Compare generated assets against
visual_manifest.jsondescriptions. - Flag images that diverge from the "Cinematic Noir, 35mm, high contrast" aesthetic.
- Refine prompts for divergent beats and trigger re-renders.
- Compare generated assets against
Epic 3: Sovereign Heart Emotive Stillness
- Goal: Develop a system for selecting the most emotive static image based on the sentiment of generated TTS.
- Tasks:
- Analyze TTS output for emotional valence and arousal.
- Map sentiment kernels to the visual asset library.
- Implement a "breathing" transition logic between assets for an expressive presence.
Epic 4: Multimodal Architecture Synthesis
- Goal: Extract and synthesize architectural patterns from visual research papers.
- Tasks:
- Ingest PDF research papers on agentic workflows.
- Analyze diagrams and charts to extract structural logic.
- Synthesize findings into
Sovereign_Knowledge_Graph.md.
General Tasks
- Task 1: Add Gemma 4 entries to
KNOWN_MODEL_CAPABILITIESand vision fallback chain insrc/infrastructure/models/multimodal.py. Gemma 4 is a multimodal model supporting vision, text, tools, JSON, and streaming. ✅ PR #1493 - Task 3: Add a
ModelCapability.VIDEOenum member for future video understanding models. ✅ PR #1494 - Task 4: Implement
get_model_for_content("video")routing with appropriate fallback chain.