forked from Rockachopa/Timmy-time-dashboard
36 lines
2.0 KiB
Markdown
36 lines
2.0 KiB
Markdown
# Gemma 4 Multimodal Backlog
|
|
|
|
## Epic 1: Visual QA for Nexus World
|
|
- **Goal:** Use Gemma 4's vision to audit screenshots of the Three.js Nexus world for layout inconsistencies and UI bugs.
|
|
- **Tasks:**
|
|
- [x] Capture automated screenshots of all primary Nexus zones.
|
|
- [ ] Analyze images for clipping, overlapping UI elements, and lighting glitches.
|
|
- [ ] Generate a structured bug report with coordinates and suggested fixes.
|
|
|
|
## Epic 2: The Testament Visual Consistency Audit
|
|
- **Goal:** Ensure the generated image assets for The Testament align with the narrative mood and visual manifest.
|
|
- **Tasks:**
|
|
- [ ] Compare generated assets against `visual_manifest.json` descriptions.
|
|
- [ ] Flag images that diverge from the "Cinematic Noir, 35mm, high contrast" aesthetic.
|
|
- [ ] Refine prompts for divergent beats and trigger re-renders.
|
|
|
|
## Epic 3: Sovereign Heart Emotive Stillness
|
|
- **Goal:** Develop a system for selecting the most emotive static image based on the sentiment of generated TTS.
|
|
- **Tasks:**
|
|
- [ ] Analyze TTS output for emotional valence and arousal.
|
|
- [ ] Map sentiment kernels to the visual asset library.
|
|
- [ ] Implement a "breathing" transition logic between assets for an expressive presence.
|
|
|
|
## Epic 4: Multimodal Architecture Synthesis
|
|
- **Goal:** Extract and synthesize architectural patterns from visual research papers.
|
|
- **Tasks:**
|
|
- [ ] Ingest PDF research papers on agentic workflows.
|
|
- [ ] Analyze diagrams and charts to extract structural logic.
|
|
- [ ] Synthesize findings into `Sovereign_Knowledge_Graph.md`.
|
|
|
|
## General Tasks
|
|
|
|
- [x] **Task 1:** Add Gemma 4 entries to `KNOWN_MODEL_CAPABILITIES` and vision fallback chain in `src/infrastructure/models/multimodal.py`. Gemma 4 is a multimodal model supporting vision, text, tools, JSON, and streaming. ✅ PR #1493
|
|
- [x] **Task 3:** Add a `ModelCapability.VIDEO` enum member for future video understanding models. ✅ PR #1494
|
|
- [ ] **Task 4:** Implement `get_model_for_content("video")` routing with appropriate fallback chain.
|