Files
Timmy-time-dashboard/MULTIMODAL_BACKLOG.md
2026-04-15 05:12:49 +00:00

2.0 KiB

Gemma 4 Multimodal Backlog

Epic 1: Visual QA for Nexus World

  • Goal: Use Gemma 4's vision to audit screenshots of the Three.js Nexus world for layout inconsistencies and UI bugs.
  • Tasks:
    • Capture automated screenshots of all primary Nexus zones.
    • Analyze images for clipping, overlapping UI elements, and lighting glitches.
    • Generate a structured bug report with coordinates and suggested fixes.

Epic 2: The Testament Visual Consistency Audit

  • Goal: Ensure the generated image assets for The Testament align with the narrative mood and visual manifest.
  • Tasks:
    • Compare generated assets against visual_manifest.json descriptions.
    • Flag images that diverge from the "Cinematic Noir, 35mm, high contrast" aesthetic.
    • Refine prompts for divergent beats and trigger re-renders.

Epic 3: Sovereign Heart Emotive Stillness

  • Goal: Develop a system for selecting the most emotive static image based on the sentiment of generated TTS.
  • Tasks:
    • Analyze TTS output for emotional valence and arousal.
    • Map sentiment kernels to the visual asset library.
    • Implement a "breathing" transition logic between assets for an expressive presence.

Epic 4: Multimodal Architecture Synthesis

  • Goal: Extract and synthesize architectural patterns from visual research papers.
  • Tasks:
    • Ingest PDF research papers on agentic workflows.
    • Analyze diagrams and charts to extract structural logic.
    • Synthesize findings into Sovereign_Knowledge_Graph.md.

General Tasks

  • Task 1: Add Gemma 4 entries to KNOWN_MODEL_CAPABILITIES and vision fallback chain in src/infrastructure/models/multimodal.py. Gemma 4 is a multimodal model supporting vision, text, tools, JSON, and streaming. PR #1493
  • Task 3: Add a ModelCapability.VIDEO enum member for future video understanding models. PR #1494
  • Task 4: Implement get_model_for_content("video") routing with appropriate fallback chain.