[claude] Research: Google Imagen 3 — Nexus concept art & agent avatars (#290) (#316)

2026-03-24 04:56:02 +00:00
parent b10f23c12d
commit db8e9802bc
1 changed files with 327 additions and 0 deletions
--- a/IMAGEN3_REPORT.md
+++ b/IMAGEN3_REPORT.md
@@ -0,0 +1,327 @@
 # Google Imagen 3 — Nexus Concept Art & Agent Avatars Research Report
 *Compiled March 2026*
 ## Executive Summary
 Google Imagen 3 is Google DeepMind's state-of-the-art text-to-image generation model, available via API through the Gemini Developer API and Vertex AI. This report evaluates Imagen 3 for generating Nexus concept art (space/3D/cyberpunk environments) and AI agent avatars, covering API access, prompt engineering, integration architecture, and comparison to alternatives.
 ---
 ## 1. Model Overview
 Google Imagen 3 was released in late 2024 and made generally available in early 2025. It is the third major generation of Google's Imagen series, with Imagen 4 now available as the current-generation model. Both Imagen 3 and 4 share near-identical APIs.
 ### Available Model Variants
 | Model ID | Purpose |
 |---|---|
 | `imagen-3.0-generate-002` | Primary high-quality model (recommended for Nexus) |
 | `imagen-3.0-generate-001` | Earlier Imagen 3 variant |
 | `imagen-3.0-fast-generate-001` | ~40% lower latency, slightly reduced quality |
 | `imagen-3.0-capability-001` | Extended features (editing, inpainting, upscaling) |
 | `imagen-4.0-generate-001` | Current-generation (Imagen 4) |
 | `imagen-4.0-fast-generate-001` | Fast Imagen 4 variant |
 ### Core Capabilities
 - Photorealistic and stylized image generation from text prompts
 - Artifact-free output with improved detail and lighting vs. Imagen 2
 - In-image text rendering — up to 25 characters reliably (best-in-class)
 - Multiple artistic styles: photorealism, digital art, impressionism, anime, watercolor, cinematic
 - Negative prompt support
 - Seed-based reproducible generation (useful for consistent agent avatar identity)
 - SynthID invisible digital watermarking on all outputs
 - Inpainting, outpainting, and image editing (via `capability-001` model)
 ---
 ## 2. API Access & Pricing
 ### Access Paths
 **Path A — Gemini Developer API (recommended for Nexus)**
 - Endpoint: `https://generativelanguage.googleapis.com/v1beta/models/{model}:predict`
 - Auth: API key via `x-goog-api-key` header
 - Key obtained at: Google AI Studio (aistudio.google.com)
 - No Google Cloud project required for basic access
 - Price: **$0.03/image** (Imagen 3), **$0.04/image** (Imagen 4 Standard)
 **Path B — Vertex AI (enterprise)**
 - Requires a Google Cloud project with billing enabled
 - Auth: OAuth 2.0 or Application Default Credentials
 - More granular safety controls, regional selection, SLAs
 ### Pricing Summary
 | Model | Price/Image |
 |---|---|
 | Imagen 3 (`imagen-3.0-generate-002`) | $0.03 |
 | Imagen 4 Fast | $0.02 |
 | Imagen 4 Standard | $0.04 |
 | Imagen 4 Ultra | $0.06 |
 | Image editing/inpainting (Vertex) | $0.02 |
 ### Rate Limits
 | Tier | Images/Minute |
 |---|---|
 | Free (AI Studio web UI only) | ~2 IPM |
 | Tier 1 (billing linked) | 10 IPM |
 | Tier 2 ($250 cumulative spend) | Higher — contact Google |
 ---
 ## 3. Image Resolutions & Formats
 | Aspect Ratio | Pixel Size | Best Use |
 |---|---|---|
 | 1:1 | 1024×1024 or 2048×2048 | Agent avatars, thumbnails |
 | 16:9 | 1408×768 | Nexus concept art, widescreen |
 | 4:3 | 1280×896 | Environment shots |
 | 3:4 | 896×1280 | Portrait concept art |
 | 9:16 | 768×1408 | Vertical banners |
 - Default output: 1K (1024px); max: 2K (2048px)
 - Output formats: PNG (default), JPEG
 - Prompt input limit: 480 tokens
 ---
 ## 4. Prompt Engineering for the Nexus
 ### Core Formula
 ```
 [Subject] + [Setting/Context] + [Style] + [Lighting] + [Technical Specs]
 ```
 ### Style Keywords for Space/Cyberpunk Concept Art
 **Rendering:**
 `cinematic`, `octane render`, `unreal engine 5`, `ray tracing`, `subsurface scattering`, `matte painting`, `digital concept art`, `hyperrealistic`
 **Lighting:**
 `volumetric light shafts`, `neon glow`, `cyberpunk neon`, `dramatic rim lighting`, `chiaroscuro`, `bioluminescent`
 **Quality:**
 `4K`, `8K resolution`, `ultra-detailed`, `HDR`, `photorealistic`, `professional`
 **Sci-fi/Space:**
 `hard science fiction aesthetic`, `dark void background`, `nebula`, `holographic`, `glowing circuits`, `orbital`
 ### Example Prompts: Nexus Concept Art
 **The Nexus Hub (main environment):**
 ```
 Exterior view of a glowing orbital space station against a deep purple nebula,
 holographic data streams flowing between modules in cyan and gold,
 three.js aesthetic, hard science fiction,
 rendered in Unreal Engine 5, volumetric lighting,
 4K, ultra-detailed, cinematic 16:9
 ```
 **Portal Chamber:**
 ```
 Interior of a circular chamber with six glowing portal doorways
 arranged in a hexagonal pattern, each portal displaying a different dimension,
 neon-lit cyber baroque architecture, glowing runes on obsidian floor,
 cyberpunk aesthetic, volumetric light shafts, ray tracing,
 4K matte painting, wide angle
 ```
 **Cyberpunk Nexus Exterior:**
 ```
 Exterior of a towering brutalist cyber-tower floating in deep space,
 neon holographic advertisements in multiple languages,
 rain streaks catching neon light, 2087 aesthetic,
 cinematic lighting, anamorphic lens flare, film grain,
 ultra-detailed, 4K
 ```
 ### Example Prompts: AI Agent Avatars
 **Timmy (Sovereign AI Host):**
 ```
 Portrait of a warm humanoid AI entity, translucent synthetic skin
 revealing golden circuit patterns beneath, kind glowing amber eyes,
 soft studio rim lighting, deep space background with subtle star field,
 digital concept art, shallow depth of field,
 professional 3D render, 1:1 square format, 8K
 ```
 **Technical Agent Avatar (e.g. Kimi, Claude):**
 ```
 Portrait of a sleek android entity, obsidian chrome face
 with glowing cyan ocular sensors and circuit filaments visible at temples,
 neutral expression suggesting deep processing,
 dark gradient background, dramatic rim lighting in electric blue,
 digital concept art, highly detailed, professional 3D render, 8K
 ```
 **Pixar-Style Friendly Agent:**
 ```
 Ultra-cute 3D cartoon android character,
 big expressive glowing teal eyes, smooth chrome dome with small antenna,
 soft Pixar/Disney render style, pastel color palette on dark space background,
 high detail, cinematic studio lighting, ultra-high resolution, 1:1
 ```
 ### Negative Prompt Best Practices
 Use plain nouns/adjectives, not instructions:
 ```
 blurry, watermark, text overlay, low quality, overexposed,
 deformed, distorted, ugly, bad anatomy, jpeg artifacts
 ```
 Note: Do NOT write "no blur" or "don't add text" — use the noun form only.
 ---
 ## 5. Integration Architecture for the Nexus
 **Security requirement:** Never call Imagen APIs from browser-side JavaScript. The API key would be exposed in client code.
 ### Recommended Pattern
 ```
 Browser (Three.js / Nexus) → Backend Proxy → Imagen API → Base64 → Browser
 ```
 ### Backend Proxy (Node.js)
 ```javascript
 // server-side only — keep API key in environment variable, never in client code
 async function generateNexusImage(prompt, aspectRatio = '16:9') {
  const response = await fetch(
    'https://generativelanguage.googleapis.com/v1beta/models/imagen-3.0-generate-002:predict',
    {
      method: 'POST',
      headers: {
        'x-goog-api-key': process.env.GEMINI_API_KEY,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        instances: [{ prompt }],
        parameters: {
          sampleCount: 1,
          aspectRatio,
          negativePrompt: 'blurry, watermark, low quality, deformed',
          addWatermark: true,
        }
      })
    }
  );
  const data = await response.json();
  const base64 = data.predictions[0].bytesBase64Encoded;
  return `data:image/png;base64,${base64}`;
 }
 ```
 ### Applying to Three.js (Nexus app.js)
 ```javascript
 // Load a generated image as a Three.js texture
 async function loadGeneratedTexture(imageDataUrl) {
  return new Promise((resolve) => {
    const loader = new THREE.TextureLoader();
    loader.load(imageDataUrl, resolve);
  });
 }
 // Apply to a portal or background plane
 const texture = await loadGeneratedTexture(await fetchFromProxy('/api/generate-image', prompt));
 portalMesh.material.map = texture;
 portalMesh.material.needsUpdate = true;
 ```
 ### Python SDK (Vertex AI)
 ```python
 from vertexai.preview.vision_models import ImageGenerationModel
 import vertexai
 vertexai.init(project="YOUR_PROJECT_ID", location="us-central1")
 model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-002")
 images = model.generate_images(
    prompt="Nexus orbital station, cyberpunk, 4K, cinematic",
    number_of_images=1,
    aspect_ratio="16:9",
    negative_prompt="blurry, low quality",
 )
 images[0].save(location="nexus_concept.png")
 ```
 ---
 ## 6. Comparison to Alternatives
 | Feature | Imagen 3/4 | DALL-E 3 / GPT-Image-1.5 | Stable Diffusion 3.5 | Midjourney |
 |---|---|---|---|---|
 | **Photorealism** | Excellent | Excellent | Very Good | Excellent |
 | **Text in Images** | Best-in-class | Strong | Weak | Weak |
 | **Cyberpunk/Concept Art** | Very Good | Good | Excellent (custom models) | Excellent |
 | **Portrait Avatars** | Very Good | Good | Excellent | Excellent |
 | **API Access** | Yes | Yes | Yes (various) | No public API |
 | **Price/image** | $0.02–$0.06 | $0.011–$0.25 | $0.002–$0.05 | N/A (subscription) |
 | **Free Tier** | UI only | ChatGPT free | Local run | Limited |
 | **Open Source** | No | No | Yes | No |
 | **Negative Prompts** | Yes | No | Yes | Partial |
 | **Seed Control** | Yes | No | Yes | Yes |
 | **Watermark** | SynthID (always) | No | No | Subtle |
 ### Assessment for the Nexus
 - **Imagen 3/4** — Best choice for Google ecosystem integration; excellent photorealism and text rendering; slightly weaker on artistic stylization than alternatives.
 - **Stable Diffusion** — Most powerful for cyberpunk/concept art via community models (DreamShaper, SDXL); can run locally at zero API cost; requires more setup.
 - **DALL-E 3** — Strong natural language understanding; accessible; no negative prompts.
 - **Midjourney** — Premium aesthetic quality; no API access makes it unsuitable for automated generation.
 **Recommendation:** Use Imagen 3 (`imagen-3.0-generate-002`) via Gemini API for initial implementation — lowest friction for Google ecosystem, $0.03/image, strong results with the prompt patterns above. Consider Stable Diffusion for offline/cost-sensitive generation of bulk assets.
 ---
 ## 7. Key Considerations
 1. **SynthID watermark** is always present on all Imagen outputs (imperceptible to human eye but embedded in pixel data). Cannot be disabled on Gemini API; can be disabled on Vertex AI with `addWatermark: false`.
 2. **Seed parameter** enables reproducible avatar generation — critical for consistent agent identity across sessions. Requires `addWatermark: false` to work (Vertex AI only).
 3. **Prompt enhancement** (`enhancePrompt: true`) is enabled by default — Imagen's LLM rewrites your prompt for better results. Disable to use prompts verbatim.
 4. **Person generation controls** are geo-restricted. The `allow_all` setting (adults + children) is blocked in EU, UK, Switzerland, and MENA regions.
 5. **Nexus color palette compatibility** — use explicit color keywords in prompts to match the Nexus color scheme defined in `NEXUS.colors` (e.g., specify `#0ff cyan`, `deep purple`, `gold`).
 6. **Imagen 3 vs. 4** — Imagen 3 (`imagen-3.0-generate-002`) is the stable proven model at $0.03/image. Imagen 4 Standard improves quality at $0.04/image. Both use identical API structure.
 ---
 ## 8. Implementation Roadmap for the Nexus
 ### Phase 1 — Concept Art Generation (Offline/Pre-generated)
 - Use Python + Vertex AI to generate Nexus concept art images
 - Optimal prompts for: hub environment, portal chamber, exterior shot
 - Store as static assets; apply as Three.js textures
 ### Phase 2 — Agent Avatar Generation
 - Define avatar prompt templates per agent (Timmy, Kimi, Claude, Perplexity)
 - Generate at 1:1 / 2048×2048 with `seed` for reproducibility
 - Apply as HUD portraits and 3D billboard sprites
 ### Phase 3 — Live Generation Proxy (Future)
 - Add `/api/generate-image` backend endpoint
 - Allow Nexus to request dynamic portal concept art on-demand
 - Cache results in Cloud Storage for cost efficiency
 ---
 ## Sources
 - Google DeepMind — Imagen 3: deepmind.google/technologies/imagen-3/
 - Google Cloud — Imagen 3 on Vertex AI documentation
 - Google AI for Developers — Imagen API (Gemini Developer API)
 - Google Cloud Vertex AI Pricing
 - Gemini Developer API Pricing
 - A developer's guide to Imagen 3 on Vertex AI — Google Cloud Blog
 - Imagen 3: A Guide With Examples — DataCamp
 - DALL-E 3 vs Imagen comparison — ToolsCompare.ai
 - Best Text-to-Image Models 2026 — AIPortalX