[claude] Research: Google Imagen 3 — Nexus concept art & agent avatars (#290) (#316)

2026-03-24 04:56:02 +00:00
parent b10f23c12d
commit db8e9802bc
1 changed files with 327 additions and 0 deletions
--- a/IMAGEN3_REPORT.md
+++ b/IMAGEN3_REPORT.md
@@ -0,0 +1,327 @@
+# Google Imagen 3 — Nexus Concept Art & Agent Avatars Research Report
+
+*Compiled March 2026*
+
+## Executive Summary
+
+Google Imagen 3 is Google DeepMind's state-of-the-art text-to-image generation model, available via API through the Gemini Developer API and Vertex AI. This report evaluates Imagen 3 for generating Nexus concept art (space/3D/cyberpunk environments) and AI agent avatars, covering API access, prompt engineering, integration architecture, and comparison to alternatives.
+
+---
+
+## 1. Model Overview
+
+Google Imagen 3 was released in late 2024 and made generally available in early 2025. It is the third major generation of Google's Imagen series, with Imagen 4 now available as the current-generation model. Both Imagen 3 and 4 share near-identical APIs.
+
+### Available Model Variants
+
+| Model ID | Purpose |
+|---|---|
+| `imagen-3.0-generate-002` | Primary high-quality model (recommended for Nexus) |
+| `imagen-3.0-generate-001` | Earlier Imagen 3 variant |
+| `imagen-3.0-fast-generate-001` | ~40% lower latency, slightly reduced quality |
+| `imagen-3.0-capability-001` | Extended features (editing, inpainting, upscaling) |
+| `imagen-4.0-generate-001` | Current-generation (Imagen 4) |
+| `imagen-4.0-fast-generate-001` | Fast Imagen 4 variant |
+
+### Core Capabilities
+
+- Photorealistic and stylized image generation from text prompts
+- Artifact-free output with improved detail and lighting vs. Imagen 2
+- In-image text rendering — up to 25 characters reliably (best-in-class)
+- Multiple artistic styles: photorealism, digital art, impressionism, anime, watercolor, cinematic
+- Negative prompt support
+- Seed-based reproducible generation (useful for consistent agent avatar identity)
+- SynthID invisible digital watermarking on all outputs
+- Inpainting, outpainting, and image editing (via `capability-001` model)
+
+---
+
+## 2. API Access & Pricing
+
+### Access Paths
+
+**Path A — Gemini Developer API (recommended for Nexus)**
+- Endpoint: `https://generativelanguage.googleapis.com/v1beta/models/{model}:predict`
+- Auth: API key via `x-goog-api-key` header
+- Key obtained at: Google AI Studio (aistudio.google.com)
+- No Google Cloud project required for basic access
+- Price: **$0.03/image** (Imagen 3), **$0.04/image** (Imagen 4 Standard)
+
+**Path B — Vertex AI (enterprise)**
+- Requires a Google Cloud project with billing enabled
+- Auth: OAuth 2.0 or Application Default Credentials
+- More granular safety controls, regional selection, SLAs
+
+### Pricing Summary
+
+| Model | Price/Image |
+|---|---|
+| Imagen 3 (`imagen-3.0-generate-002`) | $0.03 |
+| Imagen 4 Fast | $0.02 |
+| Imagen 4 Standard | $0.04 |
+| Imagen 4 Ultra | $0.06 |
+| Image editing/inpainting (Vertex) | $0.02 |
+
+### Rate Limits
+
+| Tier | Images/Minute |
+|---|---|
+| Free (AI Studio web UI only) | ~2 IPM |
+| Tier 1 (billing linked) | 10 IPM |
+| Tier 2 ($250 cumulative spend) | Higher — contact Google |
+
+---
+
+## 3. Image Resolutions & Formats
+
+| Aspect Ratio | Pixel Size | Best Use |
+|---|---|---|
+| 1:1 | 1024×1024 or 2048×2048 | Agent avatars, thumbnails |
+| 16:9 | 1408×768 | Nexus concept art, widescreen |
+| 4:3 | 1280×896 | Environment shots |
+| 3:4 | 896×1280 | Portrait concept art |
+| 9:16 | 768×1408 | Vertical banners |
+
+- Default output: 1K (1024px); max: 2K (2048px)
+- Output formats: PNG (default), JPEG
+- Prompt input limit: 480 tokens
+
+---
+
+## 4. Prompt Engineering for the Nexus
+
+### Core Formula
+```
+[Subject] + [Setting/Context] + [Style] + [Lighting] + [Technical Specs]
+```
+
+### Style Keywords for Space/Cyberpunk Concept Art
+
+**Rendering:**
+`cinematic`, `octane render`, `unreal engine 5`, `ray tracing`, `subsurface scattering`, `matte painting`, `digital concept art`, `hyperrealistic`
+
+**Lighting:**
+`volumetric light shafts`, `neon glow`, `cyberpunk neon`, `dramatic rim lighting`, `chiaroscuro`, `bioluminescent`
+
+**Quality:**
+`4K`, `8K resolution`, `ultra-detailed`, `HDR`, `photorealistic`, `professional`
+
+**Sci-fi/Space:**
+`hard science fiction aesthetic`, `dark void background`, `nebula`, `holographic`, `glowing circuits`, `orbital`
+
+### Example Prompts: Nexus Concept Art
+
+**The Nexus Hub (main environment):**
+```
+Exterior view of a glowing orbital space station against a deep purple nebula,
+holographic data streams flowing between modules in cyan and gold,
+three.js aesthetic, hard science fiction,
+rendered in Unreal Engine 5, volumetric lighting,
+4K, ultra-detailed, cinematic 16:9
+```
+
+**Portal Chamber:**
+```
+Interior of a circular chamber with six glowing portal doorways
+arranged in a hexagonal pattern, each portal displaying a different dimension,
+neon-lit cyber baroque architecture, glowing runes on obsidian floor,
+cyberpunk aesthetic, volumetric light shafts, ray tracing,
+4K matte painting, wide angle
+```
+
+**Cyberpunk Nexus Exterior:**
+```
+Exterior of a towering brutalist cyber-tower floating in deep space,
+neon holographic advertisements in multiple languages,
+rain streaks catching neon light, 2087 aesthetic,
+cinematic lighting, anamorphic lens flare, film grain,
+ultra-detailed, 4K
+```
+
+### Example Prompts: AI Agent Avatars
+
+**Timmy (Sovereign AI Host):**
+```
+Portrait of a warm humanoid AI entity, translucent synthetic skin
+revealing golden circuit patterns beneath, kind glowing amber eyes,
+soft studio rim lighting, deep space background with subtle star field,
+digital concept art, shallow depth of field,
+professional 3D render, 1:1 square format, 8K
+```
+
+**Technical Agent Avatar (e.g. Kimi, Claude):**
+```
+Portrait of a sleek android entity, obsidian chrome face
+with glowing cyan ocular sensors and circuit filaments visible at temples,
+neutral expression suggesting deep processing,
+dark gradient background, dramatic rim lighting in electric blue,
+digital concept art, highly detailed, professional 3D render, 8K
+```
+
+**Pixar-Style Friendly Agent:**
+```
+Ultra-cute 3D cartoon android character,
+big expressive glowing teal eyes, smooth chrome dome with small antenna,
+soft Pixar/Disney render style, pastel color palette on dark space background,
+high detail, cinematic studio lighting, ultra-high resolution, 1:1
+```
+
+### Negative Prompt Best Practices
+
+Use plain nouns/adjectives, not instructions:
+```
+blurry, watermark, text overlay, low quality, overexposed,
+deformed, distorted, ugly, bad anatomy, jpeg artifacts
+```
+
+Note: Do NOT write "no blur" or "don't add text" — use the noun form only.
+
+---
+
+## 5. Integration Architecture for the Nexus
+
+**Security requirement:** Never call Imagen APIs from browser-side JavaScript. The API key would be exposed in client code.
+
+### Recommended Pattern
+```
+Browser (Three.js / Nexus) → Backend Proxy → Imagen API → Base64 → Browser
+```
+
+### Backend Proxy (Node.js)
+```javascript
+// server-side only — keep API key in environment variable, never in client code
+async function generateNexusImage(prompt, aspectRatio = '16:9') {
+  const response = await fetch(
+    'https://generativelanguage.googleapis.com/v1beta/models/imagen-3.0-generate-002:predict',
+    {
+      method: 'POST',
+      headers: {
+        'x-goog-api-key': process.env.GEMINI_API_KEY,
+        'Content-Type': 'application/json',
+      },
+      body: JSON.stringify({
+        instances: [{ prompt }],
+        parameters: {
+          sampleCount: 1,
+          aspectRatio,
+          negativePrompt: 'blurry, watermark, low quality, deformed',
+          addWatermark: true,
+        }
+      })
+    }
+  );
+
+  const data = await response.json();
+  const base64 = data.predictions[0].bytesBase64Encoded;
+  return `data:image/png;base64,${base64}`;
+}
+```
+
+### Applying to Three.js (Nexus app.js)
+```javascript
+// Load a generated image as a Three.js texture
+async function loadGeneratedTexture(imageDataUrl) {
+  return new Promise((resolve) => {
+    const loader = new THREE.TextureLoader();
+    loader.load(imageDataUrl, resolve);
+  });
+}
+
+// Apply to a portal or background plane
+const texture = await loadGeneratedTexture(await fetchFromProxy('/api/generate-image', prompt));
+portalMesh.material.map = texture;
+portalMesh.material.needsUpdate = true;
+```
+
+### Python SDK (Vertex AI)
+```python
+from vertexai.preview.vision_models import ImageGenerationModel
+import vertexai
+
+vertexai.init(project="YOUR_PROJECT_ID", location="us-central1")
+model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-002")
+
+images = model.generate_images(
+    prompt="Nexus orbital station, cyberpunk, 4K, cinematic",
+    number_of_images=1,
+    aspect_ratio="16:9",
+    negative_prompt="blurry, low quality",
+)
+images[0].save(location="nexus_concept.png")
+```
+
+---
+
+## 6. Comparison to Alternatives
+
+| Feature | Imagen 3/4 | DALL-E 3 / GPT-Image-1.5 | Stable Diffusion 3.5 | Midjourney |
+|---|---|---|---|---|
+| **Photorealism** | Excellent | Excellent | Very Good | Excellent |
+| **Text in Images** | Best-in-class | Strong | Weak | Weak |
+| **Cyberpunk/Concept Art** | Very Good | Good | Excellent (custom models) | Excellent |
+| **Portrait Avatars** | Very Good | Good | Excellent | Excellent |
+| **API Access** | Yes | Yes | Yes (various) | No public API |
+| **Price/image** | $0.02–$0.06 | $0.011–$0.25 | $0.002–$0.05 | N/A (subscription) |
+| **Free Tier** | UI only | ChatGPT free | Local run | Limited |
+| **Open Source** | No | No | Yes | No |
+| **Negative Prompts** | Yes | No | Yes | Partial |
+| **Seed Control** | Yes | No | Yes | Yes |
+| **Watermark** | SynthID (always) | No | No | Subtle |
+
+### Assessment for the Nexus
+
+- **Imagen 3/4** — Best choice for Google ecosystem integration; excellent photorealism and text rendering; slightly weaker on artistic stylization than alternatives.
+- **Stable Diffusion** — Most powerful for cyberpunk/concept art via community models (DreamShaper, SDXL); can run locally at zero API cost; requires more setup.
+- **DALL-E 3** — Strong natural language understanding; accessible; no negative prompts.
+- **Midjourney** — Premium aesthetic quality; no API access makes it unsuitable for automated generation.
+
+**Recommendation:** Use Imagen 3 (`imagen-3.0-generate-002`) via Gemini API for initial implementation — lowest friction for Google ecosystem, $0.03/image, strong results with the prompt patterns above. Consider Stable Diffusion for offline/cost-sensitive generation of bulk assets.
+
+---
+
+## 7. Key Considerations
+
+1. **SynthID watermark** is always present on all Imagen outputs (imperceptible to human eye but embedded in pixel data). Cannot be disabled on Gemini API; can be disabled on Vertex AI with `addWatermark: false`.
+
+2. **Seed parameter** enables reproducible avatar generation — critical for consistent agent identity across sessions. Requires `addWatermark: false` to work (Vertex AI only).
+
+3. **Prompt enhancement** (`enhancePrompt: true`) is enabled by default — Imagen's LLM rewrites your prompt for better results. Disable to use prompts verbatim.
+
+4. **Person generation controls** are geo-restricted. The `allow_all` setting (adults + children) is blocked in EU, UK, Switzerland, and MENA regions.
+
+5. **Nexus color palette compatibility** — use explicit color keywords in prompts to match the Nexus color scheme defined in `NEXUS.colors` (e.g., specify `#0ff cyan`, `deep purple`, `gold`).
+
+6. **Imagen 3 vs. 4** — Imagen 3 (`imagen-3.0-generate-002`) is the stable proven model at $0.03/image. Imagen 4 Standard improves quality at $0.04/image. Both use identical API structure.
+
+---
+
+## 8. Implementation Roadmap for the Nexus
+
+### Phase 1 — Concept Art Generation (Offline/Pre-generated)
+- Use Python + Vertex AI to generate Nexus concept art images
+- Optimal prompts for: hub environment, portal chamber, exterior shot
+- Store as static assets; apply as Three.js textures
+
+### Phase 2 — Agent Avatar Generation
+- Define avatar prompt templates per agent (Timmy, Kimi, Claude, Perplexity)
+- Generate at 1:1 / 2048×2048 with `seed` for reproducibility
+- Apply as HUD portraits and 3D billboard sprites
+
+### Phase 3 — Live Generation Proxy (Future)
+- Add `/api/generate-image` backend endpoint
+- Allow Nexus to request dynamic portal concept art on-demand
+- Cache results in Cloud Storage for cost efficiency
+
+---
+
+## Sources
+
+- Google DeepMind — Imagen 3: deepmind.google/technologies/imagen-3/
+- Google Cloud — Imagen 3 on Vertex AI documentation
+- Google AI for Developers — Imagen API (Gemini Developer API)
+- Google Cloud Vertex AI Pricing
+- Gemini Developer API Pricing
+- A developer's guide to Imagen 3 on Vertex AI — Google Cloud Blog
+- Imagen 3: A Guide With Examples — DataCamp
+- DALL-E 3 vs Imagen comparison — ToolsCompare.ai
+- Best Text-to-Image Models 2026 — AIPortalX