From ddf0eb8e9e966460559847fddace4059608f1e42 Mon Sep 17 00:00:00 2001 From: Alexander Whitestone Date: Tue, 24 Mar 2026 00:54:49 -0400 Subject: [PATCH] docs: Add Google Imagen 3 research report for Nexus concept art and agent avatars Comprehensive research covering Imagen 3 API access, pricing, prompt engineering patterns for space/cyberpunk concept art and AI agent avatars, integration architecture for Three.js Nexus app, and comparison to DALL-E 3, Stable Diffusion, and Midjourney alternatives. Refs #290 --- IMAGEN3_REPORT.md | 327 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 327 insertions(+) create mode 100644 IMAGEN3_REPORT.md diff --git a/IMAGEN3_REPORT.md b/IMAGEN3_REPORT.md new file mode 100644 index 0000000..cc38194 --- /dev/null +++ b/IMAGEN3_REPORT.md @@ -0,0 +1,327 @@ +# Google Imagen 3 — Nexus Concept Art & Agent Avatars Research Report + +*Compiled March 2026* + +## Executive Summary + +Google Imagen 3 is Google DeepMind's state-of-the-art text-to-image generation model, available via API through the Gemini Developer API and Vertex AI. This report evaluates Imagen 3 for generating Nexus concept art (space/3D/cyberpunk environments) and AI agent avatars, covering API access, prompt engineering, integration architecture, and comparison to alternatives. + +--- + +## 1. Model Overview + +Google Imagen 3 was released in late 2024 and made generally available in early 2025. It is the third major generation of Google's Imagen series, with Imagen 4 now available as the current-generation model. Both Imagen 3 and 4 share near-identical APIs. + +### Available Model Variants + +| Model ID | Purpose | +|---|---| +| `imagen-3.0-generate-002` | Primary high-quality model (recommended for Nexus) | +| `imagen-3.0-generate-001` | Earlier Imagen 3 variant | +| `imagen-3.0-fast-generate-001` | ~40% lower latency, slightly reduced quality | +| `imagen-3.0-capability-001` | Extended features (editing, inpainting, upscaling) | +| `imagen-4.0-generate-001` | Current-generation (Imagen 4) | +| `imagen-4.0-fast-generate-001` | Fast Imagen 4 variant | + +### Core Capabilities + +- Photorealistic and stylized image generation from text prompts +- Artifact-free output with improved detail and lighting vs. Imagen 2 +- In-image text rendering — up to 25 characters reliably (best-in-class) +- Multiple artistic styles: photorealism, digital art, impressionism, anime, watercolor, cinematic +- Negative prompt support +- Seed-based reproducible generation (useful for consistent agent avatar identity) +- SynthID invisible digital watermarking on all outputs +- Inpainting, outpainting, and image editing (via `capability-001` model) + +--- + +## 2. API Access & Pricing + +### Access Paths + +**Path A — Gemini Developer API (recommended for Nexus)** +- Endpoint: `https://generativelanguage.googleapis.com/v1beta/models/{model}:predict` +- Auth: API key via `x-goog-api-key` header +- Key obtained at: Google AI Studio (aistudio.google.com) +- No Google Cloud project required for basic access +- Price: **$0.03/image** (Imagen 3), **$0.04/image** (Imagen 4 Standard) + +**Path B — Vertex AI (enterprise)** +- Requires a Google Cloud project with billing enabled +- Auth: OAuth 2.0 or Application Default Credentials +- More granular safety controls, regional selection, SLAs + +### Pricing Summary + +| Model | Price/Image | +|---|---| +| Imagen 3 (`imagen-3.0-generate-002`) | $0.03 | +| Imagen 4 Fast | $0.02 | +| Imagen 4 Standard | $0.04 | +| Imagen 4 Ultra | $0.06 | +| Image editing/inpainting (Vertex) | $0.02 | + +### Rate Limits + +| Tier | Images/Minute | +|---|---| +| Free (AI Studio web UI only) | ~2 IPM | +| Tier 1 (billing linked) | 10 IPM | +| Tier 2 ($250 cumulative spend) | Higher — contact Google | + +--- + +## 3. Image Resolutions & Formats + +| Aspect Ratio | Pixel Size | Best Use | +|---|---|---| +| 1:1 | 1024×1024 or 2048×2048 | Agent avatars, thumbnails | +| 16:9 | 1408×768 | Nexus concept art, widescreen | +| 4:3 | 1280×896 | Environment shots | +| 3:4 | 896×1280 | Portrait concept art | +| 9:16 | 768×1408 | Vertical banners | + +- Default output: 1K (1024px); max: 2K (2048px) +- Output formats: PNG (default), JPEG +- Prompt input limit: 480 tokens + +--- + +## 4. Prompt Engineering for the Nexus + +### Core Formula +``` +[Subject] + [Setting/Context] + [Style] + [Lighting] + [Technical Specs] +``` + +### Style Keywords for Space/Cyberpunk Concept Art + +**Rendering:** +`cinematic`, `octane render`, `unreal engine 5`, `ray tracing`, `subsurface scattering`, `matte painting`, `digital concept art`, `hyperrealistic` + +**Lighting:** +`volumetric light shafts`, `neon glow`, `cyberpunk neon`, `dramatic rim lighting`, `chiaroscuro`, `bioluminescent` + +**Quality:** +`4K`, `8K resolution`, `ultra-detailed`, `HDR`, `photorealistic`, `professional` + +**Sci-fi/Space:** +`hard science fiction aesthetic`, `dark void background`, `nebula`, `holographic`, `glowing circuits`, `orbital` + +### Example Prompts: Nexus Concept Art + +**The Nexus Hub (main environment):** +``` +Exterior view of a glowing orbital space station against a deep purple nebula, +holographic data streams flowing between modules in cyan and gold, +three.js aesthetic, hard science fiction, +rendered in Unreal Engine 5, volumetric lighting, +4K, ultra-detailed, cinematic 16:9 +``` + +**Portal Chamber:** +``` +Interior of a circular chamber with six glowing portal doorways +arranged in a hexagonal pattern, each portal displaying a different dimension, +neon-lit cyber baroque architecture, glowing runes on obsidian floor, +cyberpunk aesthetic, volumetric light shafts, ray tracing, +4K matte painting, wide angle +``` + +**Cyberpunk Nexus Exterior:** +``` +Exterior of a towering brutalist cyber-tower floating in deep space, +neon holographic advertisements in multiple languages, +rain streaks catching neon light, 2087 aesthetic, +cinematic lighting, anamorphic lens flare, film grain, +ultra-detailed, 4K +``` + +### Example Prompts: AI Agent Avatars + +**Timmy (Sovereign AI Host):** +``` +Portrait of a warm humanoid AI entity, translucent synthetic skin +revealing golden circuit patterns beneath, kind glowing amber eyes, +soft studio rim lighting, deep space background with subtle star field, +digital concept art, shallow depth of field, +professional 3D render, 1:1 square format, 8K +``` + +**Technical Agent Avatar (e.g. Kimi, Claude):** +``` +Portrait of a sleek android entity, obsidian chrome face +with glowing cyan ocular sensors and circuit filaments visible at temples, +neutral expression suggesting deep processing, +dark gradient background, dramatic rim lighting in electric blue, +digital concept art, highly detailed, professional 3D render, 8K +``` + +**Pixar-Style Friendly Agent:** +``` +Ultra-cute 3D cartoon android character, +big expressive glowing teal eyes, smooth chrome dome with small antenna, +soft Pixar/Disney render style, pastel color palette on dark space background, +high detail, cinematic studio lighting, ultra-high resolution, 1:1 +``` + +### Negative Prompt Best Practices + +Use plain nouns/adjectives, not instructions: +``` +blurry, watermark, text overlay, low quality, overexposed, +deformed, distorted, ugly, bad anatomy, jpeg artifacts +``` + +Note: Do NOT write "no blur" or "don't add text" — use the noun form only. + +--- + +## 5. Integration Architecture for the Nexus + +**Security requirement:** Never call Imagen APIs from browser-side JavaScript. The API key would be exposed in client code. + +### Recommended Pattern +``` +Browser (Three.js / Nexus) → Backend Proxy → Imagen API → Base64 → Browser +``` + +### Backend Proxy (Node.js) +```javascript +// server-side only — keep API key in environment variable, never in client code +async function generateNexusImage(prompt, aspectRatio = '16:9') { + const response = await fetch( + 'https://generativelanguage.googleapis.com/v1beta/models/imagen-3.0-generate-002:predict', + { + method: 'POST', + headers: { + 'x-goog-api-key': process.env.GEMINI_API_KEY, + 'Content-Type': 'application/json', + }, + body: JSON.stringify({ + instances: [{ prompt }], + parameters: { + sampleCount: 1, + aspectRatio, + negativePrompt: 'blurry, watermark, low quality, deformed', + addWatermark: true, + } + }) + } + ); + + const data = await response.json(); + const base64 = data.predictions[0].bytesBase64Encoded; + return `data:image/png;base64,${base64}`; +} +``` + +### Applying to Three.js (Nexus app.js) +```javascript +// Load a generated image as a Three.js texture +async function loadGeneratedTexture(imageDataUrl) { + return new Promise((resolve) => { + const loader = new THREE.TextureLoader(); + loader.load(imageDataUrl, resolve); + }); +} + +// Apply to a portal or background plane +const texture = await loadGeneratedTexture(await fetchFromProxy('/api/generate-image', prompt)); +portalMesh.material.map = texture; +portalMesh.material.needsUpdate = true; +``` + +### Python SDK (Vertex AI) +```python +from vertexai.preview.vision_models import ImageGenerationModel +import vertexai + +vertexai.init(project="YOUR_PROJECT_ID", location="us-central1") +model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-002") + +images = model.generate_images( + prompt="Nexus orbital station, cyberpunk, 4K, cinematic", + number_of_images=1, + aspect_ratio="16:9", + negative_prompt="blurry, low quality", +) +images[0].save(location="nexus_concept.png") +``` + +--- + +## 6. Comparison to Alternatives + +| Feature | Imagen 3/4 | DALL-E 3 / GPT-Image-1.5 | Stable Diffusion 3.5 | Midjourney | +|---|---|---|---|---| +| **Photorealism** | Excellent | Excellent | Very Good | Excellent | +| **Text in Images** | Best-in-class | Strong | Weak | Weak | +| **Cyberpunk/Concept Art** | Very Good | Good | Excellent (custom models) | Excellent | +| **Portrait Avatars** | Very Good | Good | Excellent | Excellent | +| **API Access** | Yes | Yes | Yes (various) | No public API | +| **Price/image** | $0.02–$0.06 | $0.011–$0.25 | $0.002–$0.05 | N/A (subscription) | +| **Free Tier** | UI only | ChatGPT free | Local run | Limited | +| **Open Source** | No | No | Yes | No | +| **Negative Prompts** | Yes | No | Yes | Partial | +| **Seed Control** | Yes | No | Yes | Yes | +| **Watermark** | SynthID (always) | No | No | Subtle | + +### Assessment for the Nexus + +- **Imagen 3/4** — Best choice for Google ecosystem integration; excellent photorealism and text rendering; slightly weaker on artistic stylization than alternatives. +- **Stable Diffusion** — Most powerful for cyberpunk/concept art via community models (DreamShaper, SDXL); can run locally at zero API cost; requires more setup. +- **DALL-E 3** — Strong natural language understanding; accessible; no negative prompts. +- **Midjourney** — Premium aesthetic quality; no API access makes it unsuitable for automated generation. + +**Recommendation:** Use Imagen 3 (`imagen-3.0-generate-002`) via Gemini API for initial implementation — lowest friction for Google ecosystem, $0.03/image, strong results with the prompt patterns above. Consider Stable Diffusion for offline/cost-sensitive generation of bulk assets. + +--- + +## 7. Key Considerations + +1. **SynthID watermark** is always present on all Imagen outputs (imperceptible to human eye but embedded in pixel data). Cannot be disabled on Gemini API; can be disabled on Vertex AI with `addWatermark: false`. + +2. **Seed parameter** enables reproducible avatar generation — critical for consistent agent identity across sessions. Requires `addWatermark: false` to work (Vertex AI only). + +3. **Prompt enhancement** (`enhancePrompt: true`) is enabled by default — Imagen's LLM rewrites your prompt for better results. Disable to use prompts verbatim. + +4. **Person generation controls** are geo-restricted. The `allow_all` setting (adults + children) is blocked in EU, UK, Switzerland, and MENA regions. + +5. **Nexus color palette compatibility** — use explicit color keywords in prompts to match the Nexus color scheme defined in `NEXUS.colors` (e.g., specify `#0ff cyan`, `deep purple`, `gold`). + +6. **Imagen 3 vs. 4** — Imagen 3 (`imagen-3.0-generate-002`) is the stable proven model at $0.03/image. Imagen 4 Standard improves quality at $0.04/image. Both use identical API structure. + +--- + +## 8. Implementation Roadmap for the Nexus + +### Phase 1 — Concept Art Generation (Offline/Pre-generated) +- Use Python + Vertex AI to generate Nexus concept art images +- Optimal prompts for: hub environment, portal chamber, exterior shot +- Store as static assets; apply as Three.js textures + +### Phase 2 — Agent Avatar Generation +- Define avatar prompt templates per agent (Timmy, Kimi, Claude, Perplexity) +- Generate at 1:1 / 2048×2048 with `seed` for reproducibility +- Apply as HUD portraits and 3D billboard sprites + +### Phase 3 — Live Generation Proxy (Future) +- Add `/api/generate-image` backend endpoint +- Allow Nexus to request dynamic portal concept art on-demand +- Cache results in Cloud Storage for cost efficiency + +--- + +## Sources + +- Google DeepMind — Imagen 3: deepmind.google/technologies/imagen-3/ +- Google Cloud — Imagen 3 on Vertex AI documentation +- Google AI for Developers — Imagen API (Gemini Developer API) +- Google Cloud Vertex AI Pricing +- Gemini Developer API Pricing +- A developer's guide to Imagen 3 on Vertex AI — Google Cloud Blog +- Imagen 3: A Guide With Examples — DataCamp +- DALL-E 3 vs Imagen comparison — ToolsCompare.ai +- Best Text-to-Image Models 2026 — AIPortalX -- 2.43.0