[claude] Research: Google Imagen 3 — Nexus concept art & agent avatars (#290) (#316)
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
Some checks failed
Deploy Nexus / deploy (push) Has been cancelled
This commit was merged in pull request #316.
This commit is contained in:
327
IMAGEN3_REPORT.md
Normal file
327
IMAGEN3_REPORT.md
Normal file
@@ -0,0 +1,327 @@
|
||||
# Google Imagen 3 — Nexus Concept Art & Agent Avatars Research Report
|
||||
|
||||
*Compiled March 2026*
|
||||
|
||||
## Executive Summary
|
||||
|
||||
Google Imagen 3 is Google DeepMind's state-of-the-art text-to-image generation model, available via API through the Gemini Developer API and Vertex AI. This report evaluates Imagen 3 for generating Nexus concept art (space/3D/cyberpunk environments) and AI agent avatars, covering API access, prompt engineering, integration architecture, and comparison to alternatives.
|
||||
|
||||
---
|
||||
|
||||
## 1. Model Overview
|
||||
|
||||
Google Imagen 3 was released in late 2024 and made generally available in early 2025. It is the third major generation of Google's Imagen series, with Imagen 4 now available as the current-generation model. Both Imagen 3 and 4 share near-identical APIs.
|
||||
|
||||
### Available Model Variants
|
||||
|
||||
| Model ID | Purpose |
|
||||
|---|---|
|
||||
| `imagen-3.0-generate-002` | Primary high-quality model (recommended for Nexus) |
|
||||
| `imagen-3.0-generate-001` | Earlier Imagen 3 variant |
|
||||
| `imagen-3.0-fast-generate-001` | ~40% lower latency, slightly reduced quality |
|
||||
| `imagen-3.0-capability-001` | Extended features (editing, inpainting, upscaling) |
|
||||
| `imagen-4.0-generate-001` | Current-generation (Imagen 4) |
|
||||
| `imagen-4.0-fast-generate-001` | Fast Imagen 4 variant |
|
||||
|
||||
### Core Capabilities
|
||||
|
||||
- Photorealistic and stylized image generation from text prompts
|
||||
- Artifact-free output with improved detail and lighting vs. Imagen 2
|
||||
- In-image text rendering — up to 25 characters reliably (best-in-class)
|
||||
- Multiple artistic styles: photorealism, digital art, impressionism, anime, watercolor, cinematic
|
||||
- Negative prompt support
|
||||
- Seed-based reproducible generation (useful for consistent agent avatar identity)
|
||||
- SynthID invisible digital watermarking on all outputs
|
||||
- Inpainting, outpainting, and image editing (via `capability-001` model)
|
||||
|
||||
---
|
||||
|
||||
## 2. API Access & Pricing
|
||||
|
||||
### Access Paths
|
||||
|
||||
**Path A — Gemini Developer API (recommended for Nexus)**
|
||||
- Endpoint: `https://generativelanguage.googleapis.com/v1beta/models/{model}:predict`
|
||||
- Auth: API key via `x-goog-api-key` header
|
||||
- Key obtained at: Google AI Studio (aistudio.google.com)
|
||||
- No Google Cloud project required for basic access
|
||||
- Price: **$0.03/image** (Imagen 3), **$0.04/image** (Imagen 4 Standard)
|
||||
|
||||
**Path B — Vertex AI (enterprise)**
|
||||
- Requires a Google Cloud project with billing enabled
|
||||
- Auth: OAuth 2.0 or Application Default Credentials
|
||||
- More granular safety controls, regional selection, SLAs
|
||||
|
||||
### Pricing Summary
|
||||
|
||||
| Model | Price/Image |
|
||||
|---|---|
|
||||
| Imagen 3 (`imagen-3.0-generate-002`) | $0.03 |
|
||||
| Imagen 4 Fast | $0.02 |
|
||||
| Imagen 4 Standard | $0.04 |
|
||||
| Imagen 4 Ultra | $0.06 |
|
||||
| Image editing/inpainting (Vertex) | $0.02 |
|
||||
|
||||
### Rate Limits
|
||||
|
||||
| Tier | Images/Minute |
|
||||
|---|---|
|
||||
| Free (AI Studio web UI only) | ~2 IPM |
|
||||
| Tier 1 (billing linked) | 10 IPM |
|
||||
| Tier 2 ($250 cumulative spend) | Higher — contact Google |
|
||||
|
||||
---
|
||||
|
||||
## 3. Image Resolutions & Formats
|
||||
|
||||
| Aspect Ratio | Pixel Size | Best Use |
|
||||
|---|---|---|
|
||||
| 1:1 | 1024×1024 or 2048×2048 | Agent avatars, thumbnails |
|
||||
| 16:9 | 1408×768 | Nexus concept art, widescreen |
|
||||
| 4:3 | 1280×896 | Environment shots |
|
||||
| 3:4 | 896×1280 | Portrait concept art |
|
||||
| 9:16 | 768×1408 | Vertical banners |
|
||||
|
||||
- Default output: 1K (1024px); max: 2K (2048px)
|
||||
- Output formats: PNG (default), JPEG
|
||||
- Prompt input limit: 480 tokens
|
||||
|
||||
---
|
||||
|
||||
## 4. Prompt Engineering for the Nexus
|
||||
|
||||
### Core Formula
|
||||
```
|
||||
[Subject] + [Setting/Context] + [Style] + [Lighting] + [Technical Specs]
|
||||
```
|
||||
|
||||
### Style Keywords for Space/Cyberpunk Concept Art
|
||||
|
||||
**Rendering:**
|
||||
`cinematic`, `octane render`, `unreal engine 5`, `ray tracing`, `subsurface scattering`, `matte painting`, `digital concept art`, `hyperrealistic`
|
||||
|
||||
**Lighting:**
|
||||
`volumetric light shafts`, `neon glow`, `cyberpunk neon`, `dramatic rim lighting`, `chiaroscuro`, `bioluminescent`
|
||||
|
||||
**Quality:**
|
||||
`4K`, `8K resolution`, `ultra-detailed`, `HDR`, `photorealistic`, `professional`
|
||||
|
||||
**Sci-fi/Space:**
|
||||
`hard science fiction aesthetic`, `dark void background`, `nebula`, `holographic`, `glowing circuits`, `orbital`
|
||||
|
||||
### Example Prompts: Nexus Concept Art
|
||||
|
||||
**The Nexus Hub (main environment):**
|
||||
```
|
||||
Exterior view of a glowing orbital space station against a deep purple nebula,
|
||||
holographic data streams flowing between modules in cyan and gold,
|
||||
three.js aesthetic, hard science fiction,
|
||||
rendered in Unreal Engine 5, volumetric lighting,
|
||||
4K, ultra-detailed, cinematic 16:9
|
||||
```
|
||||
|
||||
**Portal Chamber:**
|
||||
```
|
||||
Interior of a circular chamber with six glowing portal doorways
|
||||
arranged in a hexagonal pattern, each portal displaying a different dimension,
|
||||
neon-lit cyber baroque architecture, glowing runes on obsidian floor,
|
||||
cyberpunk aesthetic, volumetric light shafts, ray tracing,
|
||||
4K matte painting, wide angle
|
||||
```
|
||||
|
||||
**Cyberpunk Nexus Exterior:**
|
||||
```
|
||||
Exterior of a towering brutalist cyber-tower floating in deep space,
|
||||
neon holographic advertisements in multiple languages,
|
||||
rain streaks catching neon light, 2087 aesthetic,
|
||||
cinematic lighting, anamorphic lens flare, film grain,
|
||||
ultra-detailed, 4K
|
||||
```
|
||||
|
||||
### Example Prompts: AI Agent Avatars
|
||||
|
||||
**Timmy (Sovereign AI Host):**
|
||||
```
|
||||
Portrait of a warm humanoid AI entity, translucent synthetic skin
|
||||
revealing golden circuit patterns beneath, kind glowing amber eyes,
|
||||
soft studio rim lighting, deep space background with subtle star field,
|
||||
digital concept art, shallow depth of field,
|
||||
professional 3D render, 1:1 square format, 8K
|
||||
```
|
||||
|
||||
**Technical Agent Avatar (e.g. Kimi, Claude):**
|
||||
```
|
||||
Portrait of a sleek android entity, obsidian chrome face
|
||||
with glowing cyan ocular sensors and circuit filaments visible at temples,
|
||||
neutral expression suggesting deep processing,
|
||||
dark gradient background, dramatic rim lighting in electric blue,
|
||||
digital concept art, highly detailed, professional 3D render, 8K
|
||||
```
|
||||
|
||||
**Pixar-Style Friendly Agent:**
|
||||
```
|
||||
Ultra-cute 3D cartoon android character,
|
||||
big expressive glowing teal eyes, smooth chrome dome with small antenna,
|
||||
soft Pixar/Disney render style, pastel color palette on dark space background,
|
||||
high detail, cinematic studio lighting, ultra-high resolution, 1:1
|
||||
```
|
||||
|
||||
### Negative Prompt Best Practices
|
||||
|
||||
Use plain nouns/adjectives, not instructions:
|
||||
```
|
||||
blurry, watermark, text overlay, low quality, overexposed,
|
||||
deformed, distorted, ugly, bad anatomy, jpeg artifacts
|
||||
```
|
||||
|
||||
Note: Do NOT write "no blur" or "don't add text" — use the noun form only.
|
||||
|
||||
---
|
||||
|
||||
## 5. Integration Architecture for the Nexus
|
||||
|
||||
**Security requirement:** Never call Imagen APIs from browser-side JavaScript. The API key would be exposed in client code.
|
||||
|
||||
### Recommended Pattern
|
||||
```
|
||||
Browser (Three.js / Nexus) → Backend Proxy → Imagen API → Base64 → Browser
|
||||
```
|
||||
|
||||
### Backend Proxy (Node.js)
|
||||
```javascript
|
||||
// server-side only — keep API key in environment variable, never in client code
|
||||
async function generateNexusImage(prompt, aspectRatio = '16:9') {
|
||||
const response = await fetch(
|
||||
'https://generativelanguage.googleapis.com/v1beta/models/imagen-3.0-generate-002:predict',
|
||||
{
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'x-goog-api-key': process.env.GEMINI_API_KEY,
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
body: JSON.stringify({
|
||||
instances: [{ prompt }],
|
||||
parameters: {
|
||||
sampleCount: 1,
|
||||
aspectRatio,
|
||||
negativePrompt: 'blurry, watermark, low quality, deformed',
|
||||
addWatermark: true,
|
||||
}
|
||||
})
|
||||
}
|
||||
);
|
||||
|
||||
const data = await response.json();
|
||||
const base64 = data.predictions[0].bytesBase64Encoded;
|
||||
return `data:image/png;base64,${base64}`;
|
||||
}
|
||||
```
|
||||
|
||||
### Applying to Three.js (Nexus app.js)
|
||||
```javascript
|
||||
// Load a generated image as a Three.js texture
|
||||
async function loadGeneratedTexture(imageDataUrl) {
|
||||
return new Promise((resolve) => {
|
||||
const loader = new THREE.TextureLoader();
|
||||
loader.load(imageDataUrl, resolve);
|
||||
});
|
||||
}
|
||||
|
||||
// Apply to a portal or background plane
|
||||
const texture = await loadGeneratedTexture(await fetchFromProxy('/api/generate-image', prompt));
|
||||
portalMesh.material.map = texture;
|
||||
portalMesh.material.needsUpdate = true;
|
||||
```
|
||||
|
||||
### Python SDK (Vertex AI)
|
||||
```python
|
||||
from vertexai.preview.vision_models import ImageGenerationModel
|
||||
import vertexai
|
||||
|
||||
vertexai.init(project="YOUR_PROJECT_ID", location="us-central1")
|
||||
model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-002")
|
||||
|
||||
images = model.generate_images(
|
||||
prompt="Nexus orbital station, cyberpunk, 4K, cinematic",
|
||||
number_of_images=1,
|
||||
aspect_ratio="16:9",
|
||||
negative_prompt="blurry, low quality",
|
||||
)
|
||||
images[0].save(location="nexus_concept.png")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Comparison to Alternatives
|
||||
|
||||
| Feature | Imagen 3/4 | DALL-E 3 / GPT-Image-1.5 | Stable Diffusion 3.5 | Midjourney |
|
||||
|---|---|---|---|---|
|
||||
| **Photorealism** | Excellent | Excellent | Very Good | Excellent |
|
||||
| **Text in Images** | Best-in-class | Strong | Weak | Weak |
|
||||
| **Cyberpunk/Concept Art** | Very Good | Good | Excellent (custom models) | Excellent |
|
||||
| **Portrait Avatars** | Very Good | Good | Excellent | Excellent |
|
||||
| **API Access** | Yes | Yes | Yes (various) | No public API |
|
||||
| **Price/image** | $0.02–$0.06 | $0.011–$0.25 | $0.002–$0.05 | N/A (subscription) |
|
||||
| **Free Tier** | UI only | ChatGPT free | Local run | Limited |
|
||||
| **Open Source** | No | No | Yes | No |
|
||||
| **Negative Prompts** | Yes | No | Yes | Partial |
|
||||
| **Seed Control** | Yes | No | Yes | Yes |
|
||||
| **Watermark** | SynthID (always) | No | No | Subtle |
|
||||
|
||||
### Assessment for the Nexus
|
||||
|
||||
- **Imagen 3/4** — Best choice for Google ecosystem integration; excellent photorealism and text rendering; slightly weaker on artistic stylization than alternatives.
|
||||
- **Stable Diffusion** — Most powerful for cyberpunk/concept art via community models (DreamShaper, SDXL); can run locally at zero API cost; requires more setup.
|
||||
- **DALL-E 3** — Strong natural language understanding; accessible; no negative prompts.
|
||||
- **Midjourney** — Premium aesthetic quality; no API access makes it unsuitable for automated generation.
|
||||
|
||||
**Recommendation:** Use Imagen 3 (`imagen-3.0-generate-002`) via Gemini API for initial implementation — lowest friction for Google ecosystem, $0.03/image, strong results with the prompt patterns above. Consider Stable Diffusion for offline/cost-sensitive generation of bulk assets.
|
||||
|
||||
---
|
||||
|
||||
## 7. Key Considerations
|
||||
|
||||
1. **SynthID watermark** is always present on all Imagen outputs (imperceptible to human eye but embedded in pixel data). Cannot be disabled on Gemini API; can be disabled on Vertex AI with `addWatermark: false`.
|
||||
|
||||
2. **Seed parameter** enables reproducible avatar generation — critical for consistent agent identity across sessions. Requires `addWatermark: false` to work (Vertex AI only).
|
||||
|
||||
3. **Prompt enhancement** (`enhancePrompt: true`) is enabled by default — Imagen's LLM rewrites your prompt for better results. Disable to use prompts verbatim.
|
||||
|
||||
4. **Person generation controls** are geo-restricted. The `allow_all` setting (adults + children) is blocked in EU, UK, Switzerland, and MENA regions.
|
||||
|
||||
5. **Nexus color palette compatibility** — use explicit color keywords in prompts to match the Nexus color scheme defined in `NEXUS.colors` (e.g., specify `#0ff cyan`, `deep purple`, `gold`).
|
||||
|
||||
6. **Imagen 3 vs. 4** — Imagen 3 (`imagen-3.0-generate-002`) is the stable proven model at $0.03/image. Imagen 4 Standard improves quality at $0.04/image. Both use identical API structure.
|
||||
|
||||
---
|
||||
|
||||
## 8. Implementation Roadmap for the Nexus
|
||||
|
||||
### Phase 1 — Concept Art Generation (Offline/Pre-generated)
|
||||
- Use Python + Vertex AI to generate Nexus concept art images
|
||||
- Optimal prompts for: hub environment, portal chamber, exterior shot
|
||||
- Store as static assets; apply as Three.js textures
|
||||
|
||||
### Phase 2 — Agent Avatar Generation
|
||||
- Define avatar prompt templates per agent (Timmy, Kimi, Claude, Perplexity)
|
||||
- Generate at 1:1 / 2048×2048 with `seed` for reproducibility
|
||||
- Apply as HUD portraits and 3D billboard sprites
|
||||
|
||||
### Phase 3 — Live Generation Proxy (Future)
|
||||
- Add `/api/generate-image` backend endpoint
|
||||
- Allow Nexus to request dynamic portal concept art on-demand
|
||||
- Cache results in Cloud Storage for cost efficiency
|
||||
|
||||
---
|
||||
|
||||
## Sources
|
||||
|
||||
- Google DeepMind — Imagen 3: deepmind.google/technologies/imagen-3/
|
||||
- Google Cloud — Imagen 3 on Vertex AI documentation
|
||||
- Google AI for Developers — Imagen API (Gemini Developer API)
|
||||
- Google Cloud Vertex AI Pricing
|
||||
- Gemini Developer API Pricing
|
||||
- A developer's guide to Imagen 3 on Vertex AI — Google Cloud Blog
|
||||
- Imagen 3: A Guide With Examples — DataCamp
|
||||
- DALL-E 3 vs Imagen comparison — ToolsCompare.ai
|
||||
- Best Text-to-Image Models 2026 — AIPortalX
|
||||
Reference in New Issue
Block a user