Major changes across 20 documentation pages: Staleness fixes: - Fix FAQ: wrong import path (hermes.agent → run_agent) - Fix FAQ: stale Gemini 2.0 model → Gemini 3 Flash - Fix integrations/index: missing MiniMax TTS provider - Fix integrations/index: web_crawl is not a registered tool - Fix sessions: add all 19 session sources (was only 5) - Fix cron: add all 18 delivery targets (was only telegram/discord) - Fix webhooks: add all delivery targets - Fix overview: add missing MCP, memory providers, credential pools - Fix all line-number references → use function name searches instead - Update file size estimates (run_agent ~9200, gateway ~7200, cli ~8500) Expanded thin pages (< 150 lines → substantial depth): - honcho.md: 43 → 108 lines — added feature comparison, tools, config, CLI - overview.md: 49 → 55 lines — added MCP, memory providers, credential pools - toolsets-reference.md: 57 → 175 lines — added explanations, config examples, custom toolsets, wildcards, platform differences table - optional-skills-catalog.md: 74 → 153 lines — added 25+ missing skills across communication, devops, mlops (18!), productivity, research categories - integrations/index.md: 82 → 115 lines — added messaging, HA, plugins sections - cron-internals.md: 90 → 195 lines — added job JSON example, lifecycle states, tick cycle, delivery targets, script-backed jobs, CLI interface - gateway-internals.md: 111 → 250 lines — added architecture diagram, message flow, two-level guard, platform adapters, token locks, process management - agent-loop.md: 112 → 235 lines — added entry points, API mode resolution, turn lifecycle detail, message alternation rules, tool execution flow, callback table, budget tracking, compression details - architecture.md: 152 → 295 lines — added system overview diagram, data flow diagrams, design principles table, dependency chain Other depth additions: - context-references.md: added platform availability, compression interaction, common patterns sections - slash-commands.md: added quick commands config example, alias resolution - image-generation.md: added platform delivery table - tools-reference.md: added tool counts, MCP tools note - index.md: updated platform count (5 → 14+), tool count (40+ → 47)
5.3 KiB
title, description, sidebar_label, sidebar_position
| title | description | sidebar_label | sidebar_position |
|---|---|---|---|
| Image Generation | Generate high-quality images using FLUX 2 Pro with automatic upscaling via FAL.ai. | Image Generation | 6 |
Image Generation
Hermes Agent can generate images from text prompts using FAL.ai's FLUX 2 Pro model with automatic 2x upscaling via the Clarity Upscaler for enhanced quality.
Setup
Get a FAL API Key
- Sign up at fal.ai
- Generate an API key from your dashboard
Configure the Key
# Add to ~/.hermes/.env
FAL_KEY=your-fal-api-key-here
Install the Client Library
pip install fal-client
:::info
The image generation tool is automatically available when FAL_KEY is set. No additional toolset configuration is needed.
:::
How It Works
When you ask Hermes to generate an image:
- Generation — Your prompt is sent to the FLUX 2 Pro model (
fal-ai/flux-2-pro) - Upscaling — The generated image is automatically upscaled 2x using the Clarity Upscaler (
fal-ai/clarity-upscaler) - Delivery — The upscaled image URL is returned
If upscaling fails for any reason, the original image is returned as a fallback.
Usage
Simply ask Hermes to create an image:
Generate an image of a serene mountain landscape with cherry blossoms
Create a portrait of a wise old owl perched on an ancient tree branch
Make me a futuristic cityscape with flying cars and neon lights
Parameters
The image_generate_tool accepts these parameters:
| Parameter | Default | Range | Description |
|---|---|---|---|
prompt |
(required) | — | Text description of the desired image |
aspect_ratio |
"landscape" |
landscape, square, portrait |
Image aspect ratio |
num_inference_steps |
50 |
1–100 | Number of denoising steps (more = higher quality, slower) |
guidance_scale |
4.5 |
0.1–20.0 | How closely to follow the prompt |
num_images |
1 |
1–4 | Number of images to generate |
output_format |
"png" |
png, jpeg |
Image file format |
seed |
(random) | any integer | Random seed for reproducible results |
Aspect Ratios
The tool uses simplified aspect ratio names that map to FLUX 2 Pro image sizes:
| Aspect Ratio | Maps To | Best For |
|---|---|---|
landscape |
landscape_16_9 |
Wallpapers, banners, scenes |
square |
square_hd |
Profile pictures, social media posts |
portrait |
portrait_16_9 |
Character art, phone wallpapers |
:::tip
You can also use the raw FLUX 2 Pro size presets directly: square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9. Custom sizes up to 2048x2048 are also supported.
:::
Automatic Upscaling
Every generated image is automatically upscaled 2x using FAL.ai's Clarity Upscaler with these settings:
| Setting | Value |
|---|---|
| Upscale Factor | 2x |
| Creativity | 0.35 |
| Resemblance | 0.6 |
| Guidance Scale | 4 |
| Inference Steps | 18 |
| Positive Prompt | "masterpiece, best quality, highres" + your original prompt |
| Negative Prompt | "(worst quality, low quality, normal quality:2)" |
The upscaler enhances detail and resolution while preserving the original composition. If the upscaler fails (network issue, rate limit), the original resolution image is returned automatically.
Example Prompts
Here are some effective prompts to try:
A candid street photo of a woman with a pink bob and bold eyeliner
Modern architecture building with glass facade, sunset lighting
Abstract art with vibrant colors and geometric patterns
Portrait of a wise old owl perched on ancient tree branch
Futuristic cityscape with flying cars and neon lights
Debugging
Enable debug logging for image generation:
export IMAGE_TOOLS_DEBUG=true
Debug logs are saved to ./logs/image_tools_debug_<session_id>.json with details about each generation request, parameters, timing, and any errors.
Safety Settings
The image generation tool runs with safety checks disabled by default (safety_tolerance: 5, the most permissive setting). This is configured at the code level and is not user-adjustable.
Platform Delivery
Generated images are delivered differently depending on the platform:
| Platform | Delivery method |
|---|---|
| CLI | Image URL printed as markdown  — click to open in browser |
| Telegram | Image sent as a photo message with the prompt as caption |
| Discord | Image embedded in a message |
| Slack | Image URL in message (Slack unfurls it) |
| Image sent as a media message | |
| Other platforms | Image URL in plain text |
The agent uses MEDIA:<url> syntax in its response, which the platform adapter converts to the appropriate format.
Limitations
- Requires FAL API key — image generation incurs API costs on your FAL.ai account
- No image editing — this is text-to-image only, no inpainting or img2img
- URL-based delivery — images are returned as temporary FAL.ai URLs, not saved locally. URLs expire after a period (typically hours)
- Upscaling adds latency — the automatic 2x upscale step adds processing time
- Max 4 images per request —
num_imagesis capped at 4