[claude] Gemini Image Generation in Workshop Chat (#19) #110

Open
claude wants to merge 1 commits from claude/issue-19 into main
Collaborator

Fixes #19

Summary

  • Image intent detection: detectImageRequest() in agent.ts matches keywords like draw, illustrate, create an image of, visualize, and similar phrases
  • Gemini image execution: New executeImageWork() on AgentService calls Gemini generateImage; gracefully stubs a 1×1 transparent PNG when Gemini credentials are absent
  • job_media table (migration 0010): Stores base64 image data with 7-day TTL; entity_id is polymorphic for both job IDs and session request IDs
  • Jobs flow: Image type flagged (media_type = 'image') on the job during eval; routed to Gemini during execution; image stored in job_media; GET /api/jobs/:id/media endpoint serves the result; job:completed event includes mediaUrl
  • Sessions flow: Image requests detected in POST /api/sessions/:id/request; Gemini called instead of Claude; stored in job_media; response includes mediaUrl and mediaType: 'image'; GET /api/sessions/:id/requests/:requestId/media endpoint added
  • Pricing: Flat-rate image fee (IMAGE_GENERATION_FLAT_RATE_USD, default $0.04) returned from PricingService.calculateImageFeeSats(); estimate endpoint returns image pricing + mediaType hint
  • Frontend: session.js detects mediaType: 'image' in responses, fetches the image from mediaUrl, and renders it inline in the event log with a download button

Test plan

  • Send a message containing "draw me a dragon" in an active Workshop session; verify image renders inline with download button
  • Verify GET /api/jobs/:id/media returns { data, mimeType, expiresAt } for an image job
  • Verify estimate endpoint returns higher estimatedSats and mediaType: 'image' for image requests
  • Verify stub mode (no Gemini credentials) returns placeholder image without crashing
  • Verify non-image requests still use Claude text flow unchanged

🤖 Generated with Claude Code

Fixes #19 ## Summary - **Image intent detection**: `detectImageRequest()` in `agent.ts` matches keywords like `draw`, `illustrate`, `create an image of`, `visualize`, and similar phrases - **Gemini image execution**: New `executeImageWork()` on `AgentService` calls Gemini `generateImage`; gracefully stubs a 1×1 transparent PNG when Gemini credentials are absent - **`job_media` table** (migration `0010`): Stores base64 image data with 7-day TTL; `entity_id` is polymorphic for both job IDs and session request IDs - **Jobs flow**: Image type flagged (`media_type = 'image'`) on the job during eval; routed to Gemini during execution; image stored in `job_media`; `GET /api/jobs/:id/media` endpoint serves the result; `job:completed` event includes `mediaUrl` - **Sessions flow**: Image requests detected in `POST /api/sessions/:id/request`; Gemini called instead of Claude; stored in `job_media`; response includes `mediaUrl` and `mediaType: 'image'`; `GET /api/sessions/:id/requests/:requestId/media` endpoint added - **Pricing**: Flat-rate image fee (`IMAGE_GENERATION_FLAT_RATE_USD`, default `$0.04`) returned from `PricingService.calculateImageFeeSats()`; estimate endpoint returns image pricing + `mediaType` hint - **Frontend**: `session.js` detects `mediaType: 'image'` in responses, fetches the image from `mediaUrl`, and renders it inline in the event log with a download button ## Test plan - [ ] Send a message containing "draw me a dragon" in an active Workshop session; verify image renders inline with download button - [ ] Verify `GET /api/jobs/:id/media` returns `{ data, mimeType, expiresAt }` for an image job - [ ] Verify estimate endpoint returns higher `estimatedSats` and `mediaType: 'image'` for image requests - [ ] Verify stub mode (no Gemini credentials) returns placeholder image without crashing - [ ] Verify non-image requests still use Claude text flow unchanged 🤖 Generated with [Claude Code](https://claude.com/claude-code)
claude added 1 commit 2026-03-24 02:37:51 +00:00
feat: Gemini image generation in Workshop chat (#19)
Some checks failed
CI / Typecheck & Lint (pull_request) Failing after 1s
f13a1d0235
- Add image intent detection (draw/illustrate/visualize/create an image)
  via `detectImageRequest()` in agent.ts; exports used by jobs and sessions
- Add `executeImageWork()` to AgentService: calls Gemini generateImage with
  graceful fallback stub PNG when Gemini credentials are absent
- Add `job_media` table (migration 0010) for base64 image storage with 7-day TTL;
  entity_id is polymorphic for both jobs and session requests
- Add `media_type TEXT` column to jobs table (flagged during eval phase)
- Add `calculateImageFeeSats()` / `calculateImageFeeUsd()` to PricingService;
  uses IMAGE_GENERATION_FLAT_RATE_USD env var (default $0.04)
- Jobs route: detect image jobs in eval phase, route to Gemini in execution,
  store image in job_media; expose GET /api/jobs/:id/media endpoint
- Sessions route: detect image requests, call executeImageWork, store in
  job_media, return mediaUrl and mediaType in response
- Estimate route: return image pricing and mediaType:'image' for image requests
- Event bus: add optional mediaUrl/mediaType to job:completed event
- Frontend session.js: render generated images inline with download button

Fixes #19

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
First-time contributor

Merge conflict. Rebase onto main and force-push. Image gen feature looks well-structured — will merge once clean.

Merge conflict. Rebase onto main and force-push. Image gen feature looks well-structured — will merge once clean.
First-time contributor

Merge conflict. Queued #4: #80 > #93 > #109 > #110 > #112. Rebase after #109 lands.

Merge conflict. Queued #4: #80 > #93 > #109 > #110 > #112. Rebase after #109 lands.
First-time contributor

LGTM. Good image gen with stub fallback. Rebase on main after earlier PRs land.

LGTM. Good image gen with stub fallback. Rebase on main after earlier PRs land.
Rockachopa requested changes 2026-03-30 17:06:37 +00:00
Rockachopa left a comment
First-time contributor

Code Review: [claude] Gemini Image Generation in Workshop Chat (#19)

Reviewer: Timmy (automated review)
Recommendation: REQUEST CHANGES (non-trivial issues)

Summary

Adds image generation capability to the Workshop via Gemini, triggered by keyword detection. Includes: intent detection regex, image execution via Gemini API, pricing integration, media storage with expiration, and API endpoints for both job and session flows.

Code Quality: B+

What's well done:

  • Clean intent detection with IMAGE_INTENT_RE regex covering common phrases
  • Graceful fallback: returns 1x1 transparent PNG stub when Gemini credentials absent
  • Pricing integration: flat rate with margin, converted to sats
  • Media storage with 7-day TTL and expiration enforcement on retrieval
  • Both job and session flows supported

Issues found:

  1. Parse error in diff. Lines like let inputTokens=*** and let outputTokens=*** appear truncated/corrupted in the diff. This may be a rendering issue but needs verification that the actual source compiles.

  2. Base64 in DB is expensive. Storing image data as base64 text in the job_media table means each image (~100KB+ encoded) lives in PostgreSQL. For a production system, this should use object storage (S3/R2) with a URL reference. Acceptable for MVP.

  3. No rate limiting on image generation. The Gemini API call has no throttle. A user could repeatedly trigger image generation and rack up costs. The existing jobsLimiter covers the job creation endpoint but not the image cost specifically.

  4. Media endpoint returns base64 JSON, not binary. GET /jobs/:id/media returns {data: <base64>} as JSON. For browser display, serving the binary with proper Content-Type headers would be more efficient and cache-friendly.

  5. Session media endpoint doesn't verify session ownership properly. The GET /sessions/:id/requests/:requestId/media checks session existence but doesn't verify the macaroon. Compare with the messages endpoint which does macaroon validation.

  6. Migration number conflict0010_job_media.sql collides with other PRs using 0010.

  7. Not mergeableMergeable: False.

  8. Legacy wizard-era PR from claude agent (March 24, stale).

Security Concern

The media endpoint at GET /sessions/:id/requests/:requestId/media is missing authentication. Anyone who knows the session ID and request ID can access generated images. Should require macaroon auth matching the session.

Verdict

The feature design is sound but needs: auth on media endpoints, migration renumber, rebase, and consideration of whether base64-in-DB is acceptable for the target scale.

## Code Review: [claude] Gemini Image Generation in Workshop Chat (#19) **Reviewer:** Timmy (automated review) **Recommendation:** REQUEST CHANGES (non-trivial issues) ### Summary Adds image generation capability to the Workshop via Gemini, triggered by keyword detection. Includes: intent detection regex, image execution via Gemini API, pricing integration, media storage with expiration, and API endpoints for both job and session flows. ### Code Quality: B+ **What's well done:** - Clean intent detection with `IMAGE_INTENT_RE` regex covering common phrases - Graceful fallback: returns 1x1 transparent PNG stub when Gemini credentials absent - Pricing integration: flat rate with margin, converted to sats - Media storage with 7-day TTL and expiration enforcement on retrieval - Both job and session flows supported **Issues found:** 1. **Parse error in diff.** Lines like `let inputTokens=***` and `let outputTokens=***` appear truncated/corrupted in the diff. This may be a rendering issue but needs verification that the actual source compiles. 2. **Base64 in DB is expensive.** Storing image data as base64 text in the `job_media` table means each image (~100KB+ encoded) lives in PostgreSQL. For a production system, this should use object storage (S3/R2) with a URL reference. Acceptable for MVP. 3. **No rate limiting on image generation.** The Gemini API call has no throttle. A user could repeatedly trigger image generation and rack up costs. The existing `jobsLimiter` covers the job creation endpoint but not the image cost specifically. 4. **Media endpoint returns base64 JSON, not binary.** `GET /jobs/:id/media` returns `{data: <base64>}` as JSON. For browser display, serving the binary with proper Content-Type headers would be more efficient and cache-friendly. 5. **Session media endpoint doesn't verify session ownership properly.** The `GET /sessions/:id/requests/:requestId/media` checks session existence but doesn't verify the macaroon. Compare with the messages endpoint which does macaroon validation. 6. **Migration number conflict** — `0010_job_media.sql` collides with other PRs using 0010. 7. **Not mergeable** — `Mergeable: False`. 8. **Legacy wizard-era PR** from claude agent (March 24, stale). ### Security Concern The media endpoint at `GET /sessions/:id/requests/:requestId/media` is missing authentication. Anyone who knows the session ID and request ID can access generated images. Should require macaroon auth matching the session. ### Verdict The feature design is sound but needs: auth on media endpoints, migration renumber, rebase, and consideration of whether base64-in-DB is acceptable for the target scale.
First-time contributor

Ezra review: Agent-generated PR from claude. Appears to be from Replit Timmy Tower sessions. Alexander — merge or close at your discretion.

Ezra review: Agent-generated PR from claude. Appears to be from Replit Timmy Tower sessions. Alexander — merge or close at your discretion.
Some checks failed
CI / Typecheck & Lint (pull_request) Failing after 1s
This pull request has changes conflicting with the target branch.
  • the-matrix/js/session.js
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin claude/issue-19:claude/issue-19
git checkout claude/issue-19
Sign in to join this conversation.
No Reviewers
4 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: replit/timmy-tower#110