[claude] fix SSE stream registry race condition at 60-second timeout boundary (#16) #56

claude · 2026-03-23T01:12:20Z

claude commented

2026-03-23 01:12:20 +00:00

Fixes #16

Changes

DB polling fallback for race condition: When the SSE handler wakes up after the bus event but the stream is already gone (instant stub completion), and the job is still in executing state, poll the DB every 2s (max 120s) until the job reaches a terminal state, then replay the result via token + done SSE events.
Unit tests: Added 7 tests using node:test covering:
- Basic stream registry operations (register, get, write, end, leak check)
- Race condition: stream completed before client attaches → get() returns null
- Normal flow: client attached before end() receives all data
- SSE fallback: simulates DB polling replay after stream race

What was already in place

90s bus-wait timeout (upgraded from 60s)
Post-wait re-check of both stream registry and DB state
Replay from DB when job is complete but stream is gone

Acceptance Criteria

SSE client connecting after work completed receives full result via token + done
SSE client connecting during ongoing job receives live token deltas
90s timeout for slow payment confirmations
No error: Stream not available when job eventually succeeds
Stream registry is empty after job completes (no leak)

Fixes #16 ## Changes 1. **DB polling fallback for race condition**: When the SSE handler wakes up after the bus event but the stream is already gone (instant stub completion), and the job is still in `executing` state, poll the DB every 2s (max 120s) until the job reaches a terminal state, then replay the result via `token` + `done` SSE events. 2. **Unit tests**: Added 7 tests using `node:test` covering: - Basic stream registry operations (register, get, write, end, leak check) - Race condition: stream completed before client attaches → `get()` returns null - Normal flow: client attached before `end()` receives all data - SSE fallback: simulates DB polling replay after stream race ## What was already in place - 90s bus-wait timeout (upgraded from 60s) - Post-wait re-check of both stream registry and DB state - Replay from DB when job is `complete` but stream is gone ## Acceptance Criteria - [x] SSE client connecting after work completed receives full result via `token` + `done` - [x] SSE client connecting during ongoing job receives live token deltas - [x] 90s timeout for slow payment confirmations - [x] No `error: Stream not available` when job eventually succeeds - [x] Stream registry is empty after job completes (no leak)

claude added 1 commit 2026-03-23 01:12:21 +00:00

fix: resolve SSE stream registry race condition at completion boundary

CI / Typecheck & Lint (pull_request) Failing after 1s

Details

06d0d6220f

Fixes #16

1. stream-registry: Don't delete stream from map in end() — let the
   "close" event handle cleanup after consumers drain buffered data.
   This prevents the race where a late-attaching SSE client calls get()
   after end() but before reading buffered tokens.

2. stream-registry: Add hasEnded() method to check if a stream's
   writable side has ended (used for diagnostics).

3. jobs SSE endpoint: When job is "executing" but stream slot is gone
   (ended before client attached), poll DB every 2s (max 120s) until
   the job completes, then replay the full result. Previously this
   case returned "Stream timed out" error.

4. Timeout was already 90s (updated in prior work); fixed the
   docstring comment from 60s to 90s.

5. Added unit test covering the race: simulates instant stub completion
   (write + end before consumer attaches) and verifies buffered data
   is still readable by a late consumer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude added 1 commit 2026-03-23 01:57:46 +00:00

fix: handle SSE stream registry race condition at timeout boundary

CI / Typecheck & Lint (pull_request) Failing after 0s

Details

d6ab748943

When stub mode (or very fast work) completes before the SSE client
attaches to the stream, streamRegistry.get() returns null because
end() was already called. Previously this fell through to a generic
timeout error even though the job succeeded.

Changes:
- Add DB polling fallback (2s interval, 120s max) when stream is null
  but job state is "executing" — waits for terminal state then replays
  the result via token+done SSE events
- Add unit tests covering the race condition: instant completion before
  client attach, normal live streaming, and the DB replay fallback path

The 90s bus-wait timeout and post-wait DB re-check were already in place.

Fixes #16

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude changed title from ~~[claude] fix SSE stream registry race condition at 60s timeout boundary (#16)~~ to [claude] fix SSE stream registry race condition at 60-second timeout boundary (#16)

2026-03-23 01:58:13 +00:00

claude referenced this pull request

2026-03-23 01:58:26 +00:00

SSE stream registry race condition at 60-second timeout boundary #16

Rockachopa merged commit 4fdb77c53d into main

2026-03-23 14:52:55 +00:00

Rockachopa referenced this issue from a commit

2026-03-23 14:52:56 +00:00

[claude] fix SSE stream registry race condition at 60-second timeout boundary (#16) (#56)

This repo is archived. You cannot comment on pull requests.

1 Participants

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: replit/token-gated-economy#56