When stub mode (or very fast work) completes before the SSE client
attaches to the stream, streamRegistry.get() returns null because
end() was already called. Previously this fell through to a generic
timeout error even though the job succeeded.
Changes:
- Add DB polling fallback (2s interval, 120s max) when stream is null
but job state is "executing" — waits for terminal state then replays
the result via token+done SSE events
- Add unit tests covering the race condition: instant completion before
client attach, normal live streaming, and the DB replay fallback path
The 90s bus-wait timeout and post-wait DB re-check were already in place.
Fixes#16
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes#16
1. stream-registry: Don't delete stream from map in end() — let the
"close" event handle cleanup after consumers drain buffered data.
This prevents the race where a late-attaching SSE client calls get()
after end() but before reading buffered tokens.
2. stream-registry: Add hasEnded() method to check if a stream's
writable side has ended (used for diagnostics).
3. jobs SSE endpoint: When job is "executing" but stream slot is gone
(ended before client attached), poll DB every 2s (max 120s) until
the job completes, then replay the full result. Previously this
case returned "Stream timed out" error.
4. Timeout was already 90s (updated in prior work); fixed the
docstring comment from 60s to 90s.
5. Added unit test covering the race: simulates instant stub completion
(write + end before consumer attaches) and verifies buffered data
is still readable by a late consumer.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>