fix(agent): prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode (#2701)

In gateway mode, async tools (vision_analyze, web_extract, session_search)
deadlock because _run_async() spawns a thread with asyncio.run(), creating
a new event loop, but _get_cached_client() returns an AsyncOpenAI client
bound to a different loop. httpx.AsyncClient cannot work across event loop
boundaries, causing await client.chat.completions.create() to hang forever.

Fix: include the event loop identity in the async client cache key so each
loop gets its own AsyncOpenAI instance. Also fix session_search_tool.py
which had its own broken asyncio.run()-in-thread pattern — now uses the
centralized _run_async() bridge.
This commit is contained in:
ctlst
2026-03-25 17:31:56 -07:00
committed by GitHub
parent 0d7f739675
commit 281100e2df
3 changed files with 218 additions and 16 deletions

View File

@@ -358,12 +358,14 @@ def session_search(
return await asyncio.gather(*coros, return_exceptions=True)
try:
asyncio.get_running_loop()
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
results = pool.submit(lambda: asyncio.run(_summarize_all())).result(timeout=60)
except RuntimeError:
# No event loop running, create a new one
results = asyncio.run(_summarize_all())
# Use _run_async() which properly manages event loops across
# CLI, gateway, and worker-thread contexts. The previous
# pattern (asyncio.run() in a ThreadPoolExecutor) created a
# disposable event loop that conflicted with cached
# AsyncOpenAI/httpx clients bound to a different loop,
# causing deadlocks in gateway mode (#2681).
from model_tools import _run_async
results = _run_async(_summarize_all())
except concurrent.futures.TimeoutError:
logging.warning(
"Session summarization timed out after 60 seconds",