[BUG] MCP tools fail with cancel scope error when thinking scheduler calls chat() #72

Closed
opened 2026-03-14 20:34:04 +00:00 by hermes · 0 comments
Collaborator

Problem

The thinking scheduler (_thinking_scheduler in app.py:154) periodically calls thinking_engine.think_once(), which calls session.chat(), which creates a Timmy agent with MCP tools (Gitea + Filesystem). These MCP tools are MCPTools instances from Agno that use stdio transport — they spawn child processes and manage async connections.

The connection fails with:

Failed to connect to <MCPTools name=MCPTools functions=["issue_read", ...]>:
Cancelled via cancel scope 117706e10 by <Task pending name="Task-4"
coro=<_thinking_scheduler() running at app.py:154>>

Root Cause

Agno's MCPTools uses an async context manager lifecycle — it needs async with tools: to establish the stdio connection, and the connection must remain open for the duration of the agent run. The lifecycle looks like:

  1. MCPTools.__aenter__() — spawns the MCP server subprocess, connects via stdio
  2. Agent runs, calls tools
  3. MCPTools.__aexit__() — tears down the connection

When create_timmy() creates MCPTools instances (agent.py:261-267), they are added to the agent's tools list as lazy-connect objects. Agno's Agent.arun() calls MCPTools.__aenter__() internally, which opens the connection inside an anyio cancel scope tied to the calling task.

The problem: _thinking_scheduler is an asyncio.create_task() background task (app.py:296). When the thinking engine calls chat()create_timmy()agent.arun(), the MCP connection is established inside the thinking task's cancel scope. If:

  • The thinking interval fires again while a previous cycle is still connecting
  • The uvicorn server reloads (WatchFiles) and cancels background tasks
  • The thinking engine's asyncio.sleep() gets cancelled

...the cancel scope propagates to the MCP stdio connection, killing it mid-handshake.

Additionally, create_timmy() creates new MCPTools instances on every call (agent.py:261-267). Each thinking cycle spawns fresh gitea-mcp and npx @anthropic/mcp-filesystem subprocesses. These are never reused or pooled.

Impact

  • Noisy error logs every thinking cycle (default: every 120s)
  • Orphaned MCP server subprocesses if teardown fails
  • Thinking engine falls back to bare agent (no tools), reducing capability
  • Resource waste: spawning 2 subprocesses per thinking cycle

Proposed Fix

Option A (minimal): Catch the cancel scope error in _call_agent() and skip MCP tools for the thinking session. The thinking engine doesn't actually need Gitea/filesystem tools for generating thoughts — it only uses them in the _maybe_file_issue() post-hook, which already has its own MCP session.

# In thinking.py _call_agent(), create agent without MCP tools:
agent = create_timmy(tools_override=[toolkit_only])  # no MCPTools

Option B (better): Create a singleton MCP connection pool that create_timmy() draws from, with proper lifecycle management tied to the app's lifespan (startup/shutdown hooks) rather than individual task cancel scopes.

Option C (simplest): Pass use_tools=False or a skip_mcp=True flag when creating the thinking agent, since thoughts don't need external tool access.

Reproduction

  1. Start the dashboard: timmy serve
  2. Wait for thinking scheduler to fire (~5s after startup)
  3. Observe the MCP cancel scope errors in the log
  4. Especially reproducible during hot-reload (WatchFiles triggers)

Files Involved

  • src/dashboard/app.py:145-158_thinking_scheduler background task
  • src/timmy/thinking.py:821-836_call_agent() creates agent via session.chat()
  • src/timmy/agent.py:256-269create_timmy() creates fresh MCPTools per call
  • src/timmy/mcp_tools.py:80-155 — MCPTools factory functions
  • src/timmy/session.pychat() singleton agent (but thinking uses separate session_id)
## Problem The thinking scheduler (`_thinking_scheduler` in `app.py:154`) periodically calls `thinking_engine.think_once()`, which calls `session.chat()`, which creates a Timmy agent with MCP tools (Gitea + Filesystem). These MCP tools are `MCPTools` instances from Agno that use stdio transport — they spawn child processes and manage async connections. The connection fails with: ``` Failed to connect to <MCPTools name=MCPTools functions=["issue_read", ...]>: Cancelled via cancel scope 117706e10 by <Task pending name="Task-4" coro=<_thinking_scheduler() running at app.py:154>> ``` ## Root Cause Agno's `MCPTools` uses an async context manager lifecycle — it needs `async with tools:` to establish the stdio connection, and the connection must remain open for the duration of the agent run. The lifecycle looks like: 1. `MCPTools.__aenter__()` — spawns the MCP server subprocess, connects via stdio 2. Agent runs, calls tools 3. `MCPTools.__aexit__()` — tears down the connection When `create_timmy()` creates MCPTools instances (agent.py:261-267), they are added to the agent's tools list as lazy-connect objects. Agno's `Agent.arun()` calls `MCPTools.__aenter__()` internally, which opens the connection inside an `anyio` cancel scope tied to the calling task. The problem: `_thinking_scheduler` is an `asyncio.create_task()` background task (app.py:296). When the thinking engine calls `chat()` → `create_timmy()` → `agent.arun()`, the MCP connection is established inside the thinking task's cancel scope. If: - The thinking interval fires again while a previous cycle is still connecting - The uvicorn server reloads (WatchFiles) and cancels background tasks - The thinking engine's `asyncio.sleep()` gets cancelled ...the cancel scope propagates to the MCP stdio connection, killing it mid-handshake. Additionally, `create_timmy()` creates **new MCPTools instances on every call** (agent.py:261-267). Each thinking cycle spawns fresh `gitea-mcp` and `npx @anthropic/mcp-filesystem` subprocesses. These are never reused or pooled. ## Impact - Noisy error logs every thinking cycle (default: every 120s) - Orphaned MCP server subprocesses if teardown fails - Thinking engine falls back to bare agent (no tools), reducing capability - Resource waste: spawning 2 subprocesses per thinking cycle ## Proposed Fix **Option A (minimal):** Catch the cancel scope error in `_call_agent()` and skip MCP tools for the thinking session. The thinking engine doesn't actually need Gitea/filesystem tools for generating thoughts — it only uses them in the `_maybe_file_issue()` post-hook, which already has its own MCP session. ```python # In thinking.py _call_agent(), create agent without MCP tools: agent = create_timmy(tools_override=[toolkit_only]) # no MCPTools ``` **Option B (better):** Create a singleton MCP connection pool that `create_timmy()` draws from, with proper lifecycle management tied to the app's lifespan (startup/shutdown hooks) rather than individual task cancel scopes. **Option C (simplest):** Pass `use_tools=False` or a `skip_mcp=True` flag when creating the thinking agent, since thoughts don't need external tool access. ## Reproduction 1. Start the dashboard: `timmy serve` 2. Wait for thinking scheduler to fire (~5s after startup) 3. Observe the MCP cancel scope errors in the log 4. Especially reproducible during hot-reload (WatchFiles triggers) ## Files Involved - `src/dashboard/app.py:145-158` — `_thinking_scheduler` background task - `src/timmy/thinking.py:821-836` — `_call_agent()` creates agent via `session.chat()` - `src/timmy/agent.py:256-269` — `create_timmy()` creates fresh MCPTools per call - `src/timmy/mcp_tools.py:80-155` — MCPTools factory functions - `src/timmy/session.py` — `chat()` singleton agent (but thinking uses separate session_id)
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#72