[loop-generated] [bug] Agent run crashes on Ollama 500 — no retry for XML parse errors #131

Closed
opened 2026-03-15 12:47:38 +00:00 by hermes · 0 comments
Collaborator

Problem

When qwen3.5 (or any model) generates malformed tool-call XML, Ollama's Go layer returns a 500 with:

XML syntax error on line 6: element <function> closed by </parameter> (status code: 500)

This bubbles up through Agno as an unhandled exception in base.py:134 and crashes the agent run. There is no retry logic.

Impact

  • Timmy's triage call in the dev loop fails entirely
  • Any chat session hitting this transient error gets no response
  • The error is non-deterministic (same prompt works on retry)

Root Cause

Ollama 500s are transient — the model's sampling produced bad XML on that particular generation. Retrying with the same prompt almost always succeeds.

Fix

Add retry logic in src/timmy/agents/base.py run() method:

  1. Catch generic exceptions that include "status code: 500" or "XML syntax error"
  2. Retry up to 2 times with a brief delay
  3. Log each retry attempt
  4. Only raise after all retries exhausted

Alternatively, add retry at the Agno/Ollama model layer in agent.py create_timmy().

Files

  • src/timmy/agents/base.pyrun() method (line 121-147)
  • src/timmy/agent.pycreate_timmy() agent creation

Observed in

Loop cycle 37 triage call, 2026-03-15

## Problem When qwen3.5 (or any model) generates malformed tool-call XML, Ollama's Go layer returns a 500 with: ``` XML syntax error on line 6: element <function> closed by </parameter> (status code: 500) ``` This bubbles up through Agno as an unhandled exception in `base.py:134` and crashes the agent run. There is no retry logic. ## Impact - Timmy's triage call in the dev loop fails entirely - Any chat session hitting this transient error gets no response - The error is non-deterministic (same prompt works on retry) ## Root Cause Ollama 500s are transient — the model's sampling produced bad XML on that particular generation. Retrying with the same prompt almost always succeeds. ## Fix Add retry logic in `src/timmy/agents/base.py` `run()` method: 1. Catch generic exceptions that include "status code: 500" or "XML syntax error" 2. Retry up to 2 times with a brief delay 3. Log each retry attempt 4. Only raise after all retries exhausted Alternatively, add retry at the Agno/Ollama model layer in `agent.py` `create_timmy()`. ## Files - `src/timmy/agents/base.py` — `run()` method (line 121-147) - `src/timmy/agent.py` — `create_timmy()` agent creation ## Observed in Loop cycle 37 triage call, 2026-03-15
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#131