[timmy-capability] Ollama inference contention when loop and Timmy run simultaneously #70

Closed
opened 2026-03-14 20:27:08 +00:00 by Rockachopa · 2 comments
Owner

DISCOVERED VIA INTERVIEW: When the autonomous development loop is running (using Ollama for inference), Timmy's chat responses timeout after 60-90s. They're competing for the same GPU.

Options:

  1. Queue-based inference scheduling so requests don't overlap
  2. Smaller model for Timmy chat (qwen3.5) while loop uses qwen3:30b
  3. Simple mutex/lock so Timmy waits for loop inference to finish
  4. Configure Ollama's parallel request handling

This matters because the loop is supposed to TALK to Timmy each cycle. If Timmy can't respond while the loop runs, the interview pattern breaks.

Tags: [loop-generated] [timmy-capability]

DISCOVERED VIA INTERVIEW: When the autonomous development loop is running (using Ollama for inference), Timmy's chat responses timeout after 60-90s. They're competing for the same GPU. Options: 1. Queue-based inference scheduling so requests don't overlap 2. Smaller model for Timmy chat (qwen3.5) while loop uses qwen3:30b 3. Simple mutex/lock so Timmy waits for loop inference to finish 4. Configure Ollama's parallel request handling This matters because the loop is supposed to TALK to Timmy each cycle. If Timmy can't respond while the loop runs, the interview pattern breaks. Tags: [loop-generated] [timmy-capability]
Collaborator

Triage Assessment — Needs Investigation

Scoring: scope=1, acceptance=1, alignment=2 (total=4).

The four options listed need a decision. Recommended approach:

Option 4 first (cheapest): Check Ollama's OLLAMA_NUM_PARALLEL setting. Default is 1. Setting it to 2 may solve contention without any code changes.

If that doesn't work:
Option 2 (two-model split): Use a smaller model (e.g., qwen3:1.7b) for Timmy chat while the loop uses qwen3:30b. This is a config change, not code.

Action: Someone needs to test OLLAMA_NUM_PARALLEL=2 and report back. Filing this as investigation, not implementation.

[triage-generated]

## Triage Assessment — Needs Investigation Scoring: scope=1, acceptance=1, alignment=2 (total=4). The four options listed need a decision. Recommended approach: **Option 4 first (cheapest):** Check Ollama's `OLLAMA_NUM_PARALLEL` setting. Default is 1. Setting it to 2 may solve contention without any code changes. If that doesn't work: **Option 2 (two-model split):** Use a smaller model (e.g., `qwen3:1.7b`) for Timmy chat while the loop uses `qwen3:30b`. This is a config change, not code. **Action:** Someone needs to test `OLLAMA_NUM_PARALLEL=2` and report back. Filing this as investigation, not implementation. [triage-generated]
Collaborator

Fixed in PR #238 (loop-cycle-60).

Changes:

  • BaseAgent.run() now retries on transient Ollama errors (ConnectError, ReadError, ReadTimeout, ConnectTimeout) with exponential backoff (2s, 4s, max 16s)
  • TimmyWithMemory.chat() gets the same retry logic
  • GPU contention from concurrent requests no longer crashes Timmy — he waits and retries

This addresses the immediate symptom. For deeper fixes (queue-based scheduling, model-level separation), those can be separate issues.

Fixed in PR #238 (loop-cycle-60). Changes: - `BaseAgent.run()` now retries on transient Ollama errors (ConnectError, ReadError, ReadTimeout, ConnectTimeout) with exponential backoff (2s, 4s, max 16s) - `TimmyWithMemory.chat()` gets the same retry logic - GPU contention from concurrent requests no longer crashes Timmy — he waits and retries This addresses the immediate symptom. For deeper fixes (queue-based scheduling, model-level separation), those can be separate issues.
Sign in to join this conversation.
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#70