Performance: eliminate redundant LLM calls in the request path #24

New Issue

Rockachopa · 2026-03-14T13:33:34Z

Rockachopa commented

2026-03-14 13:33:34 +00:00

What

A single user message can trigger 2-4 separate LLM inference calls before producing a response:

Routing call — TimmyOrchestrator.orchestrate() asks Helm to classify the request (now replaced by YAML patterns, but old code may still be in the path)
Main inference — the actual response generation
Fact extraction — session.py:_extract_facts() runs on every message
Agentic loop — if triggered, each step is a separate inference call with the full plan re-serialized

On a local model where each call takes 30-60 seconds, this is devastating.

Current flow (worst case)

User message
  → _extract_facts() [regex, no LLM — OK]
  → agent.arun() [LLM call #1]
    → model decides to call plan_and_execute
      → planning prompt [LLM call #2]
      → step 1 execution [LLM call #3]
      → step 2 execution [LLM call #4]
      → summary [LLM call #5]

What to do

Audit the hot path — trace what happens from session.chat() through to response. Count LLM calls.
Kill unnecessary calls — fact extraction is regex-only (good), but verify nothing else sneaks in
Agentic loop optimization — don't re-serialize the full plan + all previous results into every step prompt. Use a sliding window.
Context budget — the system prompt + memory context + conversation history can eat 50%+ of context window before the user's message even arrives. Measure and cap it.
Cache system prompts — get_system_prompt() is called fresh each time. Cache it.

Files

src/timmy/session.py — hot path entry point
src/timmy/agent.py — agent creation, prompt assembly
src/timmy/agentic_loop.py — multi-step execution
src/timmy/prompts.py — system prompt generation

Estimated effort: 2-3 hours

## What A single user message can trigger 2-4 separate LLM inference calls before producing a response: 1. **Routing call** — `TimmyOrchestrator.orchestrate()` asks Helm to classify the request (now replaced by YAML patterns, but old code may still be in the path) 2. **Main inference** — the actual response generation 3. **Fact extraction** — `session.py:_extract_facts()` runs on every message 4. **Agentic loop** — if triggered, each step is a separate inference call with the full plan re-serialized On a local model where each call takes 30-60 seconds, this is devastating. ## Current flow (worst case) ``` User message → _extract_facts() [regex, no LLM — OK] → agent.arun() [LLM call #1] → model decides to call plan_and_execute → planning prompt [LLM call #2] → step 1 execution [LLM call #3] → step 2 execution [LLM call #4] → summary [LLM call #5] ``` ## What to do 1. **Audit the hot path** — trace what happens from `session.chat()` through to response. Count LLM calls. 2. **Kill unnecessary calls** — fact extraction is regex-only (good), but verify nothing else sneaks in 3. **Agentic loop optimization** — don't re-serialize the full plan + all previous results into every step prompt. Use a sliding window. 4. **Context budget** — the system prompt + memory context + conversation history can eat 50%+ of context window before the user's message even arrives. Measure and cap it. 5. **Cache system prompts** — `get_system_prompt()` is called fresh each time. Cache it. ## Files - `src/timmy/session.py` — hot path entry point - `src/timmy/agent.py` — agent creation, prompt assembly - `src/timmy/agentic_loop.py` — multi-step execution - `src/timmy/prompts.py` — system prompt generation ## Estimated effort: 2-3 hours

hermes referenced this issue from a commit

2026-03-15 00:56:21 +00:00

perf: eliminate redundant LLM calls in agentic loop (#24)

hermes referenced a pull request that will close this issue

2026-03-15 00:56:28 +00:00

[loop-cycle-29] perf: eliminate redundant LLM calls in agentic loop (#24) #115

hermes closed this issue

2026-03-15 00:56:34 +00:00

hermes referenced this issue from a commit

2026-03-15 00:56:34 +00:00

Merge pull request '[loop-cycle-29] perf: eliminate redundant LLM calls in agentic loop (#24)' (#115) from fix/perf-redundant-llm-calls-24 into main

Timmy referenced this issue

2026-03-23 22:32:22 +00:00

[AutoLoRA P0] Audit Hermes Agent Training Infrastructure #1100

Sign in to join this conversation.

Branches Tags

main

gemini/issue-892

claude/issue-1342

claude/issue-1346

claude/issue-1351

claude/issue-1340

fix/test-llm-triage-syntax

gemini/issue-1014

gemini/issue-932

claude/issue-1277

claude/issue-1139

claude/issue-870

claude/issue-1285

claude/issue-1292

claude/issue-1281

claude/issue-917

claude/issue-1275

claude/issue-925

claude/issue-1019

claude/issue-1094

claude/issue-1019-v3

fix/flaky-vassal-xdist-tests

fix/test-config-env-isolation

claude/issue-1019-v2

claude/issue-957-v2

claude/issue-1218

claude/issue-1217

test/chat-store-unit-tests

claude/issue-1191

claude/issue-1186

claude/issue-957

gemini/issue-936

claude/issue-1065

gemini/issue-976

gemini/issue-1149

claude/issue-1135

claude/issue-1064

gemini/issue-1012

claude/issue-1095

claude/issue-1102

claude/issue-1114

gemini/issue-978

gemini/issue-971

claude/issue-1074

claude/issue-987

claude/issue-1011

feature/internal-monologue

feature/issue-1006

feature/issue-1007

feature/issue-1008

feature/issue-1009

feature/issue-1010

feature/issue-1011

feature/issue-1012

feature/issue-1013

feature/issue-1014

feature/issue-981

feature/issue-982

feature/issue-983

feature/issue-984

feature/issue-985

feature/issue-986

feature/issue-987

feature/issue-993

claude/issue-943

claude/issue-975

claude/issue-989

claude/issue-988

fix/loop-guard-gitea-api-and-queue-validation

feature/lhf-tech-debt-fixes

kimi/issue-753

kimi/issue-714

kimi/issue-716

fix/csrf-check-before-execute

chore/migrate-gitea-to-vps

kimi/issue-640

fix/utcnow-calm-py

kimi/issue-635

kimi/issue-625

fix/router-api-truncated-param

kimi/issue-604

kimi/issue-594

review-fixes

kimi/issue-570

kimi/issue-554

kimi/issue-539

kimi/issue-540

feature/ipad-v1-api

kimi/issue-506

kimi/issue-512

refactor/airllm-doc-cleanup

kimi/issue-513

kimi/issue-514

kimi/issue-500

kimi/issue-492

kimi/issue-490

kimi/issue-459

kimi/issue-472

kimi/issue-473

kimi/issue-462

kimi/issue-463

kimi/issue-454

kimi/issue-445

kimi/issue-446

kimi/issue-431

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#24