Files

Teknium c58e16757a docs: fix 40+ discrepancies between documentation and codebase (#5818 )

Comprehensive audit of all ~100 doc pages against the actual code, fixing:

Reference docs:
- HERMES_API_TIMEOUT default 900 -> 1800 (env-vars)
- TERMINAL_DOCKER_IMAGE default python:3.11 -> nikolaik/python-nodejs (env-vars)
- compression.summary_model default shown as gemini -> actually empty string (env-vars)
- Add missing GOOGLE_API_KEY, GEMINI_API_KEY, GEMINI_BASE_URL env vars (env-vars)
- Add missing /branch (/fork) slash command (slash-commands)
- Fix hermes-cli tool count 39 -> 38 (toolsets-reference)
- Fix hermes-api-server drop list to include text_to_speech (toolsets-reference)
- Fix total tool count 47 -> 48, standalone 14 -> 15 (tools-reference)

User guide:
- web_extract.timeout default 30 -> 360 (configuration)
- Remove display.theme_mode (not implemented in code) (configuration)
- Remove display.background_process_notifications (not in defaults) (configuration)
- Browser inactivity timeout 300/5min -> 120/2min (browser)
- Screenshot path browser_screenshots -> cache/screenshots (browser)
- batch_runner default model claude-sonnet-4-20250514 -> claude-sonnet-4.6
- Add minimax to TTS provider list (voice-mode)
- Remove credential_pool_strategies from auth.json example (credential-pools)
- Fix Slack token path platforms/slack/ -> root ~/.hermes/ (slack)
- Fix Matrix store path for new installs (matrix)
- Fix WhatsApp session path for new installs (whatsapp)
- Fix HomeAssistant config from gateway.json to config.yaml (homeassistant)
- Fix WeCom gateway start command (wecom)

Developer guide:
- Fix tool/toolset counts in architecture overview
- Update line counts: main.py ~5500, setup.py ~3100, run.py ~7500, mcp_tool ~2200
- Replace nonexistent agent/memory_store.py with memory_manager.py + memory_provider.py
- Update _discover_tools() list: remove honcho_tools, add skill_manager_tool
- Add session_search and delegate_task to intercepted tools list (agent-loop)
- Fix budget warning: two-tier system (70% caution, 90% warning) (agent-loop)
- Fix gateway auth order (per-platform first, global last) (gateway-internals)
- Fix email_adapter.py -> email.py, add webhook.py + api_server.py (gateway-internals)
- Add 7 missing providers to provider-runtime list

Other:
- Add Docker --cap-add entries to security doc
- Fix Python version 3.10+ -> 3.11+ (contributing)
- Fix AGENTS.md discovery claim (not hierarchical walk) (tips)
- Fix cron 'add' -> canonical 'create' (cron-internals)
- Add pre_api_request/post_api_request hooks to plugin guide
- Add Google/Gemini provider to providers page
- Clarify OPENAI_BASE_URL deprecation (providers)

2026-04-07 10:17:44 -07:00

10 KiB

Raw Blame History

sidebar_position, title, description

sidebar_position	title	description
3	Agent Loop Internals	Detailed walkthrough of AIAgent execution, API modes, tools, callbacks, and fallback behavior

Agent Loop Internals

The core orchestration engine is run_agent.py's AIAgent class — roughly 9,200 lines that handle everything from prompt assembly to tool dispatch to provider failover.

Core Responsibilities

AIAgent is responsible for:

Assembling the effective system prompt and tool schemas via prompt_builder.py
Selecting the correct provider/API mode (chat_completions, codex_responses, anthropic_messages)
Making interruptible model calls with cancellation support
Executing tool calls (sequentially or concurrently via thread pool)
Maintaining conversation history in OpenAI message format
Handling compression, retries, and fallback model switching
Tracking iteration budgets across parent and child agents
Flushing persistent memory before context is lost

Two Entry Points

# Simple interface — returns final response string
response = agent.chat("Fix the bug in main.py")

# Full interface — returns dict with messages, metadata, usage stats
result = agent.run_conversation(
    user_message="Fix the bug in main.py",
    system_message=None,           # auto-built if omitted
    conversation_history=None,      # auto-loaded from session if omitted
    task_id="task_abc123"
)

chat() is a thin wrapper around run_conversation() that extracts the final_response field from the result dict.

API Modes

Hermes supports three API execution modes, resolved from provider selection, explicit args, and base URL heuristics:

API mode	Used for	Client type
`chat_completions`	OpenAI-compatible endpoints (OpenRouter, custom, most providers)	`openai.OpenAI`
`codex_responses`	OpenAI Codex / Responses API	`openai.OpenAI` with Responses format
`anthropic_messages`	Native Anthropic Messages API	`anthropic.Anthropic` via adapter

The mode determines how messages are formatted, how tool calls are structured, how responses are parsed, and how caching/streaming works. All three converge on the same internal message format (OpenAI-style role/content/tool_calls dicts) before and after API calls.

Mode resolution order:

Explicit api_mode constructor arg (highest priority)
Provider-specific detection (e.g., anthropic provider → anthropic_messages)
Base URL heuristics (e.g., api.anthropic.com → anthropic_messages)
Default: chat_completions

Turn Lifecycle

Each iteration of the agent loop follows this sequence:

run_conversation()
  1. Generate task_id if not provided
  2. Append user message to conversation history
  3. Build or reuse cached system prompt (prompt_builder.py)
  4. Check if preflight compression is needed (>50% context)
  5. Build API messages from conversation history
     - chat_completions: OpenAI format as-is
     - codex_responses: convert to Responses API input items
     - anthropic_messages: convert via anthropic_adapter.py
  6. Inject ephemeral prompt layers (budget warnings, context pressure)
  7. Apply prompt caching markers if on Anthropic
  8. Make interruptible API call (_api_call_with_interrupt)
  9. Parse response:
     - If tool_calls: execute them, append results, loop back to step 5
     - If text response: persist session, flush memory if needed, return

Message Format

All messages use OpenAI-compatible format internally:

{"role": "system", "content": "..."}
{"role": "user", "content": "..."}
{"role": "assistant", "content": "...", "tool_calls": [...]}
{"role": "tool", "tool_call_id": "...", "content": "..."}

Reasoning content (from models that support extended thinking) is stored in assistant_msg["reasoning"] and optionally displayed via the reasoning_callback.

Message Alternation Rules

The agent loop enforces strict message role alternation:

After the system message: User → Assistant → User → Assistant → ...
During tool calling: Assistant (with tool_calls) → Tool → Tool → ... → Assistant
Never two assistant messages in a row
Never two user messages in a row
Only tool role can have consecutive entries (parallel tool results)

Providers validate these sequences and will reject malformed histories.

Interruptible API Calls

API requests are wrapped in _api_call_with_interrupt() which runs the actual HTTP call in a background thread while monitoring an interrupt event:

┌──────────────────────┐     ┌──────────────┐
│  Main thread         │     │  API thread   │
│  wait on:            │────▶│  HTTP POST    │
│  - response ready    │     │  to provider  │
│  - interrupt event   │     └──────────────┘
│  - timeout           │
└──────────────────────┘

When interrupted (user sends new message, /stop command, or signal):

The API thread is abandoned (response discarded)
The agent can process the new input or shut down cleanly
No partial response is injected into conversation history

Tool Execution

Sequential vs Concurrent

When the model returns tool calls:

Single tool call → executed directly in the main thread
Multiple tool calls → executed concurrently via ThreadPoolExecutor
- Exception: tools marked as interactive (e.g., clarify) force sequential execution
- Results are reinserted in the original tool call order regardless of completion order

Execution Flow

for each tool_call in response.tool_calls:
    1. Resolve handler from tools/registry.py
    2. Fire pre_tool_call plugin hook
    3. Check if dangerous command (tools/approval.py)
       - If dangerous: invoke approval_callback, wait for user
    4. Execute handler with args + task_id
    5. Fire post_tool_call plugin hook
    6. Append {"role": "tool", "content": result} to history

Agent-Level Tools

Some tools are intercepted by run_agent.py before reaching handle_function_call():

Tool	Why intercepted
`todo`	Reads/writes agent-local task state
`memory`	Writes to persistent memory files with character limits
`session_search`	Queries session history via the agent's session DB
`delegate_task`	Spawns subagent(s) with isolated context

These tools modify agent state directly and return synthetic tool results without going through the registry.

Callback Surfaces

AIAgent supports platform-specific callbacks that enable real-time progress in the CLI, gateway, and ACP integrations:

Callback	When fired	Used by
`tool_progress_callback`	Before/after each tool execution	CLI spinner, gateway progress messages
`thinking_callback`	When model starts/stops thinking	CLI "thinking..." indicator
`reasoning_callback`	When model returns reasoning content	CLI reasoning display, gateway reasoning blocks
`clarify_callback`	When `clarify` tool is called	CLI input prompt, gateway interactive message
`step_callback`	After each complete agent turn	Gateway step tracking, ACP progress
`stream_delta_callback`	Each streaming token (when enabled)	CLI streaming display
`tool_gen_callback`	When tool call is parsed from stream	CLI tool preview in spinner
`status_callback`	State changes (thinking, executing, etc.)	ACP status updates

Budget and Fallback Behavior

Iteration Budget

The agent tracks iterations via IterationBudget:

Default: 90 iterations (configurable via agent.max_turns)
Shared across parent and child agents — a subagent consumes from the parent's budget
Two-tier budget pressure via _get_budget_warning():
- At 70%+ usage (caution tier): appends [BUDGET: Iteration X/Y. N iterations left. Start consolidating your work.] to the last tool result
- At 90%+ usage (warning tier): appends [BUDGET WARNING: Iteration X/Y. Only N iteration(s) left. Provide your final response NOW.]
At 100%, the agent stops and returns a summary of work done

Fallback Model

When the primary model fails (429 rate limit, 5xx server error, 401/403 auth error):

Check fallback_providers list in config
Try each fallback in order
On success, continue the conversation with the new provider
On 401/403, attempt credential refresh before failing over

The fallback system also covers auxiliary tasks independently — vision, compression, web extraction, and session search each have their own fallback chain configurable via the auxiliary.* config section.

Compression and Persistence

When Compression Triggers

Preflight (before API call): If conversation exceeds 50% of model's context window
Gateway auto-compression: If conversation exceeds 85% (more aggressive, runs between turns)

What Happens During Compression

Memory is flushed to disk first (preventing data loss)
Middle conversation turns are summarized into a compact summary
The last N messages are preserved intact (compression.protect_last_n, default: 20)
Tool call/result message pairs are kept together (never split)
A new session lineage ID is generated (compression creates a "child" session)

Session Persistence

After each turn:

Messages are saved to the session store (SQLite via hermes_state.py)
Memory changes are flushed to MEMORY.md / USER.md
The session can be resumed later via /resume or hermes chat --resume

Key Source Files

File	Purpose
`run_agent.py`	AIAgent class — the complete agent loop (~9,200 lines)
`agent/prompt_builder.py`	System prompt assembly from memory, skills, context files, personality
`agent/context_compressor.py`	Conversation compression algorithm
`agent/prompt_caching.py`	Anthropic prompt caching markers and cache metrics
`agent/auxiliary_client.py`	Auxiliary LLM client for side tasks (vision, summarization)
`model_tools.py`	Tool schema collection, `handle_function_call()` dispatch

10 KiB Raw Blame History