docs: document tool progress streaming in API server and Open WebUI (#4138)

Update docs to reflect that tool progress now streams inline during SSE responses. Previously docs said tool calls were invisible. - api-server.md: add 'Tool progress in streams' note to streaming docs - open-webui.md: update 'How It Works' steps, add Tool Progress tip
2026-03-30 19:40:39 -07:00
parent cc63b2d1cd
commit fb2af3bd1d
2 changed files with 9 additions and 3 deletions
--- a/website/docs/user-guide/features/api-server.md
+++ b/website/docs/user-guide/features/api-server.md
@@ -8,7 +8,7 @@ description: "Expose hermes-agent as an OpenAI-compatible API for any frontend"

 The API server exposes hermes-agent as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextChat, ChatBox, and hundreds more — can connect to hermes-agent and use it as a backend.

-Your agent handles requests with its full toolset (terminal, file operations, web search, memory, skills) and returns the final response. Tool calls execute invisibly server-side.
+Your agent handles requests with its full toolset (terminal, file operations, web search, memory, skills) and returns the final response. When streaming, tool progress indicators appear inline so frontends can show what the agent is doing.

 ## Quick Start

@@ -85,6 +85,8 @@ Standard OpenAI Chat Completions format. Stateless — the full conversation is

 **Streaming** (`"stream": true`): Returns Server-Sent Events (SSE) with token-by-token response chunks. When streaming is enabled in config, tokens are emitted live as the LLM generates them. When disabled, the full response is sent as a single SSE chunk.

+**Tool progress in streams**: When the agent calls tools during a streaming request, brief progress indicators are injected into the content stream as the tools start executing (e.g. `` `💻 pwd` ``, `` `🔍 Python docs` ``). These appear as inline markdown before the agent's response text, giving frontends like Open WebUI real-time visibility into tool execution.
+
 ### POST /v1/responses

 OpenAI Responses API format. Supports server-side conversation state via `previous_response_id` — the server stores full conversation history (including tool calls and results) so multi-turn context is preserved without the client managing it.
--- a/website/docs/user-guide/messaging/open-webui.md
+++ b/website/docs/user-guide/messaging/open-webui.md
@@ -147,12 +147,16 @@ When you send a message in Open WebUI:
 1. Open WebUI sends a `POST /v1/chat/completions` request with your message and conversation history
 2. Hermes Agent creates an AIAgent instance with its full toolset
 3. The agent processes your request — it may call tools (terminal, file operations, web search, etc.)
-4. Tool calls happen invisibly server-side
-5. The agent's final text response is returned to Open WebUI
+4. As tools execute, **inline progress messages stream to the UI** so you can see what the agent is doing (e.g. `` `💻 ls -la` ``, `` `🔍 Python 3.12 release` ``)
+5. The agent's final text response streams back to Open WebUI
 6. Open WebUI displays the response in its chat interface

 Your agent has access to all the same tools and capabilities as when using the CLI or Telegram — the only difference is the frontend.

+:::tip Tool Progress
+With streaming enabled (the default), you'll see brief inline indicators as tools run — the tool emoji and its key argument. These appear in the response stream before the agent's final answer, giving you visibility into what's happening behind the scenes.
+:::
+
 ## Configuration Reference

 ### Hermes Agent (API server)