MDX v2+ interprets curly braces in regular markdown as JSX
expressions. The headings 'GET /v1/responses/{id}' and
'DELETE /v1/responses/{id}' caused a ReferenceError during
Docusaurus static site generation because 'id' is not a
defined JavaScript variable. Escaped with backslashes.
Co-authored-by: Test <test@test.com>
224 lines
7.1 KiB
Markdown
224 lines
7.1 KiB
Markdown
---
|
|
sidebar_position: 14
|
|
title: "API Server"
|
|
description: "Expose hermes-agent as an OpenAI-compatible API for any frontend"
|
|
---
|
|
|
|
# API Server
|
|
|
|
The API server exposes hermes-agent as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextChat, ChatBox, and hundreds more — can connect to hermes-agent and use it as a backend.
|
|
|
|
Your agent handles requests with its full toolset (terminal, file operations, web search, memory, skills) and returns the final response. Tool calls execute invisibly server-side.
|
|
|
|
## Quick Start
|
|
|
|
### 1. Enable the API server
|
|
|
|
Add to `~/.hermes/.env`:
|
|
|
|
```bash
|
|
API_SERVER_ENABLED=true
|
|
```
|
|
|
|
### 2. Start the gateway
|
|
|
|
```bash
|
|
hermes gateway
|
|
```
|
|
|
|
You'll see:
|
|
|
|
```
|
|
[API Server] API server listening on http://127.0.0.1:8642
|
|
```
|
|
|
|
### 3. Connect a frontend
|
|
|
|
Point any OpenAI-compatible client at `http://localhost:8642/v1`:
|
|
|
|
```bash
|
|
# Test with curl
|
|
curl http://localhost:8642/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model": "hermes-agent", "messages": [{"role": "user", "content": "Hello!"}]}'
|
|
```
|
|
|
|
Or connect Open WebUI, LobeChat, or any other frontend — see the [Open WebUI integration guide](/docs/user-guide/messaging/open-webui) for step-by-step instructions.
|
|
|
|
## Endpoints
|
|
|
|
### POST /v1/chat/completions
|
|
|
|
Standard OpenAI Chat Completions format. Stateless — the full conversation is included in each request via the `messages` array.
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"model": "hermes-agent",
|
|
"messages": [
|
|
{"role": "system", "content": "You are a Python expert."},
|
|
{"role": "user", "content": "Write a fibonacci function"}
|
|
],
|
|
"stream": false
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"id": "chatcmpl-abc123",
|
|
"object": "chat.completion",
|
|
"created": 1710000000,
|
|
"model": "hermes-agent",
|
|
"choices": [{
|
|
"index": 0,
|
|
"message": {"role": "assistant", "content": "Here's a fibonacci function..."},
|
|
"finish_reason": "stop"
|
|
}],
|
|
"usage": {"prompt_tokens": 50, "completion_tokens": 200, "total_tokens": 250}
|
|
}
|
|
```
|
|
|
|
**Streaming** (`"stream": true`): Returns Server-Sent Events (SSE) with token-by-token response chunks. When streaming is enabled in config, tokens are emitted live as the LLM generates them. When disabled, the full response is sent as a single SSE chunk.
|
|
|
|
### POST /v1/responses
|
|
|
|
OpenAI Responses API format. Supports server-side conversation state via `previous_response_id` — the server stores full conversation history (including tool calls and results) so multi-turn context is preserved without the client managing it.
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"model": "hermes-agent",
|
|
"input": "What files are in my project?",
|
|
"instructions": "You are a helpful coding assistant.",
|
|
"store": true
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"id": "resp_abc123",
|
|
"object": "response",
|
|
"status": "completed",
|
|
"model": "hermes-agent",
|
|
"output": [
|
|
{"type": "function_call", "name": "terminal", "arguments": "{\"command\": \"ls\"}", "call_id": "call_1"},
|
|
{"type": "function_call_output", "call_id": "call_1", "output": "README.md src/ tests/"},
|
|
{"type": "message", "role": "assistant", "content": [{"type": "output_text", "text": "Your project has..."}]}
|
|
],
|
|
"usage": {"input_tokens": 50, "output_tokens": 200, "total_tokens": 250}
|
|
}
|
|
```
|
|
|
|
#### Multi-turn with previous_response_id
|
|
|
|
Chain responses to maintain full context (including tool calls) across turns:
|
|
|
|
```json
|
|
{
|
|
"input": "Now show me the README",
|
|
"previous_response_id": "resp_abc123"
|
|
}
|
|
```
|
|
|
|
The server reconstructs the full conversation from the stored response chain — all previous tool calls and results are preserved.
|
|
|
|
#### Named conversations
|
|
|
|
Use the `conversation` parameter instead of tracking response IDs:
|
|
|
|
```json
|
|
{"input": "Hello", "conversation": "my-project"}
|
|
{"input": "What's in src/?", "conversation": "my-project"}
|
|
{"input": "Run the tests", "conversation": "my-project"}
|
|
```
|
|
|
|
The server automatically chains to the latest response in that conversation. Like the `/title` command for gateway sessions.
|
|
|
|
### GET /v1/responses/\{id\}
|
|
|
|
Retrieve a previously stored response by ID.
|
|
|
|
### DELETE /v1/responses/\{id\}
|
|
|
|
Delete a stored response.
|
|
|
|
### GET /v1/models
|
|
|
|
Lists `hermes-agent` as an available model. Required by most frontends for model discovery.
|
|
|
|
### GET /health
|
|
|
|
Health check. Returns `{"status": "ok"}`.
|
|
|
|
## System Prompt Handling
|
|
|
|
When a frontend sends a `system` message (Chat Completions) or `instructions` field (Responses API), hermes-agent **layers it on top** of its core system prompt. Your agent keeps all its tools, memory, and skills — the frontend's system prompt adds extra instructions.
|
|
|
|
This means you can customize behavior per-frontend without losing capabilities:
|
|
- Open WebUI system prompt: "You are a Python expert. Always include type hints."
|
|
- The agent still has terminal, file tools, web search, memory, etc.
|
|
|
|
## Authentication
|
|
|
|
Bearer token auth via the `Authorization` header:
|
|
|
|
```
|
|
Authorization: Bearer ***
|
|
```
|
|
|
|
Configure the key via `API_SERVER_KEY` env var. If no key is set, all requests are allowed (for local-only use).
|
|
|
|
:::warning Security
|
|
The API server gives full access to hermes-agent's toolset, **including terminal commands**. If you change the bind address to `0.0.0.0` (network-accessible), **always set `API_SERVER_KEY`** — without it, anyone on your network can execute arbitrary commands on your machine.
|
|
|
|
The default bind address (`127.0.0.1`) is safe for local-only use.
|
|
:::
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `API_SERVER_ENABLED` | `false` | Enable the API server |
|
|
| `API_SERVER_PORT` | `8642` | HTTP server port |
|
|
| `API_SERVER_HOST` | `127.0.0.1` | Bind address (localhost only by default) |
|
|
| `API_SERVER_KEY` | _(none)_ | Bearer token for auth |
|
|
|
|
### config.yaml
|
|
|
|
```yaml
|
|
# Not yet supported — use environment variables.
|
|
# config.yaml support coming in a future release.
|
|
```
|
|
|
|
## CORS
|
|
|
|
The API server includes CORS headers on all responses (`Access-Control-Allow-Origin: *`), so browser-based frontends can connect directly.
|
|
|
|
## Compatible Frontends
|
|
|
|
Any frontend that supports the OpenAI API format works. Tested/documented integrations:
|
|
|
|
| Frontend | Stars | Connection |
|
|
|----------|-------|------------|
|
|
| [Open WebUI](/docs/user-guide/messaging/open-webui) | 126k | Full guide available |
|
|
| LobeChat | 73k | Custom provider endpoint |
|
|
| LibreChat | 34k | Custom endpoint in librechat.yaml |
|
|
| AnythingLLM | 56k | Generic OpenAI provider |
|
|
| NextChat | 87k | BASE_URL env var |
|
|
| ChatBox | 39k | API Host setting |
|
|
| Jan | 26k | Remote model config |
|
|
| HF Chat-UI | 8k | OPENAI_BASE_URL |
|
|
| big-AGI | 7k | Custom endpoint |
|
|
| OpenAI Python SDK | — | `OpenAI(base_url="http://localhost:8642/v1")` |
|
|
| curl | — | Direct HTTP requests |
|
|
|
|
## Limitations
|
|
|
|
- **Response storage is in-memory** — stored responses (for `previous_response_id`) are lost on gateway restart. Max 100 stored responses (LRU eviction).
|
|
- **No file upload** — vision/document analysis via uploaded files is not yet supported through the API.
|
|
- **Model field is cosmetic** — the `model` field in requests is accepted but the actual LLM model used is configured server-side in config.yaml.
|