Developer guide stubs expanded to full documentation: - trajectory-format.md: 56→233 lines (JSONL format, ShareGPT example, normalization rules, reasoning markup, replay code) - session-storage.md: 66→388 lines (SQLite schema, migration table, FTS5 search syntax, lineage queries, Python API examples) - context-compression-and-caching.md: 72→321 lines (dual compression system, config defaults, 4-phase algorithm, before/after example, prompt caching mechanics, cache-aware patterns) - tools-runtime.md: 65→246 lines (registry API, dispatch flow, availability checking, error wrapping, approval flow) - prompt-assembly.md: 89→246 lines (concrete assembled prompt example, SOUL.md injection, context file discovery table) User-facing pages expanded: - docker.md: 62→224 lines (volumes, env forwarding, docker-compose, resource limits, troubleshooting) - updating.md: 79→167 lines (update behavior, version checking, rollback instructions, Nix users) - skins.md: 80→206 lines (all color/spinner/branding keys, built-in skin descriptions, full custom skin YAML template) Hub pages improved: - integrations/index.md: 25→82 lines (web search backends table, TTS/browser providers, quick config example) - features/overview.md: added Integrations section with 6 missing links Specific fixes: - configuration.md: removed duplicate Gateway Streaming section - mcp.md: removed internal "PR work" language - plugins.md: added inline minimal plugin example (self-contained) 13 files changed, ~1700 lines added. Docusaurus build verified clean.
247 lines
8.8 KiB
Markdown
247 lines
8.8 KiB
Markdown
---
|
|
sidebar_position: 5
|
|
title: "Prompt Assembly"
|
|
description: "How Hermes builds the system prompt, preserves cache stability, and injects ephemeral layers"
|
|
---
|
|
|
|
# Prompt Assembly
|
|
|
|
Hermes deliberately separates:
|
|
|
|
- **cached system prompt state**
|
|
- **ephemeral API-call-time additions**
|
|
|
|
This is one of the most important design choices in the project because it affects:
|
|
|
|
- token usage
|
|
- prompt caching effectiveness
|
|
- session continuity
|
|
- memory correctness
|
|
|
|
Primary files:
|
|
|
|
- `run_agent.py`
|
|
- `agent/prompt_builder.py`
|
|
- `tools/memory_tool.py`
|
|
|
|
## Cached system prompt layers
|
|
|
|
The cached system prompt is assembled in roughly this order:
|
|
|
|
1. agent identity — `SOUL.md` from `HERMES_HOME` when available, otherwise falls back to `DEFAULT_AGENT_IDENTITY` in `prompt_builder.py`
|
|
2. tool-aware behavior guidance
|
|
3. Honcho static block (when active)
|
|
4. optional system message
|
|
5. frozen MEMORY snapshot
|
|
6. frozen USER profile snapshot
|
|
7. skills index
|
|
8. context files (`AGENTS.md`, `.cursorrules`, `.cursor/rules/*.mdc`) — SOUL.md is **not** included here when it was already loaded as the identity in step 1
|
|
9. timestamp / optional session ID
|
|
10. platform hint
|
|
|
|
When `skip_context_files` is set (e.g., subagent delegation), SOUL.md is not loaded and the hardcoded `DEFAULT_AGENT_IDENTITY` is used instead.
|
|
|
|
### Concrete example: assembled system prompt
|
|
|
|
Here is a simplified view of what the final system prompt looks like when all layers are present (comments show the source of each section):
|
|
|
|
```
|
|
# Layer 1: Agent Identity (from ~/.hermes/SOUL.md)
|
|
You are Hermes, an AI assistant created by Nous Research.
|
|
You are an expert software engineer and researcher.
|
|
You value correctness, clarity, and efficiency.
|
|
...
|
|
|
|
# Layer 2: Tool-aware behavior guidance
|
|
You have persistent memory across sessions. Save durable facts using
|
|
the memory tool: user preferences, environment details, tool quirks,
|
|
and stable conventions. Memory is injected into every turn, so keep
|
|
it compact and focused on facts that will still matter later.
|
|
...
|
|
When the user references something from a past conversation or you
|
|
suspect relevant cross-session context exists, use session_search
|
|
to recall it before asking them to repeat themselves.
|
|
|
|
# Tool-use enforcement (for GPT/Codex models only)
|
|
You MUST use your tools to take action — do not describe what you
|
|
would do or plan to do without actually doing it.
|
|
...
|
|
|
|
# Layer 3: Honcho static block (when active)
|
|
[Honcho personality/context data]
|
|
|
|
# Layer 4: Optional system message (from config or API)
|
|
[User-configured system message override]
|
|
|
|
# Layer 5: Frozen MEMORY snapshot
|
|
## Persistent Memory
|
|
- User prefers Python 3.12, uses pyproject.toml
|
|
- Default editor is nvim
|
|
- Working on project "atlas" in ~/code/atlas
|
|
- Timezone: US/Pacific
|
|
|
|
# Layer 6: Frozen USER profile snapshot
|
|
## User Profile
|
|
- Name: Alice
|
|
- GitHub: alice-dev
|
|
|
|
# Layer 7: Skills index
|
|
## Skills (mandatory)
|
|
Before replying, scan the skills below. If one clearly matches
|
|
your task, load it with skill_view(name) and follow its instructions.
|
|
...
|
|
<available_skills>
|
|
software-development:
|
|
- code-review: Structured code review workflow
|
|
- test-driven-development: TDD methodology
|
|
research:
|
|
- arxiv: Search and summarize arXiv papers
|
|
</available_skills>
|
|
|
|
# Layer 8: Context files (from project directory)
|
|
# Project Context
|
|
The following project context files have been loaded and should be followed:
|
|
|
|
## AGENTS.md
|
|
This is the atlas project. Use pytest for testing. The main
|
|
entry point is src/atlas/main.py. Always run `make lint` before
|
|
committing.
|
|
|
|
# Layer 9: Timestamp + session
|
|
Current time: 2026-03-30T14:30:00-07:00
|
|
Session: abc123
|
|
|
|
# Layer 10: Platform hint
|
|
You are a CLI AI Agent. Try not to use markdown but simple text
|
|
renderable inside a terminal.
|
|
```
|
|
|
|
## How SOUL.md appears in the prompt
|
|
|
|
`SOUL.md` lives at `~/.hermes/SOUL.md` and serves as the agent's identity — the very first section of the system prompt. The loading logic in `prompt_builder.py` works as follows:
|
|
|
|
```python
|
|
# From agent/prompt_builder.py (simplified)
|
|
def load_soul_md() -> Optional[str]:
|
|
soul_path = get_hermes_home() / "SOUL.md"
|
|
if not soul_path.exists():
|
|
return None
|
|
content = soul_path.read_text(encoding="utf-8").strip()
|
|
content = _scan_context_content(content, "SOUL.md") # Security scan
|
|
content = _truncate_content(content, "SOUL.md") # Cap at 20k chars
|
|
return content
|
|
```
|
|
|
|
When `load_soul_md()` returns content, it replaces the hardcoded `DEFAULT_AGENT_IDENTITY`. The `build_context_files_prompt()` function is then called with `skip_soul=True` to prevent SOUL.md from appearing twice (once as identity, once as a context file).
|
|
|
|
If `SOUL.md` doesn't exist, the system falls back to:
|
|
|
|
```
|
|
You are Hermes Agent, an intelligent AI assistant created by Nous Research.
|
|
You are helpful, knowledgeable, and direct. You assist users with a wide
|
|
range of tasks including answering questions, writing and editing code,
|
|
analyzing information, creative work, and executing actions via your tools.
|
|
You communicate clearly, admit uncertainty when appropriate, and prioritize
|
|
being genuinely useful over being verbose unless otherwise directed below.
|
|
Be targeted and efficient in your exploration and investigations.
|
|
```
|
|
|
|
## How context files are injected
|
|
|
|
`build_context_files_prompt()` uses a **priority system** — only one project context type is loaded (first match wins):
|
|
|
|
```python
|
|
# From agent/prompt_builder.py (simplified)
|
|
def build_context_files_prompt(cwd=None, skip_soul=False):
|
|
cwd_path = Path(cwd).resolve()
|
|
|
|
# Priority: first match wins — only ONE project context loaded
|
|
project_context = (
|
|
_load_hermes_md(cwd_path) # 1. .hermes.md / HERMES.md (walks to git root)
|
|
or _load_agents_md(cwd_path) # 2. AGENTS.md (cwd only)
|
|
or _load_claude_md(cwd_path) # 3. CLAUDE.md (cwd only)
|
|
or _load_cursorrules(cwd_path) # 4. .cursorrules / .cursor/rules/*.mdc
|
|
)
|
|
|
|
sections = []
|
|
if project_context:
|
|
sections.append(project_context)
|
|
|
|
# SOUL.md from HERMES_HOME (independent of project context)
|
|
if not skip_soul:
|
|
soul_content = load_soul_md()
|
|
if soul_content:
|
|
sections.append(soul_content)
|
|
|
|
if not sections:
|
|
return ""
|
|
|
|
return (
|
|
"# Project Context\n\n"
|
|
"The following project context files have been loaded "
|
|
"and should be followed:\n\n"
|
|
+ "\n".join(sections)
|
|
)
|
|
```
|
|
|
|
### Context file discovery details
|
|
|
|
| Priority | Files | Search scope | Notes |
|
|
|----------|-------|-------------|-------|
|
|
| 1 | `.hermes.md`, `HERMES.md` | CWD up to git root | Hermes-native project config |
|
|
| 2 | `AGENTS.md` | CWD only | Common agent instruction file |
|
|
| 3 | `CLAUDE.md` | CWD only | Claude Code compatibility |
|
|
| 4 | `.cursorrules`, `.cursor/rules/*.mdc` | CWD only | Cursor compatibility |
|
|
|
|
All context files are:
|
|
- **Security scanned** — checked for prompt injection patterns (invisible unicode, "ignore previous instructions", credential exfiltration attempts)
|
|
- **Truncated** — capped at 20,000 characters using 70/20 head/tail ratio with a truncation marker
|
|
- **YAML frontmatter stripped** — `.hermes.md` frontmatter is removed (reserved for future config overrides)
|
|
|
|
## API-call-time-only layers
|
|
|
|
These are intentionally *not* persisted as part of the cached system prompt:
|
|
|
|
- `ephemeral_system_prompt`
|
|
- prefill messages
|
|
- gateway-derived session context overlays
|
|
- later-turn Honcho recall injected into the current-turn user message
|
|
|
|
This separation keeps the stable prefix stable for caching.
|
|
|
|
## Memory snapshots
|
|
|
|
Local memory and user profile data are injected as frozen snapshots at session start. Mid-session writes update disk state but do not mutate the already-built system prompt until a new session or forced rebuild occurs.
|
|
|
|
## Context files
|
|
|
|
`agent/prompt_builder.py` scans and sanitizes project context files using a **priority system** — only one type is loaded (first match wins):
|
|
|
|
1. `.hermes.md` / `HERMES.md` (walks to git root)
|
|
2. `AGENTS.md` (recursive directory walk)
|
|
3. `CLAUDE.md` (CWD only)
|
|
4. `.cursorrules` / `.cursor/rules/*.mdc` (CWD only)
|
|
|
|
`SOUL.md` is loaded separately via `load_soul_md()` for the identity slot. When it loads successfully, `build_context_files_prompt(skip_soul=True)` prevents it from appearing twice.
|
|
|
|
Long files are truncated before injection.
|
|
|
|
## Skills index
|
|
|
|
The skills system contributes a compact skills index to the prompt when skills tooling is available.
|
|
|
|
## Why prompt assembly is split this way
|
|
|
|
The architecture is intentionally optimized to:
|
|
|
|
- preserve provider-side prompt caching
|
|
- avoid mutating history unnecessarily
|
|
- keep memory semantics understandable
|
|
- let gateway/ACP/CLI add context without poisoning persistent prompt state
|
|
|
|
## Related docs
|
|
|
|
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
|
|
- [Session Storage](./session-storage.md)
|
|
- [Gateway Internals](./gateway-internals.md)
|