Files
hermes-agent/website/docs/developer-guide/prompt-assembly.md
Teknium ee3f3e756d docs: fix stale and incorrect documentation across 18 files
Cross-referenced all 84 docs pages against the actual codebase and
corrected every discrepancy found.

Reference docs:
- faq.md: Fix non-existent commands (/stats→/usage, /context→/usage,
  hermes models→hermes model, hermes config get→hermes config show,
  hermes gateway logs→cat gateway.log, async→sync chat() call)
- cli-commands.md: Fix --provider choices list (remove providers not
  in argparse), add undocumented -s/--skills flag
- slash-commands.md: Add missing /queue and /resume commands, fix
  /approve args_hint to show [session|always]
- tools-reference.md: Remove duplicate vision and web toolset sections
- environment-variables.md: Fix HERMES_INFERENCE_PROVIDER list (add
  copilot-acp, remove alibaba to match actual argparse choices)

Configuration & user guide:
- configuration.md: Fix approval_mode→approvals.mode (manual not ask),
  checkpoints.enabled default true not false, human_delay defaults
  (500/2000→800/2500), remove non-existent delegation.max_iterations
  and delegation.default_toolsets, fix website_blocklist nesting
  under security:, add .hermes.md and CLAUDE.md to context files
  table with priority system explanation
- security.md: Fix website_blocklist nesting under security:
- context-files.md: Add .hermes.md/HERMES.md and CLAUDE.md support,
  document priority-based first-match-wins loading behavior
- cli.md: Fix personalities config nesting (top-level, not under agent:)
- delegation.md: Fix model override docs (config-level, not per-call
  tool parameter)
- rl-training.md: Fix log directory (tinker-atropos/logs/→
  ~/.hermes/logs/rl_training/)
- tts.md: Fix Discord delivery format (voice bubble with fallback,
  not just file attachment)
- git-worktrees.md: Remove outdated v0.2.0 version reference

Developer guide:
- prompt-assembly.md: Add .hermes.md, CLAUDE.md, document priority
  system for context files
- agent-loop.md: Fix callback list (remove non-existent
  message_callback, add stream_delta_callback, tool_gen_callback,
  status_callback)

Messaging & guides:
- webhooks.md: Fix command (hermes setup gateway→hermes gateway setup)
- tips.md: Fix session idle timeout (120min→24h), config file
  (gateway.json→config.yaml)
- build-a-hermes-plugin.md: Fix plugin.yaml provides: format
  (provides_tools/provides_hooks as lists), note register_command()
  as not yet implemented
2026-03-24 07:53:07 -07:00

3.0 KiB

sidebar_position, title, description
sidebar_position title description
5 Prompt Assembly How Hermes builds the system prompt, preserves cache stability, and injects ephemeral layers

Prompt Assembly

Hermes deliberately separates:

  • cached system prompt state
  • ephemeral API-call-time additions

This is one of the most important design choices in the project because it affects:

  • token usage
  • prompt caching effectiveness
  • session continuity
  • memory correctness

Primary files:

  • run_agent.py
  • agent/prompt_builder.py
  • tools/memory_tool.py

Cached system prompt layers

The cached system prompt is assembled in roughly this order:

  1. agent identity — SOUL.md from HERMES_HOME when available, otherwise falls back to DEFAULT_AGENT_IDENTITY in prompt_builder.py
  2. tool-aware behavior guidance
  3. Honcho static block (when active)
  4. optional system message
  5. frozen MEMORY snapshot
  6. frozen USER profile snapshot
  7. skills index
  8. context files (AGENTS.md, .cursorrules, .cursor/rules/*.mdc) — SOUL.md is not included here when it was already loaded as the identity in step 1
  9. timestamp / optional session ID
  10. platform hint

When skip_context_files is set (e.g., subagent delegation), SOUL.md is not loaded and the hardcoded DEFAULT_AGENT_IDENTITY is used instead.

API-call-time-only layers

These are intentionally not persisted as part of the cached system prompt:

  • ephemeral_system_prompt
  • prefill messages
  • gateway-derived session context overlays
  • later-turn Honcho recall injected into the current-turn user message

This separation keeps the stable prefix stable for caching.

Memory snapshots

Local memory and user profile data are injected as frozen snapshots at session start. Mid-session writes update disk state but do not mutate the already-built system prompt until a new session or forced rebuild occurs.

Context files

agent/prompt_builder.py scans and sanitizes project context files using a priority system — only one type is loaded (first match wins):

  1. .hermes.md / HERMES.md (walks to git root)
  2. AGENTS.md (recursive directory walk)
  3. CLAUDE.md (CWD only)
  4. .cursorrules / .cursor/rules/*.mdc (CWD only)

SOUL.md is loaded separately via load_soul_md() for the identity slot. When it loads successfully, build_context_files_prompt(skip_soul=True) prevents it from appearing twice.

Long files are truncated before injection.

Skills index

The skills system contributes a compact skills index to the prompt when skills tooling is available.

Why prompt assembly is split this way

The architecture is intentionally optimized to:

  • preserve provider-side prompt caching
  • avoid mutating history unnecessarily
  • keep memory semantics understandable
  • let gateway/ACP/CLI add context without poisoning persistent prompt state