As the agent navigates into subdirectories via tool calls (read_file, terminal, search_files, etc.), automatically discover and load project context files (AGENTS.md, CLAUDE.md, .cursorrules) from those directories. Previously, context files were only loaded from the CWD at session start. If the agent moved into backend/, frontend/, or any subdirectory with its own AGENTS.md, those instructions were never seen. Now, SubdirectoryHintTracker watches tool call arguments for file paths and shell commands, resolves directories, and loads hint files on first access. Discovered hints are appended to the tool result so the model gets relevant context at the moment it starts working in a new area — without modifying the system prompt (preserving prompt caching). Features: - Extracts paths from tool args (path, workdir) and shell commands - Loads AGENTS.md, CLAUDE.md, .cursorrules (first match per directory) - Deduplicates — each directory loaded at most once per session - Ignores paths outside the working directory - Truncates large hint files at 8K chars - Works on both sequential and concurrent tool execution paths Inspired by Block/goose SubdirectoryHintTracker.
247 lines
8.9 KiB
Markdown
247 lines
8.9 KiB
Markdown
---
|
|
sidebar_position: 5
|
|
title: "Prompt Assembly"
|
|
description: "How Hermes builds the system prompt, preserves cache stability, and injects ephemeral layers"
|
|
---
|
|
|
|
# Prompt Assembly
|
|
|
|
Hermes deliberately separates:
|
|
|
|
- **cached system prompt state**
|
|
- **ephemeral API-call-time additions**
|
|
|
|
This is one of the most important design choices in the project because it affects:
|
|
|
|
- token usage
|
|
- prompt caching effectiveness
|
|
- session continuity
|
|
- memory correctness
|
|
|
|
Primary files:
|
|
|
|
- `run_agent.py`
|
|
- `agent/prompt_builder.py`
|
|
- `tools/memory_tool.py`
|
|
|
|
## Cached system prompt layers
|
|
|
|
The cached system prompt is assembled in roughly this order:
|
|
|
|
1. agent identity — `SOUL.md` from `HERMES_HOME` when available, otherwise falls back to `DEFAULT_AGENT_IDENTITY` in `prompt_builder.py`
|
|
2. tool-aware behavior guidance
|
|
3. Honcho static block (when active)
|
|
4. optional system message
|
|
5. frozen MEMORY snapshot
|
|
6. frozen USER profile snapshot
|
|
7. skills index
|
|
8. context files (`AGENTS.md`, `.cursorrules`, `.cursor/rules/*.mdc`) — SOUL.md is **not** included here when it was already loaded as the identity in step 1
|
|
9. timestamp / optional session ID
|
|
10. platform hint
|
|
|
|
When `skip_context_files` is set (e.g., subagent delegation), SOUL.md is not loaded and the hardcoded `DEFAULT_AGENT_IDENTITY` is used instead.
|
|
|
|
### Concrete example: assembled system prompt
|
|
|
|
Here is a simplified view of what the final system prompt looks like when all layers are present (comments show the source of each section):
|
|
|
|
```
|
|
# Layer 1: Agent Identity (from ~/.hermes/SOUL.md)
|
|
You are Hermes, an AI assistant created by Nous Research.
|
|
You are an expert software engineer and researcher.
|
|
You value correctness, clarity, and efficiency.
|
|
...
|
|
|
|
# Layer 2: Tool-aware behavior guidance
|
|
You have persistent memory across sessions. Save durable facts using
|
|
the memory tool: user preferences, environment details, tool quirks,
|
|
and stable conventions. Memory is injected into every turn, so keep
|
|
it compact and focused on facts that will still matter later.
|
|
...
|
|
When the user references something from a past conversation or you
|
|
suspect relevant cross-session context exists, use session_search
|
|
to recall it before asking them to repeat themselves.
|
|
|
|
# Tool-use enforcement (for GPT/Codex models only)
|
|
You MUST use your tools to take action — do not describe what you
|
|
would do or plan to do without actually doing it.
|
|
...
|
|
|
|
# Layer 3: Honcho static block (when active)
|
|
[Honcho personality/context data]
|
|
|
|
# Layer 4: Optional system message (from config or API)
|
|
[User-configured system message override]
|
|
|
|
# Layer 5: Frozen MEMORY snapshot
|
|
## Persistent Memory
|
|
- User prefers Python 3.12, uses pyproject.toml
|
|
- Default editor is nvim
|
|
- Working on project "atlas" in ~/code/atlas
|
|
- Timezone: US/Pacific
|
|
|
|
# Layer 6: Frozen USER profile snapshot
|
|
## User Profile
|
|
- Name: Alice
|
|
- GitHub: alice-dev
|
|
|
|
# Layer 7: Skills index
|
|
## Skills (mandatory)
|
|
Before replying, scan the skills below. If one clearly matches
|
|
your task, load it with skill_view(name) and follow its instructions.
|
|
...
|
|
<available_skills>
|
|
software-development:
|
|
- code-review: Structured code review workflow
|
|
- test-driven-development: TDD methodology
|
|
research:
|
|
- arxiv: Search and summarize arXiv papers
|
|
</available_skills>
|
|
|
|
# Layer 8: Context files (from project directory)
|
|
# Project Context
|
|
The following project context files have been loaded and should be followed:
|
|
|
|
## AGENTS.md
|
|
This is the atlas project. Use pytest for testing. The main
|
|
entry point is src/atlas/main.py. Always run `make lint` before
|
|
committing.
|
|
|
|
# Layer 9: Timestamp + session
|
|
Current time: 2026-03-30T14:30:00-07:00
|
|
Session: abc123
|
|
|
|
# Layer 10: Platform hint
|
|
You are a CLI AI Agent. Try not to use markdown but simple text
|
|
renderable inside a terminal.
|
|
```
|
|
|
|
## How SOUL.md appears in the prompt
|
|
|
|
`SOUL.md` lives at `~/.hermes/SOUL.md` and serves as the agent's identity — the very first section of the system prompt. The loading logic in `prompt_builder.py` works as follows:
|
|
|
|
```python
|
|
# From agent/prompt_builder.py (simplified)
|
|
def load_soul_md() -> Optional[str]:
|
|
soul_path = get_hermes_home() / "SOUL.md"
|
|
if not soul_path.exists():
|
|
return None
|
|
content = soul_path.read_text(encoding="utf-8").strip()
|
|
content = _scan_context_content(content, "SOUL.md") # Security scan
|
|
content = _truncate_content(content, "SOUL.md") # Cap at 20k chars
|
|
return content
|
|
```
|
|
|
|
When `load_soul_md()` returns content, it replaces the hardcoded `DEFAULT_AGENT_IDENTITY`. The `build_context_files_prompt()` function is then called with `skip_soul=True` to prevent SOUL.md from appearing twice (once as identity, once as a context file).
|
|
|
|
If `SOUL.md` doesn't exist, the system falls back to:
|
|
|
|
```
|
|
You are Hermes Agent, an intelligent AI assistant created by Nous Research.
|
|
You are helpful, knowledgeable, and direct. You assist users with a wide
|
|
range of tasks including answering questions, writing and editing code,
|
|
analyzing information, creative work, and executing actions via your tools.
|
|
You communicate clearly, admit uncertainty when appropriate, and prioritize
|
|
being genuinely useful over being verbose unless otherwise directed below.
|
|
Be targeted and efficient in your exploration and investigations.
|
|
```
|
|
|
|
## How context files are injected
|
|
|
|
`build_context_files_prompt()` uses a **priority system** — only one project context type is loaded (first match wins):
|
|
|
|
```python
|
|
# From agent/prompt_builder.py (simplified)
|
|
def build_context_files_prompt(cwd=None, skip_soul=False):
|
|
cwd_path = Path(cwd).resolve()
|
|
|
|
# Priority: first match wins — only ONE project context loaded
|
|
project_context = (
|
|
_load_hermes_md(cwd_path) # 1. .hermes.md / HERMES.md (walks to git root)
|
|
or _load_agents_md(cwd_path) # 2. AGENTS.md (cwd only)
|
|
or _load_claude_md(cwd_path) # 3. CLAUDE.md (cwd only)
|
|
or _load_cursorrules(cwd_path) # 4. .cursorrules / .cursor/rules/*.mdc
|
|
)
|
|
|
|
sections = []
|
|
if project_context:
|
|
sections.append(project_context)
|
|
|
|
# SOUL.md from HERMES_HOME (independent of project context)
|
|
if not skip_soul:
|
|
soul_content = load_soul_md()
|
|
if soul_content:
|
|
sections.append(soul_content)
|
|
|
|
if not sections:
|
|
return ""
|
|
|
|
return (
|
|
"# Project Context\n\n"
|
|
"The following project context files have been loaded "
|
|
"and should be followed:\n\n"
|
|
+ "\n".join(sections)
|
|
)
|
|
```
|
|
|
|
### Context file discovery details
|
|
|
|
| Priority | Files | Search scope | Notes |
|
|
|----------|-------|-------------|-------|
|
|
| 1 | `.hermes.md`, `HERMES.md` | CWD up to git root | Hermes-native project config |
|
|
| 2 | `AGENTS.md` | CWD only | Common agent instruction file |
|
|
| 3 | `CLAUDE.md` | CWD only | Claude Code compatibility |
|
|
| 4 | `.cursorrules`, `.cursor/rules/*.mdc` | CWD only | Cursor compatibility |
|
|
|
|
All context files are:
|
|
- **Security scanned** — checked for prompt injection patterns (invisible unicode, "ignore previous instructions", credential exfiltration attempts)
|
|
- **Truncated** — capped at 20,000 characters using 70/20 head/tail ratio with a truncation marker
|
|
- **YAML frontmatter stripped** — `.hermes.md` frontmatter is removed (reserved for future config overrides)
|
|
|
|
## API-call-time-only layers
|
|
|
|
These are intentionally *not* persisted as part of the cached system prompt:
|
|
|
|
- `ephemeral_system_prompt`
|
|
- prefill messages
|
|
- gateway-derived session context overlays
|
|
- later-turn Honcho recall injected into the current-turn user message
|
|
|
|
This separation keeps the stable prefix stable for caching.
|
|
|
|
## Memory snapshots
|
|
|
|
Local memory and user profile data are injected as frozen snapshots at session start. Mid-session writes update disk state but do not mutate the already-built system prompt until a new session or forced rebuild occurs.
|
|
|
|
## Context files
|
|
|
|
`agent/prompt_builder.py` scans and sanitizes project context files using a **priority system** — only one type is loaded (first match wins):
|
|
|
|
1. `.hermes.md` / `HERMES.md` (walks to git root)
|
|
2. `AGENTS.md` (CWD at startup; subdirectories discovered progressively during the session via `agent/subdirectory_hints.py`)
|
|
3. `CLAUDE.md` (CWD only)
|
|
4. `.cursorrules` / `.cursor/rules/*.mdc` (CWD only)
|
|
|
|
`SOUL.md` is loaded separately via `load_soul_md()` for the identity slot. When it loads successfully, `build_context_files_prompt(skip_soul=True)` prevents it from appearing twice.
|
|
|
|
Long files are truncated before injection.
|
|
|
|
## Skills index
|
|
|
|
The skills system contributes a compact skills index to the prompt when skills tooling is available.
|
|
|
|
## Why prompt assembly is split this way
|
|
|
|
The architecture is intentionally optimized to:
|
|
|
|
- preserve provider-side prompt caching
|
|
- avoid mutating history unnecessarily
|
|
- keep memory semantics understandable
|
|
- let gateway/ACP/CLI add context without poisoning persistent prompt state
|
|
|
|
## Related docs
|
|
|
|
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
|
|
- [Session Storage](./session-storage.md)
|
|
- [Gateway Internals](./gateway-internals.md)
|