Comprehensive guide covering the Hermes memory system: - Built-in memory (MEMORY.md / USER.md) with frozen snapshot pattern - Session search (FTS5 + Gemini Flash summarization) - Skills as procedural memory - External memory providers (8 plugins) - System interaction flow and data lifecycle - Best practices for what to save/skip - Privacy and data locality guarantees - Configuration reference (char limits, nudge interval, flush settings) - Troubleshooting common issues Closes #258
15 KiB
Memory Architecture Guide
How Hermes Agent remembers things across sessions — the stores, the tools, the data flow, and how to configure it all.
Overview
Hermes has a multi-layered memory system. It is not one thing — it is several independent systems that complement each other:
- Persistent Memory (MEMORY.md / USER.md) — bounded, curated notes injected into every system prompt
- Session Search — full-text search across all past conversation transcripts
- Skills — procedural memory: reusable workflows stored as SKILL.md files
- External Memory Providers — optional plugins (Honcho, Holographic, Mem0, etc.) for deeper recall
All built-in memory lives on disk under ~/.hermes/ (or $HERMES_HOME). No memory data leaves the machine unless you explicitly configure an external cloud provider.
Memory Types in Detail
1. Persistent Memory (MEMORY.md and USER.md)
The core memory system. Two files in ~/.hermes/memories/:
| File | Purpose | Default Char Limit |
|---|---|---|
MEMORY.md |
Agent's personal notes — environment facts, project conventions, tool quirks, lessons learned | 2,200 chars (~800 tokens) |
USER.md |
User profile — name, preferences, communication style, pet peeves | 1,375 chars (~500 tokens) |
How it works:
- Loaded from disk at session start and injected into the system prompt as a frozen snapshot
- The agent uses the
memorytool to add, replace, or remove entries during a session - Mid-session writes go to disk immediately (durable) but do NOT update the system prompt — this preserves the LLM's prefix cache for performance
- The snapshot refreshes on the next session start
- Entries are delimited by
§(section sign) and can be multiline
System prompt appearance:
══════════════════════════════════════════════
MEMORY (your personal notes) [67% — 1,474/2,200 chars]
══════════════════════════════════════════════
User's project is a Rust web service at ~/code/myapi using Axum + SQLx
§
This machine runs Ubuntu 22.04, has Docker and Podman installed
§
User prefers concise responses, dislikes verbose explanations
Memory tool actions:
add— append a new entry (rejected if it would exceed the char limit)replace— find an entry by substring match and replace itremove— find an entry by substring match and delete it
Substring matching means you only need a unique fragment of the entry, not the full text. If the fragment matches multiple entries, the tool returns an error asking for a more specific match.
2. Session Search
Cross-session conversation recall via SQLite FTS5 full-text search.
- All CLI and messaging sessions are stored in
~/.hermes/state.db - The
session_searchtool finds relevant past conversations by keyword - Top matching sessions are summarized by Gemini Flash (cheap, fast) before being returned to the main model
- Returns focused summaries, not raw transcripts
When to use session_search vs. memory:
| Feature | Persistent Memory | Session Search |
|---|---|---|
| Capacity | ~3,575 chars total | Unlimited (all sessions) |
| Speed | Instant (in system prompt) | Requires search + LLM summarization |
| Use case | Key facts always in context | "What did we discuss about X last week?" |
| Management | Manually curated by the agent | Automatic — all sessions stored |
| Token cost | Fixed per session (~1,300 tokens) | On-demand (searched when needed) |
Rule of thumb: Memory is for facts that should always be available. Session search is for recalling specific past conversations on demand. Don't save task progress or session outcomes to memory — use session_search to find those.
3. Skills (Procedural Memory)
Skills are reusable workflows stored as SKILL.md files in ~/.hermes/skills/ (and optionally external skill directories).
- Organized by category:
skills/github/github-pr-workflow/SKILL.md - YAML frontmatter with name, description, version, platform restrictions
- Progressive disclosure: metadata shown in skill list, full content loaded on demand via
skill_view - The agent creates skills proactively after complex tasks (5+ tool calls) using the
skill_managetool - Skills can be patched when found outdated — stale skills are a liability
Skills are not injected into the system prompt by default. The agent sees a compact index of available skills and loads them on demand. This keeps the prompt lean while giving access to deep procedural knowledge.
Skills vs. Memory:
- Memory: compact facts ("User's project uses Go 1.22 with chi router")
- Skills: detailed procedures ("How to deploy the staging server: step 1, step 2, ...")
4. External Memory Providers
Optional plugins that add deeper, structured memory alongside the built-in system. Only one external provider can be active at a time.
| Provider | Storage | Key Feature |
|---|---|---|
| Honcho | Cloud | Dialectic user modeling with semantic search |
| OpenViking | Self-hosted | Filesystem-style knowledge hierarchy |
| Mem0 | Cloud | Server-side LLM fact extraction |
| Hindsight | Cloud/Local | Knowledge graph with entity resolution |
| Holographic | Local SQLite | HRR algebraic reasoning + trust scoring |
| RetainDB | Cloud | Hybrid search with delta compression |
| ByteRover | Local/Cloud | Hierarchical knowledge tree with CLI |
| Supermemory | Cloud | Context fencing + session graph ingest |
External providers run alongside built-in memory (never replacing it). They receive hooks for:
- System prompt injection (provider context)
- Pre-turn memory prefetch
- Post-turn conversation sync
- Session-end extraction
- Built-in memory write mirroring
Setup: hermes memory setup or set memory.provider in ~/.hermes/config.yaml.
See website/docs/user-guide/features/memory-providers.md for full provider details.
How the Systems Interact
Session Start
|
+--> Load MEMORY.md + USER.md from disk --> frozen snapshot into system prompt
+--> Provider: system_prompt_block() --> injected into system prompt
+--> Skills index --> injected into system prompt (compact metadata only)
|
v
Each Turn
|
+--> Provider: prefetch(query) --> relevant recalled context
+--> Agent sees: system prompt (memory + provider context + skills index)
+--> Agent can call: memory tool, session_search tool, skill tools, provider tools
|
v
After Each Response
|
+--> Provider: sync_turn(user, assistant) --> persist conversation
|
v
Periodic (every N turns, default 10)
|
+--> Memory nudge: agent prompted to review and update memory
|
v
Session End / Compression
|
+--> Memory flush: agent saves important facts before context is discarded
+--> Provider: on_session_end(messages) --> final extraction
+--> Provider: on_pre_compress(messages) --> save insights before compression
Best Practices
What to Save
Save proactively — don't wait for the user to ask:
- User preferences: "I prefer TypeScript over JavaScript" →
usertarget - Corrections: "Don't use sudo for Docker, I'm in the docker group" →
memorytarget - Environment facts: "This server runs Debian 12 with PostgreSQL 16" →
memorytarget - Conventions: "Project uses tabs, 120-char lines, Google docstrings" →
memorytarget - Explicit requests: "Remember that my API key rotation is monthly" →
memorytarget
What NOT to Save
- Task progress or session outcomes — use session_search to recall these
- Trivially re-discoverable facts — "Python 3.12 supports f-strings" (web search this)
- Raw data dumps — large code blocks, log files, data tables
- Session-specific ephemera — temporary file paths, one-off debugging context
- Content already in SOUL.md or AGENTS.md — those are already in context
Writing Good Entries
Compact, information-dense entries work best:
# Good — packs multiple related facts
User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop and Podman. Shell: zsh. Editor: VS Code with Vim bindings.
# Good — specific, actionable convention
Project ~/code/api uses Go 1.22, sqlc for DB, chi router. Tests: make test. CI: GitHub Actions.
# Bad — too vague
User has a project.
# Bad — too verbose
On January 5th, 2026, the user asked me to look at their project which is
located at ~/code/api. I discovered it uses Go version 1.22 and...
Capacity Management
When memory is above 80% capacity (visible in the system prompt header), consolidate before adding. Merge related entries into shorter, denser versions. The tool will reject additions that would exceed the limit — use replace to consolidate first.
Priority order for what stays in memory:
- User preferences and corrections (highest — prevents repeated steering)
- Environment facts and project conventions
- Tool quirks and workarounds
- Lessons learned (lowest — can often be rediscovered)
Memory Nudge
Every N turns (default: 10), the agent receives a nudge prompting it to review and update its memory. This is a lightweight prompt injected into the conversation — not a separate API call. The agent can choose to update memory or skip if nothing has changed.
Privacy and Data Locality
Built-in memory is fully local. MEMORY.md and USER.md are plain text files in ~/.hermes/memories/. No network calls are made in the memory read/write path. The memory tool scans entries for prompt injection and exfiltration patterns before accepting them.
Session search is local. The SQLite database (~/.hermes/state.db) stays on disk. FTS5 search is a local operation. However, the summarization step uses Gemini Flash (via the auxiliary LLM client) — conversation snippets are sent to Google's API for summarization. If this is a concern, session_search can be disabled.
External providers may send data off-machine. Cloud providers (Honcho, Mem0, RetainDB, Supermemory) send data to their respective APIs. Self-hosted providers (OpenViking, Hindsight local mode, Holographic, ByteRover local mode) keep everything on your machine. Check the provider's documentation for specifics.
Security scanning. All content written to memory (via the memory tool) is scanned for:
- Prompt injection patterns ("ignore previous instructions", role hijacking, etc.)
- Credential exfiltration attempts (curl/wget with secrets, reading .env files)
- SSH backdoor patterns
- Invisible unicode characters (used for steganographic injection)
Blocked content is rejected with a descriptive error message.
Configuration
In ~/.hermes/config.yaml:
memory:
# Enable/disable the two built-in memory stores
memory_enabled: true # MEMORY.md
user_profile_enabled: true # USER.md
# Character limits (not tokens — model-independent)
memory_char_limit: 2200 # ~800 tokens at 2.75 chars/token
user_char_limit: 1375 # ~500 tokens at 2.75 chars/token
# External memory provider (empty string = built-in only)
# Options: "honcho", "openviking", "mem0", "hindsight",
# "holographic", "retaindb", "byterover", "supermemory"
provider: ""
Additional settings are read from run_agent.py defaults:
| Setting | Default | Description |
|---|---|---|
nudge_interval |
10 | Turns between memory review nudges (0 = disabled) |
flush_min_turns |
6 | Minimum user turns before memory flush on session end/compression (0 = never flush) |
These are set under the memory key in config.yaml:
memory:
nudge_interval: 10
flush_min_turns: 6
Disabling Memory
To disable memory entirely, set both to false:
memory:
memory_enabled: false
user_profile_enabled: false
The memory tool will not appear in the tool list, and no memory blocks are injected into the system prompt.
You can also disable memory per-invocation with skip_memory=True in the AIAgent constructor (used by cron jobs and flush agents).
File Locations
~/.hermes/
├── memories/
│ ├── MEMORY.md # Agent's persistent notes
│ ├── USER.md # User profile
│ ├── MEMORY.md.lock # File lock (auto-created)
│ └── USER.md.lock # File lock (auto-created)
├── state.db # SQLite session store (FTS5)
├── config.yaml # Memory config + provider selection
└── .env # API keys for external providers
All paths respect $HERMES_HOME — if you use Hermes profiles, each profile has its own isolated memory directory.
Troubleshooting
"Memory full" errors
The tool returns an error when adding would exceed the character limit. The response includes current entries so the agent can consolidate. Fix by:
- Replacing multiple related entries with one denser entry
- Removing entries that are no longer relevant
- Increasing
memory_char_limitin config (at the cost of larger system prompts)
Stale memory entries
If the agent seems to have outdated information:
- Check
~/.hermes/memories/MEMORY.mddirectly — you can edit it by hand - The frozen snapshot pattern means changes only take effect on the next session start
- If the agent wrote something wrong mid-session, it persists on disk but won't affect the current session's system prompt
Memory not appearing in system prompt
- Verify
memory_enabled: truein config.yaml - Check that
~/.hermes/memories/MEMORY.mdexists and has content - The file might be empty if all entries were removed — add entries with the
memorytool
Session search returns no results
- Session search requires sessions to be stored in
state.db— new installations have no history - FTS5 indexes are built automatically but may lag behind on very large databases
- The summarization step requires the auxiliary LLM client to be configured (API key for Gemini Flash)
Skill drift
Skills that haven't been updated can become wrong or incomplete. The agent is prompted to patch skills when it finds them outdated during use (skill_manage(action='patch')). If you notice stale skills:
- Use
/skillsto browse and review installed skills - Delete or update skills in
~/.hermes/skills/directly - The agent creates skills after complex tasks — review and prune periodically
Provider not activating
- Run
hermes memory statusto check provider state - Verify the provider plugin is installed in
~/.hermes/plugins/memory/ - Check that required API keys are set in
~/.hermes/.env - Start a new session after changing provider config — existing sessions use the old provider
Concurrent write conflicts
The memory tool uses file locking (fcntl.flock) and atomic file replacement (os.replace) to handle concurrent writes from multiple sessions. If you see corrupted memory files:
- Check for stale
.lockfiles in~/.hermes/memories/ - Restart any hung Hermes processes
- The atomic write pattern means readers always see either the old or new file — never a partial write