Developer-facing guide covering all four memory tiers: - Built-in memory (MEMORY.md/USER.md) with frozen snapshot pattern - Session search (FTS5 + Gemini Flash summarization) - Skills as procedural memory - External memory provider plugin architecture Includes data lifecycle, security guarantees, code paths, configuration reference, and troubleshooting.
13 KiB
Memory Architecture Guide
Developer-facing guide to the Hermes Agent memory system. Covers all four memory tiers, data lifecycle, security guarantees, and extension points.
Overview
Hermes has four distinct memory systems, each serving a different purpose:
| Tier | System | Scope | Cost | Persistence |
|---|---|---|---|---|
| 1 | Built-in Memory (MEMORY.md / USER.md) | Current session, curated facts | ~1,300 tokens fixed per session | File-backed, cross-session |
| 2 | Session Search (FTS5) | All past conversations | On-demand (search + summarize) | SQLite (state.db) |
| 3 | Skills (procedural memory) | How to do specific tasks | Loaded on match only | File-backed (~/.hermes/skills/) |
| 4 | External Providers (plugins) | Deep persistent knowledge | Provider-dependent | Provider-specific |
All four tiers operate independently. Built-in memory is always active. The others are opt-in or on-demand.
Tier 1: Built-in Memory (MEMORY.md / USER.md)
File Layout
~/.hermes/memories/
├── MEMORY.md — Agent's notes (environment facts, conventions, lessons learned)
└── USER.md — User profile (preferences, communication style, identity)
Profile-aware: when running under a profile (hermes -p coder), the memories directory resolves to ~/.hermes/profiles/<name>/memories/.
Frozen Snapshot Pattern
This is the most important architectural decision in the memory system.
- Session start:
MemoryStore.load_for_prompt()reads both files from disk, parses entries delimited by§(section sign), and injects them into the system prompt as a frozen block. - During session: The
memorytool writes to disk immediately (durable), but does not update the system prompt. This preserves the LLM's prefix cache for the entire session. - Next session: The snapshot refreshes from disk.
Why frozen? System prompt changes invalidate the KV cache on every API call. With a ~30K token system prompt, that's expensive. Freezing memory at session start means the cache stays warm for the entire conversation. The tradeoff: memory writes made mid-session don't take effect until next session. Tool responses show the live state so the agent can verify writes succeeded.
Character Limits
| Store | Default Limit | Approx Tokens | Typical Entries |
|---|---|---|---|
| MEMORY.md | 2,200 chars | ~800 | 8-15 |
| USER.md | 1,375 chars | ~500 | 5-10 |
Limits are in characters (not tokens) because character counts are model-independent. Configurable in config.yaml:
memory:
memory_char_limit: 2200
user_char_limit: 1375
Entry Format
Entries are separated by \n§\n. Each entry can be multiline. Example MEMORY.md:
User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop
§
Project ~/code/api uses Go 1.22, chi router, sqlc. Tests: 'make test'
§
Staging server 10.0.1.50 uses SSH port 2222, key at ~/.ssh/staging_ed25519
Tool Interface
The memory tool (defined in tools/memory_tool.py) supports:
add— Append new entry. Rejects exact duplicates.replace— Find entry by unique substring (old_text), replace withcontent.remove— Find entry by unique substring, delete it.read— Return current entries from disk (live state, not frozen snapshot).
Substring matching: old_text must match exactly one entry. If it matches multiple, the tool returns an error asking for more specificity.
Security Scanning
Every memory entry is scanned against _MEMORY_THREAT_PATTERNS before acceptance:
- Prompt injection patterns (
ignore previous instructions,you are now...) - Credential exfiltration (
curl/wgetwith env vars,.envfile reads) - SSH backdoor attempts (
authorized_keys,.sshwrites) - Invisible Unicode characters (zero-width spaces, BOM)
Matches are rejected with an error message. Source: _scan_memory_content() in tools/memory_tool.py.
Code Path
agent/prompt_builder.py
└── assembles system prompt pieces
└── MemoryStore.load_for_prompt() → frozen snapshot injection
tools/memory_tool.py
├── MemoryStore class (file I/O, locking, parsing)
├── memory_tool() function (add/replace/remove/read dispatch)
└── _scan_memory_content() (threat scanning)
hermes_cli/memory_setup.py
└── Interactive first-run memory setup
Tier 2: Session Search (FTS5)
How It Works
- Every CLI and gateway session stores full message history in SQLite (
~/.hermes/state.db) - The
messages_ftsFTS5 virtual table enables fast full-text search - The
session_searchtool finds relevant messages, groups by session, loads top N - Each matching session is summarized by Gemini Flash (auxiliary LLM, not main model)
- Summaries are returned to the main agent as context
Why Gemini Flash for Summarization
Raw session transcripts can be 50K+ chars. Feeding them to the main model wastes context window and tokens. Gemini Flash is fast, cheap, and good enough for "extract the relevant bits" summarization. Same pattern used by web_extract.
Schema
-- Core tables
sessions (id, source, user_id, model, system_prompt, parent_session_id, ...)
messages (id, session_id, role, content, tool_name, timestamp, ...)
-- Full-text search
messages_fts -- FTS5 virtual table on messages.content
-- Schema tracking
schema_version
WAL mode for concurrent readers + one writer (gateway multi-platform support).
Session Lineage
When context compression triggers a session split, parent_session_id chains the old and new sessions. This lets session search follow the thread across compression boundaries.
Code Path
tools/session_search_tool.py
├── FTS5 query against messages_fts
├── Groups results by session_id
├── Loads top N sessions (MAX_SESSION_CHARS = 100K per session)
├── Sends to Gemini Flash via auxiliary_client.async_call_llm()
└── Returns per-session summaries
hermes_state.py (SessionDB class)
├── SQLite WAL mode database
├── FTS5 triggers for message insert/update/delete
└── Session CRUD operations
Memory vs Session Search
| Memory | Session Search | |
|---|---|---|
| Capacity | ~1,300 tokens total | Unlimited (all stored sessions) |
| Latency | Instant (in system prompt) | Requires FTS query + LLM call |
| When to use | Critical facts always in context | "What did we discuss about X?" |
| Management | Agent-curated | Automatic |
| Token cost | Fixed per session | On-demand per search |
Tier 3: Skills (Procedural Memory)
What Skills Are
Skills capture how to do a specific type of task based on proven experience. Where memory is broad and declarative, skills are narrow and actionable.
A skill is a directory with a SKILL.md (markdown instructions) and optional supporting files:
~/.hermes/skills/
├── my-skill/
│ ├── SKILL.md — Instructions, steps, pitfalls
│ ├── references/ — API docs, specs
│ ├── templates/ — Code templates, config files
│ ├── scripts/ — Helper scripts
│ └── assets/ — Images, data files
How Skills Load
At the start of each turn, the agent's system prompt includes available skills. When a skill matches the current task, the agent loads it with skill_view(name) and follows its instructions. Skills are not injected wholesale — they're loaded on demand to preserve context window.
Skill Lifecycle
- Creation: After a complex task (5+ tool calls), the agent offers to save the approach as a skill using
skill_manage(action='create'). - Usage: On future matching tasks, the agent loads the skill with
skill_view(name). - Maintenance: If a skill is outdated or incomplete when used, the agent patches it immediately with
skill_manage(action='patch'). - Deletion: Obsolete skills are removed with
skill_manage(action='delete').
Skills vs Memory
| Memory | Skills | |
|---|---|---|
| Format | Free-text entries | Structured markdown (steps, pitfalls, examples) |
| Scope | Facts and preferences | Procedures and workflows |
| Loading | Always in system prompt | On-demand when matched |
| Size | ~1,300 tokens total | Variable (loaded individually) |
Code Path
tools/skill_manager_tool.py — Create, edit, patch, delete skills
agent/skill_commands.py — Slash commands for skill management
skills_hub.py — Browse, search, install skills from hub
Tier 4: External Memory Providers
Plugin Architecture
plugins/memory/
├── __init__.py — Provider registry and base interface
├── honcho/ — Dialectic Q&A, cross-session user modeling
├── openviking/ — Knowledge graph memory
├── mem0/ — Semantic memory with auto-extraction
├── hindsight/ — Retrospective memory analysis
├── holographic/ — Distributed holographic memory
├── retaindb/ — Vector-based retention
├── byterover/ — Byte-level memory compression
└── supermemory/ — Cloud-hosted semantic memory
Only one external provider can be active at a time. Built-in memory (Tier 1) always runs alongside it.
Integration Points
When a provider is active, Hermes:
- Injects provider context into the system prompt
- Prefetches relevant memories before each turn (background, non-blocking)
- Syncs conversation turns to the provider after each response
- Extracts memories on session end (for providers that support it)
- Mirrors built-in memory writes to the provider
- Adds provider-specific tools for search and management
Configuration
memory:
provider: openviking # or honcho, mem0, hindsight, etc.
Setup: hermes memory setup (interactive picker).
Data Lifecycle
Session Start
│
├── Load MEMORY.md + USER.md from disk → frozen snapshot in system prompt
├── Load skills catalog (names + descriptions)
├── Initialize session search (SQLite connection)
└── Initialize external provider (if configured)
│
▼
Each Turn
│
├── Agent sees frozen memory in system prompt
├── Agent can call memory tool → writes to disk, returns live state
├── Agent can call session_search → FTS5 + Gemini Flash summarization
├── Agent can load skills → reads SKILL.md from disk
└── External provider prefetches context (if active)
│
▼
Session End
│
├── All memory writes already on disk (immediate persistence)
├── Session transcript saved to SQLite (messages + FTS5 index)
├── External provider extracts final memories (if supported)
└── Skill updates persisted (if any were patched)
Privacy and Data Locality
| Component | Location | Network |
|---|---|---|
| MEMORY.md / USER.md | ~/.hermes/memories/ |
Local only |
| Session DB | ~/.hermes/state.db |
Local only |
| Skills | ~/.hermes/skills/ |
Local only |
| External provider | Provider-dependent | Provider API calls |
Built-in memory (Tiers 1-3) never leaves the machine. External providers (Tier 4) send data to the configured provider by design. The agent logs all provider API calls in the session transcript for auditability.
Configuration Reference
# ~/.hermes/config.yaml
memory:
memory_enabled: true # Enable MEMORY.md
user_profile_enabled: true # Enable USER.md
memory_char_limit: 2200 # MEMORY.md char limit (~800 tokens)
user_char_limit: 1375 # USER.md char limit (~500 tokens)
nudge_interval: 10 # Turns between memory nudge reminders
provider: null # External provider name (null = disabled)
Environment variables (in ~/.hermes/.env):
- Provider-specific API keys (e.g.,
HONCHO_API_KEY,MEM0_API_KEY)
Troubleshooting
Memory not appearing in system prompt
- Check
~/.hermes/memories/MEMORY.mdexists and has content - Verify
memory.memory_enabled: truein config - Check for file lock issues (WAL mode, concurrent access)
Memory writes not taking effect
- Writes are durable to disk immediately but frozen in system prompt until next session
- Tool response shows live state — verify the write succeeded there
- Start a new session to see the updated snapshot
Session search returns nothing
- Verify
state.dbhas sessions:sqlite3 ~/.hermes/state.db "SELECT count(*) FROM sessions" - Check FTS5 index:
sqlite3 ~/.hermes/state.db "SELECT count(*) FROM messages_fts" - Ensure auxiliary LLM (Gemini Flash) is configured and reachable
Skills not loading
- Check
~/.hermes/skills/directory exists - Verify SKILL.md has valid frontmatter (name, description)
- Skills load by name match — check the skill name matches what the agent expects
External provider errors
- Check API key in
~/.hermes/.env - Verify provider is installed:
pip install <provider-package> - Run
hermes memory statusfor diagnostic info