Files
hermes-agent/docs/memory-architecture.md
Alexander Whitestone 511eacb573
All checks were successful
Forge CI / smoke-and-build (pull_request) Successful in 47s
docs: add Memory Architecture Guide
Comprehensive guide covering the Hermes memory system:
- Built-in memory (MEMORY.md / USER.md) with frozen snapshot pattern
- Session search (FTS5 + Gemini Flash summarization)
- Skills as procedural memory
- External memory providers (8 plugins)
- System interaction flow and data lifecycle
- Best practices for what to save/skip
- Privacy and data locality guarantees
- Configuration reference (char limits, nudge interval, flush settings)
- Troubleshooting common issues

Closes #258
2026-04-09 12:45:48 -04:00

15 KiB

Memory Architecture Guide

How Hermes Agent remembers things across sessions — the stores, the tools, the data flow, and how to configure it all.

Overview

Hermes has a multi-layered memory system. It is not one thing — it is several independent systems that complement each other:

  1. Persistent Memory (MEMORY.md / USER.md) — bounded, curated notes injected into every system prompt
  2. Session Search — full-text search across all past conversation transcripts
  3. Skills — procedural memory: reusable workflows stored as SKILL.md files
  4. External Memory Providers — optional plugins (Honcho, Holographic, Mem0, etc.) for deeper recall

All built-in memory lives on disk under ~/.hermes/ (or $HERMES_HOME). No memory data leaves the machine unless you explicitly configure an external cloud provider.

Memory Types in Detail

1. Persistent Memory (MEMORY.md and USER.md)

The core memory system. Two files in ~/.hermes/memories/:

File Purpose Default Char Limit
MEMORY.md Agent's personal notes — environment facts, project conventions, tool quirks, lessons learned 2,200 chars (~800 tokens)
USER.md User profile — name, preferences, communication style, pet peeves 1,375 chars (~500 tokens)

How it works:

  • Loaded from disk at session start and injected into the system prompt as a frozen snapshot
  • The agent uses the memory tool to add, replace, or remove entries during a session
  • Mid-session writes go to disk immediately (durable) but do NOT update the system prompt — this preserves the LLM's prefix cache for performance
  • The snapshot refreshes on the next session start
  • Entries are delimited by § (section sign) and can be multiline

System prompt appearance:

══════════════════════════════════════════════
MEMORY (your personal notes) [67% — 1,474/2,200 chars]
══════════════════════════════════════════════
User's project is a Rust web service at ~/code/myapi using Axum + SQLx
§
This machine runs Ubuntu 22.04, has Docker and Podman installed
§
User prefers concise responses, dislikes verbose explanations

Memory tool actions:

  • add — append a new entry (rejected if it would exceed the char limit)
  • replace — find an entry by substring match and replace it
  • remove — find an entry by substring match and delete it

Substring matching means you only need a unique fragment of the entry, not the full text. If the fragment matches multiple entries, the tool returns an error asking for a more specific match.

Cross-session conversation recall via SQLite FTS5 full-text search.

  • All CLI and messaging sessions are stored in ~/.hermes/state.db
  • The session_search tool finds relevant past conversations by keyword
  • Top matching sessions are summarized by Gemini Flash (cheap, fast) before being returned to the main model
  • Returns focused summaries, not raw transcripts

When to use session_search vs. memory:

Feature Persistent Memory Session Search
Capacity ~3,575 chars total Unlimited (all sessions)
Speed Instant (in system prompt) Requires search + LLM summarization
Use case Key facts always in context "What did we discuss about X last week?"
Management Manually curated by the agent Automatic — all sessions stored
Token cost Fixed per session (~1,300 tokens) On-demand (searched when needed)

Rule of thumb: Memory is for facts that should always be available. Session search is for recalling specific past conversations on demand. Don't save task progress or session outcomes to memory — use session_search to find those.

3. Skills (Procedural Memory)

Skills are reusable workflows stored as SKILL.md files in ~/.hermes/skills/ (and optionally external skill directories).

  • Organized by category: skills/github/github-pr-workflow/SKILL.md
  • YAML frontmatter with name, description, version, platform restrictions
  • Progressive disclosure: metadata shown in skill list, full content loaded on demand via skill_view
  • The agent creates skills proactively after complex tasks (5+ tool calls) using the skill_manage tool
  • Skills can be patched when found outdated — stale skills are a liability

Skills are not injected into the system prompt by default. The agent sees a compact index of available skills and loads them on demand. This keeps the prompt lean while giving access to deep procedural knowledge.

Skills vs. Memory:

  • Memory: compact facts ("User's project uses Go 1.22 with chi router")
  • Skills: detailed procedures ("How to deploy the staging server: step 1, step 2, ...")

4. External Memory Providers

Optional plugins that add deeper, structured memory alongside the built-in system. Only one external provider can be active at a time.

Provider Storage Key Feature
Honcho Cloud Dialectic user modeling with semantic search
OpenViking Self-hosted Filesystem-style knowledge hierarchy
Mem0 Cloud Server-side LLM fact extraction
Hindsight Cloud/Local Knowledge graph with entity resolution
Holographic Local SQLite HRR algebraic reasoning + trust scoring
RetainDB Cloud Hybrid search with delta compression
ByteRover Local/Cloud Hierarchical knowledge tree with CLI
Supermemory Cloud Context fencing + session graph ingest

External providers run alongside built-in memory (never replacing it). They receive hooks for:

  • System prompt injection (provider context)
  • Pre-turn memory prefetch
  • Post-turn conversation sync
  • Session-end extraction
  • Built-in memory write mirroring

Setup: hermes memory setup or set memory.provider in ~/.hermes/config.yaml.

See website/docs/user-guide/features/memory-providers.md for full provider details.

How the Systems Interact

Session Start
  |
  +--> Load MEMORY.md + USER.md from disk --> frozen snapshot into system prompt
  +--> Provider: system_prompt_block() --> injected into system prompt
  +--> Skills index --> injected into system prompt (compact metadata only)
  |
  v
Each Turn
  |
  +--> Provider: prefetch(query) --> relevant recalled context
  +--> Agent sees: system prompt (memory + provider context + skills index)
  +--> Agent can call: memory tool, session_search tool, skill tools, provider tools
  |
  v
After Each Response
  |
  +--> Provider: sync_turn(user, assistant) --> persist conversation
  |
  v
Periodic (every N turns, default 10)
  |
  +--> Memory nudge: agent prompted to review and update memory
  |
  v
Session End / Compression
  |
  +--> Memory flush: agent saves important facts before context is discarded
  +--> Provider: on_session_end(messages) --> final extraction
  +--> Provider: on_pre_compress(messages) --> save insights before compression

Best Practices

What to Save

Save proactively — don't wait for the user to ask:

  • User preferences: "I prefer TypeScript over JavaScript" → user target
  • Corrections: "Don't use sudo for Docker, I'm in the docker group" → memory target
  • Environment facts: "This server runs Debian 12 with PostgreSQL 16" → memory target
  • Conventions: "Project uses tabs, 120-char lines, Google docstrings" → memory target
  • Explicit requests: "Remember that my API key rotation is monthly" → memory target

What NOT to Save

  • Task progress or session outcomes — use session_search to recall these
  • Trivially re-discoverable facts — "Python 3.12 supports f-strings" (web search this)
  • Raw data dumps — large code blocks, log files, data tables
  • Session-specific ephemera — temporary file paths, one-off debugging context
  • Content already in SOUL.md or AGENTS.md — those are already in context

Writing Good Entries

Compact, information-dense entries work best:

# Good — packs multiple related facts
User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop and Podman. Shell: zsh. Editor: VS Code with Vim bindings.

# Good — specific, actionable convention
Project ~/code/api uses Go 1.22, sqlc for DB, chi router. Tests: make test. CI: GitHub Actions.

# Bad — too vague
User has a project.

# Bad — too verbose
On January 5th, 2026, the user asked me to look at their project which is
located at ~/code/api. I discovered it uses Go version 1.22 and...

Capacity Management

When memory is above 80% capacity (visible in the system prompt header), consolidate before adding. Merge related entries into shorter, denser versions. The tool will reject additions that would exceed the limit — use replace to consolidate first.

Priority order for what stays in memory:

  1. User preferences and corrections (highest — prevents repeated steering)
  2. Environment facts and project conventions
  3. Tool quirks and workarounds
  4. Lessons learned (lowest — can often be rediscovered)

Memory Nudge

Every N turns (default: 10), the agent receives a nudge prompting it to review and update its memory. This is a lightweight prompt injected into the conversation — not a separate API call. The agent can choose to update memory or skip if nothing has changed.

Privacy and Data Locality

Built-in memory is fully local. MEMORY.md and USER.md are plain text files in ~/.hermes/memories/. No network calls are made in the memory read/write path. The memory tool scans entries for prompt injection and exfiltration patterns before accepting them.

Session search is local. The SQLite database (~/.hermes/state.db) stays on disk. FTS5 search is a local operation. However, the summarization step uses Gemini Flash (via the auxiliary LLM client) — conversation snippets are sent to Google's API for summarization. If this is a concern, session_search can be disabled.

External providers may send data off-machine. Cloud providers (Honcho, Mem0, RetainDB, Supermemory) send data to their respective APIs. Self-hosted providers (OpenViking, Hindsight local mode, Holographic, ByteRover local mode) keep everything on your machine. Check the provider's documentation for specifics.

Security scanning. All content written to memory (via the memory tool) is scanned for:

  • Prompt injection patterns ("ignore previous instructions", role hijacking, etc.)
  • Credential exfiltration attempts (curl/wget with secrets, reading .env files)
  • SSH backdoor patterns
  • Invisible unicode characters (used for steganographic injection)

Blocked content is rejected with a descriptive error message.

Configuration

In ~/.hermes/config.yaml:

memory:
  # Enable/disable the two built-in memory stores
  memory_enabled: true         # MEMORY.md
  user_profile_enabled: true   # USER.md

  # Character limits (not tokens — model-independent)
  memory_char_limit: 2200      # ~800 tokens at 2.75 chars/token
  user_char_limit: 1375        # ~500 tokens at 2.75 chars/token

  # External memory provider (empty string = built-in only)
  # Options: "honcho", "openviking", "mem0", "hindsight",
  #          "holographic", "retaindb", "byterover", "supermemory"
  provider: ""

Additional settings are read from run_agent.py defaults:

Setting Default Description
nudge_interval 10 Turns between memory review nudges (0 = disabled)
flush_min_turns 6 Minimum user turns before memory flush on session end/compression (0 = never flush)

These are set under the memory key in config.yaml:

memory:
  nudge_interval: 10
  flush_min_turns: 6

Disabling Memory

To disable memory entirely, set both to false:

memory:
  memory_enabled: false
  user_profile_enabled: false

The memory tool will not appear in the tool list, and no memory blocks are injected into the system prompt.

You can also disable memory per-invocation with skip_memory=True in the AIAgent constructor (used by cron jobs and flush agents).

File Locations

~/.hermes/
├── memories/
│   ├── MEMORY.md        # Agent's persistent notes
│   ├── USER.md          # User profile
│   ├── MEMORY.md.lock   # File lock (auto-created)
│   └── USER.md.lock     # File lock (auto-created)
├── state.db             # SQLite session store (FTS5)
├── config.yaml          # Memory config + provider selection
└── .env                 # API keys for external providers

All paths respect $HERMES_HOME — if you use Hermes profiles, each profile has its own isolated memory directory.

Troubleshooting

"Memory full" errors

The tool returns an error when adding would exceed the character limit. The response includes current entries so the agent can consolidate. Fix by:

  1. Replacing multiple related entries with one denser entry
  2. Removing entries that are no longer relevant
  3. Increasing memory_char_limit in config (at the cost of larger system prompts)

Stale memory entries

If the agent seems to have outdated information:

  • Check ~/.hermes/memories/MEMORY.md directly — you can edit it by hand
  • The frozen snapshot pattern means changes only take effect on the next session start
  • If the agent wrote something wrong mid-session, it persists on disk but won't affect the current session's system prompt

Memory not appearing in system prompt

  • Verify memory_enabled: true in config.yaml
  • Check that ~/.hermes/memories/MEMORY.md exists and has content
  • The file might be empty if all entries were removed — add entries with the memory tool

Session search returns no results

  • Session search requires sessions to be stored in state.db — new installations have no history
  • FTS5 indexes are built automatically but may lag behind on very large databases
  • The summarization step requires the auxiliary LLM client to be configured (API key for Gemini Flash)

Skill drift

Skills that haven't been updated can become wrong or incomplete. The agent is prompted to patch skills when it finds them outdated during use (skill_manage(action='patch')). If you notice stale skills:

  • Use /skills to browse and review installed skills
  • Delete or update skills in ~/.hermes/skills/ directly
  • The agent creates skills after complex tasks — review and prune periodically

Provider not activating

  • Run hermes memory status to check provider state
  • Verify the provider plugin is installed in ~/.hermes/plugins/memory/
  • Check that required API keys are set in ~/.hermes/.env
  • Start a new session after changing provider config — existing sessions use the old provider

Concurrent write conflicts

The memory tool uses file locking (fcntl.flock) and atomic file replacement (os.replace) to handle concurrent writes from multiple sessions. If you see corrupted memory files:

  • Check for stale .lock files in ~/.hermes/memories/
  • Restart any hung Hermes processes
  • The atomic write pattern means readers always see either the old or new file — never a partial write