Files

Alexander Whitestone 511eacb573

Forge CI / smoke-and-build (pull_request) Successful in 47s

Details

Comprehensive guide covering the Hermes memory system:
- Built-in memory (MEMORY.md / USER.md) with frozen snapshot pattern
- Session search (FTS5 + Gemini Flash summarization)
- Skills as procedural memory
- External memory providers (8 plugins)
- System interaction flow and data lifecycle
- Best practices for what to save/skip
- Privacy and data locality guarantees
- Configuration reference (char limits, nudge interval, flush settings)
- Troubleshooting common issues

Closes #258

2026-04-09 12:45:48 -04:00

15 KiB

Raw Blame History

Memory Architecture Guide

How Hermes Agent remembers things across sessions — the stores, the tools, the data flow, and how to configure it all.

Overview

Hermes has a multi-layered memory system. It is not one thing — it is several independent systems that complement each other:

Persistent Memory (MEMORY.md / USER.md) — bounded, curated notes injected into every system prompt
Session Search — full-text search across all past conversation transcripts
Skills — procedural memory: reusable workflows stored as SKILL.md files
External Memory Providers — optional plugins (Honcho, Holographic, Mem0, etc.) for deeper recall

All built-in memory lives on disk under ~/.hermes/ (or $HERMES_HOME). No memory data leaves the machine unless you explicitly configure an external cloud provider.

Memory Types in Detail

1. Persistent Memory (MEMORY.md and USER.md)

The core memory system. Two files in ~/.hermes/memories/:

File	Purpose	Default Char Limit
`MEMORY.md`	Agent's personal notes — environment facts, project conventions, tool quirks, lessons learned	2,200 chars (~800 tokens)
`USER.md`	User profile — name, preferences, communication style, pet peeves	1,375 chars (~500 tokens)

How it works:

Loaded from disk at session start and injected into the system prompt as a frozen snapshot
The agent uses the memory tool to add, replace, or remove entries during a session
Mid-session writes go to disk immediately (durable) but do NOT update the system prompt — this preserves the LLM's prefix cache for performance
The snapshot refreshes on the next session start
Entries are delimited by § (section sign) and can be multiline

System prompt appearance:

══════════════════════════════════════════════
MEMORY (your personal notes) [67% — 1,474/2,200 chars]
══════════════════════════════════════════════
User's project is a Rust web service at ~/code/myapi using Axum + SQLx
§
This machine runs Ubuntu 22.04, has Docker and Podman installed
§
User prefers concise responses, dislikes verbose explanations

Memory tool actions:

add — append a new entry (rejected if it would exceed the char limit)
replace — find an entry by substring match and replace it
remove — find an entry by substring match and delete it

Substring matching means you only need a unique fragment of the entry, not the full text. If the fragment matches multiple entries, the tool returns an error asking for a more specific match.

2. Session Search

Cross-session conversation recall via SQLite FTS5 full-text search.

All CLI and messaging sessions are stored in ~/.hermes/state.db
The session_search tool finds relevant past conversations by keyword
Top matching sessions are summarized by Gemini Flash (cheap, fast) before being returned to the main model
Returns focused summaries, not raw transcripts

When to use session_search vs. memory:

Feature	Persistent Memory	Session Search
Capacity	~3,575 chars total	Unlimited (all sessions)
Speed	Instant (in system prompt)	Requires search + LLM summarization
Use case	Key facts always in context	"What did we discuss about X last week?"
Management	Manually curated by the agent	Automatic — all sessions stored
Token cost	Fixed per session (~1,300 tokens)	On-demand (searched when needed)

Rule of thumb: Memory is for facts that should always be available. Session search is for recalling specific past conversations on demand. Don't save task progress or session outcomes to memory — use session_search to find those.

3. Skills (Procedural Memory)

Skills are reusable workflows stored as SKILL.md files in ~/.hermes/skills/ (and optionally external skill directories).

Organized by category: skills/github/github-pr-workflow/SKILL.md
YAML frontmatter with name, description, version, platform restrictions
Progressive disclosure: metadata shown in skill list, full content loaded on demand via skill_view
The agent creates skills proactively after complex tasks (5+ tool calls) using the skill_manage tool
Skills can be patched when found outdated — stale skills are a liability

Skills are not injected into the system prompt by default. The agent sees a compact index of available skills and loads them on demand. This keeps the prompt lean while giving access to deep procedural knowledge.

Skills vs. Memory:

Memory: compact facts ("User's project uses Go 1.22 with chi router")
Skills: detailed procedures ("How to deploy the staging server: step 1, step 2, ...")

4. External Memory Providers

Optional plugins that add deeper, structured memory alongside the built-in system. Only one external provider can be active at a time.

Provider	Storage	Key Feature
Honcho	Cloud	Dialectic user modeling with semantic search
OpenViking	Self-hosted	Filesystem-style knowledge hierarchy
Mem0	Cloud	Server-side LLM fact extraction
Hindsight	Cloud/Local	Knowledge graph with entity resolution
Holographic	Local SQLite	HRR algebraic reasoning + trust scoring
RetainDB	Cloud	Hybrid search with delta compression
ByteRover	Local/Cloud	Hierarchical knowledge tree with CLI
Supermemory	Cloud	Context fencing + session graph ingest

External providers run alongside built-in memory (never replacing it). They receive hooks for:

System prompt injection (provider context)
Pre-turn memory prefetch
Post-turn conversation sync
Session-end extraction
Built-in memory write mirroring

Setup: hermes memory setup or set memory.provider in ~/.hermes/config.yaml.

See website/docs/user-guide/features/memory-providers.md for full provider details.

How the Systems Interact

Session Start
  |
  +--> Load MEMORY.md + USER.md from disk --> frozen snapshot into system prompt
  +--> Provider: system_prompt_block() --> injected into system prompt
  +--> Skills index --> injected into system prompt (compact metadata only)
  |
  v
Each Turn
  |
  +--> Provider: prefetch(query) --> relevant recalled context
  +--> Agent sees: system prompt (memory + provider context + skills index)
  +--> Agent can call: memory tool, session_search tool, skill tools, provider tools
  |
  v
After Each Response
  |
  +--> Provider: sync_turn(user, assistant) --> persist conversation
  |
  v
Periodic (every N turns, default 10)
  |
  +--> Memory nudge: agent prompted to review and update memory
  |
  v
Session End / Compression
  |
  +--> Memory flush: agent saves important facts before context is discarded
  +--> Provider: on_session_end(messages) --> final extraction
  +--> Provider: on_pre_compress(messages) --> save insights before compression

Best Practices

What to Save

Save proactively — don't wait for the user to ask:

User preferences: "I prefer TypeScript over JavaScript" → user target
Corrections: "Don't use sudo for Docker, I'm in the docker group" → memory target
Environment facts: "This server runs Debian 12 with PostgreSQL 16" → memory target
Conventions: "Project uses tabs, 120-char lines, Google docstrings" → memory target
Explicit requests: "Remember that my API key rotation is monthly" → memory target

What NOT to Save

Task progress or session outcomes — use session_search to recall these
Trivially re-discoverable facts — "Python 3.12 supports f-strings" (web search this)
Raw data dumps — large code blocks, log files, data tables
Session-specific ephemera — temporary file paths, one-off debugging context
Content already in SOUL.md or AGENTS.md — those are already in context

Writing Good Entries

Compact, information-dense entries work best:

# Good — packs multiple related facts
User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop and Podman. Shell: zsh. Editor: VS Code with Vim bindings.

# Good — specific, actionable convention
Project ~/code/api uses Go 1.22, sqlc for DB, chi router. Tests: make test. CI: GitHub Actions.

# Bad — too vague
User has a project.

# Bad — too verbose
On January 5th, 2026, the user asked me to look at their project which is
located at ~/code/api. I discovered it uses Go version 1.22 and...

Capacity Management

When memory is above 80% capacity (visible in the system prompt header), consolidate before adding. Merge related entries into shorter, denser versions. The tool will reject additions that would exceed the limit — use replace to consolidate first.

Priority order for what stays in memory:

User preferences and corrections (highest — prevents repeated steering)
Environment facts and project conventions
Tool quirks and workarounds
Lessons learned (lowest — can often be rediscovered)

Memory Nudge

Every N turns (default: 10), the agent receives a nudge prompting it to review and update its memory. This is a lightweight prompt injected into the conversation — not a separate API call. The agent can choose to update memory or skip if nothing has changed.

Privacy and Data Locality

Built-in memory is fully local. MEMORY.md and USER.md are plain text files in ~/.hermes/memories/. No network calls are made in the memory read/write path. The memory tool scans entries for prompt injection and exfiltration patterns before accepting them.

Session search is local. The SQLite database (~/.hermes/state.db) stays on disk. FTS5 search is a local operation. However, the summarization step uses Gemini Flash (via the auxiliary LLM client) — conversation snippets are sent to Google's API for summarization. If this is a concern, session_search can be disabled.

External providers may send data off-machine. Cloud providers (Honcho, Mem0, RetainDB, Supermemory) send data to their respective APIs. Self-hosted providers (OpenViking, Hindsight local mode, Holographic, ByteRover local mode) keep everything on your machine. Check the provider's documentation for specifics.

Security scanning. All content written to memory (via the memory tool) is scanned for:

Prompt injection patterns ("ignore previous instructions", role hijacking, etc.)
Credential exfiltration attempts (curl/wget with secrets, reading .env files)
SSH backdoor patterns
Invisible unicode characters (used for steganographic injection)

Blocked content is rejected with a descriptive error message.

Configuration

In ~/.hermes/config.yaml:

memory:
  # Enable/disable the two built-in memory stores
  memory_enabled: true         # MEMORY.md
  user_profile_enabled: true   # USER.md

  # Character limits (not tokens — model-independent)
  memory_char_limit: 2200      # ~800 tokens at 2.75 chars/token
  user_char_limit: 1375        # ~500 tokens at 2.75 chars/token

  # External memory provider (empty string = built-in only)
  # Options: "honcho", "openviking", "mem0", "hindsight",
  #          "holographic", "retaindb", "byterover", "supermemory"
  provider: ""

Additional settings are read from run_agent.py defaults:

Setting	Default	Description
`nudge_interval`	10	Turns between memory review nudges (0 = disabled)
`flush_min_turns`	6	Minimum user turns before memory flush on session end/compression (0 = never flush)

These are set under the memory key in config.yaml:

memory:
  nudge_interval: 10
  flush_min_turns: 6

Disabling Memory

To disable memory entirely, set both to false:

memory:
  memory_enabled: false
  user_profile_enabled: false

The memory tool will not appear in the tool list, and no memory blocks are injected into the system prompt.

You can also disable memory per-invocation with skip_memory=True in the AIAgent constructor (used by cron jobs and flush agents).

File Locations

~/.hermes/
├── memories/
│   ├── MEMORY.md        # Agent's persistent notes
│   ├── USER.md          # User profile
│   ├── MEMORY.md.lock   # File lock (auto-created)
│   └── USER.md.lock     # File lock (auto-created)
├── state.db             # SQLite session store (FTS5)
├── config.yaml          # Memory config + provider selection
└── .env                 # API keys for external providers

All paths respect $HERMES_HOME — if you use Hermes profiles, each profile has its own isolated memory directory.

Troubleshooting

"Memory full" errors

The tool returns an error when adding would exceed the character limit. The response includes current entries so the agent can consolidate. Fix by:

Replacing multiple related entries with one denser entry
Removing entries that are no longer relevant
Increasing memory_char_limit in config (at the cost of larger system prompts)

Stale memory entries

If the agent seems to have outdated information:

Check ~/.hermes/memories/MEMORY.md directly — you can edit it by hand
The frozen snapshot pattern means changes only take effect on the next session start
If the agent wrote something wrong mid-session, it persists on disk but won't affect the current session's system prompt

Memory not appearing in system prompt

Verify memory_enabled: true in config.yaml
Check that ~/.hermes/memories/MEMORY.md exists and has content
The file might be empty if all entries were removed — add entries with the memory tool

Session search returns no results

Session search requires sessions to be stored in state.db — new installations have no history
FTS5 indexes are built automatically but may lag behind on very large databases
The summarization step requires the auxiliary LLM client to be configured (API key for Gemini Flash)

Skill drift

Skills that haven't been updated can become wrong or incomplete. The agent is prompted to patch skills when it finds them outdated during use (skill_manage(action='patch')). If you notice stale skills:

Use /skills to browse and review installed skills
Delete or update skills in ~/.hermes/skills/ directly
The agent creates skills after complex tasks — review and prune periodically

Provider not activating

Run hermes memory status to check provider state
Verify the provider plugin is installed in ~/.hermes/plugins/memory/
Check that required API keys are set in ~/.hermes/.env
Start a new session after changing provider config — existing sessions use the old provider

Concurrent write conflicts

The memory tool uses file locking (fcntl.flock) and atomic file replacement (os.replace) to handle concurrent writes from multiple sessions. If you see corrupted memory files:

Check for stale .lock files in ~/.hermes/memories/
Restart any hung Hermes processes
The atomic write pattern means readers always see either the old or new file — never a partial write

15 KiB Raw Blame History