Files
hermes-agent/docs/memory-architecture-guide.md
Alexander Whitestone 815160bd6f
All checks were successful
Forge CI / smoke-and-build (pull_request) Successful in 1m3s
burn: add Memory Architecture Guide (closes #263, #258)
Developer-facing guide covering all four memory tiers:
- Built-in memory (MEMORY.md/USER.md) with frozen snapshot pattern
- Session search (FTS5 + Gemini Flash summarization)
- Skills as procedural memory
- External memory provider plugin architecture

Includes data lifecycle, security guarantees, code paths,
configuration reference, and troubleshooting.
2026-04-09 20:51:45 -04:00

13 KiB

Memory Architecture Guide

Developer-facing guide to the Hermes Agent memory system. Covers all four memory tiers, data lifecycle, security guarantees, and extension points.

Overview

Hermes has four distinct memory systems, each serving a different purpose:

Tier System Scope Cost Persistence
1 Built-in Memory (MEMORY.md / USER.md) Current session, curated facts ~1,300 tokens fixed per session File-backed, cross-session
2 Session Search (FTS5) All past conversations On-demand (search + summarize) SQLite (state.db)
3 Skills (procedural memory) How to do specific tasks Loaded on match only File-backed (~/.hermes/skills/)
4 External Providers (plugins) Deep persistent knowledge Provider-dependent Provider-specific

All four tiers operate independently. Built-in memory is always active. The others are opt-in or on-demand.

Tier 1: Built-in Memory (MEMORY.md / USER.md)

File Layout

~/.hermes/memories/
├── MEMORY.md    — Agent's notes (environment facts, conventions, lessons learned)
└── USER.md      — User profile (preferences, communication style, identity)

Profile-aware: when running under a profile (hermes -p coder), the memories directory resolves to ~/.hermes/profiles/<name>/memories/.

Frozen Snapshot Pattern

This is the most important architectural decision in the memory system.

  1. Session start: MemoryStore.load_for_prompt() reads both files from disk, parses entries delimited by § (section sign), and injects them into the system prompt as a frozen block.
  2. During session: The memory tool writes to disk immediately (durable), but does not update the system prompt. This preserves the LLM's prefix cache for the entire session.
  3. Next session: The snapshot refreshes from disk.

Why frozen? System prompt changes invalidate the KV cache on every API call. With a ~30K token system prompt, that's expensive. Freezing memory at session start means the cache stays warm for the entire conversation. The tradeoff: memory writes made mid-session don't take effect until next session. Tool responses show the live state so the agent can verify writes succeeded.

Character Limits

Store Default Limit Approx Tokens Typical Entries
MEMORY.md 2,200 chars ~800 8-15
USER.md 1,375 chars ~500 5-10

Limits are in characters (not tokens) because character counts are model-independent. Configurable in config.yaml:

memory:
  memory_char_limit: 2200
  user_char_limit: 1375

Entry Format

Entries are separated by \n§\n. Each entry can be multiline. Example MEMORY.md:

User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop
§
Project ~/code/api uses Go 1.22, chi router, sqlc. Tests: 'make test'
§
Staging server 10.0.1.50 uses SSH port 2222, key at ~/.ssh/staging_ed25519

Tool Interface

The memory tool (defined in tools/memory_tool.py) supports:

  • add — Append new entry. Rejects exact duplicates.
  • replace — Find entry by unique substring (old_text), replace with content.
  • remove — Find entry by unique substring, delete it.
  • read — Return current entries from disk (live state, not frozen snapshot).

Substring matching: old_text must match exactly one entry. If it matches multiple, the tool returns an error asking for more specificity.

Security Scanning

Every memory entry is scanned against _MEMORY_THREAT_PATTERNS before acceptance:

  • Prompt injection patterns (ignore previous instructions, you are now...)
  • Credential exfiltration (curl/wget with env vars, .env file reads)
  • SSH backdoor attempts (authorized_keys, .ssh writes)
  • Invisible Unicode characters (zero-width spaces, BOM)

Matches are rejected with an error message. Source: _scan_memory_content() in tools/memory_tool.py.

Code Path

agent/prompt_builder.py
  └── assembles system prompt pieces
       └── MemoryStore.load_for_prompt() → frozen snapshot injection

tools/memory_tool.py
  ├── MemoryStore class (file I/O, locking, parsing)
  ├── memory_tool() function (add/replace/remove/read dispatch)
  └── _scan_memory_content() (threat scanning)

hermes_cli/memory_setup.py
  └── Interactive first-run memory setup

Tier 2: Session Search (FTS5)

How It Works

  1. Every CLI and gateway session stores full message history in SQLite (~/.hermes/state.db)
  2. The messages_fts FTS5 virtual table enables fast full-text search
  3. The session_search tool finds relevant messages, groups by session, loads top N
  4. Each matching session is summarized by Gemini Flash (auxiliary LLM, not main model)
  5. Summaries are returned to the main agent as context

Why Gemini Flash for Summarization

Raw session transcripts can be 50K+ chars. Feeding them to the main model wastes context window and tokens. Gemini Flash is fast, cheap, and good enough for "extract the relevant bits" summarization. Same pattern used by web_extract.

Schema

-- Core tables
sessions (id, source, user_id, model, system_prompt, parent_session_id, ...)
messages (id, session_id, role, content, tool_name, timestamp, ...)

-- Full-text search
messages_fts  -- FTS5 virtual table on messages.content

-- Schema tracking
schema_version

WAL mode for concurrent readers + one writer (gateway multi-platform support).

Session Lineage

When context compression triggers a session split, parent_session_id chains the old and new sessions. This lets session search follow the thread across compression boundaries.

Code Path

tools/session_search_tool.py
  ├── FTS5 query against messages_fts
  ├── Groups results by session_id
  ├── Loads top N sessions (MAX_SESSION_CHARS = 100K per session)
  ├── Sends to Gemini Flash via auxiliary_client.async_call_llm()
  └── Returns per-session summaries

hermes_state.py (SessionDB class)
  ├── SQLite WAL mode database
  ├── FTS5 triggers for message insert/update/delete
  └── Session CRUD operations
Memory Session Search
Capacity ~1,300 tokens total Unlimited (all stored sessions)
Latency Instant (in system prompt) Requires FTS query + LLM call
When to use Critical facts always in context "What did we discuss about X?"
Management Agent-curated Automatic
Token cost Fixed per session On-demand per search

Tier 3: Skills (Procedural Memory)

What Skills Are

Skills capture how to do a specific type of task based on proven experience. Where memory is broad and declarative, skills are narrow and actionable.

A skill is a directory with a SKILL.md (markdown instructions) and optional supporting files:

~/.hermes/skills/
├── my-skill/
│   ├── SKILL.md          — Instructions, steps, pitfalls
│   ├── references/       — API docs, specs
│   ├── templates/        — Code templates, config files
│   ├── scripts/          — Helper scripts
│   └── assets/           — Images, data files

How Skills Load

At the start of each turn, the agent's system prompt includes available skills. When a skill matches the current task, the agent loads it with skill_view(name) and follows its instructions. Skills are not injected wholesale — they're loaded on demand to preserve context window.

Skill Lifecycle

  1. Creation: After a complex task (5+ tool calls), the agent offers to save the approach as a skill using skill_manage(action='create').
  2. Usage: On future matching tasks, the agent loads the skill with skill_view(name).
  3. Maintenance: If a skill is outdated or incomplete when used, the agent patches it immediately with skill_manage(action='patch').
  4. Deletion: Obsolete skills are removed with skill_manage(action='delete').

Skills vs Memory

Memory Skills
Format Free-text entries Structured markdown (steps, pitfalls, examples)
Scope Facts and preferences Procedures and workflows
Loading Always in system prompt On-demand when matched
Size ~1,300 tokens total Variable (loaded individually)

Code Path

tools/skill_manager_tool.py  — Create, edit, patch, delete skills
agent/skill_commands.py       — Slash commands for skill management
skills_hub.py                 — Browse, search, install skills from hub

Tier 4: External Memory Providers

Plugin Architecture

plugins/memory/
├── __init__.py        — Provider registry and base interface
├── honcho/            — Dialectic Q&A, cross-session user modeling
├── openviking/        — Knowledge graph memory
├── mem0/              — Semantic memory with auto-extraction
├── hindsight/         — Retrospective memory analysis
├── holographic/       — Distributed holographic memory
├── retaindb/          — Vector-based retention
├── byterover/         — Byte-level memory compression
└── supermemory/       — Cloud-hosted semantic memory

Only one external provider can be active at a time. Built-in memory (Tier 1) always runs alongside it.

Integration Points

When a provider is active, Hermes:

  1. Injects provider context into the system prompt
  2. Prefetches relevant memories before each turn (background, non-blocking)
  3. Syncs conversation turns to the provider after each response
  4. Extracts memories on session end (for providers that support it)
  5. Mirrors built-in memory writes to the provider
  6. Adds provider-specific tools for search and management

Configuration

memory:
  provider: openviking  # or honcho, mem0, hindsight, etc.

Setup: hermes memory setup (interactive picker).

Data Lifecycle

Session Start
  │
  ├── Load MEMORY.md + USER.md from disk → frozen snapshot in system prompt
  ├── Load skills catalog (names + descriptions)
  ├── Initialize session search (SQLite connection)
  └── Initialize external provider (if configured)
  │
  ▼
Each Turn
  │
  ├── Agent sees frozen memory in system prompt
  ├── Agent can call memory tool → writes to disk, returns live state
  ├── Agent can call session_search → FTS5 + Gemini Flash summarization
  ├── Agent can load skills → reads SKILL.md from disk
  └── External provider prefetches context (if active)
  │
  ▼
Session End
  │
  ├── All memory writes already on disk (immediate persistence)
  ├── Session transcript saved to SQLite (messages + FTS5 index)
  ├── External provider extracts final memories (if supported)
  └── Skill updates persisted (if any were patched)

Privacy and Data Locality

Component Location Network
MEMORY.md / USER.md ~/.hermes/memories/ Local only
Session DB ~/.hermes/state.db Local only
Skills ~/.hermes/skills/ Local only
External provider Provider-dependent Provider API calls

Built-in memory (Tiers 1-3) never leaves the machine. External providers (Tier 4) send data to the configured provider by design. The agent logs all provider API calls in the session transcript for auditability.

Configuration Reference

# ~/.hermes/config.yaml
memory:
  memory_enabled: true          # Enable MEMORY.md
  user_profile_enabled: true    # Enable USER.md
  memory_char_limit: 2200       # MEMORY.md char limit (~800 tokens)
  user_char_limit: 1375         # USER.md char limit (~500 tokens)
  nudge_interval: 10            # Turns between memory nudge reminders
  provider: null                # External provider name (null = disabled)

Environment variables (in ~/.hermes/.env):

  • Provider-specific API keys (e.g., HONCHO_API_KEY, MEM0_API_KEY)

Troubleshooting

Memory not appearing in system prompt

  • Check ~/.hermes/memories/MEMORY.md exists and has content
  • Verify memory.memory_enabled: true in config
  • Check for file lock issues (WAL mode, concurrent access)

Memory writes not taking effect

  • Writes are durable to disk immediately but frozen in system prompt until next session
  • Tool response shows live state — verify the write succeeded there
  • Start a new session to see the updated snapshot

Session search returns nothing

  • Verify state.db has sessions: sqlite3 ~/.hermes/state.db "SELECT count(*) FROM sessions"
  • Check FTS5 index: sqlite3 ~/.hermes/state.db "SELECT count(*) FROM messages_fts"
  • Ensure auxiliary LLM (Gemini Flash) is configured and reachable

Skills not loading

  • Check ~/.hermes/skills/ directory exists
  • Verify SKILL.md has valid frontmatter (name, description)
  • Skills load by name match — check the skill name matches what the agent expects

External provider errors

  • Check API key in ~/.hermes/.env
  • Verify provider is installed: pip install <provider-package>
  • Run hermes memory status for diagnostic info