Files

Alexander Whitestone 815160bd6f

Forge CI / smoke-and-build (pull_request) Successful in 1m3s

Details

burn: add Memory Architecture Guide (closes #263 , #258 )

Developer-facing guide covering all four memory tiers:
- Built-in memory (MEMORY.md/USER.md) with frozen snapshot pattern
- Session search (FTS5 + Gemini Flash summarization)
- Skills as procedural memory
- External memory provider plugin architecture

Includes data lifecycle, security guarantees, code paths,
configuration reference, and troubleshooting.

2026-04-09 20:51:45 -04:00

13 KiB

Raw Blame History

Memory Architecture Guide

Developer-facing guide to the Hermes Agent memory system. Covers all four memory tiers, data lifecycle, security guarantees, and extension points.

Overview

Hermes has four distinct memory systems, each serving a different purpose:

Tier	System	Scope	Cost	Persistence
1	Built-in Memory (MEMORY.md / USER.md)	Current session, curated facts	~1,300 tokens fixed per session	File-backed, cross-session
2	Session Search (FTS5)	All past conversations	On-demand (search + summarize)	SQLite (state.db)
3	Skills (procedural memory)	How to do specific tasks	Loaded on match only	File-backed (~/.hermes/skills/)
4	External Providers (plugins)	Deep persistent knowledge	Provider-dependent	Provider-specific

All four tiers operate independently. Built-in memory is always active. The others are opt-in or on-demand.

Tier 1: Built-in Memory (MEMORY.md / USER.md)

File Layout

~/.hermes/memories/
├── MEMORY.md    — Agent's notes (environment facts, conventions, lessons learned)
└── USER.md      — User profile (preferences, communication style, identity)

Profile-aware: when running under a profile (hermes -p coder), the memories directory resolves to ~/.hermes/profiles/<name>/memories/.

Frozen Snapshot Pattern

This is the most important architectural decision in the memory system.

Session start: MemoryStore.load_for_prompt() reads both files from disk, parses entries delimited by § (section sign), and injects them into the system prompt as a frozen block.
During session: The memory tool writes to disk immediately (durable), but does not update the system prompt. This preserves the LLM's prefix cache for the entire session.
Next session: The snapshot refreshes from disk.

Why frozen? System prompt changes invalidate the KV cache on every API call. With a ~30K token system prompt, that's expensive. Freezing memory at session start means the cache stays warm for the entire conversation. The tradeoff: memory writes made mid-session don't take effect until next session. Tool responses show the live state so the agent can verify writes succeeded.

Character Limits

Store	Default Limit	Approx Tokens	Typical Entries
MEMORY.md	2,200 chars	~800	8-15
USER.md	1,375 chars	~500	5-10

Limits are in characters (not tokens) because character counts are model-independent. Configurable in config.yaml:

memory:
  memory_char_limit: 2200
  user_char_limit: 1375

Entry Format

Entries are separated by \n§\n. Each entry can be multiline. Example MEMORY.md:

User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop
§
Project ~/code/api uses Go 1.22, chi router, sqlc. Tests: 'make test'
§
Staging server 10.0.1.50 uses SSH port 2222, key at ~/.ssh/staging_ed25519

Tool Interface

The memory tool (defined in tools/memory_tool.py) supports:

add — Append new entry. Rejects exact duplicates.
replace — Find entry by unique substring (old_text), replace with content.
remove — Find entry by unique substring, delete it.
read — Return current entries from disk (live state, not frozen snapshot).

Substring matching: old_text must match exactly one entry. If it matches multiple, the tool returns an error asking for more specificity.

Security Scanning

Every memory entry is scanned against _MEMORY_THREAT_PATTERNS before acceptance:

Prompt injection patterns (ignore previous instructions, you are now...)
Credential exfiltration (curl/wget with env vars, .env file reads)
SSH backdoor attempts (authorized_keys, .ssh writes)
Invisible Unicode characters (zero-width spaces, BOM)

Matches are rejected with an error message. Source: _scan_memory_content() in tools/memory_tool.py.

Code Path

agent/prompt_builder.py
  └── assembles system prompt pieces
       └── MemoryStore.load_for_prompt() → frozen snapshot injection

tools/memory_tool.py
  ├── MemoryStore class (file I/O, locking, parsing)
  ├── memory_tool() function (add/replace/remove/read dispatch)
  └── _scan_memory_content() (threat scanning)

hermes_cli/memory_setup.py
  └── Interactive first-run memory setup

Tier 2: Session Search (FTS5)

How It Works

Every CLI and gateway session stores full message history in SQLite (~/.hermes/state.db)
The messages_fts FTS5 virtual table enables fast full-text search
The session_search tool finds relevant messages, groups by session, loads top N
Each matching session is summarized by Gemini Flash (auxiliary LLM, not main model)
Summaries are returned to the main agent as context

Why Gemini Flash for Summarization

Raw session transcripts can be 50K+ chars. Feeding them to the main model wastes context window and tokens. Gemini Flash is fast, cheap, and good enough for "extract the relevant bits" summarization. Same pattern used by web_extract.

Schema

-- Core tables
sessions (id, source, user_id, model, system_prompt, parent_session_id, ...)
messages (id, session_id, role, content, tool_name, timestamp, ...)

-- Full-text search
messages_fts  -- FTS5 virtual table on messages.content

-- Schema tracking
schema_version

WAL mode for concurrent readers + one writer (gateway multi-platform support).

Session Lineage

When context compression triggers a session split, parent_session_id chains the old and new sessions. This lets session search follow the thread across compression boundaries.

Code Path

tools/session_search_tool.py
  ├── FTS5 query against messages_fts
  ├── Groups results by session_id
  ├── Loads top N sessions (MAX_SESSION_CHARS = 100K per session)
  ├── Sends to Gemini Flash via auxiliary_client.async_call_llm()
  └── Returns per-session summaries

hermes_state.py (SessionDB class)
  ├── SQLite WAL mode database
  ├── FTS5 triggers for message insert/update/delete
  └── Session CRUD operations

Memory vs Session Search

	Memory	Session Search
Capacity	~1,300 tokens total	Unlimited (all stored sessions)
Latency	Instant (in system prompt)	Requires FTS query + LLM call
When to use	Critical facts always in context	"What did we discuss about X?"
Management	Agent-curated	Automatic
Token cost	Fixed per session	On-demand per search

Tier 3: Skills (Procedural Memory)

What Skills Are

Skills capture how to do a specific type of task based on proven experience. Where memory is broad and declarative, skills are narrow and actionable.

A skill is a directory with a SKILL.md (markdown instructions) and optional supporting files:

~/.hermes/skills/
├── my-skill/
│   ├── SKILL.md          — Instructions, steps, pitfalls
│   ├── references/       — API docs, specs
│   ├── templates/        — Code templates, config files
│   ├── scripts/          — Helper scripts
│   └── assets/           — Images, data files

How Skills Load

At the start of each turn, the agent's system prompt includes available skills. When a skill matches the current task, the agent loads it with skill_view(name) and follows its instructions. Skills are not injected wholesale — they're loaded on demand to preserve context window.

Skill Lifecycle

Creation: After a complex task (5+ tool calls), the agent offers to save the approach as a skill using skill_manage(action='create').
Usage: On future matching tasks, the agent loads the skill with skill_view(name).
Maintenance: If a skill is outdated or incomplete when used, the agent patches it immediately with skill_manage(action='patch').
Deletion: Obsolete skills are removed with skill_manage(action='delete').

Skills vs Memory

	Memory	Skills
Format	Free-text entries	Structured markdown (steps, pitfalls, examples)
Scope	Facts and preferences	Procedures and workflows
Loading	Always in system prompt	On-demand when matched
Size	~1,300 tokens total	Variable (loaded individually)

Code Path

tools/skill_manager_tool.py  — Create, edit, patch, delete skills
agent/skill_commands.py       — Slash commands for skill management
skills_hub.py                 — Browse, search, install skills from hub

Tier 4: External Memory Providers

Plugin Architecture

plugins/memory/
├── __init__.py        — Provider registry and base interface
├── honcho/            — Dialectic Q&A, cross-session user modeling
├── openviking/        — Knowledge graph memory
├── mem0/              — Semantic memory with auto-extraction
├── hindsight/         — Retrospective memory analysis
├── holographic/       — Distributed holographic memory
├── retaindb/          — Vector-based retention
├── byterover/         — Byte-level memory compression
└── supermemory/       — Cloud-hosted semantic memory

Only one external provider can be active at a time. Built-in memory (Tier 1) always runs alongside it.

Integration Points

When a provider is active, Hermes:

Injects provider context into the system prompt
Prefetches relevant memories before each turn (background, non-blocking)
Syncs conversation turns to the provider after each response
Extracts memories on session end (for providers that support it)
Mirrors built-in memory writes to the provider
Adds provider-specific tools for search and management

Configuration

memory:
  provider: openviking  # or honcho, mem0, hindsight, etc.

Setup: hermes memory setup (interactive picker).

Data Lifecycle

Session Start
  │
  ├── Load MEMORY.md + USER.md from disk → frozen snapshot in system prompt
  ├── Load skills catalog (names + descriptions)
  ├── Initialize session search (SQLite connection)
  └── Initialize external provider (if configured)
  │
  ▼
Each Turn
  │
  ├── Agent sees frozen memory in system prompt
  ├── Agent can call memory tool → writes to disk, returns live state
  ├── Agent can call session_search → FTS5 + Gemini Flash summarization
  ├── Agent can load skills → reads SKILL.md from disk
  └── External provider prefetches context (if active)
  │
  ▼
Session End
  │
  ├── All memory writes already on disk (immediate persistence)
  ├── Session transcript saved to SQLite (messages + FTS5 index)
  ├── External provider extracts final memories (if supported)
  └── Skill updates persisted (if any were patched)

Privacy and Data Locality

Component	Location	Network
MEMORY.md / USER.md	`~/.hermes/memories/`	Local only
Session DB	`~/.hermes/state.db`	Local only
Skills	`~/.hermes/skills/`	Local only
External provider	Provider-dependent	Provider API calls

Built-in memory (Tiers 1-3) never leaves the machine. External providers (Tier 4) send data to the configured provider by design. The agent logs all provider API calls in the session transcript for auditability.

Configuration Reference

# ~/.hermes/config.yaml
memory:
  memory_enabled: true          # Enable MEMORY.md
  user_profile_enabled: true    # Enable USER.md
  memory_char_limit: 2200       # MEMORY.md char limit (~800 tokens)
  user_char_limit: 1375         # USER.md char limit (~500 tokens)
  nudge_interval: 10            # Turns between memory nudge reminders
  provider: null                # External provider name (null = disabled)

Environment variables (in ~/.hermes/.env):

Provider-specific API keys (e.g., HONCHO_API_KEY, MEM0_API_KEY)

Troubleshooting

Memory not appearing in system prompt

Check ~/.hermes/memories/MEMORY.md exists and has content
Verify memory.memory_enabled: true in config
Check for file lock issues (WAL mode, concurrent access)

Memory writes not taking effect

Writes are durable to disk immediately but frozen in system prompt until next session
Tool response shows live state — verify the write succeeded there
Start a new session to see the updated snapshot

Session search returns nothing

Verify state.db has sessions: sqlite3 ~/.hermes/state.db "SELECT count(*) FROM sessions"
Check FTS5 index: sqlite3 ~/.hermes/state.db "SELECT count(*) FROM messages_fts"
Ensure auxiliary LLM (Gemini Flash) is configured and reachable

Skills not loading

Check ~/.hermes/skills/ directory exists
Verify SKILL.md has valid frontmatter (name, description)
Skills load by name match — check the skill name matches what the agent expects

External provider errors

Check API key in ~/.hermes/.env
Verify provider is installed: pip install <provider-package>
Run hermes memory status for diagnostic info

13 KiB Raw Blame History

Memory Architecture Guide

Overview

Tier 1: Built-in Memory (MEMORY.md / USER.md)

File Layout

Frozen Snapshot Pattern

Character Limits

Entry Format

Tool Interface

Security Scanning

Code Path

Tier 2: Session Search (FTS5)

How It Works

Why Gemini Flash for Summarization

Schema

Session Lineage

Code Path

Memory vs Session Search

Tier 3: Skills (Procedural Memory)

What Skills Are

How Skills Load

Skill Lifecycle

Skills vs Memory

Code Path

Tier 4: External Memory Providers

Plugin Architecture

Integration Points

Configuration

Data Lifecycle

Privacy and Data Locality

Configuration Reference

Troubleshooting

Memory not appearing in system prompt

Memory writes not taking effect

Session search returns nothing

Skills not loading

External provider errors

13 KiB

Raw Blame History