burn: add Memory Architecture Guide (closes #263, #258)
All checks were successful
Forge CI / smoke-and-build (pull_request) Successful in 1m3s

Developer-facing guide covering all four memory tiers:
- Built-in memory (MEMORY.md/USER.md) with frozen snapshot pattern
- Session search (FTS5 + Gemini Flash summarization)
- Skills as procedural memory
- External memory provider plugin architecture

Includes data lifecycle, security guarantees, code paths,
configuration reference, and troubleshooting.
This commit is contained in:
Alexander Whitestone
2026-04-09 20:51:45 -04:00
parent 7d2421a15f
commit 815160bd6f

View File

@@ -0,0 +1,335 @@
# Memory Architecture Guide
Developer-facing guide to the Hermes Agent memory system. Covers all four memory tiers, data lifecycle, security guarantees, and extension points.
## Overview
Hermes has four distinct memory systems, each serving a different purpose:
| Tier | System | Scope | Cost | Persistence |
|------|--------|-------|------|-------------|
| 1 | **Built-in Memory** (MEMORY.md / USER.md) | Current session, curated facts | ~1,300 tokens fixed per session | File-backed, cross-session |
| 2 | **Session Search** (FTS5) | All past conversations | On-demand (search + summarize) | SQLite (state.db) |
| 3 | **Skills** (procedural memory) | How to do specific tasks | Loaded on match only | File-backed (~/.hermes/skills/) |
| 4 | **External Providers** (plugins) | Deep persistent knowledge | Provider-dependent | Provider-specific |
All four tiers operate independently. Built-in memory is always active. The others are opt-in or on-demand.
## Tier 1: Built-in Memory (MEMORY.md / USER.md)
### File Layout
```
~/.hermes/memories/
├── MEMORY.md — Agent's notes (environment facts, conventions, lessons learned)
└── USER.md — User profile (preferences, communication style, identity)
```
Profile-aware: when running under a profile (`hermes -p coder`), the memories directory resolves to `~/.hermes/profiles/<name>/memories/`.
### Frozen Snapshot Pattern
This is the most important architectural decision in the memory system.
1. **Session start:** `MemoryStore.load_for_prompt()` reads both files from disk, parses entries delimited by `§` (section sign), and injects them into the system prompt as a frozen block.
2. **During session:** The `memory` tool writes to disk immediately (durable), but does **not** update the system prompt. This preserves the LLM's prefix cache for the entire session.
3. **Next session:** The snapshot refreshes from disk.
**Why frozen?** System prompt changes invalidate the KV cache on every API call. With a ~30K token system prompt, that's expensive. Freezing memory at session start means the cache stays warm for the entire conversation. The tradeoff: memory writes made mid-session don't take effect until next session. Tool responses show the live state so the agent can verify writes succeeded.
### Character Limits
| Store | Default Limit | Approx Tokens | Typical Entries |
|-------|--------------|---------------|-----------------|
| MEMORY.md | 2,200 chars | ~800 | 8-15 |
| USER.md | 1,375 chars | ~500 | 5-10 |
Limits are in characters (not tokens) because character counts are model-independent. Configurable in `config.yaml`:
```yaml
memory:
memory_char_limit: 2200
user_char_limit: 1375
```
### Entry Format
Entries are separated by `\n§\n`. Each entry can be multiline. Example MEMORY.md:
```
User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop
§
Project ~/code/api uses Go 1.22, chi router, sqlc. Tests: 'make test'
§
Staging server 10.0.1.50 uses SSH port 2222, key at ~/.ssh/staging_ed25519
```
### Tool Interface
The `memory` tool (defined in `tools/memory_tool.py`) supports:
- **`add`** — Append new entry. Rejects exact duplicates.
- **`replace`** — Find entry by unique substring (`old_text`), replace with `content`.
- **`remove`** — Find entry by unique substring, delete it.
- **`read`** — Return current entries from disk (live state, not frozen snapshot).
Substring matching: `old_text` must match exactly one entry. If it matches multiple, the tool returns an error asking for more specificity.
### Security Scanning
Every memory entry is scanned against `_MEMORY_THREAT_PATTERNS` before acceptance:
- Prompt injection patterns (`ignore previous instructions`, `you are now...`)
- Credential exfiltration (`curl`/`wget` with env vars, `.env` file reads)
- SSH backdoor attempts (`authorized_keys`, `.ssh` writes)
- Invisible Unicode characters (zero-width spaces, BOM)
Matches are rejected with an error message. Source: `_scan_memory_content()` in `tools/memory_tool.py`.
### Code Path
```
agent/prompt_builder.py
└── assembles system prompt pieces
└── MemoryStore.load_for_prompt() → frozen snapshot injection
tools/memory_tool.py
├── MemoryStore class (file I/O, locking, parsing)
├── memory_tool() function (add/replace/remove/read dispatch)
└── _scan_memory_content() (threat scanning)
hermes_cli/memory_setup.py
└── Interactive first-run memory setup
```
## Tier 2: Session Search (FTS5)
### How It Works
1. Every CLI and gateway session stores full message history in SQLite (`~/.hermes/state.db`)
2. The `messages_fts` FTS5 virtual table enables fast full-text search
3. The `session_search` tool finds relevant messages, groups by session, loads top N
4. Each matching session is summarized by Gemini Flash (auxiliary LLM, not main model)
5. Summaries are returned to the main agent as context
### Why Gemini Flash for Summarization
Raw session transcripts can be 50K+ chars. Feeding them to the main model wastes context window and tokens. Gemini Flash is fast, cheap, and good enough for "extract the relevant bits" summarization. Same pattern used by `web_extract`.
### Schema
```sql
-- Core tables
sessions (id, source, user_id, model, system_prompt, parent_session_id, ...)
messages (id, session_id, role, content, tool_name, timestamp, ...)
-- Full-text search
messages_fts -- FTS5 virtual table on messages.content
-- Schema tracking
schema_version
```
WAL mode for concurrent readers + one writer (gateway multi-platform support).
### Session Lineage
When context compression triggers a session split, `parent_session_id` chains the old and new sessions. This lets session search follow the thread across compression boundaries.
### Code Path
```
tools/session_search_tool.py
├── FTS5 query against messages_fts
├── Groups results by session_id
├── Loads top N sessions (MAX_SESSION_CHARS = 100K per session)
├── Sends to Gemini Flash via auxiliary_client.async_call_llm()
└── Returns per-session summaries
hermes_state.py (SessionDB class)
├── SQLite WAL mode database
├── FTS5 triggers for message insert/update/delete
└── Session CRUD operations
```
### Memory vs Session Search
| | Memory | Session Search |
|---|--------|---------------|
| **Capacity** | ~1,300 tokens total | Unlimited (all stored sessions) |
| **Latency** | Instant (in system prompt) | Requires FTS query + LLM call |
| **When to use** | Critical facts always in context | "What did we discuss about X?" |
| **Management** | Agent-curated | Automatic |
| **Token cost** | Fixed per session | On-demand per search |
## Tier 3: Skills (Procedural Memory)
### What Skills Are
Skills capture **how to do a specific type of task** based on proven experience. Where memory is broad and declarative, skills are narrow and actionable.
A skill is a directory with a `SKILL.md` (markdown instructions) and optional supporting files:
```
~/.hermes/skills/
├── my-skill/
│ ├── SKILL.md — Instructions, steps, pitfalls
│ ├── references/ — API docs, specs
│ ├── templates/ — Code templates, config files
│ ├── scripts/ — Helper scripts
│ └── assets/ — Images, data files
```
### How Skills Load
At the start of each turn, the agent's system prompt includes available skills. When a skill matches the current task, the agent loads it with `skill_view(name)` and follows its instructions. Skills are **not** injected wholesale — they're loaded on demand to preserve context window.
### Skill Lifecycle
1. **Creation:** After a complex task (5+ tool calls), the agent offers to save the approach as a skill using `skill_manage(action='create')`.
2. **Usage:** On future matching tasks, the agent loads the skill with `skill_view(name)`.
3. **Maintenance:** If a skill is outdated or incomplete when used, the agent patches it immediately with `skill_manage(action='patch')`.
4. **Deletion:** Obsolete skills are removed with `skill_manage(action='delete')`.
### Skills vs Memory
| | Memory | Skills |
|---|--------|--------|
| **Format** | Free-text entries | Structured markdown (steps, pitfalls, examples) |
| **Scope** | Facts and preferences | Procedures and workflows |
| **Loading** | Always in system prompt | On-demand when matched |
| **Size** | ~1,300 tokens total | Variable (loaded individually) |
### Code Path
```
tools/skill_manager_tool.py — Create, edit, patch, delete skills
agent/skill_commands.py — Slash commands for skill management
skills_hub.py — Browse, search, install skills from hub
```
## Tier 4: External Memory Providers
### Plugin Architecture
```
plugins/memory/
├── __init__.py — Provider registry and base interface
├── honcho/ — Dialectic Q&A, cross-session user modeling
├── openviking/ — Knowledge graph memory
├── mem0/ — Semantic memory with auto-extraction
├── hindsight/ — Retrospective memory analysis
├── holographic/ — Distributed holographic memory
├── retaindb/ — Vector-based retention
├── byterover/ — Byte-level memory compression
└── supermemory/ — Cloud-hosted semantic memory
```
Only one external provider can be active at a time. Built-in memory (Tier 1) always runs alongside it.
### Integration Points
When a provider is active, Hermes:
1. Injects provider context into the system prompt
2. Prefetches relevant memories before each turn (background, non-blocking)
3. Syncs conversation turns to the provider after each response
4. Extracts memories on session end (for providers that support it)
5. Mirrors built-in memory writes to the provider
6. Adds provider-specific tools for search and management
### Configuration
```yaml
memory:
provider: openviking # or honcho, mem0, hindsight, etc.
```
Setup: `hermes memory setup` (interactive picker).
## Data Lifecycle
```
Session Start
├── Load MEMORY.md + USER.md from disk → frozen snapshot in system prompt
├── Load skills catalog (names + descriptions)
├── Initialize session search (SQLite connection)
└── Initialize external provider (if configured)
Each Turn
├── Agent sees frozen memory in system prompt
├── Agent can call memory tool → writes to disk, returns live state
├── Agent can call session_search → FTS5 + Gemini Flash summarization
├── Agent can load skills → reads SKILL.md from disk
└── External provider prefetches context (if active)
Session End
├── All memory writes already on disk (immediate persistence)
├── Session transcript saved to SQLite (messages + FTS5 index)
├── External provider extracts final memories (if supported)
└── Skill updates persisted (if any were patched)
```
## Privacy and Data Locality
| Component | Location | Network |
|-----------|----------|---------|
| MEMORY.md / USER.md | `~/.hermes/memories/` | Local only |
| Session DB | `~/.hermes/state.db` | Local only |
| Skills | `~/.hermes/skills/` | Local only |
| External provider | Provider-dependent | Provider API calls |
Built-in memory (Tiers 1-3) never leaves the machine. External providers (Tier 4) send data to the configured provider by design. The agent logs all provider API calls in the session transcript for auditability.
## Configuration Reference
```yaml
# ~/.hermes/config.yaml
memory:
memory_enabled: true # Enable MEMORY.md
user_profile_enabled: true # Enable USER.md
memory_char_limit: 2200 # MEMORY.md char limit (~800 tokens)
user_char_limit: 1375 # USER.md char limit (~500 tokens)
nudge_interval: 10 # Turns between memory nudge reminders
provider: null # External provider name (null = disabled)
```
Environment variables (in `~/.hermes/.env`):
- Provider-specific API keys (e.g., `HONCHO_API_KEY`, `MEM0_API_KEY`)
## Troubleshooting
### Memory not appearing in system prompt
- Check `~/.hermes/memories/MEMORY.md` exists and has content
- Verify `memory.memory_enabled: true` in config
- Check for file lock issues (WAL mode, concurrent access)
### Memory writes not taking effect
- Writes are durable to disk immediately but frozen in system prompt until next session
- Tool response shows live state — verify the write succeeded there
- Start a new session to see the updated snapshot
### Session search returns nothing
- Verify `state.db` has sessions: `sqlite3 ~/.hermes/state.db "SELECT count(*) FROM sessions"`
- Check FTS5 index: `sqlite3 ~/.hermes/state.db "SELECT count(*) FROM messages_fts"`
- Ensure auxiliary LLM (Gemini Flash) is configured and reachable
### Skills not loading
- Check `~/.hermes/skills/` directory exists
- Verify SKILL.md has valid frontmatter (name, description)
- Skills load by name match — check the skill name matches what the agent expects
### External provider errors
- Check API key in `~/.hermes/.env`
- Verify provider is installed: `pip install <provider-package>`
- Run `hermes memory status` for diagnostic info