burn: add Memory Architecture Guide (closes #263, #258)
All checks were successful
Forge CI / smoke-and-build (pull_request) Successful in 1m3s
All checks were successful
Forge CI / smoke-and-build (pull_request) Successful in 1m3s
Developer-facing guide covering all four memory tiers: - Built-in memory (MEMORY.md/USER.md) with frozen snapshot pattern - Session search (FTS5 + Gemini Flash summarization) - Skills as procedural memory - External memory provider plugin architecture Includes data lifecycle, security guarantees, code paths, configuration reference, and troubleshooting.
This commit is contained in:
335
docs/memory-architecture-guide.md
Normal file
335
docs/memory-architecture-guide.md
Normal file
@@ -0,0 +1,335 @@
|
||||
# Memory Architecture Guide
|
||||
|
||||
Developer-facing guide to the Hermes Agent memory system. Covers all four memory tiers, data lifecycle, security guarantees, and extension points.
|
||||
|
||||
## Overview
|
||||
|
||||
Hermes has four distinct memory systems, each serving a different purpose:
|
||||
|
||||
| Tier | System | Scope | Cost | Persistence |
|
||||
|------|--------|-------|------|-------------|
|
||||
| 1 | **Built-in Memory** (MEMORY.md / USER.md) | Current session, curated facts | ~1,300 tokens fixed per session | File-backed, cross-session |
|
||||
| 2 | **Session Search** (FTS5) | All past conversations | On-demand (search + summarize) | SQLite (state.db) |
|
||||
| 3 | **Skills** (procedural memory) | How to do specific tasks | Loaded on match only | File-backed (~/.hermes/skills/) |
|
||||
| 4 | **External Providers** (plugins) | Deep persistent knowledge | Provider-dependent | Provider-specific |
|
||||
|
||||
All four tiers operate independently. Built-in memory is always active. The others are opt-in or on-demand.
|
||||
|
||||
## Tier 1: Built-in Memory (MEMORY.md / USER.md)
|
||||
|
||||
### File Layout
|
||||
|
||||
```
|
||||
~/.hermes/memories/
|
||||
├── MEMORY.md — Agent's notes (environment facts, conventions, lessons learned)
|
||||
└── USER.md — User profile (preferences, communication style, identity)
|
||||
```
|
||||
|
||||
Profile-aware: when running under a profile (`hermes -p coder`), the memories directory resolves to `~/.hermes/profiles/<name>/memories/`.
|
||||
|
||||
### Frozen Snapshot Pattern
|
||||
|
||||
This is the most important architectural decision in the memory system.
|
||||
|
||||
1. **Session start:** `MemoryStore.load_for_prompt()` reads both files from disk, parses entries delimited by `§` (section sign), and injects them into the system prompt as a frozen block.
|
||||
2. **During session:** The `memory` tool writes to disk immediately (durable), but does **not** update the system prompt. This preserves the LLM's prefix cache for the entire session.
|
||||
3. **Next session:** The snapshot refreshes from disk.
|
||||
|
||||
**Why frozen?** System prompt changes invalidate the KV cache on every API call. With a ~30K token system prompt, that's expensive. Freezing memory at session start means the cache stays warm for the entire conversation. The tradeoff: memory writes made mid-session don't take effect until next session. Tool responses show the live state so the agent can verify writes succeeded.
|
||||
|
||||
### Character Limits
|
||||
|
||||
| Store | Default Limit | Approx Tokens | Typical Entries |
|
||||
|-------|--------------|---------------|-----------------|
|
||||
| MEMORY.md | 2,200 chars | ~800 | 8-15 |
|
||||
| USER.md | 1,375 chars | ~500 | 5-10 |
|
||||
|
||||
Limits are in characters (not tokens) because character counts are model-independent. Configurable in `config.yaml`:
|
||||
|
||||
```yaml
|
||||
memory:
|
||||
memory_char_limit: 2200
|
||||
user_char_limit: 1375
|
||||
```
|
||||
|
||||
### Entry Format
|
||||
|
||||
Entries are separated by `\n§\n`. Each entry can be multiline. Example MEMORY.md:
|
||||
|
||||
```
|
||||
User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop
|
||||
§
|
||||
Project ~/code/api uses Go 1.22, chi router, sqlc. Tests: 'make test'
|
||||
§
|
||||
Staging server 10.0.1.50 uses SSH port 2222, key at ~/.ssh/staging_ed25519
|
||||
```
|
||||
|
||||
### Tool Interface
|
||||
|
||||
The `memory` tool (defined in `tools/memory_tool.py`) supports:
|
||||
|
||||
- **`add`** — Append new entry. Rejects exact duplicates.
|
||||
- **`replace`** — Find entry by unique substring (`old_text`), replace with `content`.
|
||||
- **`remove`** — Find entry by unique substring, delete it.
|
||||
- **`read`** — Return current entries from disk (live state, not frozen snapshot).
|
||||
|
||||
Substring matching: `old_text` must match exactly one entry. If it matches multiple, the tool returns an error asking for more specificity.
|
||||
|
||||
### Security Scanning
|
||||
|
||||
Every memory entry is scanned against `_MEMORY_THREAT_PATTERNS` before acceptance:
|
||||
|
||||
- Prompt injection patterns (`ignore previous instructions`, `you are now...`)
|
||||
- Credential exfiltration (`curl`/`wget` with env vars, `.env` file reads)
|
||||
- SSH backdoor attempts (`authorized_keys`, `.ssh` writes)
|
||||
- Invisible Unicode characters (zero-width spaces, BOM)
|
||||
|
||||
Matches are rejected with an error message. Source: `_scan_memory_content()` in `tools/memory_tool.py`.
|
||||
|
||||
### Code Path
|
||||
|
||||
```
|
||||
agent/prompt_builder.py
|
||||
└── assembles system prompt pieces
|
||||
└── MemoryStore.load_for_prompt() → frozen snapshot injection
|
||||
|
||||
tools/memory_tool.py
|
||||
├── MemoryStore class (file I/O, locking, parsing)
|
||||
├── memory_tool() function (add/replace/remove/read dispatch)
|
||||
└── _scan_memory_content() (threat scanning)
|
||||
|
||||
hermes_cli/memory_setup.py
|
||||
└── Interactive first-run memory setup
|
||||
```
|
||||
|
||||
## Tier 2: Session Search (FTS5)
|
||||
|
||||
### How It Works
|
||||
|
||||
1. Every CLI and gateway session stores full message history in SQLite (`~/.hermes/state.db`)
|
||||
2. The `messages_fts` FTS5 virtual table enables fast full-text search
|
||||
3. The `session_search` tool finds relevant messages, groups by session, loads top N
|
||||
4. Each matching session is summarized by Gemini Flash (auxiliary LLM, not main model)
|
||||
5. Summaries are returned to the main agent as context
|
||||
|
||||
### Why Gemini Flash for Summarization
|
||||
|
||||
Raw session transcripts can be 50K+ chars. Feeding them to the main model wastes context window and tokens. Gemini Flash is fast, cheap, and good enough for "extract the relevant bits" summarization. Same pattern used by `web_extract`.
|
||||
|
||||
### Schema
|
||||
|
||||
```sql
|
||||
-- Core tables
|
||||
sessions (id, source, user_id, model, system_prompt, parent_session_id, ...)
|
||||
messages (id, session_id, role, content, tool_name, timestamp, ...)
|
||||
|
||||
-- Full-text search
|
||||
messages_fts -- FTS5 virtual table on messages.content
|
||||
|
||||
-- Schema tracking
|
||||
schema_version
|
||||
```
|
||||
|
||||
WAL mode for concurrent readers + one writer (gateway multi-platform support).
|
||||
|
||||
### Session Lineage
|
||||
|
||||
When context compression triggers a session split, `parent_session_id` chains the old and new sessions. This lets session search follow the thread across compression boundaries.
|
||||
|
||||
### Code Path
|
||||
|
||||
```
|
||||
tools/session_search_tool.py
|
||||
├── FTS5 query against messages_fts
|
||||
├── Groups results by session_id
|
||||
├── Loads top N sessions (MAX_SESSION_CHARS = 100K per session)
|
||||
├── Sends to Gemini Flash via auxiliary_client.async_call_llm()
|
||||
└── Returns per-session summaries
|
||||
|
||||
hermes_state.py (SessionDB class)
|
||||
├── SQLite WAL mode database
|
||||
├── FTS5 triggers for message insert/update/delete
|
||||
└── Session CRUD operations
|
||||
```
|
||||
|
||||
### Memory vs Session Search
|
||||
|
||||
| | Memory | Session Search |
|
||||
|---|--------|---------------|
|
||||
| **Capacity** | ~1,300 tokens total | Unlimited (all stored sessions) |
|
||||
| **Latency** | Instant (in system prompt) | Requires FTS query + LLM call |
|
||||
| **When to use** | Critical facts always in context | "What did we discuss about X?" |
|
||||
| **Management** | Agent-curated | Automatic |
|
||||
| **Token cost** | Fixed per session | On-demand per search |
|
||||
|
||||
## Tier 3: Skills (Procedural Memory)
|
||||
|
||||
### What Skills Are
|
||||
|
||||
Skills capture **how to do a specific type of task** based on proven experience. Where memory is broad and declarative, skills are narrow and actionable.
|
||||
|
||||
A skill is a directory with a `SKILL.md` (markdown instructions) and optional supporting files:
|
||||
|
||||
```
|
||||
~/.hermes/skills/
|
||||
├── my-skill/
|
||||
│ ├── SKILL.md — Instructions, steps, pitfalls
|
||||
│ ├── references/ — API docs, specs
|
||||
│ ├── templates/ — Code templates, config files
|
||||
│ ├── scripts/ — Helper scripts
|
||||
│ └── assets/ — Images, data files
|
||||
```
|
||||
|
||||
### How Skills Load
|
||||
|
||||
At the start of each turn, the agent's system prompt includes available skills. When a skill matches the current task, the agent loads it with `skill_view(name)` and follows its instructions. Skills are **not** injected wholesale — they're loaded on demand to preserve context window.
|
||||
|
||||
### Skill Lifecycle
|
||||
|
||||
1. **Creation:** After a complex task (5+ tool calls), the agent offers to save the approach as a skill using `skill_manage(action='create')`.
|
||||
2. **Usage:** On future matching tasks, the agent loads the skill with `skill_view(name)`.
|
||||
3. **Maintenance:** If a skill is outdated or incomplete when used, the agent patches it immediately with `skill_manage(action='patch')`.
|
||||
4. **Deletion:** Obsolete skills are removed with `skill_manage(action='delete')`.
|
||||
|
||||
### Skills vs Memory
|
||||
|
||||
| | Memory | Skills |
|
||||
|---|--------|--------|
|
||||
| **Format** | Free-text entries | Structured markdown (steps, pitfalls, examples) |
|
||||
| **Scope** | Facts and preferences | Procedures and workflows |
|
||||
| **Loading** | Always in system prompt | On-demand when matched |
|
||||
| **Size** | ~1,300 tokens total | Variable (loaded individually) |
|
||||
|
||||
### Code Path
|
||||
|
||||
```
|
||||
tools/skill_manager_tool.py — Create, edit, patch, delete skills
|
||||
agent/skill_commands.py — Slash commands for skill management
|
||||
skills_hub.py — Browse, search, install skills from hub
|
||||
```
|
||||
|
||||
## Tier 4: External Memory Providers
|
||||
|
||||
### Plugin Architecture
|
||||
|
||||
```
|
||||
plugins/memory/
|
||||
├── __init__.py — Provider registry and base interface
|
||||
├── honcho/ — Dialectic Q&A, cross-session user modeling
|
||||
├── openviking/ — Knowledge graph memory
|
||||
├── mem0/ — Semantic memory with auto-extraction
|
||||
├── hindsight/ — Retrospective memory analysis
|
||||
├── holographic/ — Distributed holographic memory
|
||||
├── retaindb/ — Vector-based retention
|
||||
├── byterover/ — Byte-level memory compression
|
||||
└── supermemory/ — Cloud-hosted semantic memory
|
||||
```
|
||||
|
||||
Only one external provider can be active at a time. Built-in memory (Tier 1) always runs alongside it.
|
||||
|
||||
### Integration Points
|
||||
|
||||
When a provider is active, Hermes:
|
||||
|
||||
1. Injects provider context into the system prompt
|
||||
2. Prefetches relevant memories before each turn (background, non-blocking)
|
||||
3. Syncs conversation turns to the provider after each response
|
||||
4. Extracts memories on session end (for providers that support it)
|
||||
5. Mirrors built-in memory writes to the provider
|
||||
6. Adds provider-specific tools for search and management
|
||||
|
||||
### Configuration
|
||||
|
||||
```yaml
|
||||
memory:
|
||||
provider: openviking # or honcho, mem0, hindsight, etc.
|
||||
```
|
||||
|
||||
Setup: `hermes memory setup` (interactive picker).
|
||||
|
||||
## Data Lifecycle
|
||||
|
||||
```
|
||||
Session Start
|
||||
│
|
||||
├── Load MEMORY.md + USER.md from disk → frozen snapshot in system prompt
|
||||
├── Load skills catalog (names + descriptions)
|
||||
├── Initialize session search (SQLite connection)
|
||||
└── Initialize external provider (if configured)
|
||||
│
|
||||
▼
|
||||
Each Turn
|
||||
│
|
||||
├── Agent sees frozen memory in system prompt
|
||||
├── Agent can call memory tool → writes to disk, returns live state
|
||||
├── Agent can call session_search → FTS5 + Gemini Flash summarization
|
||||
├── Agent can load skills → reads SKILL.md from disk
|
||||
└── External provider prefetches context (if active)
|
||||
│
|
||||
▼
|
||||
Session End
|
||||
│
|
||||
├── All memory writes already on disk (immediate persistence)
|
||||
├── Session transcript saved to SQLite (messages + FTS5 index)
|
||||
├── External provider extracts final memories (if supported)
|
||||
└── Skill updates persisted (if any were patched)
|
||||
```
|
||||
|
||||
## Privacy and Data Locality
|
||||
|
||||
| Component | Location | Network |
|
||||
|-----------|----------|---------|
|
||||
| MEMORY.md / USER.md | `~/.hermes/memories/` | Local only |
|
||||
| Session DB | `~/.hermes/state.db` | Local only |
|
||||
| Skills | `~/.hermes/skills/` | Local only |
|
||||
| External provider | Provider-dependent | Provider API calls |
|
||||
|
||||
Built-in memory (Tiers 1-3) never leaves the machine. External providers (Tier 4) send data to the configured provider by design. The agent logs all provider API calls in the session transcript for auditability.
|
||||
|
||||
## Configuration Reference
|
||||
|
||||
```yaml
|
||||
# ~/.hermes/config.yaml
|
||||
memory:
|
||||
memory_enabled: true # Enable MEMORY.md
|
||||
user_profile_enabled: true # Enable USER.md
|
||||
memory_char_limit: 2200 # MEMORY.md char limit (~800 tokens)
|
||||
user_char_limit: 1375 # USER.md char limit (~500 tokens)
|
||||
nudge_interval: 10 # Turns between memory nudge reminders
|
||||
provider: null # External provider name (null = disabled)
|
||||
```
|
||||
|
||||
Environment variables (in `~/.hermes/.env`):
|
||||
- Provider-specific API keys (e.g., `HONCHO_API_KEY`, `MEM0_API_KEY`)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Memory not appearing in system prompt
|
||||
|
||||
- Check `~/.hermes/memories/MEMORY.md` exists and has content
|
||||
- Verify `memory.memory_enabled: true` in config
|
||||
- Check for file lock issues (WAL mode, concurrent access)
|
||||
|
||||
### Memory writes not taking effect
|
||||
|
||||
- Writes are durable to disk immediately but frozen in system prompt until next session
|
||||
- Tool response shows live state — verify the write succeeded there
|
||||
- Start a new session to see the updated snapshot
|
||||
|
||||
### Session search returns nothing
|
||||
|
||||
- Verify `state.db` has sessions: `sqlite3 ~/.hermes/state.db "SELECT count(*) FROM sessions"`
|
||||
- Check FTS5 index: `sqlite3 ~/.hermes/state.db "SELECT count(*) FROM messages_fts"`
|
||||
- Ensure auxiliary LLM (Gemini Flash) is configured and reachable
|
||||
|
||||
### Skills not loading
|
||||
|
||||
- Check `~/.hermes/skills/` directory exists
|
||||
- Verify SKILL.md has valid frontmatter (name, description)
|
||||
- Skills load by name match — check the skill name matches what the agent expects
|
||||
|
||||
### External provider errors
|
||||
|
||||
- Check API key in `~/.hermes/.env`
|
||||
- Verify provider is installed: `pip install <provider-package>`
|
||||
- Run `hermes memory status` for diagnostic info
|
||||
Reference in New Issue
Block a user