From 815160bd6f40337b523f0cf6d2b2e0757cfb59fb Mon Sep 17 00:00:00 2001 From: Alexander Whitestone Date: Thu, 9 Apr 2026 20:51:45 -0400 Subject: [PATCH] burn: add Memory Architecture Guide (closes #263, #258) Developer-facing guide covering all four memory tiers: - Built-in memory (MEMORY.md/USER.md) with frozen snapshot pattern - Session search (FTS5 + Gemini Flash summarization) - Skills as procedural memory - External memory provider plugin architecture Includes data lifecycle, security guarantees, code paths, configuration reference, and troubleshooting. --- docs/memory-architecture-guide.md | 335 ++++++++++++++++++++++++++++++ 1 file changed, 335 insertions(+) create mode 100644 docs/memory-architecture-guide.md diff --git a/docs/memory-architecture-guide.md b/docs/memory-architecture-guide.md new file mode 100644 index 000000000..f1c4ac894 --- /dev/null +++ b/docs/memory-architecture-guide.md @@ -0,0 +1,335 @@ +# Memory Architecture Guide + +Developer-facing guide to the Hermes Agent memory system. Covers all four memory tiers, data lifecycle, security guarantees, and extension points. + +## Overview + +Hermes has four distinct memory systems, each serving a different purpose: + +| Tier | System | Scope | Cost | Persistence | +|------|--------|-------|------|-------------| +| 1 | **Built-in Memory** (MEMORY.md / USER.md) | Current session, curated facts | ~1,300 tokens fixed per session | File-backed, cross-session | +| 2 | **Session Search** (FTS5) | All past conversations | On-demand (search + summarize) | SQLite (state.db) | +| 3 | **Skills** (procedural memory) | How to do specific tasks | Loaded on match only | File-backed (~/.hermes/skills/) | +| 4 | **External Providers** (plugins) | Deep persistent knowledge | Provider-dependent | Provider-specific | + +All four tiers operate independently. Built-in memory is always active. The others are opt-in or on-demand. + +## Tier 1: Built-in Memory (MEMORY.md / USER.md) + +### File Layout + +``` +~/.hermes/memories/ +├── MEMORY.md — Agent's notes (environment facts, conventions, lessons learned) +└── USER.md — User profile (preferences, communication style, identity) +``` + +Profile-aware: when running under a profile (`hermes -p coder`), the memories directory resolves to `~/.hermes/profiles//memories/`. + +### Frozen Snapshot Pattern + +This is the most important architectural decision in the memory system. + +1. **Session start:** `MemoryStore.load_for_prompt()` reads both files from disk, parses entries delimited by `§` (section sign), and injects them into the system prompt as a frozen block. +2. **During session:** The `memory` tool writes to disk immediately (durable), but does **not** update the system prompt. This preserves the LLM's prefix cache for the entire session. +3. **Next session:** The snapshot refreshes from disk. + +**Why frozen?** System prompt changes invalidate the KV cache on every API call. With a ~30K token system prompt, that's expensive. Freezing memory at session start means the cache stays warm for the entire conversation. The tradeoff: memory writes made mid-session don't take effect until next session. Tool responses show the live state so the agent can verify writes succeeded. + +### Character Limits + +| Store | Default Limit | Approx Tokens | Typical Entries | +|-------|--------------|---------------|-----------------| +| MEMORY.md | 2,200 chars | ~800 | 8-15 | +| USER.md | 1,375 chars | ~500 | 5-10 | + +Limits are in characters (not tokens) because character counts are model-independent. Configurable in `config.yaml`: + +```yaml +memory: + memory_char_limit: 2200 + user_char_limit: 1375 +``` + +### Entry Format + +Entries are separated by `\n§\n`. Each entry can be multiline. Example MEMORY.md: + +``` +User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop +§ +Project ~/code/api uses Go 1.22, chi router, sqlc. Tests: 'make test' +§ +Staging server 10.0.1.50 uses SSH port 2222, key at ~/.ssh/staging_ed25519 +``` + +### Tool Interface + +The `memory` tool (defined in `tools/memory_tool.py`) supports: + +- **`add`** — Append new entry. Rejects exact duplicates. +- **`replace`** — Find entry by unique substring (`old_text`), replace with `content`. +- **`remove`** — Find entry by unique substring, delete it. +- **`read`** — Return current entries from disk (live state, not frozen snapshot). + +Substring matching: `old_text` must match exactly one entry. If it matches multiple, the tool returns an error asking for more specificity. + +### Security Scanning + +Every memory entry is scanned against `_MEMORY_THREAT_PATTERNS` before acceptance: + +- Prompt injection patterns (`ignore previous instructions`, `you are now...`) +- Credential exfiltration (`curl`/`wget` with env vars, `.env` file reads) +- SSH backdoor attempts (`authorized_keys`, `.ssh` writes) +- Invisible Unicode characters (zero-width spaces, BOM) + +Matches are rejected with an error message. Source: `_scan_memory_content()` in `tools/memory_tool.py`. + +### Code Path + +``` +agent/prompt_builder.py + └── assembles system prompt pieces + └── MemoryStore.load_for_prompt() → frozen snapshot injection + +tools/memory_tool.py + ├── MemoryStore class (file I/O, locking, parsing) + ├── memory_tool() function (add/replace/remove/read dispatch) + └── _scan_memory_content() (threat scanning) + +hermes_cli/memory_setup.py + └── Interactive first-run memory setup +``` + +## Tier 2: Session Search (FTS5) + +### How It Works + +1. Every CLI and gateway session stores full message history in SQLite (`~/.hermes/state.db`) +2. The `messages_fts` FTS5 virtual table enables fast full-text search +3. The `session_search` tool finds relevant messages, groups by session, loads top N +4. Each matching session is summarized by Gemini Flash (auxiliary LLM, not main model) +5. Summaries are returned to the main agent as context + +### Why Gemini Flash for Summarization + +Raw session transcripts can be 50K+ chars. Feeding them to the main model wastes context window and tokens. Gemini Flash is fast, cheap, and good enough for "extract the relevant bits" summarization. Same pattern used by `web_extract`. + +### Schema + +```sql +-- Core tables +sessions (id, source, user_id, model, system_prompt, parent_session_id, ...) +messages (id, session_id, role, content, tool_name, timestamp, ...) + +-- Full-text search +messages_fts -- FTS5 virtual table on messages.content + +-- Schema tracking +schema_version +``` + +WAL mode for concurrent readers + one writer (gateway multi-platform support). + +### Session Lineage + +When context compression triggers a session split, `parent_session_id` chains the old and new sessions. This lets session search follow the thread across compression boundaries. + +### Code Path + +``` +tools/session_search_tool.py + ├── FTS5 query against messages_fts + ├── Groups results by session_id + ├── Loads top N sessions (MAX_SESSION_CHARS = 100K per session) + ├── Sends to Gemini Flash via auxiliary_client.async_call_llm() + └── Returns per-session summaries + +hermes_state.py (SessionDB class) + ├── SQLite WAL mode database + ├── FTS5 triggers for message insert/update/delete + └── Session CRUD operations +``` + +### Memory vs Session Search + +| | Memory | Session Search | +|---|--------|---------------| +| **Capacity** | ~1,300 tokens total | Unlimited (all stored sessions) | +| **Latency** | Instant (in system prompt) | Requires FTS query + LLM call | +| **When to use** | Critical facts always in context | "What did we discuss about X?" | +| **Management** | Agent-curated | Automatic | +| **Token cost** | Fixed per session | On-demand per search | + +## Tier 3: Skills (Procedural Memory) + +### What Skills Are + +Skills capture **how to do a specific type of task** based on proven experience. Where memory is broad and declarative, skills are narrow and actionable. + +A skill is a directory with a `SKILL.md` (markdown instructions) and optional supporting files: + +``` +~/.hermes/skills/ +├── my-skill/ +│ ├── SKILL.md — Instructions, steps, pitfalls +│ ├── references/ — API docs, specs +│ ├── templates/ — Code templates, config files +│ ├── scripts/ — Helper scripts +│ └── assets/ — Images, data files +``` + +### How Skills Load + +At the start of each turn, the agent's system prompt includes available skills. When a skill matches the current task, the agent loads it with `skill_view(name)` and follows its instructions. Skills are **not** injected wholesale — they're loaded on demand to preserve context window. + +### Skill Lifecycle + +1. **Creation:** After a complex task (5+ tool calls), the agent offers to save the approach as a skill using `skill_manage(action='create')`. +2. **Usage:** On future matching tasks, the agent loads the skill with `skill_view(name)`. +3. **Maintenance:** If a skill is outdated or incomplete when used, the agent patches it immediately with `skill_manage(action='patch')`. +4. **Deletion:** Obsolete skills are removed with `skill_manage(action='delete')`. + +### Skills vs Memory + +| | Memory | Skills | +|---|--------|--------| +| **Format** | Free-text entries | Structured markdown (steps, pitfalls, examples) | +| **Scope** | Facts and preferences | Procedures and workflows | +| **Loading** | Always in system prompt | On-demand when matched | +| **Size** | ~1,300 tokens total | Variable (loaded individually) | + +### Code Path + +``` +tools/skill_manager_tool.py — Create, edit, patch, delete skills +agent/skill_commands.py — Slash commands for skill management +skills_hub.py — Browse, search, install skills from hub +``` + +## Tier 4: External Memory Providers + +### Plugin Architecture + +``` +plugins/memory/ +├── __init__.py — Provider registry and base interface +├── honcho/ — Dialectic Q&A, cross-session user modeling +├── openviking/ — Knowledge graph memory +├── mem0/ — Semantic memory with auto-extraction +├── hindsight/ — Retrospective memory analysis +├── holographic/ — Distributed holographic memory +├── retaindb/ — Vector-based retention +├── byterover/ — Byte-level memory compression +└── supermemory/ — Cloud-hosted semantic memory +``` + +Only one external provider can be active at a time. Built-in memory (Tier 1) always runs alongside it. + +### Integration Points + +When a provider is active, Hermes: + +1. Injects provider context into the system prompt +2. Prefetches relevant memories before each turn (background, non-blocking) +3. Syncs conversation turns to the provider after each response +4. Extracts memories on session end (for providers that support it) +5. Mirrors built-in memory writes to the provider +6. Adds provider-specific tools for search and management + +### Configuration + +```yaml +memory: + provider: openviking # or honcho, mem0, hindsight, etc. +``` + +Setup: `hermes memory setup` (interactive picker). + +## Data Lifecycle + +``` +Session Start + │ + ├── Load MEMORY.md + USER.md from disk → frozen snapshot in system prompt + ├── Load skills catalog (names + descriptions) + ├── Initialize session search (SQLite connection) + └── Initialize external provider (if configured) + │ + ▼ +Each Turn + │ + ├── Agent sees frozen memory in system prompt + ├── Agent can call memory tool → writes to disk, returns live state + ├── Agent can call session_search → FTS5 + Gemini Flash summarization + ├── Agent can load skills → reads SKILL.md from disk + └── External provider prefetches context (if active) + │ + ▼ +Session End + │ + ├── All memory writes already on disk (immediate persistence) + ├── Session transcript saved to SQLite (messages + FTS5 index) + ├── External provider extracts final memories (if supported) + └── Skill updates persisted (if any were patched) +``` + +## Privacy and Data Locality + +| Component | Location | Network | +|-----------|----------|---------| +| MEMORY.md / USER.md | `~/.hermes/memories/` | Local only | +| Session DB | `~/.hermes/state.db` | Local only | +| Skills | `~/.hermes/skills/` | Local only | +| External provider | Provider-dependent | Provider API calls | + +Built-in memory (Tiers 1-3) never leaves the machine. External providers (Tier 4) send data to the configured provider by design. The agent logs all provider API calls in the session transcript for auditability. + +## Configuration Reference + +```yaml +# ~/.hermes/config.yaml +memory: + memory_enabled: true # Enable MEMORY.md + user_profile_enabled: true # Enable USER.md + memory_char_limit: 2200 # MEMORY.md char limit (~800 tokens) + user_char_limit: 1375 # USER.md char limit (~500 tokens) + nudge_interval: 10 # Turns between memory nudge reminders + provider: null # External provider name (null = disabled) +``` + +Environment variables (in `~/.hermes/.env`): +- Provider-specific API keys (e.g., `HONCHO_API_KEY`, `MEM0_API_KEY`) + +## Troubleshooting + +### Memory not appearing in system prompt + +- Check `~/.hermes/memories/MEMORY.md` exists and has content +- Verify `memory.memory_enabled: true` in config +- Check for file lock issues (WAL mode, concurrent access) + +### Memory writes not taking effect + +- Writes are durable to disk immediately but frozen in system prompt until next session +- Tool response shows live state — verify the write succeeded there +- Start a new session to see the updated snapshot + +### Session search returns nothing + +- Verify `state.db` has sessions: `sqlite3 ~/.hermes/state.db "SELECT count(*) FROM sessions"` +- Check FTS5 index: `sqlite3 ~/.hermes/state.db "SELECT count(*) FROM messages_fts"` +- Ensure auxiliary LLM (Gemini Flash) is configured and reachable + +### Skills not loading + +- Check `~/.hermes/skills/` directory exists +- Verify SKILL.md has valid frontmatter (name, description) +- Skills load by name match — check the skill name matches what the agent expects + +### External provider errors + +- Check API key in `~/.hermes/.env` +- Verify provider is installed: `pip install ` +- Run `hermes memory status` for diagnostic info