burn: add Memory Architecture Guide (closes #263, #258)

Developer-facing guide covering all four memory tiers: - Built-in memory (MEMORY.md/USER.md) with frozen snapshot pattern - Session search (FTS5 + Gemini Flash summarization) - Skills as procedural memory - External memory provider plugin architecture Includes data lifecycle, security guarantees, code paths, configuration reference, and troubleshooting.
2026-04-09 20:51:45 -04:00
parent 7d2421a15f
commit 815160bd6f
1 changed files with 335 additions and 0 deletions
--- a/docs/memory-architecture-guide.md
+++ b/docs/memory-architecture-guide.md
@@ -0,0 +1,335 @@
+# Memory Architecture Guide
+
+Developer-facing guide to the Hermes Agent memory system. Covers all four memory tiers, data lifecycle, security guarantees, and extension points.
+
+## Overview
+
+Hermes has four distinct memory systems, each serving a different purpose:
+
+| Tier | System | Scope | Cost | Persistence |
+|------|--------|-------|------|-------------|
+| 1 | **Built-in Memory** (MEMORY.md / USER.md) | Current session, curated facts | ~1,300 tokens fixed per session | File-backed, cross-session |
+| 2 | **Session Search** (FTS5) | All past conversations | On-demand (search + summarize) | SQLite (state.db) |
+| 3 | **Skills** (procedural memory) | How to do specific tasks | Loaded on match only | File-backed (~/.hermes/skills/) |
+| 4 | **External Providers** (plugins) | Deep persistent knowledge | Provider-dependent | Provider-specific |
+
+All four tiers operate independently. Built-in memory is always active. The others are opt-in or on-demand.
+
+## Tier 1: Built-in Memory (MEMORY.md / USER.md)
+
+### File Layout
+
+```
+~/.hermes/memories/
+├── MEMORY.md    — Agent's notes (environment facts, conventions, lessons learned)
+└── USER.md      — User profile (preferences, communication style, identity)
+```
+
+Profile-aware: when running under a profile (`hermes -p coder`), the memories directory resolves to `~/.hermes/profiles/<name>/memories/`.
+
+### Frozen Snapshot Pattern
+
+This is the most important architectural decision in the memory system.
+
+1. **Session start:** `MemoryStore.load_for_prompt()` reads both files from disk, parses entries delimited by `§` (section sign), and injects them into the system prompt as a frozen block.
+2. **During session:** The `memory` tool writes to disk immediately (durable), but does **not** update the system prompt. This preserves the LLM's prefix cache for the entire session.
+3. **Next session:** The snapshot refreshes from disk.
+
+**Why frozen?** System prompt changes invalidate the KV cache on every API call. With a ~30K token system prompt, that's expensive. Freezing memory at session start means the cache stays warm for the entire conversation. The tradeoff: memory writes made mid-session don't take effect until next session. Tool responses show the live state so the agent can verify writes succeeded.
+
+### Character Limits
+
+| Store | Default Limit | Approx Tokens | Typical Entries |
+|-------|--------------|---------------|-----------------|
+| MEMORY.md | 2,200 chars | ~800 | 8-15 |
+| USER.md | 1,375 chars | ~500 | 5-10 |
+
+Limits are in characters (not tokens) because character counts are model-independent. Configurable in `config.yaml`:
+
+```yaml
+memory:
+  memory_char_limit: 2200
+  user_char_limit: 1375
+```
+
+### Entry Format
+
+Entries are separated by `\n§\n`. Each entry can be multiline. Example MEMORY.md:
+
+```
+User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop
+§
+Project ~/code/api uses Go 1.22, chi router, sqlc. Tests: 'make test'
+§
+Staging server 10.0.1.50 uses SSH port 2222, key at ~/.ssh/staging_ed25519
+```
+
+### Tool Interface
+
+The `memory` tool (defined in `tools/memory_tool.py`) supports:
+
+- **`add`** — Append new entry. Rejects exact duplicates.
+- **`replace`** — Find entry by unique substring (`old_text`), replace with `content`.
+- **`remove`** — Find entry by unique substring, delete it.
+- **`read`** — Return current entries from disk (live state, not frozen snapshot).
+
+Substring matching: `old_text` must match exactly one entry. If it matches multiple, the tool returns an error asking for more specificity.
+
+### Security Scanning
+
+Every memory entry is scanned against `_MEMORY_THREAT_PATTERNS` before acceptance:
+
+- Prompt injection patterns (`ignore previous instructions`, `you are now...`)
+- Credential exfiltration (`curl`/`wget` with env vars, `.env` file reads)
+- SSH backdoor attempts (`authorized_keys`, `.ssh` writes)
+- Invisible Unicode characters (zero-width spaces, BOM)
+
+Matches are rejected with an error message. Source: `_scan_memory_content()` in `tools/memory_tool.py`.
+
+### Code Path
+
+```
+agent/prompt_builder.py
+  └── assembles system prompt pieces
+       └── MemoryStore.load_for_prompt() → frozen snapshot injection
+
+tools/memory_tool.py
+  ├── MemoryStore class (file I/O, locking, parsing)
+  ├── memory_tool() function (add/replace/remove/read dispatch)
+  └── _scan_memory_content() (threat scanning)
+
+hermes_cli/memory_setup.py
+  └── Interactive first-run memory setup
+```
+
+## Tier 2: Session Search (FTS5)
+
+### How It Works
+
+1. Every CLI and gateway session stores full message history in SQLite (`~/.hermes/state.db`)
+2. The `messages_fts` FTS5 virtual table enables fast full-text search
+3. The `session_search` tool finds relevant messages, groups by session, loads top N
+4. Each matching session is summarized by Gemini Flash (auxiliary LLM, not main model)
+5. Summaries are returned to the main agent as context
+
+### Why Gemini Flash for Summarization
+
+Raw session transcripts can be 50K+ chars. Feeding them to the main model wastes context window and tokens. Gemini Flash is fast, cheap, and good enough for "extract the relevant bits" summarization. Same pattern used by `web_extract`.
+
+### Schema
+
+```sql
+-- Core tables
+sessions (id, source, user_id, model, system_prompt, parent_session_id, ...)
+messages (id, session_id, role, content, tool_name, timestamp, ...)
+
+-- Full-text search
+messages_fts  -- FTS5 virtual table on messages.content
+
+-- Schema tracking
+schema_version
+```
+
+WAL mode for concurrent readers + one writer (gateway multi-platform support).
+
+### Session Lineage
+
+When context compression triggers a session split, `parent_session_id` chains the old and new sessions. This lets session search follow the thread across compression boundaries.
+
+### Code Path
+
+```
+tools/session_search_tool.py
+  ├── FTS5 query against messages_fts
+  ├── Groups results by session_id
+  ├── Loads top N sessions (MAX_SESSION_CHARS = 100K per session)
+  ├── Sends to Gemini Flash via auxiliary_client.async_call_llm()
+  └── Returns per-session summaries
+
+hermes_state.py (SessionDB class)
+  ├── SQLite WAL mode database
+  ├── FTS5 triggers for message insert/update/delete
+  └── Session CRUD operations
+```
+
+### Memory vs Session Search
+
+| | Memory | Session Search |
+|---|--------|---------------|
+| **Capacity** | ~1,300 tokens total | Unlimited (all stored sessions) |
+| **Latency** | Instant (in system prompt) | Requires FTS query + LLM call |
+| **When to use** | Critical facts always in context | "What did we discuss about X?" |
+| **Management** | Agent-curated | Automatic |
+| **Token cost** | Fixed per session | On-demand per search |
+
+## Tier 3: Skills (Procedural Memory)
+
+### What Skills Are
+
+Skills capture **how to do a specific type of task** based on proven experience. Where memory is broad and declarative, skills are narrow and actionable.
+
+A skill is a directory with a `SKILL.md` (markdown instructions) and optional supporting files:
+
+```
+~/.hermes/skills/
+├── my-skill/
+│   ├── SKILL.md          — Instructions, steps, pitfalls
+│   ├── references/       — API docs, specs
+│   ├── templates/        — Code templates, config files
+│   ├── scripts/          — Helper scripts
+│   └── assets/           — Images, data files
+```
+
+### How Skills Load
+
+At the start of each turn, the agent's system prompt includes available skills. When a skill matches the current task, the agent loads it with `skill_view(name)` and follows its instructions. Skills are **not** injected wholesale — they're loaded on demand to preserve context window.
+
+### Skill Lifecycle
+
+1. **Creation:** After a complex task (5+ tool calls), the agent offers to save the approach as a skill using `skill_manage(action='create')`.
+2. **Usage:** On future matching tasks, the agent loads the skill with `skill_view(name)`.
+3. **Maintenance:** If a skill is outdated or incomplete when used, the agent patches it immediately with `skill_manage(action='patch')`.
+4. **Deletion:** Obsolete skills are removed with `skill_manage(action='delete')`.
+
+### Skills vs Memory
+
+| | Memory | Skills |
+|---|--------|--------|
+| **Format** | Free-text entries | Structured markdown (steps, pitfalls, examples) |
+| **Scope** | Facts and preferences | Procedures and workflows |
+| **Loading** | Always in system prompt | On-demand when matched |
+| **Size** | ~1,300 tokens total | Variable (loaded individually) |
+
+### Code Path
+
+```
+tools/skill_manager_tool.py  — Create, edit, patch, delete skills
+agent/skill_commands.py       — Slash commands for skill management
+skills_hub.py                 — Browse, search, install skills from hub
+```
+
+## Tier 4: External Memory Providers
+
+### Plugin Architecture
+
+```
+plugins/memory/
+├── __init__.py        — Provider registry and base interface
+├── honcho/            — Dialectic Q&A, cross-session user modeling
+├── openviking/        — Knowledge graph memory
+├── mem0/              — Semantic memory with auto-extraction
+├── hindsight/         — Retrospective memory analysis
+├── holographic/       — Distributed holographic memory
+├── retaindb/          — Vector-based retention
+├── byterover/         — Byte-level memory compression
+└── supermemory/       — Cloud-hosted semantic memory
+```
+
+Only one external provider can be active at a time. Built-in memory (Tier 1) always runs alongside it.
+
+### Integration Points
+
+When a provider is active, Hermes:
+
+1. Injects provider context into the system prompt
+2. Prefetches relevant memories before each turn (background, non-blocking)
+3. Syncs conversation turns to the provider after each response
+4. Extracts memories on session end (for providers that support it)
+5. Mirrors built-in memory writes to the provider
+6. Adds provider-specific tools for search and management
+
+### Configuration
+
+```yaml
+memory:
+  provider: openviking  # or honcho, mem0, hindsight, etc.
+```
+
+Setup: `hermes memory setup` (interactive picker).
+
+## Data Lifecycle
+
+```
+Session Start
+  │
+  ├── Load MEMORY.md + USER.md from disk → frozen snapshot in system prompt
+  ├── Load skills catalog (names + descriptions)
+  ├── Initialize session search (SQLite connection)
+  └── Initialize external provider (if configured)
+  │
+  ▼
+Each Turn
+  │
+  ├── Agent sees frozen memory in system prompt
+  ├── Agent can call memory tool → writes to disk, returns live state
+  ├── Agent can call session_search → FTS5 + Gemini Flash summarization
+  ├── Agent can load skills → reads SKILL.md from disk
+  └── External provider prefetches context (if active)
+  │
+  ▼
+Session End
+  │
+  ├── All memory writes already on disk (immediate persistence)
+  ├── Session transcript saved to SQLite (messages + FTS5 index)
+  ├── External provider extracts final memories (if supported)
+  └── Skill updates persisted (if any were patched)
+```
+
+## Privacy and Data Locality
+
+| Component | Location | Network |
+|-----------|----------|---------|
+| MEMORY.md / USER.md | `~/.hermes/memories/` | Local only |
+| Session DB | `~/.hermes/state.db` | Local only |
+| Skills | `~/.hermes/skills/` | Local only |
+| External provider | Provider-dependent | Provider API calls |
+
+Built-in memory (Tiers 1-3) never leaves the machine. External providers (Tier 4) send data to the configured provider by design. The agent logs all provider API calls in the session transcript for auditability.
+
+## Configuration Reference
+
+```yaml
+# ~/.hermes/config.yaml
+memory:
+  memory_enabled: true          # Enable MEMORY.md
+  user_profile_enabled: true    # Enable USER.md
+  memory_char_limit: 2200       # MEMORY.md char limit (~800 tokens)
+  user_char_limit: 1375         # USER.md char limit (~500 tokens)
+  nudge_interval: 10            # Turns between memory nudge reminders
+  provider: null                # External provider name (null = disabled)
+```
+
+Environment variables (in `~/.hermes/.env`):
+- Provider-specific API keys (e.g., `HONCHO_API_KEY`, `MEM0_API_KEY`)
+
+## Troubleshooting
+
+### Memory not appearing in system prompt
+
+- Check `~/.hermes/memories/MEMORY.md` exists and has content
+- Verify `memory.memory_enabled: true` in config
+- Check for file lock issues (WAL mode, concurrent access)
+
+### Memory writes not taking effect
+
+- Writes are durable to disk immediately but frozen in system prompt until next session
+- Tool response shows live state — verify the write succeeded there
+- Start a new session to see the updated snapshot
+
+### Session search returns nothing
+
+- Verify `state.db` has sessions: `sqlite3 ~/.hermes/state.db "SELECT count(*) FROM sessions"`
+- Check FTS5 index: `sqlite3 ~/.hermes/state.db "SELECT count(*) FROM messages_fts"`
+- Ensure auxiliary LLM (Gemini Flash) is configured and reachable
+
+### Skills not loading
+
+- Check `~/.hermes/skills/` directory exists
+- Verify SKILL.md has valid frontmatter (name, description)
+- Skills load by name match — check the skill name matches what the agent expects
+
+### External provider errors
+
+- Check API key in `~/.hermes/.env`
+- Verify provider is installed: `pip install <provider-package>`
+- Run `hermes memory status` for diagnostic info