diff --git a/README.md b/README.md
index ce094e953..7247a8ba6 100644
--- a/README.md
+++ b/README.md
@@ -154,7 +154,35 @@ hermes --toolsets "web,terminal"
 hermes --list-tools
 ```
 
-**Available toolsets:** `web`, `terminal`, `file`, `browser`, `vision`, `image_gen`, `moa`, `skills`, `tts`, `todo`, `cronjob`, and more.
+**Available toolsets:** `web`, `terminal`, `file`, `browser`, `vision`, `image_gen`, `moa`, `skills`, `tts`, `todo`, `memory`, `session_search`, `cronjob`, and more.
+
+### 🧠 Persistent Memory
+
+Bounded curated memory that persists across sessions:
+
+- **MEMORY.md** — agent's personal notes (environment facts, conventions, things learned). ~800 token budget.
+- **USER.md** — user profile (preferences, communication style, expectations). ~500 token budget.
+
+Both are injected into the system prompt as a frozen snapshot at session start. The agent manages its own memory via the `memory` tool (add/replace/remove/read). Character limits keep memory focused — when full, the agent consolidates or replaces entries.
+
+Configure in `~/.hermes/config.yaml`:
+```yaml
+memory:
+  memory_enabled: true
+  user_profile_enabled: true
+  memory_char_limit: 2200   # ~800 tokens
+  user_char_limit: 1375     # ~500 tokens
+```
+
+### 🗄️ Session Store
+
+All CLI and messaging sessions are stored in a SQLite database (`~/.hermes/state.db`) with full-text search:
+
+- **Full message history** stored per-session with model config and system prompt snapshots
+- **FTS5 search** via the `session_search` tool -- search past conversations with Gemini Flash summarization
+- **Compression-triggered session splitting** -- when context is compressed, a new session is created linked to the parent, giving clean trajectories
+- **Source tagging** -- each session is tagged with its origin (cli, telegram, discord, etc.)
+- Batch runner and RL trajectories are NOT stored here (separate systems)
 
 ### 🔊 Text-to-Speech
 
diff --git a/TODO.md b/TODO.md
index 61b7064e5..8cee076e8 100644
--- a/TODO.md
+++ b/TODO.md
@@ -6,12 +6,34 @@
 
 **43+ tools** across 13 toolsets: web (search, extract), terminal + process management, file ops (read, write, patch, search), vision, MoA reasoning, image gen, browser (10 tools via Browserbase), skills (41 skills), **todo (task planning)**, cronjobs, RL training (10 tools via Tinker-Atropos), TTS, cross-channel messaging.
 
+**Skills Hub**: search/install/inspect/audit/uninstall/publish/snapshot across 4 registries (GitHub, ClawHub, Claude Code marketplaces, LobeHub). Security scanner with trust-aware policy. CLI (`hermes skills ...`) and `/skills` slash command. agentskills.io spec compliant.
+
 **4 platform adapters**: Telegram, Discord, WhatsApp, Slack -- all with typing indicators, image/voice auto-analysis, dangerous command approval, interrupt support, background process watchers.
 
 **Other**: Context compression, context files (SOUL.md, AGENTS.md), session JSONL transcripts, batch runner with toolset distributions, 13 personalities, DM pairing auth, PTY mode, model metadata caching.
 
 ---
 
+## The Knowledge System (how Memory, Skills, Sessions, and Subagents interconnect)
+
+These four systems form a continuum of agent intelligence. They should be thought of together:
+
+**Types of agent knowledge:**
+- **Procedural memory (Skills)** -- reusable approaches for specific task types. "How to deploy a Docker container." "How to fine-tune with Axolotl." Created when the agent works through something difficult and succeeds.
+- **Declarative memory (MEMORY.md)** -- facts about the environment, projects, tools, conventions. "This repo uses Poetry, not pip." "The API key is stored in ~/.config/keys."
+- **Identity memory (USER.md / memory_summary.md)** -- who the user is, how they like to work, their preferences, communication style. Persists across all sessions.
+- **Error memory (Learnings)** -- what went wrong and the proven fix. "pip install fails on this system because of X; use conda instead."
+- **Episodic memory (Session summaries)** -- what happened in past sessions. Searchable for when the agent needs to recall prior conversations.
+
+**The feedback loop:** After complex tasks, especially ones involving difficulty or iteration, the agent should:
+1. Ask the user for feedback: "How was that? Did it work out?"
+2. If successful, offer to save: "Would you like me to save that as a skill for next time?"
+3. Update general memory with any durable insights (user preferences, environment facts, lessons learned)
+
+**Storage evolution:** Start with flat files (Phase 1), migrate to SQLite (Phase 2) when the volume of sessions and memories makes file-based storage unwieldy.
+
+---
+
 ## 1. Subagent Architecture (Context Isolation) 🎯
 
 **Status:** Not started
@@ -27,7 +49,7 @@ The main agent becomes an orchestrator that delegates context-heavy tasks to sub
 
 **What other agents do:**
 - **OpenClaw**: `sessions_spawn` + `subagents` tool with list/kill/steer actions, depth limits, rate limiting. Cross-session agent-to-agent coordination via `sessions_send`.
-- **Codex**: `spawn_agent` / `send_input` / `close_agent` / `wait_for_agent` with configurable timeouts. Thread manager for concurrent agents.
+- **Codex**: `spawn_agent` / `send_input` / `close_agent` / `wait_for_agent` with configurable timeouts. Thread manager for concurrent agents. Also uses subagents for memory consolidation (Phase 2 spawns a dedicated consolidation agent).
 - **Cline**: Up to 5 parallel subagents per invocation. Subagents get restricted tool access (read, list, search, bash, skill, attempt). Progress tracking with stats (tool calls, tokens, cost, context usage).
 - **OpenCode**: `TaskTool` creates subagent sessions with permission inheritance. Resumable tasks via `task_id`. Parent-child session relationships.
 
@@ -36,182 +58,155 @@ The main agent becomes an orchestrator that delegates context-heavy tasks to sub
 - Subagent gets: goal, context excerpt, restricted toolset, fresh conversation
 - Returns: summary string (success/failure + key findings + any file paths created)
 - Track active subagents so parent can reference them; limit concurrency to 3
+- Primary use cases: parallelizing distinct work (research two topics, work on two separate code changes), handling context-heavy tasks that would bloat the parent's context
 - Later: add `send_input` for interactive subagent steering (Codex-style)
 - Later: cross-session coordination for gateway (OpenClaw-style `sessions_send`)
 
 ---
 
-## 2. Planning & Task Management 📋 ✅
-
-**Status:** Implemented
-**Priority:** High -- every serious agent has this now
-
-A `todo` tool the agent uses to decompose complex tasks, track progress, and recover from failures. Must be **cache-friendly** -- no system prompt mutation, no injected messages that invalidate the KV cache prefix.
-
-**What other agents do:**
-- **Cursor (Claude)**: `TodoWrite` tool. Flat list, 4 states, `merge` flag. Todos injected into context every turn. Effective but **cache-hostile** -- mutating the system prompt on every turn invalidates the entire prefix cache for all prior tokens.
-- **OpenCode**: `todowrite` / `todoread` as separate tools. State lives server-side. Agent reads it back when needed. **Cache-friendly** -- no context mutation; the todo state appears only as tool call/response pairs in the normal conversation flow.
-- **Cline**: `focus_chain` tool for task management with sequential focus
-- **OpenClaw**: `/mesh <goal>` command auto-plans and runs multi-step workflows
-
-### The caching problem
-
-With OpenRouter / vLLM / any prefix-caching provider, the prompt is cached from the start of the sequence. If we inject a changing todo list into the system prompt (or anywhere before the most recent messages), we invalidate the cache for the entire conversation. On a 100K-token conversation, this means re-processing the full prefix on every turn instead of just the new tokens. That's a massive cost and latency penalty.
-
-**Rule: Never modify anything before the latest assistant/user turn.** Anything we add must go at the *end* of the conversation (as a tool response or appended to the latest message), or live entirely inside tool call/response pairs that the agent initiated.
-
-### Design: Pure tool-based, no context mutation
-
-**Two tools: `todo_write` and `todo_read`**
-
-`todo_write` -- create or update the task list:
-```
-Parameters:
-  todos: [{id, content, status}]   # the items to write
-  merge: bool (default false)      # false = replace list, true = update by id
-```
-Returns: the full current todo list (so the agent immediately sees the result)
-
-`todo_read` -- retrieve the current task list:
-```
-Parameters: (none)
-```
-Returns: the full current todo list, plus metadata (items completed, items remaining, tool calls since last update)
-
-This is how OpenCode does it and it's the right call. The agent's own tool call history is the "memory" -- the `todo_write` response containing the full list is right there in the conversation. No injection needed.
-
-**Item schema:**
-```
-{
-  "id": "1",           # unique string identifier
-  "content": "...",    # description of the task
-  "status": "pending"  # one of: pending, in_progress, completed, cancelled
-}
-```
-
-**No priority field.** Order in the list is the priority.
-
-**Only 4 states.** No "failed" or "blocked" -- if something fails, cancel it and add a revised item.
-
-### Where the behavior rules live: the tool description
-
-Instead of adding instructions to the system prompt, we put all the usage guidance directly in the tool description. This is part of the tool schema, which is sent once at the start and **is cached perfectly** -- it never changes mid-conversation.
-
-The `todo_write` tool description teaches the agent everything:
-
-```
-Manage your task list for the current session. Use this to plan and track
-multi-step work.
-
-When to use: Complex tasks with 3+ steps, when the user gives multiple tasks,
-or when you need to plan before acting. Skip for simple single-step requests.
-
-Behavior:
-- Set merge=false to create a fresh plan (replaces existing todos)
-- Set merge=true to update status or add follow-up items
-- Mark the first item in_progress immediately and start working on it
-- Only keep ONE item in_progress at a time
-- Mark items completed as soon as you finish them
-- If something fails, mark it cancelled and add a revised item
-- Don't add "test" or "verify" items unless the user asks for them
-- Update todos silently alongside your other work
-
-Returns the full current list after every write.
-```
-
-This is ~150 tokens in the tool schema. It's static, cached, and teaches the agent the same behavioral rules without touching the system prompt.
-
-### How it survives context compression
-
-The state lives server-side on the AIAgent instance (in-memory dict). The conversation history just has the tool call/response pairs as a trail of what happened.
-
-**On context compression, re-inject the current todo state.** Compression already invalidates the cache (the middle of the conversation is being rewritten anyway), so there's zero additional cache cost to appending the todo state once at that moment. When `run_agent.py` runs compression, if `_todo_state` is non-empty, it appends a synthetic tool response at the end of the compressed history:
-
-```
-[System: Your current task list was preserved across context compression]
-- [x] 1. Set up project structure (completed)
-- [>] 2. Implement the search endpoint (in_progress)
-- [ ] 3. Add error handling (pending)
-```
-
-This is the one place we do inject -- but only on compression events, which are rare (once every ~50-100 turns). The cache was already blown by compression itself, so it costs nothing extra.
-
-The agent does NOT need to "know" to call `todo_read` after compression -- it just sees its plan right there in the history. `todo_read` still exists as a tool for any time the agent wants to double-check, but it's not load-bearing for the compression case.
-
-### Progress checkpoints (cache-friendly approach)
-
-Instead of injecting system messages (which mutate the context), we **piggyback on existing tool responses.** When any tool returns its result, and the tool call counter has crossed a threshold, we append a one-line note to that tool's response:
-
-```
-[10+ tool calls since last todo update -- consider calling todo_read to review your plan]
-```
-
-This is a tiny addition to a response that's already at the end of the conversation -- zero cache impact on previous turns. It costs ~20 tokens and only appears once per threshold crossing.
-
-Similarly, on tool errors, we can append to the error response:
-```
-[This error may affect your plan -- call todo_read to review]
-```
-
-These hints go in the tool response, not in a new system message. The agent processes them naturally as part of reading the tool output.
-
-### Server-side state
-
-```python
-# On AIAgent class
-_todo_state: Dict[str, List[dict]]  # session_key -> list of todo items
-_todo_tool_calls_since_update: int  # counter for checkpoint nudges
-```
-
-- `todo_write` updates `_todo_state` and resets the counter
-- `todo_read` reads `_todo_state` and resets the counter
-- Every `handle_function_call` increments the counter
-- When counter > threshold (default 10), the next tool response gets the checkpoint hint appended, then the hint flag is cleared until the next threshold crossing
-
-### Summary: what we took from Cursor, what we changed
-
-| Aspect | Cursor's approach | Our approach |
-|--------|------------------|--------------|
-| State visibility | Injected into context every turn | Tool call/response pairs in normal conversation flow |
-| Behavioral rules | System prompt instructions | Tool description (static, cached) |
-| Checkpoints | Injected system messages | One-line hint appended to tool responses |
-| Cache impact | High (prefix invalidated every turn) | Near-zero (only re-injects on compression, which already blows cache) |
-| State persistence | Context injection survives compression | Server-side dict; re-injected once on compression events |
-| Core UX | Identical | Identical (same flat list, same 4 states, same merge semantics) |
-
-**Files:** `tools/todo_tool.py` (tool implementation), integration in `run_agent.py` (state dict + checkpoint counter)
-
----
-
-## 3. Dynamic Skills Expansion 📚
-
-**Status:** IMPLEMENTED — Skills Hub with search/install/publish/snapshot from 4 registries
-**Priority:** ~~Medium~~ Done
-
-Skills Hub implemented: search/install/inspect/audit/uninstall/publish/snapshot across GitHub repos, ClawHub, Claude Code marketplaces, and LobeHub. Security scanner with trust-aware policy (builtin/trusted/community). CLI (`hermes skills ...`) and `/skills` slash command. agentskills.io spec compliant.
-
-**What other agents do:**
-- **OpenClaw**: ClawHub registry -- bundled, managed, and workspace skills with install gating. Agent can auto-search and pull skills from a remote hub.
-- **OpenCode**: SKILL.md format with URL download support via `Discovery.pull`. Compatible with Claude Code's skill format.
-- **Pi**: Skills as npm-publishable packages. Prompt templates and themes alongside skills.
-- **Cline**: Global + project-level skill directories. `use_skill` tool + `new_rule` tool for creating rules.
-
-**Our approach:**
-- Add `skill_create`, `skill_edit`, `skill_delete` actions to the existing `skill_view` tool (or a new `skill_manage` tool)
-- New skills saved to `~/.hermes/skills/` (user-created, separate from bundled)
-- SKILL.md format stays the same (YAML frontmatter + markdown body)
-- Skill acquisition: after a successful multi-step task, offer to save the approach as a new skill (agent-initiated, user-confirmed)
-- Later: skill chaining with dependency graphs, parameterized templates
-- Later: remote skill registry (like ClawHub) for community-shared skills
-
----
-
-## 4. Interactive Clarifying Questions ❓
+## 2. Agent-Managed Skills (Create / Edit / Delete) 📚
 
 **Status:** Not started
-**Priority:** Medium
+**Priority:** High -- skills are the agent's procedural memory
 
-Allow the agent to present structured choices to the user when it needs clarification. Rich terminal UI in CLI mode, graceful fallback on messaging platforms.
+The Skills Hub (search/install/publish from 4 registries) is done. What's missing is the agent's ability to **create, edit, and delete its own skills** -- turning successful approaches into reusable task-specific knowledge.
+
+**Skills are a form of memory.** General memory (user profile, environment facts, preferences) is broad and declarative. Skills are narrow and procedural -- they capture *how to do a specific type of task* based on proven experience. Together they form the agent's knowledge system.
+
+**What other agents do:**
+- **Cline**: `new_rule` tool for creating rules from context. Global + project-level skill directories.
+- **OpenClaw**: Workspace skills that agents can write into during sessions.
+- **Codex**: Phase 2 consolidation agent automatically creates skills from recurring patterns it detects across rollout summaries. Skills include: triggers, inputs, procedure steps, efficiency plans, pitfalls, verification checklists, and optional scripts/templates/examples.
+
+### Architecture
+
+**New `skill_manage` tool** (separate from read-only `skill_view`):
+- Actions: `create`, `edit`, `delete`
+- User skills stored in `~/.hermes/skills/<name>/SKILL.md` (separate from bundled `skills/` in repo)
+- Same SKILL.md format (YAML frontmatter + markdown body), validated on write
+- Agent sees `source` field (`"bundled"`, `"hub"`, `"user"`) on every skill -- can only edit/delete `"user"` skills
+
+**Discovery merge**: `_find_all_skills()` and `skill_view()` search both `SKILLS_DIR` (bundled) and `USER_SKILLS_DIR` (`~/.hermes/skills/`).
+
+### Proactive skill creation (the feedback loop)
+
+The agent shouldn't just create skills when asked. It should **recognize when a skill is worth creating** and offer proactively. The behavior is taught via the tool description (cache-friendly, same pattern as todo tool):
+
+**When to trigger:**
+- The task involved 5+ tool calls with back-and-forth or iteration
+- The agent hit errors/obstacles and recovered successfully
+- The user corrected the agent's approach and the corrected version worked
+- The task type is likely to recur (deployment, data processing, config setup, etc.)
+
+**The interaction pattern:**
+1. After completing a difficult task successfully, the agent asks: *"That took some figuring out. How did the result turn out for you?"*
+2. If the user confirms success, the agent offers: *"Would you like me to save that approach as a skill so I can do it faster next time?"*
+3. If yes, the agent creates a skill capturing: the trigger conditions, the working procedure, the pitfalls encountered, and the verification steps
+
+This pattern doesn't require the `clarify` tool (#3) -- it works as normal conversational text. But `clarify` would make it cleaner on messaging platforms with structured choices.
+
+### Implementation steps
+
+**Step 1: Update discovery** (`tools/skills_tool.py`)
+- Add `USER_SKILLS_DIR = Path.home() / ".hermes" / "skills"`
+- Update `_find_all_skills()` to scan both dirs, tag each skill with `source`
+- Update `skill_view()` to search `USER_SKILLS_DIR` as fallback
+
+**Step 2: Validation helper** (`tools/skills_tool.py`)
+- `_validate_skill_frontmatter()` -- enforce `name` (required, ≤64 chars, filesystem-safe), `description` (required, ≤1024 chars), valid YAML, non-empty body
+
+**Step 3: `skill_manage()` function** (`tools/skills_tool.py`)
+```
+skill_manage(action, name, description=None, content=None, tags=None)
+```
+- `create`: write `~/.hermes/skills/<name>/SKILL.md`, fail if name collision with any source
+- `edit`: read existing user skill, merge updates, write back (refuse bundled/hub skills)
+- `delete`: remove user skill directory (refuse bundled/hub skills)
+
+**Step 4: Register tool** (`model_tools.py`)
+- Tool definition with description teaching when/how to create skills AND the proactive feedback loop behavior
+- Route in `handle_skills_function_call()`
+- Add to toolset mappings and `TOOLSET_REQUIREMENTS`
+
+**Step 5: CLI commands** (`hermes_cli/skills_hub.py` + `hermes_cli/main.py`)
+- `hermes skills create <name>` -- interactive or `--from-file`
+- `hermes skills edit <name>` -- opens `$EDITOR` or accepts flags
+- `hermes skills delete <name>` -- with confirmation
+- Update `hermes skills list` to show `[user]`/`[builtin]`/`[hub]` tags
+
+**Step 6: Slash command** -- extend `/skills` handler with `create`/`edit`/`delete`
+
+### Tool description (teaches the LLM when and how to use skills)
+```
+Create, edit, or delete user-managed skills. Skills are your procedural memory --
+they capture proven approaches for specific task types so you can do them faster
+and better next time.
+
+Actions:
+- create: Save a new skill to ~/.hermes/skills/. Provide name, description, content.
+- edit: Update an existing user skill. Only works on source="user" skills.
+- delete: Remove a user skill. Only works on source="user" skills.
+
+═══ WHEN TO CREATE A SKILL ═══
+
+Create a skill when ALL of these are true:
+1. The task type is likely to recur (not a one-off)
+2. The approach was non-obvious or required iteration to get right
+3. A future attempt would benefit from having the steps written down
+
+Common triggers:
+- You completed a complex task (5+ tool calls) and it succeeded
+- You hit errors or obstacles during the task and found the fix
+- The user corrected your approach and the corrected version worked
+- You discovered a non-trivial workflow (deployment, data pipeline, config, etc.)
+- The user explicitly asks you to remember how to do something
+
+═══ THE FEEDBACK LOOP ═══
+
+After completing a task that was difficult or iterative (errors encountered,
+multiple attempts, user corrections, or 5+ tool calls with back-and-forth):
+
+1. Ask the user for feedback: "That took some working through. How did the
+   result turn out?"
+2. If they confirm success: "Would you like me to save that approach as a
+   skill so I can do it faster next time?"
+3. If yes: create a skill with the working procedure, including:
+   - When to use this skill (trigger conditions)
+   - The steps that worked (numbered procedure)
+   - Pitfalls encountered and how to avoid them
+   - How to verify success
+
+Do NOT trigger this feedback loop for:
+- Simple tasks (single tool call, obvious approach)
+- Tasks where the user seems impatient or in a hurry
+- Tasks that are clearly one-off (e.g., "what time is it in Tokyo")
+
+═══ SKILL QUALITY ═══
+
+A good skill is specific and actionable, not generic advice. It should contain:
+- Concrete trigger conditions (when does this skill apply?)
+- Exact commands, file paths, or API calls that worked
+- Known failure modes and their fixes
+- Verification steps (how to confirm it worked)
+
+Always confirm with the user before creating a skill.
+```
+
+### Later
+- Skill chaining with dependency graphs, parameterized templates
+- Publishing user skills to remote registries (already have `hermes skills publish`)
+- Periodic skill review: subagent scans session summaries for recurring patterns that should become skills (Codex-style Phase 2 consolidation)
+
+**Files:** `tools/skills_tool.py` (core logic), `model_tools.py` (registration), `hermes_cli/skills_hub.py` + `hermes_cli/main.py` (CLI)
+
+---
+
+## 3. Interactive Clarifying Questions ❓
+
+**Status:** Not started
+**Priority:** Medium-High -- enables the knowledge system feedback loop
+
+Allow the agent to present structured choices to the user when it needs clarification or feedback. Rich terminal UI in CLI mode, graceful fallback on messaging platforms.
 
 **What other agents do:**
 - **Codex**: `request_user_input` tool for open-ended questions
@@ -223,70 +218,271 @@ Allow the agent to present structured choices to the user when it needs clarific
 - CLI mode: Rich-powered selection UI (arrow keys + number shortcuts)
 - Gateway/messaging mode: numbered list with "reply with number or type your answer"
 - Returns the user's selection as a string
-- Agent can use this before starting expensive operations ("Which approach do you prefer?")
+
+**Use cases (beyond simple clarification):**
+- Before starting expensive operations: "Which approach do you prefer?"
+- **Post-task feedback**: "How did that work out?" with choices like [Worked perfectly / Mostly good / Had issues / Didn't work]
+- **Skill creation offer**: "Want me to save that approach as a skill?" with [Yes / Yes, but let me review it first / No]
+- **Memory update prompt**: "I noticed you prefer X. Should I remember that for future sessions?" with [Yes / No / It depends]
+
+This tool is lightweight on its own but becomes critical for the proactive feedback loop in the knowledge system (skills, memory, learnings).
 
 **File:** `tools/clarify_tool.py` -- presentation layer differs per platform, core logic is simple
 
 ---
 
-## 5. Memory System 🧠
+## 4. Memory System 🧠
 
 **Status:** Not started
-**Priority:** High -- biggest gap vs. OpenClaw and Dash
+**Priority:** High -- biggest gap vs. OpenClaw, Dash, and Codex
 
-Persistent memory that survives across sessions. The agent remembers what it learned, who the user is, and what worked before.
+Persistent memory that survives across sessions. The agent remembers what it learned, who the user is, and what worked before. Memory is the general/declarative counterpart to skills (procedural memory) -- together they form the agent's knowledge system.
 
 **What other agents do:**
-- **OpenClaw**: 78+ file memory subsystem. SQLite + sqlite-vec for vector search. Embeddings via OpenAI/Voyage/Gemini. Hybrid search (vector + keyword). MMR for diversity. Temporal decay. File watcher for auto-indexing. Session-scoped memory with citations.
-- **Dash**: 6-layer context system. LearningMachine with agentic mode -- errors get diagnosed, fixed, and saved as learnings that are never repeated. Business rules injection.
-- **Codex**: 2-phase memory pipeline -- extract memories from rollouts with a dedicated model, then consolidate with global locking.
+- **OpenClaw**: SQLite + sqlite-vec for vector search. LRU eviction on embedding cache. Temporal decay on search results (half-life 30 days). Pre-compaction memory flush (model writes durable notes before context eviction).
+- **Dash**: 6-layer context system. LearningMachine with agentic mode -- errors get diagnosed, fixed, and saved as learnings that are never repeated.
+- **Codex**: 2-phase pipeline. Phase 1: dedicated model extracts raw memories from past sessions. Phase 2: consolidation subagent produces `memory_summary.md` (always in prompt), `MEMORY.md` (searchable), `skills/`. Retention cap: keep N most recent, drop rest.
 
-**Our approach (phased):**
+### Our architecture: bounded, curated, always-visible
 
-### Phase 1: File-based memory (MVP)
-- `~/.hermes/MEMORY.md` -- curated long-term memory, agent can read/append/edit
-- `~/.hermes/USER.md` -- user profile the agent maintains (preferences, context, projects, communication style)
-- `memory` tool with actions: `read`, `append`, `search` (simple text search within the file)
-- Both files injected into system prompt (or summarized if too large)
-- Agent prompted to update memory at session end or before context compression
+Two small files, both injected into the system prompt every session. The agent always sees its full memory, so it can make informed decisions about what to update or consolidate without extra reads.
 
-### Phase 2: Learning store (inspired by Dash)
-- `~/.hermes/learnings.jsonl` -- structured error patterns and discovered fixes
-- `save_learning` + `search_learnings` tool actions
-- Each learning: `{pattern, fix, context, tags, created_at, times_used}`
-- Before executing risky operations, agent auto-searches learnings for known pitfalls
-- Learning deduplication and consolidation over time
+**`~/.hermes/memories/MEMORY.md`** -- agent's personal notes and observations (2,200 character limit, ~800 tokens)
+- Environment facts, project conventions, tool quirks, things that took effort to figure out
+- One entry per line: `conda preferred over pip on this machine`
 
-### Phase 3: Semantic search (later)
-- SQLite + sqlite-vec (or ChromaDB) for vector storage
-- Embed session transcripts, memory entries, and learnings
-- Hybrid search: keyword (ripgrep) + vector (embeddings) + temporal decay
-- Auto-index new sessions on write
-- This is the "full OpenClaw" level -- significant infrastructure
+**`~/.hermes/memories/USER.md`** -- what the agent knows about the user (1,375 character limit, ~500 tokens)
+- Preferences, communication style, expectations, workflow habits, corrections
+- One entry per line: `Prefers plans before implementation`
+
+Character limits (not tokens) because character counts are model-independent -- users can switch models without breaking budgets. Conversion: ~2.75 characters per token. Dates are stored internally for eviction tiebreaking but not shown in the system prompt.
+
+### System prompt injection
+
+Both files are injected into the system prompt with clear separators:
+
+```
+══════════════════════════════════════════════
+MEMORY (your personal notes) [68% — 1,496/2,200 chars]
+══════════════════════════════════════════════
+User's name is Teknium, founder of Nous Research
+§
+This machine runs Ubuntu, conda preferred over pip
+§
+When working on Hermes-Agent, always test with run_agent.py before gateway
+§
+User strongly prefers batch approach for RL training over online
+
+══════════════════════════════════════════════
+USER PROFILE (who the user is) [72% — 990/1,375 chars]
+══════════════════════════════════════════════
+Prefers detailed technical discussions, no hand-holding
+§
+Communication style: direct, concise, expects expertise
+§
+Likes to discuss plans before implementation, wants to approve approach first
+§
+Dislikes unnecessary emoji or filler language
+```
+
+Entries are separated by `§` (section sign). The model references entries by quoting their content via the `old_text` parameter for replace/remove operations -- content-based matching, not position-based.
+
+Injected after SOUL.md/AGENTS.md but before skills. Only injected when the respective memory is enabled.
+
+### Bounded memory: how pruning works
+
+**The model does the pruning, but it's cheap because it already sees everything.** Since both files are in the system prompt, the model always knows exactly what's in memory. When at the character limit:
+
+1. Model calls `memory(action="add", target="memory", content="new thing")`
+2. Tool checks: would this exceed the 2,200 char limit?
+3. If yes: tool returns an error with the current usage (e.g., "2,150/2,200 chars used — consolidate or replace entries first")
+4. Model (which already sees all entries + usage % in its system prompt) decides what to consolidate or replace
+5. Model calls `replace` or `remove`, then retries the `add`
+
+This costs one extra tool call when at the limit, but the model makes an **informed decision** (not blindly evicting oldest). The user's name won't get evicted because the model knows it's important. Stale or redundant entries get consolidated.
+
+**Why not auto-evict oldest?** Because some early memories are the most important (user's name, core preferences, critical environment facts). The model, which sees everything, is the right judge of what to prune.
+
+### The `memory` tool
+
+**Actions:**
+- `add(target, content)` -- append a new dated entry. Fails with guidance if over char limit.
+- `replace(target, old_text, new_content)` -- find entry containing `old_text`, replace it. For updates and consolidation.
+- `remove(target, old_text)` -- remove the entry containing `old_text`.
+- `read(target)` -- return current contents. Useful after context compression when system prompt may be stale.
+
+**Tool description (teaches the model everything):**
+```
+Manage your persistent memory. You have two memory stores, both visible in
+your system prompt every session:
+
+MEMORY — your personal notes and observations (2,200 character limit)
+  Things worth remembering: environment facts, project conventions, tool quirks,
+  things that took effort to figure out, recurring patterns.
+
+USER PROFILE — what you know about the user (1,375 character limit)
+  Preferences, communication style, expectations, workflow habits, corrections
+  they've given you.
+
+Actions:
+  add(target, content)                   — append a new entry
+  replace(target, old_text, new_content) — find entry matching old_text,
+                                           replace it
+  remove(target, old_text)               — remove entry matching old_text
+  read(target)                           — return current live contents
+
+For replace/remove, old_text is a short unique substring that identifies the
+target entry — just a few words, not the full text. If your snippet matches
+multiple entries, you'll get an error showing the matches so you can retry
+with something more specific.
+
+Rules:
+- You can always see your current memories in the system prompt, along with
+  a usage indicator showing how close you are to the limit (e.g. [68% — 1,496/2,200 chars]).
+- Each entry is one line.
+- When usage is high (>80%), consolidate or replace entries before adding new
+  ones. Merge related entries. Remove stale or redundant info.
+- Prefer REPLACING over REMOVING — update with better info rather than delete
+  and re-add.
+- Do not store secrets, tokens, or passwords.
+- Only store things that will meaningfully help you in future sessions.
+
+When to write memories:
+- You learned something non-obvious about the environment or a project
+- The user corrected you or expressed a preference
+- You discovered a tool quirk, workaround, or convention
+- The user explicitly asks you to remember something
+- You completed something difficult (consider a skill instead if it's a full
+  reusable procedure)
+
+When NOT to write memories:
+- Trivial or one-off facts that won't matter next session
+- Things already captured in a skill
+- Things you can easily re-discover (file contents, command outputs)
+```
+
+### Config
+
+```yaml
+memory:
+  memory_enabled: true          # MEMORY.md - agent's personal notes
+  user_profile_enabled: true    # USER.md - user preferences/identity
+  memory_char_limit: 2200       # ~800 tokens at 2.75 chars/token
+  user_char_limit: 1375         # ~500 tokens at 2.75 chars/token
+```
+
+Both default to `false` in batch_runner and RL environments (checked programmatically). Configurable per-environment.
+
+### Long-term recall (session search)
+
+The bounded memory is the curated layer. For unbounded "long-term memory" -- searching past session transcripts -- see SQLite State Store (#5). A separate `session_search` tool provides ripgrep-style search over the full session history. This is never injected into the system prompt; the agent searches it on demand.
+
+### Later (optional)
+- Periodic consolidation via cronjob/subagent: reviews recent session summaries, suggests memory updates. Needs subagents (#1).
+- Memory import/export: `hermes memory export` / `hermes memory import` for backup/migration.
+
+**Files:** `tools/memory_tool.py` (tool logic + file I/O), `model_tools.py` (registration), system prompt injection in `run_agent.py`
 
 ---
 
-## 6. Session Transcript Search 🔍
+## 5. SQLite State Store & Session Search 🔍
 
 **Status:** Not started
-**Priority:** Medium-High -- low-hanging fruit, very useful
+**Priority:** High -- foundational infrastructure for memory, search, and scale
 
-Search across past session transcripts to find previous conversations, solutions, and tool outputs.
+Replace the current JSONL-per-session file approach with a SQLite database. This is infrastructure that makes everything else work better at scale.
+
+**The problem with JSONL files:**
+- Currently: one `.jsonl` file per session in `~/.hermes/sessions/` and `logs/`
+- At 5-10 sessions per day across 4 platforms, that's 1,500-3,600 files per year
+- Searching across sessions requires ripgrep over thousands of files (slow, no filtering)
+- No relational queries (e.g., "show me all sessions about Docker from last month")
+- No way to store structured metadata alongside transcripts (summaries, tags, memory references)
+- File system overhead: inode limits, directory listing performance, backup complexity
+
+**What Codex does:**
+- SQLite state database with tables for threads, stage1_outputs (extracted memories), and jobs (background processing queue with leases/heartbeats/watermarks)
+- All session metadata, memory extraction outputs, and job coordination in one DB
+- File system only used for human-readable artifacts (MEMORY.md, rollout_summaries/, skills/)
 
 **Our approach:**
-- **CLI command**: `hermes sessions search <query>` -- uses ripgrep over `~/.hermes/sessions/*.jsonl` and `logs/*.jsonl`
-- **Agent tool**: `session_search(query, role_filter, limit, offset)` -- same search, returns structured JSON
-- Ripgrep for speed with Python fallback for environments without rg
-- Filter by role (user/assistant/tool), date range, platform
-- Results: session_id, line number, role, content preview centered on match
-- Pagination for large result sets
-- Later: integrate with Phase 3 memory (vector search over transcripts)
 
-**Files:** `tools/session_search_tool.py`, `hermes_cli/sessions.py` (CLI command handler)
+### Schema: `~/.hermes/state.db`
+
+```sql
+-- Core session data
+sessions (
+    id TEXT PRIMARY KEY,
+    platform TEXT,           -- telegram, discord, whatsapp, slack, cli
+    user_id TEXT,
+    started_at INTEGER,
+    ended_at INTEGER,
+    summary TEXT,            -- agent-written session summary (nullable)
+    tags TEXT,               -- comma-separated tags
+    message_count INTEGER,
+    tool_call_count INTEGER
+)
+
+-- Full message history (replaces JSONL)
+messages (
+    id INTEGER PRIMARY KEY,
+    session_id TEXT REFERENCES sessions,
+    role TEXT,               -- user, assistant, tool, system
+    content TEXT,
+    tool_name TEXT,          -- nullable, for tool calls
+    timestamp INTEGER,
+    tokens_used INTEGER
+)
+
+-- FTS5 virtual table for fast text search across messages
+messages_fts USING fts5(content, content=messages, content_rowid=id)
+
+-- Session summaries (written by agent at session end)
+session_summaries (
+    session_id TEXT PRIMARY KEY REFERENCES sessions,
+    summary TEXT,
+    keywords TEXT,
+    created_at INTEGER
+)
+
+-- Learnings from errors (item #15)
+learnings (
+    id INTEGER PRIMARY KEY,
+    pattern TEXT,
+    error_type TEXT,
+    fix TEXT,
+    context TEXT,
+    tags TEXT,
+    created_at INTEGER,
+    times_used INTEGER DEFAULT 0
+)
+```
+
+### Migration path from JSONL
+- New sessions go directly to SQLite
+- Existing JSONL files imported on first run (background migration)
+- `hermes migrate-sessions` CLI command for manual migration
+- Keep JSONL export as an option (`hermes sessions export <id>`)
+
+### Session search (replaces the old plan)
+- **Agent tool**: `session_search(query, role_filter, date_range, platform, limit)` -- FTS5 search across messages table
+- **CLI command**: `hermes sessions search <query>` with filters
+- FTS5 gives us: ranking, phrase matching, boolean operators, prefix queries
+- Much faster than ripgrep over thousands of files
+- Filter by platform, date range, role -- impossible with flat files
+
+### Benefits for other systems
+- **Memory**: session summaries stored in `session_summaries` table, searchable by keyword
+- **Learnings**: structured storage with `times_used` counter and tag search
+- **Subagents**: parent-child session relationships trackable via foreign keys
+- **Analytics**: token usage over time, tool call frequency, session duration -- trivially queryable
+- **Cleanup**: `hermes sessions prune --older-than 90d` becomes a SQL DELETE
+
+**Files:** `hermes_state.py` (SQLite wrapper, schema, migrations), `tools/session_search_tool.py` (agent tool), `hermes_cli/sessions.py` (CLI)
 
 ---
 
-## 7. Local Browser Control via CDP 🌐
+## 6. Local Browser Control via CDP 🌐
 
 **Status:** Not started (currently Browserbase cloud only)
 **Priority:** Medium
@@ -308,7 +504,7 @@ Support local Chrome/Chromium via Chrome DevTools Protocol alongside existing Br
 
 ---
 
-## 8. Signal Integration 📡
+## 7. Signal Integration 📡
 
 **Status:** Not started
 **Priority:** Low
@@ -319,7 +515,7 @@ New platform adapter using signal-cli daemon (JSON-RPC HTTP + SSE). Requires Jav
 
 ---
 
-## 9. Plugin/Extension System 🔌
+## 8. Plugin/Extension System 🔌
 
 **Status:** Partially implemented (event hooks exist in `gateway/hooks.py`)
 **Priority:** Medium
@@ -354,7 +550,7 @@ Full Python plugin interface that goes beyond the current hook system.
 
 ---
 
-## 10. Native Companion Apps 📱
+## 9. Native Companion Apps 📱
 
 **Status:** Not started
 **Priority:** Low
@@ -374,7 +570,7 @@ macOS (Swift/SwiftUI), iOS, Android apps connecting via WebSocket.
 
 ---
 
-## 11. Evaluation System 📏
+## 10. Evaluation System 📏
 
 **Status:** Not started
 **Priority:** Medium
@@ -393,7 +589,7 @@ Systematic evaluation of agent performance for batch_runner and RL training.
 
 ---
 
-## 12. Layered Context Architecture 📊
+## 11. Layered Context Architecture 📊
 
 **Status:** Partially implemented (context files, skills, compression exist)
 **Priority:** Medium
@@ -406,11 +602,11 @@ Structured hierarchy for what goes into the system prompt, with clear priority o
 - Define explicit layers with token budgets: `project context (AGENTS.md) > skills > user profile (USER.md) > learnings > memory > session context > runtime introspection`
 - Each layer has a max token budget; when total exceeds limit, lower-priority layers get summarized first
 - Runtime introspection layer: current working directory, active processes, git status, recent file changes
-- This becomes the backbone for the memory system (item 5) and subagent architecture (item 1)
+- This becomes the backbone for the memory system (item 4) and subagent architecture (item 1)
 
 ---
 
-## 13. Tools Wishlist 🧰
+## 12. Tools Wishlist 🧰
 
 **Status:** Various
 **Priority:** Mixed
@@ -428,7 +624,7 @@ Structured hierarchy for what goes into the system prompt, with clear priority o
 ### Canvas / Visual Workspace
 - **OpenClaw has this**: A2UI (Agent-to-UI) -- agent pushes visual content to a Canvas surface
 - For us: could be a web-based canvas (HTML/JS) that the agent can draw on
-- Depends on companion app / web UI (item 10)
+- Depends on companion app / web UI (item 9)
 
 ### Coding Agent Skill
 - Orchestrate Codex CLI or Claude Code via PTY mode (already supported!)
@@ -448,7 +644,7 @@ Structured hierarchy for what goes into the system prompt, with clear priority o
 
 ---
 
-## 14. MCP (Model Context Protocol) Support 🔗
+## 13. MCP (Model Context Protocol) Support 🔗
 
 **Status:** Not started
 **Priority:** High -- this is becoming an industry standard
@@ -465,11 +661,11 @@ MCP is the protocol that Codex, Cline, and OpenCode all support for connecting t
 - Config: list of MCP servers in `~/.hermes/config.yaml` with transport type and connection details
 - Each MCP server's tools auto-registered as a dynamic toolset
 - Start with stdio transport (most common), then add SSE and HTTP
-- Could also be part of the Plugin system (item 9, Phase 3) since MCP is essentially a plugin protocol
+- Could also be part of the Plugin system (item 8, Phase 3) since MCP is essentially a plugin protocol
 
 ---
 
-## 15. Permission / Safety System 🛡️
+## 14. Permission / Safety System 🛡️
 
 **Status:** Partially implemented (dangerous command approval in gateway)
 **Priority:** Medium
@@ -489,12 +685,12 @@ Formalize the tool permission system beyond the current ad-hoc dangerous command
 
 ---
 
-## 16. Self-Learning from Errors 📖
+## 15. Self-Learning from Errors 📖
 
 **Status:** Not started
-**Priority:** Medium-High -- unique differentiator from Dash
+**Priority:** Medium-High -- the "error memory" layer of the knowledge system
 
-Automatic learning loop: when tool calls fail, the agent diagnoses the error, fixes it, and saves the pattern so it never repeats the same mistake.
+Automatic learning loop: when tool calls fail, the agent diagnoses the error, fixes it, and saves the pattern so it never repeats the same mistake. This is the error-specific counterpart to skills (procedural memory) and MEMORY.md (declarative memory).
 
 **What Dash does:**
 - LearningMachine with agentic mode
@@ -503,16 +699,18 @@ Automatic learning loop: when tool calls fail, the agent diagnoses the error, fi
 - 6 layers of context including institutional knowledge and runtime schema
 
 **Our approach:**
-- Part of the Memory System (item 5, Phase 2)
-- `~/.hermes/learnings.jsonl` stores: `{pattern, error_type, fix, context, tags, created_at, times_used}`
-- Before executing operations that have failed before, auto-inject relevant learnings
+- Part of the knowledge system, stored in the SQLite state store (#5) once available, or `~/.hermes/learnings.jsonl` as fallback
+- Each learning: `{pattern, error_type, fix, context, tags, created_at, times_used}`
+- `learning` tool with actions: `save`, `search`, `list`
+- Before executing operations that have failed before, auto-inject relevant learnings (appended to tool responses, same pattern as todo checkpoint nudges -- cache-friendly)
 - Agent prompted: "This is similar to a previous error. Here's what worked last time: ..."
 - Consolidation: periodically merge similar learnings and increment `times_used`
-- Could be triggered automatically on tool call errors or manually by the agent
+- Triggered automatically on tool call errors OR manually by the agent
+- **Relationship to skills**: if the same error pattern appears 3+ times, the agent should consider creating a skill that includes the fix as a "pitfalls" section, rather than keeping it as a standalone learning
 
 ---
 
-## 17. Session Branching / Checkpoints 🌿
+## 16. Session Branching / Checkpoints 🌿
 
 **Status:** Not started
 **Priority:** Low-Medium
@@ -534,7 +732,7 @@ Save and restore conversation state at any point. Branch off to explore alternat
 
 ---
 
-## 18. File Watcher / Project Awareness 👁️
+## 17. File Watcher / Project Awareness 👁️
 
 **Status:** Not started
 **Priority:** Low
@@ -554,7 +752,7 @@ Monitor the working directory for changes and notify the agent of relevant updat
 
 ---
 
-## 19. Heartbeat System 💓
+## 18. Heartbeat System 💓
 
 **Status:** Not started
 **Priority:** Low-Medium
@@ -574,7 +772,7 @@ Periodic agent wake-up for checking reminders, monitoring tasks, and running sch
 
 ---
 
-## 20. Programmatic Tool Calling (Code-Mediated Tool Use) 🧬
+## 19. Programmatic Tool Calling (Code-Mediated Tool Use) 🧬
 
 **Status:** Not started
 **Priority:** High -- potentially the single biggest efficiency win for agent loops
@@ -922,7 +1120,7 @@ This goes in the tool description:
 
 ### Relationship to other items
 - **Subagent Architecture (#1)**: A code sandbox that calls tools IS a lightweight subagent without its own LLM inference. This handles many of the "mechanical multi-step" cases (search+filter, bulk file ops, browser loops) at near-zero LLM cost. Full subagents are still needed for tasks requiring LLM reasoning at each step.
-- **Browser automation (#7)**: Biggest win. Browser workflows are 10+ round trips today. A script that navigates, clicks, extracts, paginates in a loop collapses that to 1 LLM turn.
+- **Browser automation (#6)**: Biggest win. Browser workflows are 10+ round trips today. A script that navigates, clicks, extracts, paginates in a loop collapses that to 1 LLM turn.
 - **Web search**: Directly matches Anthropic's dynamic filtering results.
 - **File ops**: Bulk read-search-patch workflows become one call.
 
@@ -932,30 +1130,62 @@ This goes in the tool description:
 
 ## Implementation Priority Order
 
-**Tier 1 (High impact, foundation for everything else):**
-1. Programmatic Tool Calling (code-mediated tool use) -- #20
-2. Memory System (Phase 1: MEMORY.md + USER.md) -- #5
-3. ~~Planning & Task Management (todo tool) -- #2~~ **DONE**
-4. Session Transcript Search -- #6
-5. Self-Learning from Errors -- #16
+### Tier 1: The Knowledge System + Agent Efficiency
 
-**Tier 2 (High impact, more complex):**
-6. Subagent Architecture -- #1 (partially solved by #20)
-7. MCP Support -- #14
-8. Interactive Clarifying Questions -- #4
-9. ~~Dynamic Skills Expansion~~ -- #3 (DONE: Skills Hub)
+These form two parallel tracks. The Knowledge System items depend on each other (build in order). Programmatic Tool Calling is independent and can be built in parallel.
 
-**Tier 3 (Quality of life, polish):**
-10. Permission / Safety System -- #15
-11. Local Browser Control via CDP -- #7
-12. Layered Context Architecture -- #12
-13. Plugin/Extension System (enhanced hooks first) -- #9
-14. Evaluation System -- #11
+**Track A: The Knowledge System (build in this order):**
+1. **Memory System Phase 1** (file-based: memory_summary.md + MEMORY.md) -- #4
+   - No infrastructure dependency. Just a new `memory` tool + prompt guidance + file read/write.
+   - Gives the agent persistent identity memory (user profile) and declarative memory (facts, preferences).
+   - memory_summary.md always in system prompt = immediate value every session.
 
-**Tier 4 (Nice to have, longer term):**
-15. Heartbeat System -- #19
-16. Session Branching / Checkpoints -- #17
-17. File Watcher -- #18
-18. Signal Integration -- #8
-19. Tools Wishlist items -- #13
-20. Native Companion Apps -- #10
+2. **Agent-Managed Skills** (create/edit/delete + proactive creation) -- #2
+   - Depends on: nothing (but better with memory, since the agent understands what it has learned)
+   - Gives the agent procedural memory. Combined with memory, the agent now has both "what I know" and "how I do things."
+   - The proactive feedback loop ("How was that? Want me to save it as a skill?") is taught via tool description.
+
+3. **Interactive Clarifying Questions** -- #3
+   - Makes the feedback loop cleaner with structured choices (especially on messaging platforms).
+   - Also useful independently for pre-task clarification ("Which approach?").
+
+4. **SQLite State Store** -- #5
+   - Migrate sessions from JSONL to SQLite. Enables fast session search, structured metadata, scales to thousands of sessions.
+   - Memory Phase 2 and Learnings depend on this.
+
+5. **Self-Learning from Errors** -- #15
+   - Depends on: SQLite (#5) for storage, or fallback to learnings.jsonl
+   - The "error memory" layer. Auto-saves error patterns and fixes.
+
+**Track B: Agent Efficiency (independent, build anytime):**
+6. **Programmatic Tool Calling** (code-mediated tool use) -- #19
+   - No dependency on the Knowledge System. Biggest efficiency win for agent loops.
+   - Can be built in parallel with Track A items.
+
+### Tier 2: Scaling & Ecosystem
+
+7. **Subagent Architecture** -- #1
+   - Benefits from the Knowledge System (subagents can read memory/skills) but doesn't require it.
+   - Partially solved by Programmatic Tool Calling (#19) for mechanical multi-step tasks.
+   - Once built, enables periodic memory consolidation (an optional subagent that reviews recent session summaries and updates MEMORY.md/skills).
+
+8. **MCP Support** -- #13
+   - Industry standard protocol. Instant access to hundreds of community tool servers.
+   - Independent of Knowledge System.
+
+### Tier 3: Quality of Life
+
+9. Permission / Safety System -- #14
+10. Local Browser Control via CDP -- #6
+11. Layered Context Architecture -- #11 (becomes more important as memory/skills grow in size)
+12. Plugin/Extension System (enhanced hooks first) -- #8
+13. Evaluation System -- #10
+
+### Tier 4: Nice to Have
+
+14. Heartbeat System -- #18 (useful for periodic memory consolidation once subagents exist)
+15. Session Branching / Checkpoints -- #16
+16. File Watcher -- #17
+17. Signal Integration -- #7
+18. Tools Wishlist items -- #12
+19. Native Companion Apps -- #9
diff --git a/batch_runner.py b/batch_runner.py
index f3ba448f3..2d1ba40c0 100644
--- a/batch_runner.py
+++ b/batch_runner.py
@@ -277,6 +277,7 @@ def _process_single_prompt(
             reasoning_config=config.get("reasoning_config"),
             prefill_messages=config.get("prefill_messages"),
             skip_context_files=True,  # Don't pollute trajectories with SOUL.md/AGENTS.md
+            skip_memory=True,  # Don't use persistent memory in batch runs
         )
 
         # Run the agent with task_id to ensure each task gets its own isolated VM
diff --git a/cli-config.yaml.example b/cli-config.yaml.example
index 91297be0b..7fe334ed2 100644
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@@ -142,6 +142,26 @@ compression:
   # This model compresses the middle turns into a concise summary
   summary_model: "google/gemini-3-flash-preview"
 
+# =============================================================================
+# Persistent Memory
+# =============================================================================
+# Bounded curated memory injected into the system prompt every session.
+# Two stores: MEMORY.md (agent's notes) and USER.md (user profile).
+# Character limits keep the memory small and focused. The agent manages
+# pruning -- when at the limit, it must consolidate or replace entries.
+# Disabled by default in batch_runner and RL environments.
+#
+memory:
+  # Agent's personal notes: environment facts, conventions, things learned
+  memory_enabled: true
+  
+  # User profile: preferences, communication style, expectations
+  user_profile_enabled: true
+  
+  # Character limits (~2.75 chars per token, model-independent)
+  memory_char_limit: 2200   # ~800 tokens
+  user_char_limit: 1375     # ~500 tokens
+
 # =============================================================================
 # Agent Behavior
 # =============================================================================
@@ -274,6 +294,8 @@ platform_toolsets:
 #   skills       - Load skill documents (skills_list, skill_view)
 #   moa          - Mixture of Agents reasoning (mixture_of_agents)
 #   todo         - Task planning and tracking for multi-step work
+#   memory       - Persistent memory across sessions (personal notes + user profile)
+#   session_search - Search and recall past conversations (FTS5 + Gemini Flash summarization)
 #   tts          - Text-to-speech (Edge TTS free, ElevenLabs, OpenAI)
 #   cronjob      - Schedule and manage automated tasks (CLI-only)
 #   rl           - RL training tools (Tinker-Atropos)
diff --git a/cli.py b/cli.py
index aba0ed0d0..8d32e2cbc 100755
--- a/cli.py
+++ b/cli.py
@@ -723,6 +723,14 @@ class HermesCLI:
         if self.agent is not None:
             return True
         
+        # Initialize SQLite session store for CLI sessions
+        self._session_db = None
+        try:
+            from hermes_state import SessionDB
+            self._session_db = SessionDB()
+        except Exception:
+            pass  # SQLite session store is optional
+        
         try:
             self.agent = AIAgent(
                 model=self.model,
@@ -735,6 +743,7 @@ class HermesCLI:
                 ephemeral_system_prompt=self.system_prompt if self.system_prompt else None,
                 session_id=self.session_id,  # Pass CLI's session ID to agent
                 platform="cli",  # CLI interface — agent uses terminal-friendly formatting
+                session_db=self._session_db,
             )
             return True
         except Exception as e:
@@ -1775,6 +1784,12 @@ class HermesCLI:
             pass
         finally:
             self._should_exit = True
+            # Close session in SQLite
+            if hasattr(self, '_session_db') and self._session_db and self.agent:
+                try:
+                    self._session_db.end_session(self.agent.session_id, "cli_close")
+                except Exception:
+                    pass
             _run_cleanup()
             print("\nGoodbye! ⚕")
 
diff --git a/docs/tools.md b/docs/tools.md
index 1bc50a065..0f82af9ad 100644
--- a/docs/tools.md
+++ b/docs/tools.md
@@ -48,6 +48,8 @@ async def web_search(query: str) -> dict:
 | **Reasoning** | `mixture_of_agents_tool.py` | `mixture_of_agents` |
 | **Skills** | `skills_tool.py` | `skills_list`, `skill_view` |
 | **Todo** | `todo_tool.py` | `todo` (read/write task list for multi-step planning) |
+| **Memory** | `memory_tool.py` | `memory` (persistent notes + user profile across sessions) |
+| **Session Search** | `session_search_tool.py` | `session_search` (search + summarize past conversations) |
 | **Cronjob** | `cronjob_tools.py` | `schedule_cronjob`, `list_cronjobs`, `remove_cronjob` |
 | **RL Training** | `rl_training_tool.py` | `rl_list_environments`, `rl_start_training`, `rl_check_status`, etc. |
 
@@ -90,6 +92,10 @@ TOOLSETS = {
         "description": "Task planning and tracking for multi-step work",
         "tools": ["todo"]
     },
+    "memory": {
+        "description": "Persistent memory across sessions (personal notes + user profile)",
+        "tools": ["memory"]
+    },
     # ...
 }
 ```
diff --git a/environments/agent_loop.py b/environments/agent_loop.py
index 61286c255..62ab08d61 100644
--- a/environments/agent_loop.py
+++ b/environments/agent_loop.py
@@ -328,6 +328,12 @@ class HermesAgentLoop:
                                     store=_todo_store,
                                 )
                                 tool_elapsed = _time.monotonic() - tool_submit_time
+                            elif tool_name == "memory":
+                                tool_result = json.dumps({"error": "Memory is not available in RL environments."})
+                                tool_elapsed = _time.monotonic() - tool_submit_time
+                            elif tool_name == "session_search":
+                                tool_result = json.dumps({"error": "Session search is not available in RL environments."})
+                                tool_elapsed = _time.monotonic() - tool_submit_time
                             else:
                                 # Run tool calls in a thread pool so backends that
                                 # use asyncio.run() internally (modal, docker) get
diff --git a/gateway/run.py b/gateway/run.py
index 69d45c86f..e0af6026a 100644
--- a/gateway/run.py
+++ b/gateway/run.py
@@ -904,6 +904,8 @@ class GatewayRunner:
                 "skill_view": "📚",
                 "skills_list": "📋",
                 "todo": "📋",
+                "memory": "🧠",
+                "session_search": "🔍",
                 "send_message": "📨",
                 "schedule_cronjob": "⏰",
                 "list_cronjobs": "⏰",
diff --git a/gateway/session.py b/gateway/session.py
index ec392cfb7..d102f8b5f 100644
--- a/gateway/session.py
+++ b/gateway/session.py
@@ -265,9 +265,8 @@ class SessionStore:
     """
     Manages session storage and retrieval.
     
-    Sessions are stored in:
-    - sessions.json: Index mapping session keys to session IDs
-    - {session_id}.jsonl: Conversation transcripts
+    Uses SQLite (via SessionDB) for session metadata and message transcripts.
+    Falls back to legacy JSONL files if SQLite is unavailable.
     """
     
     def __init__(self, sessions_dir: Path, config: GatewayConfig,
@@ -276,12 +275,18 @@ class SessionStore:
         self.config = config
         self._entries: Dict[str, SessionEntry] = {}
         self._loaded = False
-        # Optional callback to check if a session has active background processes.
-        # When set, sessions with running processes are exempt from reset.
         self._has_active_processes_fn = has_active_processes_fn
+        
+        # Initialize SQLite session database
+        self._db = None
+        try:
+            from hermes_state import SessionDB
+            self._db = SessionDB()
+        except Exception as e:
+            print(f"[gateway] Warning: SQLite session store unavailable, falling back to JSONL: {e}")
     
     def _ensure_loaded(self) -> None:
-        """Load sessions from disk if not already loaded."""
+        """Load sessions index from disk if not already loaded."""
         if self._loaded:
             return
         
@@ -300,7 +305,7 @@ class SessionStore:
         self._loaded = True
     
     def _save(self) -> None:
-        """Save sessions index to disk."""
+        """Save sessions index to disk (kept for session key -> ID mapping)."""
         self.sessions_dir.mkdir(parents=True, exist_ok=True)
         sessions_file = self.sessions_dir / "sessions.json"
         
@@ -313,20 +318,16 @@ class SessionStore:
         platform = source.platform.value
         
         if source.chat_type == "dm":
-            # DMs share the main session per platform
             return f"agent:main:{platform}:dm"
         else:
-            # Groups/channels get their own keys
             return f"agent:main:{platform}:{source.chat_type}:{source.chat_id}"
     
     def _should_reset(self, entry: SessionEntry, source: SessionSource) -> bool:
         """
         Check if a session should be reset based on policy.
         
-        Returns True if the session is stale and should start fresh.
         Sessions with active background processes are never reset.
         """
-        # Don't reset sessions that have active background processes
         if self._has_active_processes_fn:
             session_key = self._generate_session_key(source)
             if self._has_active_processes_fn(session_key):
@@ -339,15 +340,12 @@ class SessionStore:
         
         now = datetime.now()
         
-        # Check idle timeout
         if policy.mode in ("idle", "both"):
             idle_deadline = entry.updated_at + timedelta(minutes=policy.idle_minutes)
             if now > idle_deadline:
                 return True
         
-        # Check daily reset
         if policy.mode in ("daily", "both"):
-            # Find the most recent reset boundary
             today_reset = now.replace(
                 hour=policy.at_hour, 
                 minute=0, 
@@ -355,7 +353,6 @@ class SessionStore:
                 microsecond=0
             )
             if now.hour < policy.at_hour:
-                # Reset boundary was yesterday
                 today_reset -= timedelta(days=1)
             
             if entry.updated_at < today_reset:
@@ -372,22 +369,27 @@ class SessionStore:
         Get an existing session or create a new one.
         
         Evaluates reset policy to determine if the existing session is stale.
+        Creates a session record in SQLite when a new session starts.
         """
         self._ensure_loaded()
         
         session_key = self._generate_session_key(source)
         now = datetime.now()
         
-        # Check for existing session
         if session_key in self._entries and not force_new:
             entry = self._entries[session_key]
             
-            # Check if session should be reset
             if not self._should_reset(entry, source):
-                # Update timestamp and return existing
                 entry.updated_at = now
                 self._save()
                 return entry
+            else:
+                # Session is being reset -- end the old one in SQLite
+                if self._db:
+                    try:
+                        self._db.end_session(entry.session_id, "session_reset")
+                    except Exception:
+                        pass
         
         # Create new session
         session_id = f"{now.strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
@@ -406,6 +408,17 @@ class SessionStore:
         self._entries[session_key] = entry
         self._save()
         
+        # Create session in SQLite
+        if self._db:
+            try:
+                self._db.create_session(
+                    session_id=session_id,
+                    source=source.platform.value,
+                    user_id=source.user_id,
+                )
+            except Exception as e:
+                print(f"[gateway] Warning: Failed to create SQLite session: {e}")
+        
         return entry
     
     def update_session(
@@ -424,6 +437,14 @@ class SessionStore:
             entry.output_tokens += output_tokens
             entry.total_tokens = entry.input_tokens + entry.output_tokens
             self._save()
+            
+            if self._db:
+                try:
+                    self._db.update_token_counts(
+                        entry.session_id, input_tokens, output_tokens
+                    )
+                except Exception:
+                    pass
     
     def reset_session(self, session_key: str) -> Optional[SessionEntry]:
         """Force reset a session, creating a new session ID."""
@@ -433,6 +454,14 @@ class SessionStore:
             return None
         
         old_entry = self._entries[session_key]
+        
+        # End old session in SQLite
+        if self._db:
+            try:
+                self._db.end_session(old_entry.session_id, "session_reset")
+            except Exception:
+                pass
+        
         now = datetime.now()
         session_id = f"{now.strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
         
@@ -450,15 +479,21 @@ class SessionStore:
         self._entries[session_key] = new_entry
         self._save()
         
+        # Create new session in SQLite
+        if self._db:
+            try:
+                self._db.create_session(
+                    session_id=session_id,
+                    source=old_entry.platform.value if old_entry.platform else "unknown",
+                    user_id=old_entry.origin.user_id if old_entry.origin else None,
+                )
+            except Exception:
+                pass
+        
         return new_entry
     
     def list_sessions(self, active_minutes: Optional[int] = None) -> List[SessionEntry]:
-        """
-        List all sessions, optionally filtered by activity.
-        
-        Args:
-            active_minutes: If provided, only return sessions updated within this many minutes
-        """
+        """List all sessions, optionally filtered by activity."""
         self._ensure_loaded()
         
         entries = list(self._entries.values())
@@ -467,24 +502,47 @@ class SessionStore:
             cutoff = datetime.now() - timedelta(minutes=active_minutes)
             entries = [e for e in entries if e.updated_at >= cutoff]
         
-        # Sort by most recently updated
         entries.sort(key=lambda e: e.updated_at, reverse=True)
         
         return entries
     
     def get_transcript_path(self, session_id: str) -> Path:
-        """Get the path to a session's transcript file."""
+        """Get the path to a session's legacy transcript file."""
         return self.sessions_dir / f"{session_id}.jsonl"
     
     def append_to_transcript(self, session_id: str, message: Dict[str, Any]) -> None:
-        """Append a message to a session's transcript."""
-        transcript_path = self.get_transcript_path(session_id)
+        """Append a message to a session's transcript (SQLite + legacy JSONL)."""
+        # Write to SQLite
+        if self._db:
+            try:
+                self._db.append_message(
+                    session_id=session_id,
+                    role=message.get("role", "unknown"),
+                    content=message.get("content"),
+                    tool_name=message.get("tool_name"),
+                    tool_calls=message.get("tool_calls"),
+                    tool_call_id=message.get("tool_call_id"),
+                )
+            except Exception:
+                pass
         
+        # Also write legacy JSONL (keeps existing tooling working during transition)
+        transcript_path = self.get_transcript_path(session_id)
         with open(transcript_path, "a") as f:
             f.write(json.dumps(message, ensure_ascii=False) + "\n")
     
     def load_transcript(self, session_id: str) -> List[Dict[str, Any]]:
         """Load all messages from a session's transcript."""
+        # Try SQLite first
+        if self._db:
+            try:
+                messages = self._db.get_messages_as_conversation(session_id)
+                if messages:
+                    return messages
+            except Exception:
+                pass
+        
+        # Fall back to legacy JSONL
         transcript_path = self.get_transcript_path(session_id)
         
         if not transcript_path.exists():
diff --git a/hermes_cli/config.py b/hermes_cli/config.py
index c8af3e3b6..8183d5fe7 100644
--- a/hermes_cli/config.py
+++ b/hermes_cli/config.py
@@ -64,6 +64,7 @@ def ensure_hermes_home():
     (home / "cron").mkdir(parents=True, exist_ok=True)
     (home / "sessions").mkdir(parents=True, exist_ok=True)
     (home / "logs").mkdir(parents=True, exist_ok=True)
+    (home / "memories").mkdir(parents=True, exist_ok=True)
 
 
 # =============================================================================
@@ -128,11 +129,19 @@ DEFAULT_CONFIG = {
         "max_ms": 2500,
     },
     
+    # Persistent memory -- bounded curated memory injected into system prompt
+    "memory": {
+        "memory_enabled": True,
+        "user_profile_enabled": True,
+        "memory_char_limit": 2200,   # ~800 tokens at 2.75 chars/token
+        "user_char_limit": 1375,     # ~500 tokens at 2.75 chars/token
+    },
+    
     # Permanently allowed dangerous command patterns (added via "always" approval)
     "command_allowlist": [],
     
     # Config schema version - bump this when adding new required fields
-    "_config_version": 2,
+    "_config_version": 3,
 }
 
 # =============================================================================
diff --git a/hermes_cli/doctor.py b/hermes_cli/doctor.py
index a5c675e03..83462e639 100644
--- a/hermes_cli/doctor.py
+++ b/hermes_cli/doctor.py
@@ -196,6 +196,43 @@ def run_doctor(args):
     else:
         check_warn("logs/ not found", "(will be created on first use)")
     
+    # Check memory directory
+    memories_dir = hermes_home / "memories"
+    if memories_dir.exists():
+        check_ok("~/.hermes/memories/ directory exists")
+        memory_file = memories_dir / "MEMORY.md"
+        user_file = memories_dir / "USER.md"
+        if memory_file.exists():
+            size = len(memory_file.read_text(encoding="utf-8").strip())
+            check_ok(f"MEMORY.md exists ({size} chars)")
+        else:
+            check_info("MEMORY.md not created yet (will be created when the agent first writes a memory)")
+        if user_file.exists():
+            size = len(user_file.read_text(encoding="utf-8").strip())
+            check_ok(f"USER.md exists ({size} chars)")
+        else:
+            check_info("USER.md not created yet (will be created when the agent first writes a memory)")
+    else:
+        check_warn("~/.hermes/memories/ not found", "(will be created on first use)")
+        if should_fix:
+            memories_dir.mkdir(parents=True, exist_ok=True)
+            check_ok("Created ~/.hermes/memories/")
+    
+    # Check SQLite session store
+    state_db_path = hermes_home / "state.db"
+    if state_db_path.exists():
+        try:
+            import sqlite3
+            conn = sqlite3.connect(str(state_db_path))
+            cursor = conn.execute("SELECT COUNT(*) FROM sessions")
+            count = cursor.fetchone()[0]
+            conn.close()
+            check_ok(f"~/.hermes/state.db exists ({count} sessions)")
+        except Exception as e:
+            check_warn(f"~/.hermes/state.db exists but has issues: {e}")
+    else:
+        check_info("~/.hermes/state.db not created yet (will be created on first session)")
+    
     # =========================================================================
     # Check: External tools
     # =========================================================================
diff --git a/hermes_cli/main.py b/hermes_cli/main.py
index 1d90e1858..377fd6b5b 100644
--- a/hermes_cli/main.py
+++ b/hermes_cli/main.py
@@ -533,6 +533,121 @@ For more help on a command:
 
     skills_parser.set_defaults(func=cmd_skills)
 
+    # =========================================================================
+    # sessions command
+    # =========================================================================
+    sessions_parser = subparsers.add_parser(
+        "sessions",
+        help="Manage session history (list, export, prune, delete)",
+        description="View and manage the SQLite session store"
+    )
+    sessions_subparsers = sessions_parser.add_subparsers(dest="sessions_action")
+
+    sessions_list = sessions_subparsers.add_parser("list", help="List recent sessions")
+    sessions_list.add_argument("--source", help="Filter by source (cli, telegram, discord, etc.)")
+    sessions_list.add_argument("--limit", type=int, default=20, help="Max sessions to show")
+
+    sessions_export = sessions_subparsers.add_parser("export", help="Export sessions to a JSONL file")
+    sessions_export.add_argument("output", help="Output JSONL file path")
+    sessions_export.add_argument("--source", help="Filter by source")
+    sessions_export.add_argument("--session-id", help="Export a specific session")
+
+    sessions_delete = sessions_subparsers.add_parser("delete", help="Delete a specific session")
+    sessions_delete.add_argument("session_id", help="Session ID to delete")
+    sessions_delete.add_argument("--yes", "-y", action="store_true", help="Skip confirmation")
+
+    sessions_prune = sessions_subparsers.add_parser("prune", help="Delete old sessions")
+    sessions_prune.add_argument("--older-than", type=int, default=90, help="Delete sessions older than N days (default: 90)")
+    sessions_prune.add_argument("--source", help="Only prune sessions from this source")
+    sessions_prune.add_argument("--yes", "-y", action="store_true", help="Skip confirmation")
+
+    sessions_stats = sessions_subparsers.add_parser("stats", help="Show session store statistics")
+
+    def cmd_sessions(args):
+        import json as _json
+        try:
+            from hermes_state import SessionDB
+            db = SessionDB()
+        except Exception as e:
+            print(f"Error: Could not open session database: {e}")
+            return
+
+        action = args.sessions_action
+
+        if action == "list":
+            sessions = db.search_sessions(source=args.source, limit=args.limit)
+            if not sessions:
+                print("No sessions found.")
+                return
+            print(f"{'ID':<30} {'Source':<12} {'Model':<30} {'Messages':>8} {'Started'}")
+            print("─" * 100)
+            from datetime import datetime
+            for s in sessions:
+                started = datetime.fromtimestamp(s["started_at"]).strftime("%Y-%m-%d %H:%M") if s["started_at"] else "?"
+                model = (s.get("model") or "?")[:28]
+                ended = " (ended)" if s.get("ended_at") else ""
+                print(f"{s['id']:<30} {s['source']:<12} {model:<30} {s['message_count']:>8} {started}{ended}")
+
+        elif action == "export":
+            if args.session_id:
+                data = db.export_session(args.session_id)
+                if not data:
+                    print(f"Session '{args.session_id}' not found.")
+                    return
+                with open(args.output, "w") as f:
+                    f.write(_json.dumps(data, ensure_ascii=False) + "\n")
+                print(f"Exported 1 session to {args.output}")
+            else:
+                sessions = db.export_all(source=args.source)
+                with open(args.output, "w") as f:
+                    for s in sessions:
+                        f.write(_json.dumps(s, ensure_ascii=False) + "\n")
+                print(f"Exported {len(sessions)} sessions to {args.output}")
+
+        elif action == "delete":
+            if not args.yes:
+                confirm = input(f"Delete session '{args.session_id}' and all its messages? [y/N] ")
+                if confirm.lower() not in ("y", "yes"):
+                    print("Cancelled.")
+                    return
+            if db.delete_session(args.session_id):
+                print(f"Deleted session '{args.session_id}'.")
+            else:
+                print(f"Session '{args.session_id}' not found.")
+
+        elif action == "prune":
+            days = args.older_than
+            source_msg = f" from '{args.source}'" if args.source else ""
+            if not args.yes:
+                confirm = input(f"Delete all ended sessions older than {days} days{source_msg}? [y/N] ")
+                if confirm.lower() not in ("y", "yes"):
+                    print("Cancelled.")
+                    return
+            count = db.prune_sessions(older_than_days=days, source=args.source)
+            print(f"Pruned {count} session(s).")
+
+        elif action == "stats":
+            total = db.session_count()
+            msgs = db.message_count()
+            print(f"Total sessions: {total}")
+            print(f"Total messages: {msgs}")
+            for src in ["cli", "telegram", "discord", "whatsapp", "slack"]:
+                c = db.session_count(source=src)
+                if c > 0:
+                    print(f"  {src}: {c} sessions")
+            import os
+            db_path = db.db_path
+            if db_path.exists():
+                size_mb = os.path.getsize(db_path) / (1024 * 1024)
+                print(f"Database size: {size_mb:.1f} MB")
+
+        else:
+            sessions_parser.print_help()
+
+        db.close()
+
+    sessions_parser.set_defaults(func=cmd_sessions)
+
     # =========================================================================
     # version command
     # =========================================================================
diff --git a/hermes_state.py b/hermes_state.py
new file mode 100644
index 000000000..7929020bb
--- /dev/null
+++ b/hermes_state.py
@@ -0,0 +1,505 @@
+#!/usr/bin/env python3
+"""
+SQLite State Store for Hermes Agent.
+
+Provides persistent session storage with FTS5 full-text search, replacing
+the per-session JSONL file approach. Stores session metadata, full message
+history, and model configuration for CLI and gateway sessions.
+
+Key design decisions:
+- WAL mode for concurrent readers + one writer (gateway multi-platform)
+- FTS5 virtual table for fast text search across all session messages
+- Compression-triggered session splitting via parent_session_id chains
+- Batch runner and RL trajectories are NOT stored here (separate systems)
+- Session source tagging ('cli', 'telegram', 'discord', etc.) for filtering
+"""
+
+import json
+import os
+import sqlite3
+import time
+from pathlib import Path
+from typing import Dict, Any, List, Optional
+
+
+DEFAULT_DB_PATH = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes")) / "state.db"
+
+SCHEMA_VERSION = 1
+
+SCHEMA_SQL = """
+CREATE TABLE IF NOT EXISTS schema_version (
+    version INTEGER NOT NULL
+);
+
+CREATE TABLE IF NOT EXISTS sessions (
+    id TEXT PRIMARY KEY,
+    source TEXT NOT NULL,
+    user_id TEXT,
+    model TEXT,
+    model_config TEXT,
+    system_prompt TEXT,
+    parent_session_id TEXT,
+    started_at REAL NOT NULL,
+    ended_at REAL,
+    end_reason TEXT,
+    message_count INTEGER DEFAULT 0,
+    tool_call_count INTEGER DEFAULT 0,
+    input_tokens INTEGER DEFAULT 0,
+    output_tokens INTEGER DEFAULT 0,
+    FOREIGN KEY (parent_session_id) REFERENCES sessions(id)
+);
+
+CREATE TABLE IF NOT EXISTS messages (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    session_id TEXT NOT NULL REFERENCES sessions(id),
+    role TEXT NOT NULL,
+    content TEXT,
+    tool_call_id TEXT,
+    tool_calls TEXT,
+    tool_name TEXT,
+    timestamp REAL NOT NULL,
+    token_count INTEGER
+);
+
+CREATE INDEX IF NOT EXISTS idx_sessions_source ON sessions(source);
+CREATE INDEX IF NOT EXISTS idx_sessions_parent ON sessions(parent_session_id);
+CREATE INDEX IF NOT EXISTS idx_sessions_started ON sessions(started_at DESC);
+CREATE INDEX IF NOT EXISTS idx_messages_session ON messages(session_id, timestamp);
+"""
+
+FTS_SQL = """
+CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5(
+    content,
+    content=messages,
+    content_rowid=id
+);
+
+CREATE TRIGGER IF NOT EXISTS messages_fts_insert AFTER INSERT ON messages BEGIN
+    INSERT INTO messages_fts(rowid, content) VALUES (new.id, new.content);
+END;
+
+CREATE TRIGGER IF NOT EXISTS messages_fts_delete AFTER DELETE ON messages BEGIN
+    INSERT INTO messages_fts(messages_fts, rowid, content) VALUES('delete', old.id, old.content);
+END;
+
+CREATE TRIGGER IF NOT EXISTS messages_fts_update AFTER UPDATE ON messages BEGIN
+    INSERT INTO messages_fts(messages_fts, rowid, content) VALUES('delete', old.id, old.content);
+    INSERT INTO messages_fts(rowid, content) VALUES (new.id, new.content);
+END;
+"""
+
+
+class SessionDB:
+    """
+    SQLite-backed session storage with FTS5 search.
+
+    Thread-safe for the common gateway pattern (multiple reader threads,
+    single writer via WAL mode). Each method opens its own cursor.
+    """
+
+    def __init__(self, db_path: Path = None):
+        self.db_path = db_path or DEFAULT_DB_PATH
+        self.db_path.parent.mkdir(parents=True, exist_ok=True)
+
+        self._conn = sqlite3.connect(
+            str(self.db_path),
+            check_same_thread=False,
+            timeout=10.0,
+        )
+        self._conn.row_factory = sqlite3.Row
+        self._conn.execute("PRAGMA journal_mode=WAL")
+        self._conn.execute("PRAGMA foreign_keys=ON")
+
+        self._init_schema()
+
+    def _init_schema(self):
+        """Create tables and FTS if they don't exist, run migrations."""
+        cursor = self._conn.cursor()
+
+        cursor.executescript(SCHEMA_SQL)
+
+        # Check schema version
+        cursor.execute("SELECT version FROM schema_version LIMIT 1")
+        row = cursor.fetchone()
+        if row is None:
+            cursor.execute("INSERT INTO schema_version (version) VALUES (?)", (SCHEMA_VERSION,))
+        # Future migrations would go here: if row["version"] < 2: ...
+
+        # FTS5 setup (separate because CREATE VIRTUAL TABLE can't be in executescript with IF NOT EXISTS reliably)
+        try:
+            cursor.execute("SELECT * FROM messages_fts LIMIT 0")
+        except sqlite3.OperationalError:
+            cursor.executescript(FTS_SQL)
+
+        self._conn.commit()
+
+    def close(self):
+        """Close the database connection."""
+        if self._conn:
+            self._conn.close()
+            self._conn = None
+
+    # =========================================================================
+    # Session lifecycle
+    # =========================================================================
+
+    def create_session(
+        self,
+        session_id: str,
+        source: str,
+        model: str = None,
+        model_config: Dict[str, Any] = None,
+        system_prompt: str = None,
+        user_id: str = None,
+        parent_session_id: str = None,
+    ) -> str:
+        """Create a new session record. Returns the session_id."""
+        self._conn.execute(
+            """INSERT INTO sessions (id, source, user_id, model, model_config,
+               system_prompt, parent_session_id, started_at)
+               VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
+            (
+                session_id,
+                source,
+                user_id,
+                model,
+                json.dumps(model_config) if model_config else None,
+                system_prompt,
+                parent_session_id,
+                time.time(),
+            ),
+        )
+        self._conn.commit()
+        return session_id
+
+    def end_session(self, session_id: str, end_reason: str) -> None:
+        """Mark a session as ended."""
+        self._conn.execute(
+            "UPDATE sessions SET ended_at = ?, end_reason = ? WHERE id = ?",
+            (time.time(), end_reason, session_id),
+        )
+        self._conn.commit()
+
+    def update_system_prompt(self, session_id: str, system_prompt: str) -> None:
+        """Store the full assembled system prompt snapshot."""
+        self._conn.execute(
+            "UPDATE sessions SET system_prompt = ? WHERE id = ?",
+            (system_prompt, session_id),
+        )
+        self._conn.commit()
+
+    def update_token_counts(
+        self, session_id: str, input_tokens: int = 0, output_tokens: int = 0
+    ) -> None:
+        """Increment token counters on a session."""
+        self._conn.execute(
+            """UPDATE sessions SET
+               input_tokens = input_tokens + ?,
+               output_tokens = output_tokens + ?
+               WHERE id = ?""",
+            (input_tokens, output_tokens, session_id),
+        )
+        self._conn.commit()
+
+    def get_session(self, session_id: str) -> Optional[Dict[str, Any]]:
+        """Get a session by ID."""
+        cursor = self._conn.execute(
+            "SELECT * FROM sessions WHERE id = ?", (session_id,)
+        )
+        row = cursor.fetchone()
+        return dict(row) if row else None
+
+    # =========================================================================
+    # Message storage
+    # =========================================================================
+
+    def append_message(
+        self,
+        session_id: str,
+        role: str,
+        content: str = None,
+        tool_name: str = None,
+        tool_calls: Any = None,
+        tool_call_id: str = None,
+        token_count: int = None,
+    ) -> int:
+        """
+        Append a message to a session. Returns the message row ID.
+
+        Also increments the session's message_count (and tool_call_count
+        if role is 'tool' or tool_calls is present).
+        """
+        cursor = self._conn.execute(
+            """INSERT INTO messages (session_id, role, content, tool_call_id,
+               tool_calls, tool_name, timestamp, token_count)
+               VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
+            (
+                session_id,
+                role,
+                content,
+                tool_call_id,
+                json.dumps(tool_calls) if tool_calls else None,
+                tool_name,
+                time.time(),
+                token_count,
+            ),
+        )
+        msg_id = cursor.lastrowid
+
+        # Update counters
+        is_tool_related = role == "tool" or tool_calls is not None
+        if is_tool_related:
+            self._conn.execute(
+                """UPDATE sessions SET message_count = message_count + 1,
+                   tool_call_count = tool_call_count + 1 WHERE id = ?""",
+                (session_id,),
+            )
+        else:
+            self._conn.execute(
+                "UPDATE sessions SET message_count = message_count + 1 WHERE id = ?",
+                (session_id,),
+            )
+
+        self._conn.commit()
+        return msg_id
+
+    def get_messages(self, session_id: str) -> List[Dict[str, Any]]:
+        """Load all messages for a session, ordered by timestamp."""
+        cursor = self._conn.execute(
+            "SELECT * FROM messages WHERE session_id = ? ORDER BY timestamp, id",
+            (session_id,),
+        )
+        rows = cursor.fetchall()
+        result = []
+        for row in rows:
+            msg = dict(row)
+            if msg.get("tool_calls"):
+                try:
+                    msg["tool_calls"] = json.loads(msg["tool_calls"])
+                except (json.JSONDecodeError, TypeError):
+                    pass
+            result.append(msg)
+        return result
+
+    def get_messages_as_conversation(self, session_id: str) -> List[Dict[str, Any]]:
+        """
+        Load messages in the OpenAI conversation format (role + content dicts).
+        Used by the gateway to restore conversation history.
+        """
+        cursor = self._conn.execute(
+            "SELECT role, content, tool_call_id, tool_calls, tool_name "
+            "FROM messages WHERE session_id = ? ORDER BY timestamp, id",
+            (session_id,),
+        )
+        messages = []
+        for row in cursor.fetchall():
+            msg = {"role": row["role"], "content": row["content"]}
+            if row["tool_call_id"]:
+                msg["tool_call_id"] = row["tool_call_id"]
+            if row["tool_name"]:
+                msg["tool_name"] = row["tool_name"]
+            if row["tool_calls"]:
+                try:
+                    msg["tool_calls"] = json.loads(row["tool_calls"])
+                except (json.JSONDecodeError, TypeError):
+                    pass
+            messages.append(msg)
+        return messages
+
+    # =========================================================================
+    # Search
+    # =========================================================================
+
+    def search_messages(
+        self,
+        query: str,
+        source_filter: List[str] = None,
+        role_filter: List[str] = None,
+        limit: int = 20,
+        offset: int = 0,
+    ) -> List[Dict[str, Any]]:
+        """
+        Full-text search across session messages using FTS5.
+
+        Supports FTS5 query syntax:
+          - Simple keywords: "docker deployment"
+          - Phrases: '"exact phrase"'
+          - Boolean: "docker OR kubernetes", "python NOT java"
+          - Prefix: "deploy*"
+
+        Returns matching messages with session metadata, content snippet,
+        and surrounding context (1 message before and after the match).
+        """
+        if not query or not query.strip():
+            return []
+
+        if source_filter is None:
+            source_filter = ["cli", "telegram", "discord", "whatsapp", "slack"]
+
+        # Build WHERE clauses dynamically
+        where_clauses = ["messages_fts MATCH ?"]
+        params: list = [query]
+
+        source_placeholders = ",".join("?" for _ in source_filter)
+        where_clauses.append(f"s.source IN ({source_placeholders})")
+        params.extend(source_filter)
+
+        if role_filter:
+            role_placeholders = ",".join("?" for _ in role_filter)
+            where_clauses.append(f"m.role IN ({role_placeholders})")
+            params.extend(role_filter)
+
+        where_sql = " AND ".join(where_clauses)
+        params.extend([limit, offset])
+
+        sql = f"""
+            SELECT
+                m.id,
+                m.session_id,
+                m.role,
+                snippet(messages_fts, 0, '>>>', '<<<', '...', 40) AS snippet,
+                m.content,
+                m.timestamp,
+                m.tool_name,
+                s.source,
+                s.model,
+                s.started_at AS session_started
+            FROM messages_fts
+            JOIN messages m ON m.id = messages_fts.rowid
+            JOIN sessions s ON s.id = m.session_id
+            WHERE {where_sql}
+            ORDER BY rank
+            LIMIT ? OFFSET ?
+        """
+
+        cursor = self._conn.execute(sql, params)
+        matches = [dict(row) for row in cursor.fetchall()]
+
+        # Add surrounding context (1 message before + after each match)
+        for match in matches:
+            try:
+                ctx_cursor = self._conn.execute(
+                    """SELECT role, content FROM messages
+                       WHERE session_id = ? AND id >= ? - 1 AND id <= ? + 1
+                       ORDER BY id""",
+                    (match["session_id"], match["id"], match["id"]),
+                )
+                context_msgs = [
+                    {"role": r["role"], "content": (r["content"] or "")[:200]}
+                    for r in ctx_cursor.fetchall()
+                ]
+                match["context"] = context_msgs
+            except Exception:
+                match["context"] = []
+
+            # Remove full content from result (snippet is enough, saves tokens)
+            match.pop("content", None)
+
+        return matches
+
+    def search_sessions(
+        self,
+        source: str = None,
+        limit: int = 20,
+        offset: int = 0,
+    ) -> List[Dict[str, Any]]:
+        """List sessions, optionally filtered by source."""
+        if source:
+            cursor = self._conn.execute(
+                "SELECT * FROM sessions WHERE source = ? ORDER BY started_at DESC LIMIT ? OFFSET ?",
+                (source, limit, offset),
+            )
+        else:
+            cursor = self._conn.execute(
+                "SELECT * FROM sessions ORDER BY started_at DESC LIMIT ? OFFSET ?",
+                (limit, offset),
+            )
+        return [dict(row) for row in cursor.fetchall()]
+
+    # =========================================================================
+    # Utility
+    # =========================================================================
+
+    def session_count(self, source: str = None) -> int:
+        """Count sessions, optionally filtered by source."""
+        if source:
+            cursor = self._conn.execute(
+                "SELECT COUNT(*) FROM sessions WHERE source = ?", (source,)
+            )
+        else:
+            cursor = self._conn.execute("SELECT COUNT(*) FROM sessions")
+        return cursor.fetchone()[0]
+
+    def message_count(self, session_id: str = None) -> int:
+        """Count messages, optionally for a specific session."""
+        if session_id:
+            cursor = self._conn.execute(
+                "SELECT COUNT(*) FROM messages WHERE session_id = ?", (session_id,)
+            )
+        else:
+            cursor = self._conn.execute("SELECT COUNT(*) FROM messages")
+        return cursor.fetchone()[0]
+
+    # =========================================================================
+    # Export and cleanup
+    # =========================================================================
+
+    def export_session(self, session_id: str) -> Optional[Dict[str, Any]]:
+        """Export a single session with all its messages as a dict."""
+        session = self.get_session(session_id)
+        if not session:
+            return None
+        messages = self.get_messages(session_id)
+        return {**session, "messages": messages}
+
+    def export_all(self, source: str = None) -> List[Dict[str, Any]]:
+        """
+        Export all sessions (with messages) as a list of dicts.
+        Suitable for writing to a JSONL file for backup/analysis.
+        """
+        sessions = self.search_sessions(source=source, limit=100000)
+        results = []
+        for session in sessions:
+            messages = self.get_messages(session["id"])
+            results.append({**session, "messages": messages})
+        return results
+
+    def delete_session(self, session_id: str) -> bool:
+        """Delete a session and all its messages. Returns True if found."""
+        cursor = self._conn.execute(
+            "SELECT COUNT(*) FROM sessions WHERE id = ?", (session_id,)
+        )
+        if cursor.fetchone()[0] == 0:
+            return False
+        self._conn.execute("DELETE FROM messages WHERE session_id = ?", (session_id,))
+        self._conn.execute("DELETE FROM sessions WHERE id = ?", (session_id,))
+        self._conn.commit()
+        return True
+
+    def prune_sessions(self, older_than_days: int = 90, source: str = None) -> int:
+        """
+        Delete sessions older than N days. Returns count of deleted sessions.
+        Only prunes ended sessions (not active ones).
+        """
+        import time as _time
+        cutoff = _time.time() - (older_than_days * 86400)
+
+        if source:
+            cursor = self._conn.execute(
+                """SELECT id FROM sessions
+                   WHERE started_at < ? AND ended_at IS NOT NULL AND source = ?""",
+                (cutoff, source),
+            )
+        else:
+            cursor = self._conn.execute(
+                "SELECT id FROM sessions WHERE started_at < ? AND ended_at IS NOT NULL",
+                (cutoff,),
+            )
+        session_ids = [row["id"] for row in cursor.fetchall()]
+
+        for sid in session_ids:
+            self._conn.execute("DELETE FROM messages WHERE session_id = ?", (sid,))
+            self._conn.execute("DELETE FROM sessions WHERE id = ?", (sid,))
+
+        self._conn.commit()
+        return len(session_ids)
diff --git a/model_tools.py b/model_tools.py
index ecf1ee187..c1bd7d547 100644
--- a/model_tools.py
+++ b/model_tools.py
@@ -87,6 +87,10 @@ from tools.browser_tool import (
 from tools.tts_tool import text_to_speech_tool, check_tts_requirements
 # Planning & task management tool
 from tools.todo_tool import todo_tool, check_todo_requirements, TODO_SCHEMA
+# Persistent memory tool
+from tools.memory_tool import memory_tool, check_memory_requirements, MEMORY_SCHEMA
+# Session search tool (past conversation recall with summarization)
+from tools.session_search_tool import session_search, check_session_search_requirements, SESSION_SEARCH_SCHEMA
 from toolsets import (
     get_toolset, resolve_toolset, resolve_multiple_toolsets,
     get_all_toolsets, get_toolset_names, validate_toolset,
@@ -183,6 +187,20 @@ TOOLSET_REQUIREMENTS = {
         "setup_url": None,
         "tools": ["todo"],
     },
+    "memory": {
+        "name": "Persistent Memory",
+        "env_vars": [],  # File-based, no external deps
+        "check_fn": check_memory_requirements,
+        "setup_url": None,
+        "tools": ["memory"],
+    },
+    "session_search": {
+        "name": "Session History Search",
+        "env_vars": ["OPENROUTER_API_KEY"],  # Needs summarizer model
+        "check_fn": check_session_search_requirements,
+        "setup_url": "https://openrouter.ai/keys",
+        "tools": ["session_search"],
+    },
 }
 
 
@@ -936,6 +954,26 @@ def get_todo_tool_definitions() -> List[Dict[str, Any]]:
     return [{"type": "function", "function": TODO_SCHEMA}]
 
 
+def get_memory_tool_definitions() -> List[Dict[str, Any]]:
+    """
+    Get tool definitions for the persistent memory tool.
+    
+    Returns:
+        List[Dict]: List containing the memory tool definition compatible with OpenAI API
+    """
+    return [{"type": "function", "function": MEMORY_SCHEMA}]
+
+
+def get_session_search_tool_definitions() -> List[Dict[str, Any]]:
+    """
+    Get tool definitions for the session history search tool.
+    
+    Returns:
+        List[Dict]: List containing the session_search tool definition compatible with OpenAI API
+    """
+    return [{"type": "function", "function": SESSION_SEARCH_SCHEMA}]
+
+
 def get_send_message_tool_definitions():
     """Tool definitions for cross-channel messaging."""
     return [
@@ -1093,6 +1131,14 @@ def get_all_tool_names() -> List[str]:
     if check_todo_requirements():
         tool_names.extend(["todo"])
     
+    # Persistent memory (always available)
+    if check_memory_requirements():
+        tool_names.extend(["memory"])
+    
+    # Session history search
+    if check_session_search_requirements():
+        tool_names.extend(["session_search"])
+    
     # Cross-channel messaging (always available on messaging platforms)
     tool_names.extend(["send_message"])
     
@@ -1150,6 +1196,10 @@ TOOL_TO_TOOLSET_MAP = {
     "send_message": "messaging_tools",
     # Planning & task management
     "todo": "todo_tools",
+    # Persistent memory
+    "memory": "memory_tools",
+    # Session history search
+    "session_search": "session_search_tools",
 }
 
 
@@ -1261,6 +1311,16 @@ def get_tool_definitions(
         for tool in get_todo_tool_definitions():
             all_available_tools_map[tool["function"]["name"]] = tool
     
+    # Persistent memory tool
+    if check_memory_requirements():
+        for tool in get_memory_tool_definitions():
+            all_available_tools_map[tool["function"]["name"]] = tool
+    
+    # Session history search tool
+    if check_session_search_requirements():
+        for tool in get_session_search_tool_definitions():
+            all_available_tools_map[tool["function"]["name"]] = tool
+    
     # Cross-channel messaging (always available on messaging platforms)
     for tool in get_send_message_tool_definitions():
         all_available_tools_map[tool["function"]["name"]] = tool
@@ -2001,6 +2061,14 @@ def handle_function_call(
         elif function_name == "todo":
             return json.dumps({"error": "todo must be handled by the agent loop"})
 
+        # Memory tool -- handled by the agent loop (needs MemoryStore instance).
+        elif function_name == "memory":
+            return json.dumps({"error": "Memory is not available. It may be disabled in config or this environment."})
+
+        # Session search -- handled by the agent loop (needs SessionDB instance).
+        elif function_name == "session_search":
+            return json.dumps({"error": "Session search is not available. The session database may not be initialized."})
+
         else:
             error_msg = f"Unknown function: {function_name}"
             print(f"❌ {error_msg}")
@@ -2090,6 +2158,18 @@ def get_available_toolsets() -> Dict[str, Dict[str, Any]]:
             "tools": ["todo"],
             "description": "Planning & task management: in-memory todo list for multi-step work",
             "requirements": []
+        },
+        "memory_tools": {
+            "available": check_memory_requirements(),
+            "tools": ["memory"],
+            "description": "Persistent memory: bounded MEMORY.md + USER.md injected into system prompt",
+            "requirements": []
+        },
+        "session_search_tools": {
+            "available": check_session_search_requirements(),
+            "tools": ["session_search"],
+            "description": "Session history search: FTS5 search + Gemini Flash summarization of past conversations",
+            "requirements": ["OPENROUTER_API_KEY", "~/.hermes/state.db"]
         }
     }
     
diff --git a/run_agent.py b/run_agent.py
index f57c8647c..9366d49bb 100644
--- a/run_agent.py
+++ b/run_agent.py
@@ -875,6 +875,24 @@ def _build_tool_preview(tool_name: str, args: dict, max_len: int = 40) -> str:
         else:
             return f"planning {len(todos_arg)} task(s)"
     
+    if tool_name == "session_search":
+        query = args.get("query", "")
+        return f"recall: \"{query[:25]}{'...' if len(query) > 25 else ''}\""
+
+    if tool_name == "memory":
+        action = args.get("action", "")
+        target = args.get("target", "")
+        if action == "add":
+            content = args.get("content", "")
+            return f"+{target}: \"{content[:25]}{'...' if len(content) > 25 else ''}\""
+        elif action == "replace":
+            return f"~{target}: \"{args.get('old_text', '')[:20]}\""
+        elif action == "remove":
+            return f"-{target}: \"{args.get('old_text', '')[:20]}\""
+        elif action == "read":
+            return f"read {target}"
+        return action
+    
     if tool_name == "send_message":
         target = args.get("target", "?")
         msg = args.get("message", "")
@@ -1061,6 +1079,8 @@ class AIAgent:
         prefill_messages: List[Dict[str, Any]] = None,
         platform: str = None,
         skip_context_files: bool = False,
+        skip_memory: bool = False,
+        session_db=None,
     ):
         """
         Initialize the AI Agent.
@@ -1269,10 +1289,51 @@ class AIAgent:
         # Track conversation messages for session logging
         self._session_messages: List[Dict[str, Any]] = []
         
+        # Cached system prompt -- built once per session, only rebuilt on compression
+        self._cached_system_prompt: Optional[str] = None
+        
+        # SQLite session store (optional -- provided by CLI or gateway)
+        self._session_db = session_db
+        if self._session_db:
+            try:
+                self._session_db.create_session(
+                    session_id=self.session_id,
+                    source=self.platform or "cli",
+                    model=self.model,
+                    model_config={
+                        "max_iterations": self.max_iterations,
+                        "reasoning_config": reasoning_config,
+                        "max_tokens": max_tokens,
+                    },
+                    user_id=None,
+                )
+            except Exception:
+                pass
+        
         # In-memory todo list for task planning (one per agent/session)
         from tools.todo_tool import TodoStore
         self._todo_store = TodoStore()
         
+        # Persistent memory (MEMORY.md + USER.md) -- loaded from disk
+        self._memory_store = None
+        self._memory_enabled = False
+        self._user_profile_enabled = False
+        if not skip_memory:
+            try:
+                from hermes_cli.config import load_config as _load_mem_config
+                mem_config = _load_mem_config().get("memory", {})
+                self._memory_enabled = mem_config.get("memory_enabled", False)
+                self._user_profile_enabled = mem_config.get("user_profile_enabled", False)
+                if self._memory_enabled or self._user_profile_enabled:
+                    from tools.memory_tool import MemoryStore
+                    self._memory_store = MemoryStore(
+                        memory_char_limit=mem_config.get("memory_char_limit", 2200),
+                        user_char_limit=mem_config.get("user_char_limit", 1375),
+                    )
+                    self._memory_store.load_from_disk()
+            except Exception:
+                pass  # Memory is optional -- don't break agent init
+        
         # Initialize context compressor for automatic context management
         # Compresses conversation when approaching model's context limit
         # Configuration via environment variables (can be set in .env or cli-config.yaml)
@@ -1452,6 +1513,32 @@ class AIAgent:
             else:
                 return f"┊ 📋 plan      {len(todos_arg)} task(s)  {dur}"
 
+        # ── Session Search ──
+        if tool_name == "session_search":
+            query = _trunc(args.get("query", ""), 35)
+            return f"┊ 🔍 recall    \"{query}\"  {dur}"
+
+        # ── Memory ──
+        if tool_name == "memory":
+            action = args.get("action", "?")
+            target = args.get("target", "")
+            if action == "add":
+                preview = _trunc(args.get("content", ""), 30)
+                return f"┊ 🧠 memory    +{target}: \"{preview}\"  {dur}"
+            elif action == "replace":
+                snippet = _trunc(args.get("old_text", ""), 20)
+                return f"┊ 🧠 memory    ~{target}: \"{snippet}\"  {dur}"
+            elif action == "remove":
+                snippet = _trunc(args.get("old_text", ""), 20)
+                return f"┊ 🧠 memory    -{target}: \"{snippet}\"  {dur}"
+            elif action == "read":
+                return f"┊ 🧠 memory    read {target}  {dur}"
+            elif action == "search_sessions":
+                query = _trunc(args.get("content", ""), 30)
+                return f"┊ 🧠 recall    \"{query}\"  {dur}"
+            else:
+                return f"┊ 🧠 memory    {action}  {dur}"
+
         # ── Skills ──
         if tool_name == "skills_list":
             return f"┊ 📚 skills    list {args.get('category', 'all')}  {dur}"
@@ -2041,6 +2128,70 @@ class AIAgent:
         """Check if an interrupt has been requested."""
         return self._interrupt_requested
     
+    def _build_system_prompt(self, system_message: str = None) -> str:
+        """
+        Assemble the full system prompt from all layers.
+        
+        Called once per session (cached on self._cached_system_prompt) and only
+        rebuilt after context compression events. This ensures the system prompt
+        is stable across all turns in a session, maximizing prefix cache hits.
+        """
+        # Layers (in order):
+        #   1. Default agent identity (always present)
+        #   2. User / gateway system prompt (if provided)
+        #   3. Persistent memory (frozen snapshot)
+        #   4. Skills guidance (if skills tools are loaded)
+        #   5. Context files (SOUL.md, AGENTS.md, .cursorrules)
+        #   6. Current date & time (frozen at build time)
+        #   7. Platform-specific formatting hint
+        prompt_parts = [DEFAULT_AGENT_IDENTITY]
+
+        caller_prompt = system_message if system_message is not None else self.ephemeral_system_prompt
+        if caller_prompt:
+            prompt_parts.append(caller_prompt)
+
+        if self._memory_store:
+            if self._memory_enabled:
+                mem_block = self._memory_store.format_for_system_prompt("memory")
+                if mem_block:
+                    prompt_parts.append(mem_block)
+            if self._user_profile_enabled:
+                user_block = self._memory_store.format_for_system_prompt("user")
+                if user_block:
+                    prompt_parts.append(user_block)
+
+        has_skills_tools = any(name in self.valid_tool_names for name in ['skills_list', 'skill_view'])
+        skills_prompt = build_skills_system_prompt() if has_skills_tools else ""
+        if skills_prompt:
+            prompt_parts.append(skills_prompt)
+
+        if not self.skip_context_files:
+            context_files_prompt = build_context_files_prompt()
+            if context_files_prompt:
+                prompt_parts.append(context_files_prompt)
+
+        now = datetime.now()
+        prompt_parts.append(
+            f"Conversation started: {now.strftime('%A, %B %d, %Y %I:%M %p')}"
+        )
+
+        platform_key = (self.platform or "").lower().strip()
+        if platform_key in PLATFORM_HINTS:
+            prompt_parts.append(PLATFORM_HINTS[platform_key])
+
+        return "\n\n".join(prompt_parts)
+    
+    def _invalidate_system_prompt(self):
+        """
+        Invalidate the cached system prompt, forcing a rebuild on the next turn.
+        
+        Called after context compression events. Also reloads memory from disk
+        so the rebuilt prompt captures any writes from this session.
+        """
+        self._cached_system_prompt = None
+        if self._memory_store:
+            self._memory_store.load_from_disk()
+    
     def run_conversation(
         self,
         user_message: str,
@@ -2093,47 +2244,27 @@ class AIAgent:
         if not self.quiet_mode:
             print(f"💬 Starting conversation: '{user_message[:60]}{'...' if len(user_message) > 60 else ''}'")
         
-        # ── Build the full system prompt ──
-        # Layers (in order):
-        #   1. Default agent identity (always present)
-        #   2. User / gateway system prompt (if provided)
-        #   3. Skills guidance (if skills tools are loaded)
-        #   4. Context files (SOUL.md, AGENTS.md, .cursorrules)
-        #   5. Current date & time
-        #   6. Platform-specific formatting hint
-        prompt_parts = [DEFAULT_AGENT_IDENTITY]
+        # ── System prompt (cached per session for prefix caching) ──
+        # Built once on first call, reused for all subsequent calls.
+        # Only rebuilt after context compression events (which invalidate
+        # the cache and reload memory from disk).
+        if self._cached_system_prompt is None:
+            self._cached_system_prompt = self._build_system_prompt(system_message)
+            # Store the system prompt snapshot in SQLite
+            if self._session_db:
+                try:
+                    self._session_db.update_system_prompt(self.session_id, self._cached_system_prompt)
+                except Exception:
+                    pass
 
-        # Layer in the caller-supplied system prompt (explicit > ephemeral).
-        caller_prompt = system_message if system_message is not None else self.ephemeral_system_prompt
-        if caller_prompt:
-            prompt_parts.append(caller_prompt)
+        active_system_prompt = self._cached_system_prompt
 
-        # Auto-include skills guidance if skills tools are available.
-        has_skills_tools = any(name in self.valid_tool_names for name in ['skills_list', 'skill_view'])
-        skills_prompt = build_skills_system_prompt() if has_skills_tools else ""
-        if skills_prompt:
-            prompt_parts.append(skills_prompt)
-
-        # Auto-include context files (SOUL.md, AGENTS.md, .cursorrules).
-        # Skipped for batch processing / data generation to avoid polluting trajectories.
-        if not self.skip_context_files:
-            context_files_prompt = build_context_files_prompt()
-            if context_files_prompt:
-                prompt_parts.append(context_files_prompt)
-
-        # Current local date and time so the model is never confused about
-        # what day/time it is (LLM training cutoffs can otherwise mislead it).
-        now = datetime.now()
-        prompt_parts.append(
-            f"Current local date and time: {now.strftime('%A, %B %d, %Y %I:%M %p')}"
-        )
-
-        # Platform-specific formatting hint (no markdown on WhatsApp, etc.).
-        platform_key = (self.platform or "").lower().strip()
-        if platform_key in PLATFORM_HINTS:
-            prompt_parts.append(PLATFORM_HINTS[platform_key])
-
-        active_system_prompt = "\n\n".join(prompt_parts)
+        # Log user message to SQLite
+        if self._session_db:
+            try:
+                self._session_db.append_message(self.session_id, "user", user_message)
+            except Exception:
+                pass
 
         # Main conversation loop
         api_call_count = 0
@@ -2510,6 +2641,25 @@ class AIAgent:
                             todo_snapshot = self._todo_store.format_for_injection()
                             if todo_snapshot:
                                 messages.append({"role": "user", "content": todo_snapshot})
+                            # Rebuild system prompt with fresh date/time + memory
+                            self._invalidate_system_prompt()
+                            active_system_prompt = self._build_system_prompt(system_message)
+                            self._cached_system_prompt = active_system_prompt
+                            # Split session in SQLite (close old, open new with parent link)
+                            if self._session_db:
+                                try:
+                                    self._session_db.end_session(self.session_id, "compression")
+                                    old_session_id = self.session_id
+                                    self.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
+                                    self._session_db.create_session(
+                                        session_id=self.session_id,
+                                        source=self.platform or "cli",
+                                        model=self.model,
+                                        parent_session_id=old_session_id,
+                                    )
+                                    self._session_db.update_system_prompt(self.session_id, active_system_prompt)
+                                except Exception:
+                                    pass
                             print(f"{self.log_prefix}   🗜️  Compressed {original_len} → {len(messages)} messages, retrying...")
                             continue  # Retry with compressed messages
                         else:
@@ -2769,9 +2919,33 @@ class AIAgent:
                                 store=self._todo_store,
                             )
                             tool_duration = time.time() - tool_start_time
-                            # Show clean output in quiet mode (no spinner needed -- instant)
                             if self.quiet_mode:
                                 print(f"  {self._get_cute_tool_message('todo', function_args, tool_duration)}")
+                        # Session search -- handle directly (needs SessionDB instance)
+                        elif function_name == "session_search" and self._session_db:
+                            from tools.session_search_tool import session_search as _session_search
+                            function_result = _session_search(
+                                query=function_args.get("query", ""),
+                                role_filter=function_args.get("role_filter"),
+                                limit=function_args.get("limit", 3),
+                                db=self._session_db,
+                            )
+                            tool_duration = time.time() - tool_start_time
+                            if self.quiet_mode:
+                                print(f"  {self._get_cute_tool_message('session_search', function_args, tool_duration)}")
+                        # Memory tool -- handle directly (needs agent's MemoryStore instance)
+                        elif function_name == "memory":
+                            from tools.memory_tool import memory_tool as _memory_tool
+                            function_result = _memory_tool(
+                                action=function_args.get("action"),
+                                target=function_args.get("target", "memory"),
+                                content=function_args.get("content"),
+                                old_text=function_args.get("old_text"),
+                                store=self._memory_store,
+                            )
+                            tool_duration = time.time() - tool_start_time
+                            if self.quiet_mode:
+                                print(f"  {self._get_cute_tool_message('memory', function_args, tool_duration)}")
                         # Execute other tools - with animated kawaii spinner in quiet mode
                         # The face is "alive" while the tool works, then vanishes
                         # and is replaced by the clean result line.
@@ -2790,7 +2964,7 @@ class AIAgent:
                                 'vision_analyze': '👁️', 'mixture_of_agents': '🧠',
                                 'skills_list': '📚', 'skill_view': '📚',
                                 'schedule_cronjob': '⏰', 'list_cronjobs': '⏰', 'remove_cronjob': '⏰',
-                                'send_message': '📨', 'todo': '📋',
+                                'send_message': '📨', 'todo': '📋', 'memory': '🧠', 'session_search': '🔍',
                             }
                             emoji = tool_emoji_map.get(function_name, '⚡')
                             preview = _build_tool_preview(function_name, function_args) or function_name
@@ -2852,10 +3026,29 @@ class AIAgent:
                             messages, 
                             current_tokens=self.context_compressor.last_prompt_tokens
                         )
-                        # Re-inject todo state after compression (cache already invalidated)
+                        # Re-inject todo state after compression
                         todo_snapshot = self._todo_store.format_for_injection()
                         if todo_snapshot:
                             messages.append({"role": "user", "content": todo_snapshot})
+                        # Rebuild system prompt with fresh date/time + memory
+                        self._invalidate_system_prompt()
+                        active_system_prompt = self._build_system_prompt(system_message)
+                        self._cached_system_prompt = active_system_prompt
+                        # Split session in SQLite (close old, open new with parent link)
+                        if self._session_db:
+                            try:
+                                self._session_db.end_session(self.session_id, "compression")
+                                old_session_id = self.session_id
+                                self.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
+                                self._session_db.create_session(
+                                    session_id=self.session_id,
+                                    source=self.platform or "cli",
+                                    model=self.model,
+                                    parent_session_id=old_session_id,
+                                )
+                                self._session_db.update_system_prompt(self.session_id, active_system_prompt)
+                            except Exception:
+                                pass
                     
                     # Save session log incrementally (so progress is visible even if interrupted)
                     self._session_messages = messages
@@ -3042,6 +3235,32 @@ class AIAgent:
         self._session_messages = messages
         self._save_session_log(messages)
         
+        # Log new messages to SQLite session store (everything after the user message we already logged)
+        if self._session_db:
+            try:
+                # Skip messages that were in the conversation history before this call
+                # (the user message was already logged at the start of run_conversation)
+                start_idx = (len(conversation_history) if conversation_history else 0) + 1  # +1 for the user msg
+                for msg in messages[start_idx:]:
+                    role = msg.get("role", "unknown")
+                    content = msg.get("content")
+                    # Extract tool call info from assistant messages
+                    tool_calls_data = None
+                    if hasattr(msg, "tool_calls") and msg.tool_calls:
+                        tool_calls_data = [{"name": tc.function.name, "arguments": tc.function.arguments} for tc in msg.tool_calls]
+                    elif isinstance(msg.get("tool_calls"), list):
+                        tool_calls_data = msg["tool_calls"]
+                    self._session_db.append_message(
+                        session_id=self.session_id,
+                        role=role,
+                        content=content,
+                        tool_name=msg.get("tool_name"),
+                        tool_calls=tool_calls_data,
+                        tool_call_id=msg.get("tool_call_id"),
+                    )
+            except Exception:
+                pass
+        
         # Build result with interrupt info if applicable
         result = {
             "final_response": final_response,
diff --git a/scripts/install.sh b/scripts/install.sh
index ef1eff2c2..a0c56ce89 100755
--- a/scripts/install.sh
+++ b/scripts/install.sh
@@ -614,7 +614,7 @@ copy_config_templates() {
     log_info "Setting up configuration files..."
     
     # Create ~/.hermes directory structure (config at top level, code in subdir)
-    mkdir -p "$HERMES_HOME"/{cron,sessions,logs,pairing,hooks,image_cache,audio_cache}
+    mkdir -p "$HERMES_HOME"/{cron,sessions,logs,pairing,hooks,image_cache,audio_cache,memories}
     
     # Create .env at ~/.hermes/.env (top level, easy to find)
     if [ ! -f "$HERMES_HOME/.env" ]; then
diff --git a/tools/memory_tool.py b/tools/memory_tool.py
new file mode 100644
index 000000000..ad303c531
--- /dev/null
+++ b/tools/memory_tool.py
@@ -0,0 +1,385 @@
+#!/usr/bin/env python3
+"""
+Memory Tool Module - Persistent Curated Memory
+
+Provides bounded, file-backed memory that persists across sessions. Two stores:
+  - MEMORY.md: agent's personal notes and observations (environment facts, project
+    conventions, tool quirks, things learned)
+  - USER.md: what the agent knows about the user (preferences, communication style,
+    expectations, workflow habits)
+
+Both are injected into the system prompt as a frozen snapshot at session start.
+Mid-session writes update files on disk immediately (durable) but do NOT change
+the system prompt -- this preserves the prefix cache for the entire session.
+The snapshot refreshes on the next session start.
+
+Entry delimiter: § (section sign). Entries can be multiline.
+Character limits (not tokens) because char counts are model-independent.
+
+Design:
+- Single `memory` tool with action parameter: add, replace, remove, read
+- replace/remove use short unique substring matching (not full text or IDs)
+- Behavioral guidance lives in the tool schema description
+- Frozen snapshot pattern: system prompt is stable, tool responses show live state
+"""
+
+import json
+import os
+import fcntl
+from pathlib import Path
+from typing import Dict, Any, List, Optional
+
+# Where memory files live
+MEMORY_DIR = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes")) / "memories"
+
+ENTRY_DELIMITER = "\n§\n"
+
+
+class MemoryStore:
+    """
+    Bounded curated memory with file persistence. One instance per AIAgent.
+
+    Maintains two parallel states:
+      - _system_prompt_snapshot: frozen at load time, used for system prompt injection.
+        Never mutated mid-session. Keeps prefix cache stable.
+      - memory_entries / user_entries: live state, mutated by tool calls, persisted to disk.
+        Tool responses always reflect this live state.
+    """
+
+    def __init__(self, memory_char_limit: int = 2200, user_char_limit: int = 1375):
+        self.memory_entries: List[str] = []
+        self.user_entries: List[str] = []
+        self.memory_char_limit = memory_char_limit
+        self.user_char_limit = user_char_limit
+        # Frozen snapshot for system prompt -- set once at load_from_disk()
+        self._system_prompt_snapshot: Dict[str, str] = {"memory": "", "user": ""}
+
+    def load_from_disk(self):
+        """Load entries from MEMORY.md and USER.md, capture system prompt snapshot."""
+        MEMORY_DIR.mkdir(parents=True, exist_ok=True)
+
+        self.memory_entries = self._read_file(MEMORY_DIR / "MEMORY.md")
+        self.user_entries = self._read_file(MEMORY_DIR / "USER.md")
+
+        # Capture frozen snapshot for system prompt injection
+        self._system_prompt_snapshot = {
+            "memory": self._render_block("memory", self.memory_entries),
+            "user": self._render_block("user", self.user_entries),
+        }
+
+    def save_to_disk(self, target: str):
+        """Persist entries to the appropriate file. Called after every mutation."""
+        MEMORY_DIR.mkdir(parents=True, exist_ok=True)
+
+        if target == "memory":
+            self._write_file(MEMORY_DIR / "MEMORY.md", self.memory_entries)
+        elif target == "user":
+            self._write_file(MEMORY_DIR / "USER.md", self.user_entries)
+
+    def _entries_for(self, target: str) -> List[str]:
+        if target == "user":
+            return self.user_entries
+        return self.memory_entries
+
+    def _set_entries(self, target: str, entries: List[str]):
+        if target == "user":
+            self.user_entries = entries
+        else:
+            self.memory_entries = entries
+
+    def _char_count(self, target: str) -> int:
+        entries = self._entries_for(target)
+        if not entries:
+            return 0
+        return len(ENTRY_DELIMITER.join(entries))
+
+    def _char_limit(self, target: str) -> int:
+        if target == "user":
+            return self.user_char_limit
+        return self.memory_char_limit
+
+    def add(self, target: str, content: str) -> Dict[str, Any]:
+        """Append a new entry. Returns error if it would exceed the char limit."""
+        content = content.strip()
+        if not content:
+            return {"success": False, "error": "Content cannot be empty."}
+
+        entries = self._entries_for(target)
+        limit = self._char_limit(target)
+
+        # Calculate what the new total would be
+        new_entries = entries + [content]
+        new_total = len(ENTRY_DELIMITER.join(new_entries))
+
+        if new_total > limit:
+            current = self._char_count(target)
+            return {
+                "success": False,
+                "error": (
+                    f"Memory at {current:,}/{limit:,} chars. "
+                    f"Adding this entry ({len(content)} chars) would exceed the limit. "
+                    f"Replace or remove existing entries first."
+                ),
+                "current_entries": entries,
+                "usage": f"{current:,}/{limit:,}",
+            }
+
+        entries.append(content)
+        self._set_entries(target, entries)
+        self.save_to_disk(target)
+
+        return self._success_response(target, "Entry added.")
+
+    def replace(self, target: str, old_text: str, new_content: str) -> Dict[str, Any]:
+        """Find entry containing old_text substring, replace it with new_content."""
+        old_text = old_text.strip()
+        new_content = new_content.strip()
+        if not old_text:
+            return {"success": False, "error": "old_text cannot be empty."}
+        if not new_content:
+            return {"success": False, "error": "new_content cannot be empty. Use 'remove' to delete entries."}
+
+        entries = self._entries_for(target)
+        matches = [(i, e) for i, e in enumerate(entries) if old_text in e]
+
+        if len(matches) == 0:
+            return {"success": False, "error": f"No entry matched '{old_text}'."}
+
+        if len(matches) > 1:
+            previews = [e[:80] + ("..." if len(e) > 80 else "") for _, e in matches]
+            return {
+                "success": False,
+                "error": f"Multiple entries matched '{old_text}'. Be more specific.",
+                "matches": previews,
+            }
+
+        idx = matches[0][0]
+        limit = self._char_limit(target)
+
+        # Check that replacement doesn't blow the budget
+        test_entries = entries.copy()
+        test_entries[idx] = new_content
+        new_total = len(ENTRY_DELIMITER.join(test_entries))
+
+        if new_total > limit:
+            return {
+                "success": False,
+                "error": (
+                    f"Replacement would put memory at {new_total:,}/{limit:,} chars. "
+                    f"Shorten the new content or remove other entries first."
+                ),
+            }
+
+        entries[idx] = new_content
+        self._set_entries(target, entries)
+        self.save_to_disk(target)
+
+        return self._success_response(target, "Entry replaced.")
+
+    def remove(self, target: str, old_text: str) -> Dict[str, Any]:
+        """Remove the entry containing old_text substring."""
+        old_text = old_text.strip()
+        if not old_text:
+            return {"success": False, "error": "old_text cannot be empty."}
+
+        entries = self._entries_for(target)
+        matches = [(i, e) for i, e in enumerate(entries) if old_text in e]
+
+        if len(matches) == 0:
+            return {"success": False, "error": f"No entry matched '{old_text}'."}
+
+        if len(matches) > 1:
+            previews = [e[:80] + ("..." if len(e) > 80 else "") for _, e in matches]
+            return {
+                "success": False,
+                "error": f"Multiple entries matched '{old_text}'. Be more specific.",
+                "matches": previews,
+            }
+
+        idx = matches[0][0]
+        entries.pop(idx)
+        self._set_entries(target, entries)
+        self.save_to_disk(target)
+
+        return self._success_response(target, "Entry removed.")
+
+    def read(self, target: str) -> Dict[str, Any]:
+        """Return live current entries and usage stats."""
+        return self._success_response(target)
+
+    def format_for_system_prompt(self, target: str) -> Optional[str]:
+        """
+        Return the frozen snapshot for system prompt injection.
+
+        This returns the state captured at load_from_disk() time, NOT the live
+        state. Mid-session writes do not affect this. This keeps the system
+        prompt stable across all turns, preserving the prefix cache.
+
+        Returns None if the snapshot is empty (no entries at load time).
+        """
+        block = self._system_prompt_snapshot.get(target, "")
+        return block if block else None
+
+    # -- Internal helpers --
+
+    def _success_response(self, target: str, message: str = None) -> Dict[str, Any]:
+        entries = self._entries_for(target)
+        current = self._char_count(target)
+        limit = self._char_limit(target)
+        pct = int((current / limit) * 100) if limit > 0 else 0
+
+        resp = {
+            "success": True,
+            "target": target,
+            "entries": entries,
+            "usage": f"{pct}% — {current:,}/{limit:,} chars",
+            "entry_count": len(entries),
+        }
+        if message:
+            resp["message"] = message
+        return resp
+
+    def _render_block(self, target: str, entries: List[str]) -> str:
+        """Render a system prompt block with header and usage indicator."""
+        if not entries:
+            return ""
+
+        limit = self._char_limit(target)
+        content = ENTRY_DELIMITER.join(entries)
+        current = len(content)
+        pct = int((current / limit) * 100) if limit > 0 else 0
+
+        if target == "user":
+            header = f"USER PROFILE (who the user is) [{pct}% — {current:,}/{limit:,} chars]"
+        else:
+            header = f"MEMORY (your personal notes) [{pct}% — {current:,}/{limit:,} chars]"
+
+        separator = "═" * 46
+        return f"{separator}\n{header}\n{separator}\n{content}"
+
+    @staticmethod
+    def _read_file(path: Path) -> List[str]:
+        """Read a memory file and split into entries."""
+        if not path.exists():
+            return []
+        try:
+            with open(path, "r", encoding="utf-8") as f:
+                fcntl.flock(f, fcntl.LOCK_SH)
+                try:
+                    raw = f.read()
+                finally:
+                    fcntl.flock(f, fcntl.LOCK_UN)
+        except (OSError, IOError):
+            return []
+
+        if not raw.strip():
+            return []
+
+        entries = [e.strip() for e in raw.split("§")]
+        return [e for e in entries if e]
+
+    @staticmethod
+    def _write_file(path: Path, entries: List[str]):
+        """Write entries to a memory file with file locking."""
+        content = ENTRY_DELIMITER.join(entries) if entries else ""
+        try:
+            with open(path, "w", encoding="utf-8") as f:
+                fcntl.flock(f, fcntl.LOCK_EX)
+                try:
+                    f.write(content)
+                finally:
+                    fcntl.flock(f, fcntl.LOCK_UN)
+        except (OSError, IOError) as e:
+            raise RuntimeError(f"Failed to write memory file {path}: {e}")
+
+
+def memory_tool(
+    action: str,
+    target: str = "memory",
+    content: str = None,
+    old_text: str = None,
+    store: Optional[MemoryStore] = None,
+) -> str:
+    """
+    Single entry point for the memory tool. Dispatches to MemoryStore methods.
+
+    Returns JSON string with results.
+    """
+    if store is None:
+        return json.dumps({"success": False, "error": "Memory is not available. It may be disabled in config or this environment."}, ensure_ascii=False)
+
+    if target not in ("memory", "user"):
+        return json.dumps({"success": False, "error": f"Invalid target '{target}'. Use 'memory' or 'user'."}, ensure_ascii=False)
+
+    if action == "add":
+        if not content:
+            return json.dumps({"success": False, "error": "Content is required for 'add' action."}, ensure_ascii=False)
+        result = store.add(target, content)
+
+    elif action == "replace":
+        if not old_text:
+            return json.dumps({"success": False, "error": "old_text is required for 'replace' action."}, ensure_ascii=False)
+        if not content:
+            return json.dumps({"success": False, "error": "content is required for 'replace' action."}, ensure_ascii=False)
+        result = store.replace(target, old_text, content)
+
+    elif action == "remove":
+        if not old_text:
+            return json.dumps({"success": False, "error": "old_text is required for 'remove' action."}, ensure_ascii=False)
+        result = store.remove(target, old_text)
+
+    elif action == "read":
+        result = store.read(target)
+
+    else:
+        return json.dumps({"success": False, "error": f"Unknown action '{action}'. Use: add, replace, remove, read"}, ensure_ascii=False)
+
+    return json.dumps(result, ensure_ascii=False)
+
+
+def check_memory_requirements() -> bool:
+    """Memory tool has no external requirements -- always available."""
+    return True
+
+
+# =============================================================================
+# OpenAI Function-Calling Schema
+# =============================================================================
+
+MEMORY_SCHEMA = {
+    "name": "memory",
+    "description": (
+        "Manage persistent memory (visible in system prompt). Targets: "
+        "'memory' (your notes) or 'user' (user profile).\n"
+        "Actions: add, replace, remove, read. For replace/remove, old_text "
+        "is a short unique snippet to identify the entry.\n"
+        "Usage indicator in system prompt shows capacity. When >80%, "
+        "consolidate/replace before adding. Prefer replacing over removing.\n"
+        "Write: non-obvious facts, user preferences, tool quirks. "
+        "Skip: trivial info, things in skills, re-discoverable content."
+    ),
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "action": {
+                "type": "string",
+                "enum": ["add", "replace", "remove", "read"],
+                "description": "The action to perform."
+            },
+            "target": {
+                "type": "string",
+                "enum": ["memory", "user"],
+                "description": "Which memory store: 'memory' for personal notes, 'user' for user profile."
+            },
+            "content": {
+                "type": "string",
+                "description": "The entry content. Required for 'add' and 'replace'."
+            },
+            "old_text": {
+                "type": "string",
+                "description": "Short unique substring identifying the entry to replace or remove."
+            },
+        },
+        "required": ["action", "target"],
+    },
+}
diff --git a/tools/session_search_tool.py b/tools/session_search_tool.py
new file mode 100644
index 000000000..4ba686241
--- /dev/null
+++ b/tools/session_search_tool.py
@@ -0,0 +1,319 @@
+#!/usr/bin/env python3
+"""
+Session Search Tool - Long-Term Conversation Recall
+
+Searches past session transcripts in SQLite via FTS5, then summarizes the top
+matching sessions using a cheap/fast model (same pattern as web_extract).
+Returns focused summaries of past conversations rather than raw transcripts,
+keeping the main model's context window clean.
+
+Flow:
+  1. FTS5 search finds matching messages ranked by relevance
+  2. Groups by session, takes the top N unique sessions (default 3)
+  3. Loads each session's conversation, truncates to ~100k chars centered on matches
+  4. Sends to Gemini Flash with a focused summarization prompt
+  5. Returns per-session summaries with metadata
+"""
+
+import asyncio
+import concurrent.futures
+import json
+import os
+import logging
+from typing import Dict, Any, List, Optional
+
+from openai import AsyncOpenAI
+
+SUMMARIZER_MODEL = "google/gemini-3-flash-preview"
+MAX_SESSION_CHARS = 100_000
+MAX_SUMMARY_TOKENS = 2000
+
+_summarizer_client = None
+
+
+def _get_client() -> AsyncOpenAI:
+    """Lazy-init the summarizer client (shared with web_tools pattern)."""
+    global _summarizer_client
+    if _summarizer_client is None:
+        api_key = os.getenv("OPENROUTER_API_KEY")
+        if not api_key:
+            raise ValueError("OPENROUTER_API_KEY not set")
+        _summarizer_client = AsyncOpenAI(
+            api_key=api_key,
+            base_url="https://openrouter.ai/api/v1",
+        )
+    return _summarizer_client
+
+
+def _format_conversation(messages: List[Dict[str, Any]]) -> str:
+    """Format session messages into a readable transcript for summarization."""
+    parts = []
+    for msg in messages:
+        role = msg.get("role", "unknown").upper()
+        content = msg.get("content") or ""
+        tool_name = msg.get("tool_name")
+
+        if role == "TOOL" and tool_name:
+            # Truncate long tool outputs
+            if len(content) > 500:
+                content = content[:250] + "\n...[truncated]...\n" + content[-250:]
+            parts.append(f"[TOOL:{tool_name}]: {content}")
+        elif role == "ASSISTANT":
+            # Include tool call names if present
+            tool_calls = msg.get("tool_calls")
+            if tool_calls and isinstance(tool_calls, list):
+                tc_names = []
+                for tc in tool_calls:
+                    if isinstance(tc, dict):
+                        name = tc.get("name") or tc.get("function", {}).get("name", "?")
+                        tc_names.append(name)
+                if tc_names:
+                    parts.append(f"[ASSISTANT]: [Called: {', '.join(tc_names)}]")
+                if content:
+                    parts.append(f"[ASSISTANT]: {content}")
+            else:
+                parts.append(f"[ASSISTANT]: {content}")
+        else:
+            parts.append(f"[{role}]: {content}")
+
+    return "\n\n".join(parts)
+
+
+def _truncate_around_matches(
+    full_text: str, query: str, max_chars: int = MAX_SESSION_CHARS
+) -> str:
+    """
+    Truncate a conversation transcript to max_chars, centered around
+    where the query terms appear. Keeps content near matches, trims the edges.
+    """
+    if len(full_text) <= max_chars:
+        return full_text
+
+    # Find the first occurrence of any query term
+    query_terms = query.lower().split()
+    text_lower = full_text.lower()
+    first_match = len(full_text)
+    for term in query_terms:
+        pos = text_lower.find(term)
+        if pos != -1 and pos < first_match:
+            first_match = pos
+
+    if first_match == len(full_text):
+        # No match found, take from the start
+        first_match = 0
+
+    # Center the window around the first match
+    half = max_chars // 2
+    start = max(0, first_match - half)
+    end = min(len(full_text), start + max_chars)
+    if end - start < max_chars:
+        start = max(0, end - max_chars)
+
+    truncated = full_text[start:end]
+    prefix = "...[earlier conversation truncated]...\n\n" if start > 0 else ""
+    suffix = "\n\n...[later conversation truncated]..." if end < len(full_text) else ""
+    return prefix + truncated + suffix
+
+
+async def _summarize_session(
+    conversation_text: str, query: str, session_meta: Dict[str, Any]
+) -> Optional[str]:
+    """Summarize a single session conversation focused on the search query."""
+    system_prompt = (
+        "You are reviewing a past conversation transcript to help recall what happened. "
+        "Summarize the conversation with a focus on the search topic. Include:\n"
+        "1. What the user asked about or wanted to accomplish\n"
+        "2. What actions were taken and what the outcomes were\n"
+        "3. Key decisions, solutions found, or conclusions reached\n"
+        "4. Any specific commands, files, URLs, or technical details that were important\n"
+        "5. Anything left unresolved or notable\n\n"
+        "Be thorough but concise. Preserve specific details (commands, paths, error messages) "
+        "that would be useful to recall. Write in past tense as a factual recap."
+    )
+
+    source = session_meta.get("source", "unknown")
+    started = session_meta.get("started_at", "unknown")
+
+    user_prompt = (
+        f"Search topic: {query}\n"
+        f"Session source: {source}\n"
+        f"Session started: {started}\n\n"
+        f"CONVERSATION TRANSCRIPT:\n{conversation_text}\n\n"
+        f"Summarize this conversation with focus on: {query}"
+    )
+
+    max_retries = 3
+    for attempt in range(max_retries):
+        try:
+            response = await _get_client().chat.completions.create(
+                model=SUMMARIZER_MODEL,
+                messages=[
+                    {"role": "system", "content": system_prompt},
+                    {"role": "user", "content": user_prompt},
+                ],
+                temperature=0.1,
+                max_tokens=MAX_SUMMARY_TOKENS,
+            )
+            return response.choices[0].message.content.strip()
+        except Exception as e:
+            if attempt < max_retries - 1:
+                await asyncio.sleep(1 * (attempt + 1))
+            else:
+                logging.warning(f"Session summarization failed after {max_retries} attempts: {e}")
+                return None
+
+
+def session_search(
+    query: str,
+    role_filter: str = None,
+    limit: int = 3,
+    db=None,
+) -> str:
+    """
+    Search past sessions and return focused summaries of matching conversations.
+
+    Uses FTS5 to find matches, then summarizes the top sessions with Gemini Flash.
+    """
+    if db is None:
+        return json.dumps({"success": False, "error": "Session database not available."}, ensure_ascii=False)
+
+    if not query or not query.strip():
+        return json.dumps({"success": False, "error": "Query cannot be empty."}, ensure_ascii=False)
+
+    query = query.strip()
+    limit = min(limit, 5)  # Cap at 5 sessions to avoid excessive LLM calls
+
+    try:
+        # Parse role filter
+        role_list = None
+        if role_filter and role_filter.strip():
+            role_list = [r.strip() for r in role_filter.split(",") if r.strip()]
+
+        # FTS5 search -- get matches ranked by relevance
+        raw_results = db.search_messages(
+            query=query,
+            role_filter=role_list,
+            limit=50,  # Get more matches to find unique sessions
+            offset=0,
+        )
+
+        if not raw_results:
+            return json.dumps({
+                "success": True,
+                "query": query,
+                "results": [],
+                "count": 0,
+                "message": "No matching sessions found.",
+            }, ensure_ascii=False)
+
+        # Group by session_id, keep order (highest ranked first)
+        seen_sessions = {}
+        for result in raw_results:
+            sid = result["session_id"]
+            if sid not in seen_sessions:
+                seen_sessions[sid] = result
+            if len(seen_sessions) >= limit:
+                break
+
+        # Summarize each matching session
+        summaries = []
+        for session_id, match_info in seen_sessions.items():
+            try:
+                # Load full conversation
+                messages = db.get_messages_as_conversation(session_id)
+                if not messages:
+                    continue
+
+                # Get session metadata
+                session_meta = db.get_session(session_id) or {}
+
+                # Format and truncate
+                conversation_text = _format_conversation(messages)
+                conversation_text = _truncate_around_matches(conversation_text, query)
+
+                # Summarize with Gemini Flash (handle both async and sync contexts)
+                coro = _summarize_session(conversation_text, query, session_meta)
+                try:
+                    asyncio.get_running_loop()
+                    # Already in an async context (gateway) -- run in a thread
+                    with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
+                        summary = pool.submit(lambda: asyncio.run(coro)).result(timeout=30)
+                except RuntimeError:
+                    # No running loop (normal CLI) -- use asyncio.run directly
+                    summary = asyncio.run(coro)
+
+                if summary:
+                    summaries.append({
+                        "session_id": session_id,
+                        "source": match_info.get("source", "unknown"),
+                        "model": match_info.get("model"),
+                        "session_started": match_info.get("session_started"),
+                        "summary": summary,
+                    })
+
+            except Exception as e:
+                logging.warning(f"Failed to summarize session {session_id}: {e}")
+                continue
+
+        return json.dumps({
+            "success": True,
+            "query": query,
+            "results": summaries,
+            "count": len(summaries),
+            "sessions_searched": len(seen_sessions),
+        }, ensure_ascii=False)
+
+    except Exception as e:
+        return json.dumps({"success": False, "error": f"Search failed: {str(e)}"}, ensure_ascii=False)
+
+
+def check_session_search_requirements() -> bool:
+    """Requires SQLite state database and OpenRouter API key."""
+    if not os.getenv("OPENROUTER_API_KEY"):
+        return False
+    try:
+        from hermes_state import DEFAULT_DB_PATH
+        return DEFAULT_DB_PATH.parent.exists()
+    except ImportError:
+        return False
+
+
+SESSION_SEARCH_SCHEMA = {
+    "name": "session_search",
+    "description": (
+        "Search and recall past conversations. Finds matching sessions using "
+        "full-text search, then provides a focused summary of each matching "
+        "conversation.\n\n"
+        "Use this when you need to recall:\n"
+        "- A solution or approach from a previous session\n"
+        "- Something the user said or asked about before\n"
+        "- A command, file path, or technical detail from past work\n"
+        "- The outcome of a previous task\n\n"
+        "Supports search syntax:\n"
+        "  Keywords: docker deployment\n"
+        "  Phrases: '\"exact phrase\"'\n"
+        "  Boolean: docker OR kubernetes, python NOT java\n"
+        "  Prefix: deploy*\n\n"
+        "Returns summaries (not raw transcripts) of the top matching sessions, "
+        "focused on your search topic. Max 3 sessions per search."
+    ),
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "query": {
+                "type": "string",
+                "description": "Search query — keywords, phrases, or boolean expressions to find in past sessions.",
+            },
+            "role_filter": {
+                "type": "string",
+                "description": "Optional: only search messages from specific roles (comma-separated). E.g. 'user,assistant' to skip tool outputs.",
+            },
+            "limit": {
+                "type": "integer",
+                "description": "Max sessions to summarize (default: 3, max: 5).",
+                "default": 3,
+            },
+        },
+        "required": ["query"],
+    },
+}
diff --git a/toolsets.py b/toolsets.py
index 1f0013c6a..0e17dd03a 100644
--- a/toolsets.py
+++ b/toolsets.py
@@ -120,6 +120,19 @@ TOOLSETS = {
         "includes": []
     },
     
+    "memory": {
+        "description": "Persistent memory across sessions (personal notes + user profile)",
+        "tools": ["memory"],
+        "includes": []
+    },
+    
+    "session_search": {
+        "description": "Search and recall past conversations with summarization",
+        "tools": ["session_search"],
+        "includes": []
+    },
+    
+    
     # Scenario-specific toolsets
     
     "debugging": {
@@ -164,6 +177,10 @@ TOOLSETS = {
             "text_to_speech",
             # Planning & task management
             "todo",
+            # Persistent memory
+            "memory",
+            # Session history search
+            "session_search",
             # Cronjob management (CLI-only)
             "schedule_cronjob", "list_cronjobs", "remove_cronjob"
         ],
@@ -198,6 +215,10 @@ TOOLSETS = {
             "skills_list", "skill_view",
             # Planning & task management
             "todo",
+            # Persistent memory
+            "memory",
+            # Session history search
+            "session_search",
             # Cronjob management - let users schedule tasks
             "schedule_cronjob", "list_cronjobs", "remove_cronjob",
             # Cross-channel messaging
@@ -230,6 +251,10 @@ TOOLSETS = {
             "skills_list", "skill_view",
             # Planning & task management
             "todo",
+            # Persistent memory
+            "memory",
+            # Session history search
+            "session_search",
             # Cronjob management - let users schedule tasks
             "schedule_cronjob", "list_cronjobs", "remove_cronjob",
             # Cross-channel messaging
@@ -262,6 +287,10 @@ TOOLSETS = {
             "skills_list", "skill_view",
             # Planning & task management
             "todo",
+            # Persistent memory
+            "memory",
+            # Session history search
+            "session_search",
             # Cronjob management
             "schedule_cronjob", "list_cronjobs", "remove_cronjob",
             # Cross-channel messaging
@@ -294,6 +323,10 @@ TOOLSETS = {
             "skills_list", "skill_view",
             # Planning & task management
             "todo",
+            # Persistent memory
+            "memory",
+            # Session history search
+            "session_search",
             # Cronjob management - let users schedule tasks
             "schedule_cronjob", "list_cronjobs", "remove_cronjob",
             # Cross-channel messaging