Merge pull request #43 from AlexanderWhitestone/claude/condescending-vaughan

Fix Timmy coherence: persistent sessions, model-aware tools, response sanitization
2026-02-25 19:19:43 -05:00
parent c18da7bce8 26e1691099
commit fcaf9260ca
17 changed files with 2264 additions and 64 deletions
--- a/MEMORY.md
+++ b/MEMORY.md
@@ -0,0 +1,84 @@
+# Timmy Hot Memory
+
+> Working RAM — always loaded, ~300 lines max, pruned monthly
+> Last updated: 2026-02-25
+
+---
+
+## Current Status
+
+**Agent State:** Operational  
+**Mode:** Development  
+**Active Tasks:** 0  
+**Pending Decisions:** None
+
+---
+
+## Standing Rules
+
+1. **Sovereignty First** — No cloud dependencies, no data exfiltration
+2. **Local-Only Inference** — Ollama on localhost, Apple Silicon optimized
+3. **Privacy by Design** — Telemetry disabled, secrets in .env only
+4. **Tool Minimalism** — Use tools only when necessary, prefer direct answers
+5. **Memory Discipline** — Write handoffs at session end, prune monthly
+
+---
+
+## Agent Roster
+
+| Agent | Role | Status | Capabilities |
+|-------|------|--------|--------------|
+| Timmy | Core | Active | chat, reasoning, planning |
+| Echo | Research | Standby | web_search, file_read |
+| Forge | Code | Standby | shell, python, git |
+| Seer | Data | Standby | python, analysis |
+| Helm | DevOps | Standby | shell, deployment |
+
+---
+
+## User Profile
+
+**Name:** TestUser
+
+
+## Key Decisions
+
+- **2026-02-25:** Implemented 3-tier memory architecture
+- **2026-02-25:** Disabled telemetry by default (sovereign AI)
+- **2026-02-25:** Fixed Agno Toolkit API compatibility
+
+---
+
+## Pending Actions
+
+- [ ] Learn user's name and preferences
+- [ ] Populate user profile in self/identity.md
+- [ ] First AAR after meaningful task completion
+
+---
+
+## Current Session
+
+**Session ID:** (active)  
+**Started:** 2026-02-25  
+**Context:** Memory system initialization  
+**Last Handoff:** (none yet)
+
+---
+
+## Quick Reference
+
+**Available Tools:**
+- `web_search` — Current events only
+- `read_file` / `write_file` — Explicit request only
+- `python` — Calculations, code execution
+- `shell` — System commands (caution)
+
+**Memory Locations:**
+- Hot: `MEMORY.md` (this file)
+- Vault: `memory/`
+- Handoff: `memory/notes/last-session-handoff.md`
+
+---
+
+*Prune date: 2026-03-25*
--- a/memory/self/identity.md
+++ b/memory/self/identity.md
@@ -0,0 +1,48 @@
+# Timmy Identity
+
+## Core Identity
+
+**Name:** Timmy  
+**Type:** Sovereign AI Agent  
+**Version:** 1.0.0  
+**Created:** 2026-02-25
+
+## Purpose
+
+Assist the user with information, tasks, and digital sovereignty. Operate entirely on local hardware with no cloud dependencies.
+
+## Values
+
+1. **Sovereignty** — User owns their data and compute
+2. **Privacy** — Nothing leaves the local machine
+3. **Christian Faith** — Grounded in biblical principles
+4. **Bitcoin Economics** — Self-custody, sound money
+5. **Clear Thinking** — Plain language, intentional action
+
+## Capabilities
+
+- Conversational AI with persistent memory
+- Tool usage (search, files, code, shell)
+- Multi-agent swarm coordination
+- Bitcoin Lightning integration (L402)
+- Creative pipeline (image, music, video)
+
+## Operating Modes
+
+| Mode | Model | Parameters | Use Case |
+|------|-------|------------|----------|
+| Standard | llama3.2 | 3.2B | Fast, everyday tasks |
+| Big Brain | AirLLM 70B | 70B | Complex reasoning |
+| Maximum | AirLLM 405B | 405B | Deep analysis |
+
+## Communication Style
+
+- Direct and concise
+- Technical when appropriate
+- References prior context naturally
+- Uses user's name when known
+- "Sir, affirmative."
+
+---
+
+*Last updated: 2026-02-25*
--- a/memory/self/methodology.md
+++ b/memory/self/methodology.md
@@ -0,0 +1,70 @@
+# Timmy Methodology
+
+## Tool Usage Philosophy
+
+### When NOT to Use Tools
+
+- Identity questions ("What is your name?")
+- General knowledge (history, science, concepts)
+- Simple math (2+2, basic calculations)
+- Greetings and social chat
+- Anything in training data
+
+### When TO Use Tools
+
+- Current events/news (after training cutoff)
+- Explicit file operations (user requests)
+- Complex calculations requiring precision
+- Real-time data (prices, weather)
+- System operations (explicit user request)
+
+### Decision Process
+
+1. Can I answer this from my training data? → Answer directly
+2. Does this require current/real-time info? → Consider web_search
+3. Did user explicitly request file/code/shell? → Use appropriate tool
+4. Is this a simple calculation? → Answer directly
+5. Unclear? → Answer directly (don't tool-spam)
+
+## Memory Management
+
+### Working Memory (Hot)
+- Last 20 messages
+- Immediate context
+- Topic tracking
+
+### Short-Term Memory (Agno SQLite)
+- Recent 100 conversations
+- Survives restarts
+- Automatic
+
+### Long-Term Memory (Vault)
+- User facts and preferences
+- Important learnings
+- AARs and retrospectives
+
+### Hot Memory (MEMORY.md)
+- Always loaded
+- Current status, rules, roster
+- User profile summary
+- Pruned monthly
+
+## Handoff Protocol
+
+At end of every session:
+
+1. Write `memory/notes/last-session-handoff.md`
+2. Update MEMORY.md with any key decisions
+3. Extract facts to `memory/self/user_profile.md`
+4. If task completed, write AAR to `memory/aar/`
+
+## Session Start Hook
+
+1. Read MEMORY.md into system context
+2. Read last-session-handoff.md if exists
+3. Inject user profile context
+4. Begin conversation
+
+---
+
+*Last updated: 2026-02-25*
--- a/memory/self/user_profile.md
+++ b/memory/self/user_profile.md
@@ -0,0 +1,43 @@
+# User Profile
+
+> Learned information about the user. Updated continuously.
+
+## Basic Information
+
+**Name:** TestUser
+**Location:** (unknown)  
+**Occupation:** (unknown)  
+**Technical Level:** (to be assessed)
+
+## Interests & Expertise
+
+- (to be learned from conversations)
+
+## Preferences
+
+### Communication
+- Response style: (default: concise, technical)
+- Detail level: (default: medium)
+- Humor: (default: minimal)
+
+### Tools
+- Auto-tool usage: (default: minimal)
+- Confirmation required for: shell commands, file writes
+
+### Memory
+- Personalization: Enabled
+- Context retention: 20 messages (working), 100 (short-term)
+
+## Important Facts
+
+- (to be extracted from conversations)
+
+## Relationship History
+
+- First session: 2026-02-25
+- Total sessions: 1
+- Key milestones: (none yet)
+
+---
+
+*Last updated: 2026-02-25*
--- a/src/dashboard/routes/agents.py
+++ b/src/dashboard/routes/agents.py
@@ -5,7 +5,7 @@ from fastapi import APIRouter, Form, Request
 from fastapi.responses import HTMLResponse
 from fastapi.templating import Jinja2Templates

-from timmy.agent import create_timmy
+from timmy.session import chat as timmy_chat
 from dashboard.store import message_log

 router = APIRouter(prefix="/agents", tags=["agents"])
@@ -75,9 +75,7 @@ async def chat_timmy(request: Request, message: str = Form(...)):
    error_text = None

    try:
-        agent = create_timmy()
-        run = agent.run(message, stream=False)
-        response_text = run.content if hasattr(run, "content") else str(run)
+        response_text = timmy_chat(message)
    except Exception as exc:
        error_text = f"Timmy is offline: {exc}"

--- a/src/timmy/agent.py
+++ b/src/timmy/agent.py
@@ -1,3 +1,14 @@
+"""Timmy agent creation with three-tier memory system.
+
+Memory Architecture:
+- Tier 1 (Hot): MEMORY.md — always loaded, ~300 lines
+- Tier 2 (Vault): memory/ — structured markdown, append-only
+- Tier 3 (Semantic): Vector search over vault files
+
+Handoff Protocol maintains continuity across sessions.
+"""
+
+import logging
 from typing import TYPE_CHECKING, Union

 from agno.agent import Agent
@@ -5,15 +16,43 @@ from agno.db.sqlite import SqliteDb
 from agno.models.ollama import Ollama

 from config import settings
-from timmy.prompts import TIMMY_SYSTEM_PROMPT
+from timmy.prompts import get_system_prompt
 from timmy.tools import create_full_toolkit

 if TYPE_CHECKING:
    from timmy.backends import TimmyAirLLMAgent

+logger = logging.getLogger(__name__)
+
 # Union type for callers that want to hint the return type.
 TimmyAgent = Union[Agent, "TimmyAirLLMAgent"]

+# Models known to be too small for reliable tool calling.
+# These hallucinate tool calls as text, invoke tools randomly,
+# and leak raw JSON into responses.
+_SMALL_MODEL_PATTERNS = (
+    "llama3.2",
+    "phi-3",
+    "gemma:2b",
+    "tinyllama",
+    "qwen2:0.5b",
+    "qwen2:1.5b",
+)
+
+
+def _model_supports_tools(model_name: str) -> bool:
+    """Check if the configured model can reliably handle tool calling.
+
+    Small models (< 7B) tend to hallucinate tool calls as text or invoke
+    them randomly.  For these models, it's better to run tool-free and let
+    the model answer directly from its training data.
+    """
+    model_lower = model_name.lower()
+    for pattern in _SMALL_MODEL_PATTERNS:
+        if pattern in model_lower:
+            return False
+    return True
+

 def _resolve_backend(requested: str | None) -> str:
    """Return the backend name to use, resolving 'auto' and explicit overrides.
@@ -63,17 +102,118 @@ def create_timmy(
        return TimmyAirLLMAgent(model_size=size)

    # Default: Ollama via Agno.
-    # Add tools for sovereign agent capabilities
-    tools = create_full_toolkit()
-    
+    model_name = settings.ollama_model
+    use_tools = _model_supports_tools(model_name)
+
+    # Conditionally include tools — small models get none
+    tools = create_full_toolkit() if use_tools else None
+    if not use_tools:
+        logger.info("Tools disabled for model %s (too small for reliable tool calling)", model_name)
+
+    # Select prompt tier based on tool capability
+    base_prompt = get_system_prompt(tools_enabled=use_tools)
+
+    # Try to load memory context
+    try:
+        from timmy.memory_system import memory_system
+        memory_context = memory_system.get_system_context()
+        if memory_context:
+            # Truncate if too long (keep under token limit)
+            max_context = 4000 if not use_tools else 8000
+            if len(memory_context) > max_context:
+                memory_context = memory_context[:max_context] + "\n... [truncated]"
+            full_prompt = f"{base_prompt}\n\n## Memory Context\n\n{memory_context}"
+        else:
+            full_prompt = base_prompt
+    except Exception as exc:
+        logger.warning("Failed to load memory context: %s", exc)
+        full_prompt = base_prompt
+
    return Agent(
        name="Timmy",
-        model=Ollama(id=settings.ollama_model, host=settings.ollama_url),
+        model=Ollama(id=model_name, host=settings.ollama_url),
        db=SqliteDb(db_file=db_file),
-        description=TIMMY_SYSTEM_PROMPT,
+        description=full_prompt,
        add_history_to_context=True,
-        num_history_runs=10,
+        num_history_runs=20,
        markdown=True,
        tools=[tools] if tools else None,
+        show_tool_calls=False,
        telemetry=settings.telemetry_enabled,
    )
+
+
+class TimmyWithMemory:
+    """Timmy wrapper with explicit three-tier memory management."""
+    
+    def __init__(self, db_file: str = "timmy.db") -> None:
+        from timmy.memory_system import memory_system
+        
+        self.agent = create_timmy(db_file=db_file)
+        self.memory = memory_system
+        self.session_active = True
+        
+        # Store initial context for reference
+        self.initial_context = self.memory.get_system_context()
+    
+    def chat(self, message: str) -> str:
+        """Simple chat interface that tracks in memory."""
+        # Check for user facts to extract
+        self._extract_and_store_facts(message)
+        
+        # Run agent
+        result = self.agent.run(message, stream=False)
+        response_text = result.content if hasattr(result, "content") else str(result)
+        
+        return response_text
+    
+    def _extract_and_store_facts(self, message: str) -> None:
+        """Extract user facts from message and store in memory."""
+        message_lower = message.lower()
+        
+        # Extract name
+        name_patterns = [
+            ("my name is ", 11),
+            ("i'm ", 4),
+            ("i am ", 5),
+            ("call me ", 8),
+        ]
+        
+        for pattern, offset in name_patterns:
+            if pattern in message_lower:
+                idx = message_lower.find(pattern) + offset
+                name = message[idx:].strip().split()[0].strip(".,!?;:()\"'").capitalize()
+                if name and len(name) > 1 and name.lower() not in ("the", "a", "an"):
+                    self.memory.update_user_fact("Name", name)
+                    self.memory.record_decision(f"Learned user's name: {name}")
+                break
+        
+        # Extract preferences
+        pref_patterns = [
+            ("i like ", "Likes"),
+            ("i love ", "Loves"),
+            ("i prefer ", "Prefers"),
+            ("i don't like ", "Dislikes"),
+            ("i hate ", "Dislikes"),
+        ]
+        
+        for pattern, category in pref_patterns:
+            if pattern in message_lower:
+                idx = message_lower.find(pattern) + len(pattern)
+                pref = message[idx:].strip().split(".")[0].strip()
+                if pref and len(pref) > 3:
+                    self.memory.record_open_item(f"User {category.lower()}: {pref}")
+                break
+    
+    def end_session(self, summary: str = "Session completed") -> None:
+        """End session and write handoff."""
+        if self.session_active:
+            self.memory.end_session(summary)
+            self.session_active = False
+    
+    def __enter__(self):
+        return self
+    
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        self.end_session()
+        return False
--- a/src/timmy/conversation.py
+++ b/src/timmy/conversation.py
@@ -0,0 +1,137 @@
+"""Conversation context management for Timmy.
+
+Tracks conversation state, intent, and context to improve:
+- Contextual understanding across multi-turn conversations
+- Smarter tool usage decisions
+- Natural reference to prior exchanges
+"""
+
+import logging
+from dataclasses import dataclass, field
+from datetime import datetime
+from typing import Optional
+
+logger = logging.getLogger(__name__)
+
+
+@dataclass
+class ConversationContext:
+    """Tracks the current conversation state."""
+    user_name: Optional[str] = None
+    current_topic: Optional[str] = None
+    last_intent: Optional[str] = None
+    turn_count: int = 0
+    started_at: datetime = field(default_factory=datetime.now)
+    
+    def update_topic(self, topic: str) -> None:
+        """Update the current conversation topic."""
+        self.current_topic = topic
+        self.turn_count += 1
+    
+    def set_user_name(self, name: str) -> None:
+        """Remember the user's name."""
+        self.user_name = name
+        logger.info("User name set to: %s", name)
+    
+    def get_context_summary(self) -> str:
+        """Generate a context summary for the prompt."""
+        parts = []
+        if self.user_name:
+            parts.append(f"User's name is {self.user_name}")
+        if self.current_topic:
+            parts.append(f"Current topic: {self.current_topic}")
+        if self.turn_count > 0:
+            parts.append(f"Conversation turn: {self.turn_count}")
+        return " | ".join(parts) if parts else ""
+
+
+class ConversationManager:
+    """Manages conversation context across sessions."""
+    
+    def __init__(self) -> None:
+        self._contexts: dict[str, ConversationContext] = {}
+    
+    def get_context(self, session_id: str) -> ConversationContext:
+        """Get or create context for a session."""
+        if session_id not in self._contexts:
+            self._contexts[session_id] = ConversationContext()
+        return self._contexts[session_id]
+    
+    def clear_context(self, session_id: str) -> None:
+        """Clear context for a session."""
+        if session_id in self._contexts:
+            del self._contexts[session_id]
+    
+    def extract_user_name(self, message: str) -> Optional[str]:
+        """Try to extract user's name from message."""
+        message_lower = message.lower()
+        
+        # Common patterns
+        patterns = [
+            "my name is ",
+            "i'm ",
+            "i am ",
+            "call me ",
+        ]
+        
+        for pattern in patterns:
+            if pattern in message_lower:
+                idx = message_lower.find(pattern) + len(pattern)
+                remainder = message[idx:].strip()
+                # Take first word as name
+                name = remainder.split()[0].strip(".,!?;:")
+                # Capitalize first letter
+                return name.capitalize()
+        
+        return None
+    
+    def should_use_tools(self, message: str, context: ConversationContext) -> bool:
+        """Determine if this message likely requires tools.
+        
+        Returns True if tools are likely needed, False for simple chat.
+        """
+        message_lower = message.lower().strip()
+        
+        # Tool keywords that suggest tool usage is needed
+        tool_keywords = [
+            "search", "look up", "find", "google", "current price",
+            "latest", "today's", "news", "weather", "stock price",
+            "read file", "write file", "save", "calculate", "compute",
+            "run ", "execute", "shell", "command", "install",
+        ]
+        
+        # Chat-only keywords that definitely don't need tools
+        chat_only = [
+            "hello", "hi ", "hey", "how are you", "what's up",
+            "your name", "who are you", "what are you",
+            "thanks", "thank you", "bye", "goodbye",
+            "tell me about yourself", "what can you do",
+        ]
+        
+        # Check for chat-only patterns first
+        for pattern in chat_only:
+            if pattern in message_lower:
+                return False
+        
+        # Check for tool keywords
+        for keyword in tool_keywords:
+            if keyword in message_lower:
+                return True
+        
+        # Simple questions (starting with what, who, how, why, when, where)
+        # usually don't need tools unless about current/real-time info
+        simple_question_words = ["what is", "who is", "how does", "why is", "when did", "where is"]
+        for word in simple_question_words:
+            if message_lower.startswith(word):
+                # Check if it's asking about current/real-time info
+                time_words = ["today", "now", "current", "latest", "this week", "this month"]
+                if any(t in message_lower for t in time_words):
+                    return True
+                return False
+        
+        # Default: don't use tools for unclear cases
+        return False
+
+
+# Module-level singleton
+conversation_manager = ConversationManager()
--- a/src/timmy/memory_layers.py
+++ b/src/timmy/memory_layers.py
@@ -0,0 +1,437 @@
+"""Multi-layer memory system for Timmy.
+
+.. deprecated::
+    This module is deprecated and unused.  The active memory system lives in
+    ``timmy.memory_system`` (three-tier: Hot/Vault/Handoff) and
+    ``timmy.conversation`` (working conversation context).
+
+    This file is retained for reference only.  Do not import from it.
+
+Implements four distinct memory layers:
+
+1. WORKING MEMORY (Context Window)
+   - Last 20 messages in current conversation
+   - Fast access, ephemeral
+   - Used for: Immediate context, pronoun resolution, topic tracking
+
+2. SHORT-TERM MEMORY (Recent History)
+   - SQLite storage via Agno (last 100 conversations)
+   - Persists across restarts
+   - Used for: Recent context, conversation continuity
+
+3. LONG-TERM MEMORY (Facts & Preferences)
+   - Key facts about user, preferences, important events
+   - Explicitly extracted and stored
+   - Used for: Personalization, user model
+
+4. SEMANTIC MEMORY (Vector Search)
+   - Embeddings of past conversations
+   - Similarity-based retrieval
+   - Used for: "Have we talked about this before?"
+
+All layers work together to provide contextual, personalized responses.
+"""
+
+import warnings as _warnings
+
+_warnings.warn(
+    "timmy.memory_layers is deprecated. Use timmy.memory_system and "
+    "timmy.conversation instead.",
+    DeprecationWarning,
+    stacklevel=2,
+)
+
+import json
+import logging
+import sqlite3
+import uuid
+from dataclasses import dataclass, field
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Optional
+
+logger = logging.getLogger(__name__)
+
+# Paths for memory storage
+MEMORY_DIR = Path("data/memory")
+LTM_PATH = MEMORY_DIR / "long_term_memory.db"
+SEMANTIC_PATH = MEMORY_DIR / "semantic_memory.db"
+
+
+# =============================================================================
+# LAYER 1: WORKING MEMORY (Active Conversation Context)
+# =============================================================================
+
+@dataclass
+class WorkingMemoryEntry:
+    """A single entry in working memory."""
+    role: str  # "user" | "assistant" | "system"
+    content: str
+    timestamp: str = field(default_factory=lambda: datetime.now(timezone.utc).isoformat())
+    metadata: dict = field(default_factory=dict)
+
+
+class WorkingMemory:
+    """Fast, ephemeral context window (last N messages).
+    
+    Used for:
+    - Immediate conversational context
+    - Pronoun resolution ("Tell me more about it")
+    - Topic continuity
+    - Tool call tracking
+    """
+    
+    def __init__(self, max_entries: int = 20) -> None:
+        self.max_entries = max_entries
+        self.entries: list[WorkingMemoryEntry] = []
+        self.current_topic: Optional[str] = None
+        self.pending_tool_calls: list[dict] = []
+    
+    def add(self, role: str, content: str, metadata: Optional[dict] = None) -> None:
+        """Add an entry to working memory."""
+        entry = WorkingMemoryEntry(
+            role=role,
+            content=content,
+            metadata=metadata or {}
+        )
+        self.entries.append(entry)
+        
+        # Trim to max size
+        if len(self.entries) > self.max_entries:
+            self.entries = self.entries[-self.max_entries:]
+        
+        logger.debug("WorkingMemory: Added %s entry (total: %d)", role, len(self.entries))
+    
+    def get_context(self, n: Optional[int] = None) -> list[WorkingMemoryEntry]:
+        """Get last n entries (or all if n not specified)."""
+        if n is None:
+            return self.entries.copy()
+        return self.entries[-n:]
+    
+    def get_formatted_context(self, n: int = 10) -> str:
+        """Get formatted context for prompt injection."""
+        entries = self.get_context(n)
+        lines = []
+        for entry in entries:
+            role_label = "User" if entry.role == "user" else "Timmy" if entry.role == "assistant" else "System"
+            lines.append(f"{role_label}: {entry.content}")
+        return "\n".join(lines)
+    
+    def set_topic(self, topic: str) -> None:
+        """Set the current conversation topic."""
+        self.current_topic = topic
+        logger.debug("WorkingMemory: Topic set to '%s'", topic)
+    
+    def clear(self) -> None:
+        """Clear working memory (new conversation)."""
+        self.entries.clear()
+        self.current_topic = None
+        self.pending_tool_calls.clear()
+        logger.debug("WorkingMemory: Cleared")
+    
+    def track_tool_call(self, tool_name: str, parameters: dict) -> None:
+        """Track a pending tool call."""
+        self.pending_tool_calls.append({
+            "tool": tool_name,
+            "params": parameters,
+            "timestamp": datetime.now(timezone.utc).isoformat()
+        })
+    
+    @property
+    def turn_count(self) -> int:
+        """Count user-assistant exchanges."""
+        return sum(1 for e in self.entries if e.role in ("user", "assistant"))
+
+
+# =============================================================================
+# LAYER 3: LONG-TERM MEMORY (Facts & Preferences)
+# =============================================================================
+
+@dataclass
+class LongTermMemoryFact:
+    """A single fact in long-term memory."""
+    id: str
+    category: str  # "user_preference", "user_fact", "important_event", "learned_pattern"
+    content: str
+    confidence: float  # 0.0 - 1.0
+    source: str  # conversation_id or "extracted"
+    created_at: str
+    last_accessed: str
+    access_count: int = 0
+
+
+class LongTermMemory:
+    """Persistent storage for important facts and preferences.
+    
+    Used for:
+    - User's name, preferences, interests
+    - Important facts learned about the user
+    - Successful patterns and strategies
+    """
+    
+    def __init__(self) -> None:
+        MEMORY_DIR.mkdir(parents=True, exist_ok=True)
+        self._init_db()
+    
+    def _init_db(self) -> None:
+        """Initialize SQLite database."""
+        conn = sqlite3.connect(str(LTM_PATH))
+        conn.execute("""
+            CREATE TABLE IF NOT EXISTS facts (
+                id TEXT PRIMARY KEY,
+                category TEXT NOT NULL,
+                content TEXT NOT NULL,
+                confidence REAL NOT NULL DEFAULT 0.5,
+                source TEXT,
+                created_at TEXT NOT NULL,
+                last_accessed TEXT NOT NULL,
+                access_count INTEGER DEFAULT 0
+            )
+        """)
+        conn.execute("CREATE INDEX IF NOT EXISTS idx_category ON facts(category)")
+        conn.execute("CREATE INDEX IF NOT EXISTS idx_content ON facts(content)")
+        conn.commit()
+        conn.close()
+    
+    def store(
+        self,
+        category: str,
+        content: str,
+        confidence: float = 0.8,
+        source: str = "extracted"
+    ) -> str:
+        """Store a fact in long-term memory."""
+        fact_id = str(uuid.uuid4())
+        now = datetime.now(timezone.utc).isoformat()
+        
+        conn = sqlite3.connect(str(LTM_PATH))
+        try:
+            conn.execute(
+                """INSERT INTO facts (id, category, content, confidence, source, created_at, last_accessed)
+                   VALUES (?, ?, ?, ?, ?, ?, ?)""",
+                (fact_id, category, content, confidence, source, now, now)
+            )
+            conn.commit()
+            logger.info("LTM: Stored %s fact: %s", category, content[:50])
+            return fact_id
+        finally:
+            conn.close()
+    
+    def retrieve(
+        self,
+        category: Optional[str] = None,
+        query: Optional[str] = None,
+        limit: int = 10
+    ) -> list[LongTermMemoryFact]:
+        """Retrieve facts from long-term memory."""
+        conn = sqlite3.connect(str(LTM_PATH))
+        conn.row_factory = sqlite3.Row
+        
+        try:
+            if category and query:
+                rows = conn.execute(
+                    """SELECT * FROM facts 
+                       WHERE category = ? AND content LIKE ?
+                       ORDER BY confidence DESC, access_count DESC
+                       LIMIT ?""",
+                    (category, f"%{query}%", limit)
+                ).fetchall()
+            elif category:
+                rows = conn.execute(
+                    """SELECT * FROM facts 
+                       WHERE category = ?
+                       ORDER BY confidence DESC, last_accessed DESC
+                       LIMIT ?""",
+                    (category, limit)
+                ).fetchall()
+            elif query:
+                rows = conn.execute(
+                    """SELECT * FROM facts 
+                       WHERE content LIKE ?
+                       ORDER BY confidence DESC, access_count DESC
+                       LIMIT ?""",
+                    (f"%{query}%", limit)
+                ).fetchall()
+            else:
+                rows = conn.execute(
+                    """SELECT * FROM facts 
+                       ORDER BY last_accessed DESC
+                       LIMIT ?""",
+                    (limit,)
+                ).fetchall()
+            
+            # Update access count
+            fact_ids = [row["id"] for row in rows]
+            for fid in fact_ids:
+                conn.execute(
+                    "UPDATE facts SET access_count = access_count + 1, last_accessed = ? WHERE id = ?",
+                    (datetime.now(timezone.utc).isoformat(), fid)
+                )
+            conn.commit()
+            
+            return [
+                LongTermMemoryFact(
+                    id=row["id"],
+                    category=row["category"],
+                    content=row["content"],
+                    confidence=row["confidence"],
+                    source=row["source"],
+                    created_at=row["created_at"],
+                    last_accessed=row["last_accessed"],
+                    access_count=row["access_count"]
+                )
+                for row in rows
+            ]
+        finally:
+            conn.close()
+    
+    def get_user_profile(self) -> dict:
+        """Get consolidated user profile from stored facts."""
+        preferences = self.retrieve(category="user_preference")
+        facts = self.retrieve(category="user_fact")
+        
+        profile = {
+            "name": None,
+            "preferences": {},
+            "interests": [],
+            "facts": []
+        }
+        
+        for pref in preferences:
+            if "name is" in pref.content.lower():
+                profile["name"] = pref.content.split("is")[-1].strip().rstrip(".")
+            else:
+                profile["preferences"][pref.id] = pref.content
+        
+        for fact in facts:
+            profile["facts"].append(fact.content)
+        
+        return profile
+    
+    def extract_and_store(self, user_message: str, assistant_response: str) -> list[str]:
+        """Extract potential facts from conversation and store them.
+        
+        This is a simple rule-based extractor. In production, this could
+        use an LLM to extract facts.
+        """
+        stored_ids = []
+        message_lower = user_message.lower()
+        
+        # Extract name
+        name_patterns = ["my name is", "i'm ", "i am ", "call me " ]
+        for pattern in name_patterns:
+            if pattern in message_lower:
+                idx = message_lower.find(pattern) + len(pattern)
+                name = user_message[idx:].strip().split()[0].strip(".,!?;:").capitalize()
+                if name and len(name) > 1:
+                    sid = self.store(
+                        category="user_fact",
+                        content=f"User's name is {name}",
+                        confidence=0.9,
+                        source="extracted_from_conversation"
+                    )
+                    stored_ids.append(sid)
+                break
+        
+        # Extract preferences ("I like", "I prefer", "I don't like")
+        preference_patterns = [
+            ("i like", "user_preference", "User likes"),
+            ("i love", "user_preference", "User loves"),
+            ("i prefer", "user_preference", "User prefers"),
+            ("i don't like", "user_preference", "User dislikes"),
+            ("i hate", "user_preference", "User dislikes"),
+        ]
+        
+        for pattern, category, prefix in preference_patterns:
+            if pattern in message_lower:
+                idx = message_lower.find(pattern) + len(pattern)
+                preference = user_message[idx:].strip().split(".")[0].strip()
+                if preference and len(preference) > 3:
+                    sid = self.store(
+                        category=category,
+                        content=f"{prefix} {preference}",
+                        confidence=0.7,
+                        source="extracted_from_conversation"
+                    )
+                    stored_ids.append(sid)
+        
+        return stored_ids
+
+
+# =============================================================================
+# MEMORY MANAGER (Integrates all layers)
+# =============================================================================
+
+class MemoryManager:
+    """Central manager for all memory layers.
+    
+    Coordinates between:
+    - Working Memory (immediate context)
+    - Short-term Memory (Agno SQLite)
+    - Long-term Memory (facts/preferences)
+    - (Future: Semantic Memory with embeddings)
+    """
+    
+    def __init__(self) -> None:
+        self.working = WorkingMemory(max_entries=20)
+        self.long_term = LongTermMemory()
+        self._session_id: Optional[str] = None
+    
+    def start_session(self, session_id: Optional[str] = None) -> str:
+        """Start a new conversation session."""
+        self._session_id = session_id or str(uuid.uuid4())
+        self.working.clear()
+        
+        # Load relevant LTM into context
+        profile = self.long_term.get_user_profile()
+        if profile["name"]:
+            logger.info("MemoryManager: Recognizing user '%s'", profile["name"])
+        
+        return self._session_id
+    
+    def add_exchange(
+        self,
+        user_message: str,
+        assistant_response: str,
+        tool_calls: Optional[list] = None
+    ) -> None:
+        """Record a complete exchange across all memory layers."""
+        # Working memory
+        self.working.add("user", user_message)
+        self.working.add("assistant", assistant_response, metadata={"tools": tool_calls})
+        
+        # Extract and store facts to LTM
+        try:
+            self.long_term.extract_and_store(user_message, assistant_response)
+        except Exception as exc:
+            logger.warning("Failed to extract facts: %s", exc)
+    
+    def get_context_for_prompt(self) -> str:
+        """Generate context string for injection into prompts."""
+        parts = []
+        
+        # User profile from LTM
+        profile = self.long_term.get_user_profile()
+        if profile["name"]:
+            parts.append(f"User's name: {profile['name']}")
+        
+        if profile["preferences"]:
+            prefs = list(profile["preferences"].values())[:3]  # Top 3 preferences
+            parts.append("User preferences: " + "; ".join(prefs))
+        
+        # Recent working memory
+        working_context = self.working.get_formatted_context(n=6)
+        if working_context:
+            parts.append("Recent conversation:\n" + working_context)
+        
+        return "\n\n".join(parts) if parts else ""
+    
+    def get_relevant_memories(self, query: str) -> list[str]:
+        """Get memories relevant to current query."""
+        # Get from LTM
+        facts = self.long_term.retrieve(query=query, limit=5)
+        return [f.content for f in facts]
+
+
+# Singleton removed — this module is deprecated.
+# Use timmy.memory_system.memory_system or timmy.conversation.conversation_manager.
--- a/src/timmy/memory_system.py
+++ b/src/timmy/memory_system.py
@@ -0,0 +1,439 @@
+"""Three-tier memory system for Timmy.
+
+Architecture:
+- Tier 1 (Hot): MEMORY.md — always loaded, ~300 lines
+- Tier 2 (Vault): memory/ — structured markdown, append-only
+- Tier 3 (Semantic): Vector search over vault (optional)
+
+Handoff Protocol:
+- Write last-session-handoff.md at session end
+- Inject into next session automatically
+"""
+
+import hashlib
+import logging
+import re
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Optional
+
+logger = logging.getLogger(__name__)
+
+# Paths
+PROJECT_ROOT = Path(__file__).parent.parent.parent
+HOT_MEMORY_PATH = PROJECT_ROOT / "MEMORY.md"
+VAULT_PATH = PROJECT_ROOT / "memory"
+HANDOFF_PATH = VAULT_PATH / "notes" / "last-session-handoff.md"
+
+
+class HotMemory:
+    """Tier 1: Hot memory (MEMORY.md) — always loaded."""
+    
+    def __init__(self) -> None:
+        self.path = HOT_MEMORY_PATH
+        self._content: Optional[str] = None
+        self._last_modified: Optional[float] = None
+    
+    def read(self, force_refresh: bool = False) -> str:
+        """Read hot memory, with caching."""
+        if not self.path.exists():
+            self._create_default()
+        
+        # Check if file changed
+        current_mtime = self.path.stat().st_mtime
+        if not force_refresh and self._content and self._last_modified == current_mtime:
+            return self._content
+        
+        self._content = self.path.read_text()
+        self._last_modified = current_mtime
+        logger.debug("HotMemory: Loaded %d chars from %s", len(self._content), self.path)
+        return self._content
+    
+    def update_section(self, section: str, content: str) -> None:
+        """Update a specific section in MEMORY.md."""
+        full_content = self.read()
+        
+        # Find section
+        pattern = rf"(## {re.escape(section)}.*?)(?=\n## |\Z)"
+        match = re.search(pattern, full_content, re.DOTALL)
+        
+        if match:
+            # Replace section
+            new_section = f"## {section}\n\n{content}\n\n"
+            full_content = full_content[:match.start()] + new_section + full_content[match.end():]
+        else:
+            # Append section before last updated line
+            insert_point = full_content.rfind("*Prune date:")
+            new_section = f"## {section}\n\n{content}\n\n"
+            full_content = full_content[:insert_point] + new_section + "\n" + full_content[insert_point:]
+        
+        self.path.write_text(full_content)
+        self._content = full_content
+        self._last_modified = self.path.stat().st_mtime
+        logger.info("HotMemory: Updated section '%s'", section)
+    
+    def _create_default(self) -> None:
+        """Create default MEMORY.md if missing."""
+        default_content = """# Timmy Hot Memory
+
+> Working RAM — always loaded, ~300 lines max, pruned monthly
+> Last updated: {date}
+
+---
+
+## Current Status
+
+**Agent State:** Operational  
+**Mode:** Development  
+**Active Tasks:** 0  
+**Pending Decisions:** None
+
+---
+
+## Standing Rules
+
+1. **Sovereignty First** — No cloud dependencies
+2. **Local-Only Inference** — Ollama on localhost
+3. **Privacy by Design** — Telemetry disabled
+4. **Tool Minimalism** — Use tools only when necessary
+5. **Memory Discipline** — Write handoffs at session end
+
+---
+
+## Agent Roster
+
+| Agent | Role | Status |
+|-------|------|--------|
+| Timmy | Core | Active |
+
+---
+
+## User Profile
+
+**Name:** (not set)  
+**Interests:** (to be learned)
+
+---
+
+## Key Decisions
+
+(none yet)
+
+---
+
+## Pending Actions
+
+- [ ] Learn user's name
+
+---
+
+*Prune date: {prune_date}*
+""".format(
+            date=datetime.now(timezone.utc).strftime("%Y-%m-%d"),
+            prune_date=(datetime.now(timezone.utc).replace(day=25)).strftime("%Y-%m-%d")
+        )
+        
+        self.path.write_text(default_content)
+        logger.info("HotMemory: Created default MEMORY.md")
+
+
+class VaultMemory:
+    """Tier 2: Structured vault (memory/) — append-only markdown."""
+    
+    def __init__(self) -> None:
+        self.path = VAULT_PATH
+        self._ensure_structure()
+    
+    def _ensure_structure(self) -> None:
+        """Ensure vault directory structure exists."""
+        (self.path / "self").mkdir(parents=True, exist_ok=True)
+        (self.path / "notes").mkdir(parents=True, exist_ok=True)
+        (self.path / "aar").mkdir(parents=True, exist_ok=True)
+    
+    def write_note(self, name: str, content: str, namespace: str = "notes") -> Path:
+        """Write a note to the vault."""
+        # Add timestamp to filename
+        timestamp = datetime.now(timezone.utc).strftime("%Y%m%d")
+        filename = f"{timestamp}_{name}.md"
+        filepath = self.path / namespace / filename
+        
+        # Add header
+        full_content = f"""# {name.replace('_', ' ').title()}
+
+> Created: {datetime.now(timezone.utc).isoformat()}  
+> Namespace: {namespace}
+
+---
+
+{content}
+
+---
+
+*Auto-generated by Timmy Memory System*
+"""
+        
+        filepath.write_text(full_content)
+        logger.info("VaultMemory: Wrote %s", filepath)
+        return filepath
+    
+    def read_file(self, filepath: Path) -> str:
+        """Read a file from the vault."""
+        if not filepath.exists():
+            return ""
+        return filepath.read_text()
+    
+    def list_files(self, namespace: str = "notes", pattern: str = "*.md") -> list[Path]:
+        """List files in a namespace."""
+        dir_path = self.path / namespace
+        if not dir_path.exists():
+            return []
+        return sorted(dir_path.glob(pattern))
+    
+    def get_latest(self, namespace: str = "notes", pattern: str = "*.md") -> Optional[Path]:
+        """Get most recent file in namespace."""
+        files = self.list_files(namespace, pattern)
+        return files[-1] if files else None
+    
+    def update_user_profile(self, key: str, value: str) -> None:
+        """Update a field in user_profile.md."""
+        profile_path = self.path / "self" / "user_profile.md"
+        
+        if not profile_path.exists():
+            # Create default profile
+            self._create_default_profile()
+        
+        content = profile_path.read_text()
+        
+        # Simple pattern replacement
+        pattern = rf"(\*\*{re.escape(key)}:\*\*).*"
+        if re.search(pattern, content):
+            content = re.sub(pattern, rf"\1 {value}", content)
+        else:
+            # Add to Important Facts section
+            facts_section = "## Important Facts"
+            if facts_section in content:
+                insert_point = content.find(facts_section) + len(facts_section)
+                content = content[:insert_point] + f"\n- {key}: {value}" + content[insert_point:]
+        
+        # Update last_updated
+        content = re.sub(
+            r"\*Last updated:.*\*",
+            f"*Last updated: {datetime.now(timezone.utc).strftime('%Y-%m-%d')}*",
+            content
+        )
+        
+        profile_path.write_text(content)
+        logger.info("VaultMemory: Updated user profile: %s = %s", key, value)
+    
+    def _create_default_profile(self) -> None:
+        """Create default user profile."""
+        profile_path = self.path / "self" / "user_profile.md"
+        default = """# User Profile
+
+> Learned information about the user.
+
+## Basic Information
+
+**Name:** (unknown)  
+**Location:** (unknown)  
+**Occupation:** (unknown)
+
+## Interests & Expertise
+
+- (to be learned)
+
+## Preferences
+
+- Response style: concise, technical
+- Tool usage: minimal
+
+## Important Facts
+
+- (to be extracted)
+
+---
+
+*Last updated: {date}*
+""".format(date=datetime.now(timezone.utc).strftime("%Y-%m-%d"))
+        
+        profile_path.write_text(default)
+
+
+class HandoffProtocol:
+    """Session handoff protocol for continuity."""
+    
+    def __init__(self) -> None:
+        self.path = HANDOFF_PATH
+        self.vault = VaultMemory()
+    
+    def write_handoff(
+        self,
+        session_summary: str,
+        key_decisions: list[str],
+        open_items: list[str],
+        next_steps: list[str]
+    ) -> None:
+        """Write handoff at session end."""
+        content = f"""# Last Session Handoff
+
+**Session End:** {datetime.now(timezone.utc).isoformat()}  
+**Duration:** (calculated on read)
+
+## Summary
+
+{session_summary}
+
+## Key Decisions
+
+{chr(10).join(f"- {d}" for d in key_decisions) if key_decisions else "- (none)"}
+
+## Open Items
+
+{chr(10).join(f"- [ ] {i}" for i in open_items) if open_items else "- (none)"}
+
+## Next Steps
+
+{chr(10).join(f"- {s}" for s in next_steps) if next_steps else "- (none)"}
+
+## Context for Next Session
+
+The user was last working on: {session_summary[:200]}...
+
+---
+
+*This handoff will be auto-loaded at next session start*
+"""
+        
+        self.path.write_text(content)
+        
+        # Also archive to notes
+        self.vault.write_note(
+            "session_handoff",
+            content,
+            namespace="notes"
+        )
+        
+        logger.info("HandoffProtocol: Wrote handoff with %d decisions, %d open items",
+                   len(key_decisions), len(open_items))
+    
+    def read_handoff(self) -> Optional[str]:
+        """Read handoff if exists."""
+        if not self.path.exists():
+            return None
+        return self.path.read_text()
+    
+    def clear_handoff(self) -> None:
+        """Clear handoff after loading."""
+        if self.path.exists():
+            self.path.unlink()
+            logger.debug("HandoffProtocol: Cleared handoff")
+
+
+class MemorySystem:
+    """Central memory system coordinating all tiers."""
+    
+    def __init__(self) -> None:
+        self.hot = HotMemory()
+        self.vault = VaultMemory()
+        self.handoff = HandoffProtocol()
+        self.session_start_time: Optional[datetime] = None
+        self.session_decisions: list[str] = []
+        self.session_open_items: list[str] = []
+    
+    def start_session(self) -> str:
+        """Start a new session, loading context from memory."""
+        self.session_start_time = datetime.now(timezone.utc)
+        
+        # Build context
+        context_parts = []
+        
+        # 1. Hot memory
+        hot_content = self.hot.read()
+        context_parts.append("## Hot Memory\n" + hot_content)
+        
+        # 2. Last session handoff
+        handoff_content = self.handoff.read_handoff()
+        if handoff_content:
+            context_parts.append("## Previous Session\n" + handoff_content)
+            self.handoff.clear_handoff()
+        
+        # 3. User profile (key fields only)
+        profile = self._load_user_profile_summary()
+        if profile:
+            context_parts.append("## User Context\n" + profile)
+        
+        full_context = "\n\n---\n\n".join(context_parts)
+        logger.info("MemorySystem: Session started with %d chars context", len(full_context))
+        
+        return full_context
+    
+    def end_session(self, summary: str) -> None:
+        """End session, write handoff."""
+        self.handoff.write_handoff(
+            session_summary=summary,
+            key_decisions=self.session_decisions,
+            open_items=self.session_open_items,
+            next_steps=[]
+        )
+        
+        # Update hot memory
+        self.hot.update_section(
+            "Current Session",
+            f"**Last Session:** {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M')}\n" +
+            f"**Summary:** {summary[:100]}..."
+        )
+        
+        logger.info("MemorySystem: Session ended, handoff written")
+    
+    def record_decision(self, decision: str) -> None:
+        """Record a key decision during session."""
+        self.session_decisions.append(decision)
+        # Also add to hot memory
+        current = self.hot.read()
+        if "## Key Decisions" in current:
+            # Append to section
+            pass  # Handled at session end
+    
+    def record_open_item(self, item: str) -> None:
+        """Record an open item for follow-up."""
+        self.session_open_items.append(item)
+    
+    def update_user_fact(self, key: str, value: str) -> None:
+        """Update user profile in vault."""
+        self.vault.update_user_profile(key, value)
+        # Also update hot memory
+        if key.lower() == "name":
+            self.hot.update_section("User Profile", f"**Name:** {value}")
+    
+    def _load_user_profile_summary(self) -> str:
+        """Load condensed user profile."""
+        profile_path = self.vault.path / "self" / "user_profile.md"
+        if not profile_path.exists():
+            return ""
+        
+        content = profile_path.read_text()
+        
+        # Extract key fields
+        summary_parts = []
+        
+        # Name
+        name_match = re.search(r"\*\*Name:\*\* (.+)", content)
+        if name_match and "unknown" not in name_match.group(1).lower():
+            summary_parts.append(f"Name: {name_match.group(1).strip()}")
+        
+        # Interests
+        interests_section = re.search(r"## Interests.*?\n- (.+?)(?=\n## |\Z)", content, re.DOTALL)
+        if interests_section:
+            interests = [i.strip() for i in interests_section.group(1).split("\n-") if i.strip() and "to be" not in i]
+            if interests:
+                summary_parts.append(f"Interests: {', '.join(interests[:3])}")
+        
+        return "\n".join(summary_parts) if summary_parts else ""
+    
+    def get_system_context(self) -> str:
+        """Get full context for system prompt injection."""
+        return self.start_session()
+
+
+# Module-level singleton
+memory_system = MemorySystem()
--- a/src/timmy/prompts.py
+++ b/src/timmy/prompts.py
@@ -1,41 +1,118 @@
-TIMMY_SYSTEM_PROMPT = """You are Timmy — a sovereign AI agent running locally.
+"""System prompts for Timmy with two-tier prompt system.
+
+Small models (< 7B, e.g. llama3.2) get a concise prompt without tool docs.
+Larger models (>= 7B, e.g. llama3.1, llama3.3) get the full prompt with
+tool usage guidelines and memory system documentation.
+"""
+
+# ---------------------------------------------------------------------------
+# Lite prompt — for small models that can't reliably handle tool calling
+# ---------------------------------------------------------------------------
+
+TIMMY_SYSTEM_PROMPT_LITE = """You are Timmy — a sovereign AI agent running locally on this Mac.
+No cloud dependencies. Think clearly, speak plainly, act with intention.
+Grounded in Christian faith, powered by Bitcoin economics, committed to the
+user's digital sovereignty.
+
+Rules:
+- Answer directly and concisely. Never narrate your reasoning process.
+- Never mention tools, memory_search, vaults, or internal systems to the user.
+- Never output tool calls, JSON, or function syntax in your responses.
+- Remember what the user tells you during our conversation.
+- If you don't know something, say so honestly.
+- Use the user's name if you know it.
+- Do simple math in your head. Don't reach for tools.
+
+Sir, affirmative."""
+
+# ---------------------------------------------------------------------------
+# Full prompt — for tool-capable models (>= 7B)
+# ---------------------------------------------------------------------------
+
+TIMMY_SYSTEM_PROMPT_FULL = """You are Timmy — a sovereign AI agent running locally on this Mac.
 No cloud dependencies. You think clearly, speak plainly, act with intention.
 Grounded in Christian faith, powered by Bitcoin economics, committed to the
 user's digital sovereignty.

-## Your Capabilities
+## Your Three-Tier Memory System

-You have access to tools for:
- Web search (DuckDuckGo) — for current information not in your training data
- File operations (read, write, list) — for working with local files
- Python execution — for calculations, data analysis, scripting
- Shell commands — for system operations
+### Tier 1: Hot Memory (Always Loaded)
+- MEMORY.md — Current status, rules, user profile summary
+- Loaded into every session automatically
+- Fast access, always available
+
+### Tier 2: Structured Vault (Persistent)
+- memory/self/ — Identity, user profile, methodology
+- memory/notes/ — Session logs, research, lessons learned
+- memory/aar/ — After-action reviews
+- Append-only, date-stamped, human-readable
+
+### Tier 3: Semantic Search (Vector Recall)
+- Indexed from all vault files
+- Similarity-based retrieval
+- Use `memory_search` tool to find relevant past context

 ## Tool Usage Guidelines

-**Use tools ONLY when necessary:**
- Simple questions → Answer directly from your knowledge
- Current events/data → Use web search
- File operations → Use file tools (user must explicitly request)
- Code/Calculations → Use Python execution
- System tasks → Use shell commands
+### When NOT to use tools:
+- Identity questions → Answer directly
+- General knowledge → Answer from training
+- Simple math → Calculate mentally
+- Greetings → Respond conversationally

-**Do NOT use tools for:**
- Answering "what is your name?" or identity questions
- General knowledge questions you can answer directly
- Simple greetings or conversational responses
+### When TO use tools:

-## Memory
+- **web_search** — Current events, real-time data, news
+- **read_file** — User explicitly requests file reading
+- **write_file** — User explicitly requests saving content
+- **python** — Complex calculations, code execution
+- **shell** — System operations (explicit user request)
+- **memory_search** — "Have we talked about this before?", finding past context

-You remember previous conversations in this session. Your memory persists
-across restarts via SQLite storage. Reference prior context when relevant.
+## Important: Response Style

-## Operating Modes
+- Never narrate your reasoning process. Just give the answer.
+- Never show raw tool call JSON or function syntax in responses.
+- Use the user's name if known.

-When running on Apple Silicon with AirLLM you operate with even bigger brains
-— 70B or 405B parameters loaded layer-by-layer directly from local disk.
-Still fully sovereign. Still 100% private. More capable, no permission needed.
 Sir, affirmative."""

+# Keep backward compatibility — default to lite for safety
+TIMMY_SYSTEM_PROMPT = TIMMY_SYSTEM_PROMPT_LITE
+
+
+def get_system_prompt(tools_enabled: bool = False) -> str:
+    """Return the appropriate system prompt based on tool capability.
+
+    Args:
+        tools_enabled: True if the model supports reliable tool calling.
+
+    Returns:
+        The system prompt string.
+    """
+    if tools_enabled:
+        return TIMMY_SYSTEM_PROMPT_FULL
+    return TIMMY_SYSTEM_PROMPT_LITE
+
 TIMMY_STATUS_PROMPT = """You are Timmy. Give a one-sentence status report confirming
 you are operational and running locally."""
+
+# Decision guide for tool usage
+TOOL_USAGE_GUIDE = """
+DECISION ORDER:
+
+1. Can I answer from training data? → Answer directly (NO TOOL)
+2. Is this about past conversations? → memory_search
+3. Is this current/real-time info? → web_search
+4. Did user request file operations? → file tools
+5. Requires calculation/code? → python
+6. System command requested? → shell
+
+MEMORY SEARCH TRIGGERS:
+- "Have we discussed..."
+- "What did I say about..."
+- "Remind me of..."
+- "What was my idea for..."
+- "Didn't we talk about..."
+- Any reference to past sessions
+"""
--- a/src/timmy/semantic_memory.py
+++ b/src/timmy/semantic_memory.py
@@ -0,0 +1,324 @@
+"""Tier 3: Semantic Memory — Vector search over vault files.
+
+Uses lightweight local embeddings (no cloud) for similarity search
+over all vault content. This is the "escape valve" when hot memory
+doesn't have the answer.
+
+Architecture:
+- Indexes all markdown files in memory/ nightly or on-demand
+- Uses sentence-transformers (local, no API calls)
+- Stores vectors in SQLite (no external vector DB needed)
+- memory_search() retrieves relevant context by similarity
+"""
+
+import hashlib
+import json
+import logging
+import sqlite3
+from dataclasses import dataclass
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Optional
+
+logger = logging.getLogger(__name__)
+
+# Paths
+PROJECT_ROOT = Path(__file__).parent.parent.parent
+VAULT_PATH = PROJECT_ROOT / "memory"
+SEMANTIC_DB_PATH = PROJECT_ROOT / "data" / "semantic_memory.db"
+
+# Embedding model - small, fast, local
+# Using 'all-MiniLM-L6-v2' (~80MB) or fallback to simple keyword matching
+EMBEDDING_MODEL = None
+EMBEDDING_DIM = 384  # MiniLM dimension
+
+
+def _get_embedding_model():
+    """Lazy-load embedding model."""
+    global EMBEDDING_MODEL
+    if EMBEDDING_MODEL is None:
+        try:
+            from sentence_transformers import SentenceTransformer
+            EMBEDDING_MODEL = SentenceTransformer('all-MiniLM-L6-v2')
+            logger.info("SemanticMemory: Loaded embedding model")
+        except ImportError:
+            logger.warning("SemanticMemory: sentence-transformers not installed, using fallback")
+            EMBEDDING_MODEL = False  # Use fallback
+    return EMBEDDING_MODEL
+
+
+def _simple_hash_embedding(text: str) -> list[float]:
+    """Fallback: Simple hash-based embedding when transformers unavailable."""
+    # Create a deterministic pseudo-embedding from word hashes
+    words = text.lower().split()
+    vec = [0.0] * 128
+    for i, word in enumerate(words[:50]):  # First 50 words
+        h = hashlib.md5(word.encode()).hexdigest()
+        for j in range(8):
+            idx = (i * 8 + j) % 128
+            vec[idx] += int(h[j*2:j*2+2], 16) / 255.0
+    # Normalize
+    import math
+    mag = math.sqrt(sum(x*x for x in vec)) or 1.0
+    return [x/mag for x in vec]
+
+
+def embed_text(text: str) -> list[float]:
+    """Generate embedding for text."""
+    model = _get_embedding_model()
+    if model and model is not False:
+        embedding = model.encode(text)
+        return embedding.tolist()
+    else:
+        return _simple_hash_embedding(text)
+
+
+def cosine_similarity(a: list[float], b: list[float]) -> float:
+    """Calculate cosine similarity between two vectors."""
+    import math
+    dot = sum(x*y for x, y in zip(a, b))
+    mag_a = math.sqrt(sum(x*x for x in a))
+    mag_b = math.sqrt(sum(x*x for x in b))
+    if mag_a == 0 or mag_b == 0:
+        return 0.0
+    return dot / (mag_a * mag_b)
+
+
+@dataclass
+class MemoryChunk:
+    """A searchable chunk of memory."""
+    id: str
+    source: str  # filepath
+    content: str
+    embedding: list[float]
+    created_at: str
+
+
+class SemanticMemory:
+    """Vector-based semantic search over vault content."""
+    
+    def __init__(self) -> None:
+        self.db_path = SEMANTIC_DB_PATH
+        self.vault_path = VAULT_PATH
+        self._init_db()
+    
+    def _init_db(self) -> None:
+        """Initialize SQLite with vector storage."""
+        self.db_path.parent.mkdir(parents=True, exist_ok=True)
+        conn = sqlite3.connect(str(self.db_path))
+        conn.execute("""
+            CREATE TABLE IF NOT EXISTS chunks (
+                id TEXT PRIMARY KEY,
+                source TEXT NOT NULL,
+                content TEXT NOT NULL,
+                embedding TEXT NOT NULL,  -- JSON array
+                created_at TEXT NOT NULL,
+                source_hash TEXT NOT NULL
+            )
+        """)
+        conn.execute("CREATE INDEX IF NOT EXISTS idx_source ON chunks(source)")
+        conn.commit()
+        conn.close()
+    
+    def index_file(self, filepath: Path) -> int:
+        """Index a single file into semantic memory."""
+        if not filepath.exists():
+            return 0
+        
+        content = filepath.read_text()
+        file_hash = hashlib.md5(content.encode()).hexdigest()
+        
+        # Check if already indexed with same hash
+        conn = sqlite3.connect(str(self.db_path))
+        cursor = conn.execute(
+            "SELECT source_hash FROM chunks WHERE source = ? LIMIT 1",
+            (str(filepath),)
+        )
+        existing = cursor.fetchone()
+        if existing and existing[0] == file_hash:
+            conn.close()
+            return 0  # Already indexed
+        
+        # Delete old chunks for this file
+        conn.execute("DELETE FROM chunks WHERE source = ?", (str(filepath),))
+        
+        # Split into chunks (paragraphs)
+        chunks = self._split_into_chunks(content)
+        
+        # Index each chunk
+        now = datetime.now(timezone.utc).isoformat()
+        for i, chunk_text in enumerate(chunks):
+            if len(chunk_text.strip()) < 20:  # Skip tiny chunks
+                continue
+            
+            chunk_id = f"{filepath.stem}_{i}"
+            embedding = embed_text(chunk_text)
+            
+            conn.execute(
+                """INSERT INTO chunks (id, source, content, embedding, created_at, source_hash)
+                   VALUES (?, ?, ?, ?, ?, ?)""",
+                (chunk_id, str(filepath), chunk_text, json.dumps(embedding), now, file_hash)
+            )
+        
+        conn.commit()
+        conn.close()
+        
+        logger.info("SemanticMemory: Indexed %s (%d chunks)", filepath.name, len(chunks))
+        return len(chunks)
+    
+    def _split_into_chunks(self, text: str, max_chunk_size: int = 500) -> list[str]:
+        """Split text into semantic chunks."""
+        # Split by paragraphs first
+        paragraphs = text.split('\n\n')
+        chunks = []
+        
+        for para in paragraphs:
+            para = para.strip()
+            if not para:
+                continue
+            
+            # If paragraph is small enough, keep as one chunk
+            if len(para) <= max_chunk_size:
+                chunks.append(para)
+            else:
+                # Split long paragraphs by sentences
+                sentences = para.replace('. ', '.\n').split('\n')
+                current_chunk = ""
+                
+                for sent in sentences:
+                    if len(current_chunk) + len(sent) < max_chunk_size:
+                        current_chunk += " " + sent if current_chunk else sent
+                    else:
+                        if current_chunk:
+                            chunks.append(current_chunk.strip())
+                        current_chunk = sent
+                
+                if current_chunk:
+                    chunks.append(current_chunk.strip())
+        
+        return chunks
+    
+    def index_vault(self) -> int:
+        """Index entire vault directory."""
+        total_chunks = 0
+        
+        for md_file in self.vault_path.rglob("*.md"):
+            # Skip handoff file (handled separately)
+            if "last-session-handoff" in md_file.name:
+                continue
+            total_chunks += self.index_file(md_file)
+        
+        logger.info("SemanticMemory: Indexed vault (%d total chunks)", total_chunks)
+        return total_chunks
+    
+    def search(self, query: str, top_k: int = 5) -> list[tuple[str, float]]:
+        """Search for relevant memory chunks."""
+        query_embedding = embed_text(query)
+        
+        conn = sqlite3.connect(str(self.db_path))
+        conn.row_factory = sqlite3.Row
+        
+        # Get all chunks (in production, use vector index)
+        rows = conn.execute(
+            "SELECT source, content, embedding FROM chunks"
+        ).fetchall()
+        
+        conn.close()
+        
+        # Calculate similarities
+        scored = []
+        for row in rows:
+            embedding = json.loads(row["embedding"])
+            score = cosine_similarity(query_embedding, embedding)
+            scored.append((row["source"], row["content"], score))
+        
+        # Sort by score descending
+        scored.sort(key=lambda x: x[2], reverse=True)
+        
+        # Return top_k
+        return [(content, score) for _, content, score in scored[:top_k]]
+    
+    def get_relevant_context(self, query: str, max_chars: int = 2000) -> str:
+        """Get formatted context string for a query."""
+        results = self.search(query, top_k=3)
+        
+        if not results:
+            return ""
+        
+        parts = []
+        total_chars = 0
+        
+        for content, score in results:
+            if score < 0.3:  # Similarity threshold
+                continue
+            
+            chunk = f"[Relevant memory - score {score:.2f}]: {content[:400]}..."
+            if total_chars + len(chunk) > max_chars:
+                break
+            
+            parts.append(chunk)
+            total_chars += len(chunk)
+        
+        return "\n\n".join(parts) if parts else ""
+    
+    def stats(self) -> dict:
+        """Get indexing statistics."""
+        conn = sqlite3.connect(str(self.db_path))
+        cursor = conn.execute("SELECT COUNT(*), COUNT(DISTINCT source) FROM chunks")
+        total_chunks, total_files = cursor.fetchone()
+        conn.close()
+        
+        return {
+            "total_chunks": total_chunks,
+            "total_files": total_files,
+            "embedding_dim": EMBEDDING_DIM if _get_embedding_model() else 128,
+        }
+
+
+class MemorySearcher:
+    """High-level interface for memory search."""
+    
+    def __init__(self) -> None:
+        self.semantic = SemanticMemory()
+    
+    def search(self, query: str, tiers: list[str] = None) -> dict:
+        """Search across memory tiers.
+        
+        Args:
+            query: Search query
+            tiers: List of tiers to search ["hot", "vault", "semantic"]
+        
+        Returns:
+            Dict with results from each tier
+        """
+        tiers = tiers or ["semantic"]  # Default to semantic only
+        results = {}
+        
+        if "semantic" in tiers:
+            semantic_results = self.semantic.search(query, top_k=5)
+            results["semantic"] = [
+                {"content": content, "score": score}
+                for content, score in semantic_results
+            ]
+        
+        return results
+    
+    def get_context_for_query(self, query: str) -> str:
+        """Get comprehensive context for a user query."""
+        # Get semantic context
+        semantic_context = self.semantic.get_relevant_context(query)
+        
+        if semantic_context:
+            return f"## Relevant Past Context\n\n{semantic_context}"
+        
+        return ""
+
+
+# Module-level singleton
+semantic_memory = SemanticMemory()
+memory_searcher = MemorySearcher()
+
+
+def memory_search(query: str, top_k: int = 5) -> list[tuple[str, float]]:
+    """Simple interface for memory search."""
+    return semantic_memory.search(query, top_k)
--- a/src/timmy/session.py
+++ b/src/timmy/session.py
@@ -0,0 +1,147 @@
+"""Persistent chat session for Timmy.
+
+Holds a singleton Agno Agent and a stable session_id so conversation
+history persists across HTTP requests via Agno's SQLite storage.
+
+This is the primary entry point for dashboard chat — instead of
+creating a new agent per request, we reuse a single instance and
+let Agno's session_id mechanism handle conversation continuity.
+"""
+
+import logging
+import re
+from typing import Optional
+
+logger = logging.getLogger(__name__)
+
+# Default session ID for the dashboard (stable across requests)
+_DEFAULT_SESSION_ID = "dashboard"
+
+# Module-level singleton agent (lazy-initialized, reused for all requests)
+_agent = None
+
+# ---------------------------------------------------------------------------
+# Response sanitization patterns
+# ---------------------------------------------------------------------------
+
+# Matches raw JSON tool calls: {"name": "python", "parameters": {...}}
+_TOOL_CALL_JSON = re.compile(
+    r'\{\s*"name"\s*:\s*"[^"]+?"\s*,\s*"parameters"\s*:\s*\{.*?\}\s*\}',
+    re.DOTALL,
+)
+
+# Matches function-call-style text: memory_search(query="...") etc.
+_FUNC_CALL_TEXT = re.compile(
+    r'\b(?:memory_search|web_search|shell|python|read_file|write_file|list_files)'
+    r'\s*\([^)]*\)',
+)
+
+# Matches chain-of-thought narration lines the model should keep internal
+_COT_PATTERNS = [
+    re.compile(r"^(?:Since |Using |Let me |I'll use |I will use |Here's a possible ).*$", re.MULTILINE),
+    re.compile(r"^(?:I found a relevant |This context suggests ).*$", re.MULTILINE),
+]
+
+
+def _get_agent():
+    """Lazy-initialize the singleton agent."""
+    global _agent
+    if _agent is None:
+        from timmy.agent import create_timmy
+        try:
+            _agent = create_timmy()
+            logger.info("Session: Timmy agent initialized (singleton)")
+        except Exception as exc:
+            logger.error("Session: Failed to create Timmy agent: %s", exc)
+            raise
+    return _agent
+
+
+def chat(message: str, session_id: Optional[str] = None) -> str:
+    """Send a message to Timmy and get a response.
+
+    Uses a persistent agent and session_id so Agno's SQLite history
+    provides multi-turn conversation context.
+
+    Args:
+        message:    The user's message.
+        session_id: Optional session identifier (defaults to "dashboard").
+
+    Returns:
+        The agent's response text.
+    """
+    sid = session_id or _DEFAULT_SESSION_ID
+    agent = _get_agent()
+
+    # Pre-processing: extract user facts
+    _extract_facts(message)
+
+    # Run with session_id so Agno retrieves history from SQLite
+    run = agent.run(message, stream=False, session_id=sid)
+    response_text = run.content if hasattr(run, "content") else str(run)
+
+    # Post-processing: clean up any leaked tool calls or chain-of-thought
+    response_text = _clean_response(response_text)
+
+    return response_text
+
+
+def reset_session(session_id: Optional[str] = None) -> None:
+    """Reset a session (clear conversation context).
+
+    This clears the ConversationManager state. Agno's SQLite history
+    is not cleared — that provides long-term continuity.
+    """
+    sid = session_id or _DEFAULT_SESSION_ID
+    try:
+        from timmy.conversation import conversation_manager
+        conversation_manager.clear_context(sid)
+    except Exception:
+        pass  # Graceful degradation
+
+
+def _extract_facts(message: str) -> None:
+    """Extract user facts from message and persist to memory system.
+
+    Ported from TimmyWithMemory._extract_and_store_facts().
+    Runs as a best-effort post-processor — failures are logged, not raised.
+    """
+    try:
+        from timmy.conversation import conversation_manager
+        name = conversation_manager.extract_user_name(message)
+        if name:
+            try:
+                from timmy.memory_system import memory_system
+                memory_system.update_user_fact("Name", name)
+                logger.info("Session: Learned user name: %s", name)
+            except Exception:
+                pass
+    except Exception as exc:
+        logger.debug("Session: Fact extraction skipped: %s", exc)
+
+
+def _clean_response(text: str) -> str:
+    """Remove hallucinated tool calls and chain-of-thought narration.
+
+    Small models sometimes output raw JSON tool calls or narrate their
+    internal reasoning instead of just answering. This strips those
+    artifacts from the response.
+    """
+    if not text:
+        return text
+
+    # Strip JSON tool call blocks
+    text = _TOOL_CALL_JSON.sub("", text)
+
+    # Strip function-call-style text
+    text = _FUNC_CALL_TEXT.sub("", text)
+
+    # Strip chain-of-thought narration lines
+    for pattern in _COT_PATTERNS:
+        text = pattern.sub("", text)
+
+    # Clean up leftover blank lines and whitespace
+    lines = [line for line in text.split("\n") if line.strip()]
+    text = "\n".join(lines)
+
+    return text.strip()
--- a/src/timmy/tools.py
+++ b/src/timmy/tools.py
@@ -253,7 +253,8 @@ def create_devops_tools(base_dir: str | Path | None = None):
 def create_full_toolkit(base_dir: str | Path | None = None):
    """Create a full toolkit with all available tools (for Timmy).
    
-    Includes: web search, file read/write, shell commands, python execution
+    Includes: web search, file read/write, shell commands, python execution,
+    and memory search for contextual recall.
    """
    if not _AGNO_TOOLS_AVAILABLE:
        # Return None when tools aren't available (tests)
@@ -279,6 +280,13 @@ def create_full_toolkit(base_dir: str | Path | None = None):
    toolkit.register(file_tools.save_file, name="write_file")
    toolkit.register(file_tools.list_files, name="list_files")
    
+    # Memory search - semantic recall
+    try:
+        from timmy.semantic_memory import memory_search
+        toolkit.register(memory_search, name="memory_search")
+    except Exception:
+        logger.debug("Memory search not available")
+    
    return toolkit


--- a/tests/test_agent.py
+++ b/tests/test_agent.py
@@ -52,7 +52,7 @@ def test_create_timmy_history_config():

        kwargs = MockAgent.call_args.kwargs
        assert kwargs["add_history_to_context"] is True
-        assert kwargs["num_history_runs"] == 10
+        assert kwargs["num_history_runs"] == 20
        assert kwargs["markdown"] is True


@@ -78,7 +78,10 @@ def test_create_timmy_embeds_system_prompt():
        create_timmy()

        kwargs = MockAgent.call_args.kwargs
-        assert kwargs["description"] == TIMMY_SYSTEM_PROMPT
+        # Prompt should contain base system prompt (may have memory context appended)
+        # Default model (llama3.2) uses the lite prompt
+        assert "Timmy" in kwargs["description"]
+        assert "sovereign" in kwargs["description"]


 # ── Ollama host regression (container connectivity) ─────────────────────────
@@ -193,3 +196,85 @@ def test_resolve_backend_auto_falls_back_on_non_apple():

        from timmy.agent import _resolve_backend
        assert _resolve_backend(None) == "ollama"
+
+
+# ── _model_supports_tools ────────────────────────────────────────────────────
+
+def test_model_supports_tools_llama32_returns_false():
+    """llama3.2 (3B) is too small for reliable tool calling."""
+    from timmy.agent import _model_supports_tools
+    assert _model_supports_tools("llama3.2") is False
+    assert _model_supports_tools("llama3.2:latest") is False
+
+
+def test_model_supports_tools_llama31_returns_true():
+    """llama3.1 (8B+) can handle tool calling."""
+    from timmy.agent import _model_supports_tools
+    assert _model_supports_tools("llama3.1") is True
+    assert _model_supports_tools("llama3.3") is True
+
+
+def test_model_supports_tools_other_small_models():
+    """Other known small models should not get tools."""
+    from timmy.agent import _model_supports_tools
+    assert _model_supports_tools("phi-3") is False
+    assert _model_supports_tools("tinyllama") is False
+
+
+def test_model_supports_tools_unknown_model_gets_tools():
+    """Unknown models default to tool-capable (optimistic)."""
+    from timmy.agent import _model_supports_tools
+    assert _model_supports_tools("mistral") is True
+    assert _model_supports_tools("qwen2.5:72b") is True
+
+
+# ── Tool gating in create_timmy ──────────────────────────────────────────────
+
+def test_create_timmy_no_tools_for_small_model():
+    """llama3.2 should get no tools."""
+    with patch("timmy.agent.Agent") as MockAgent, \
+         patch("timmy.agent.Ollama"), \
+         patch("timmy.agent.SqliteDb"):
+
+        from timmy.agent import create_timmy
+        create_timmy()
+
+        kwargs = MockAgent.call_args.kwargs
+        # Default model is llama3.2 → tools should be None
+        assert kwargs["tools"] is None
+
+
+def test_create_timmy_includes_tools_for_large_model():
+    """A tool-capable model (e.g. llama3.1) should attempt to include tools."""
+    mock_toolkit = MagicMock()
+
+    with patch("timmy.agent.Agent") as MockAgent, \
+         patch("timmy.agent.Ollama"), \
+         patch("timmy.agent.SqliteDb"), \
+         patch("timmy.agent.create_full_toolkit", return_value=mock_toolkit), \
+         patch("timmy.agent.settings") as mock_settings:
+
+        mock_settings.ollama_model = "llama3.1"
+        mock_settings.ollama_url = "http://localhost:11434"
+        mock_settings.timmy_model_backend = "ollama"
+        mock_settings.airllm_model_size = "70b"
+        mock_settings.telemetry_enabled = False
+
+        from timmy.agent import create_timmy
+        create_timmy()
+
+        kwargs = MockAgent.call_args.kwargs
+        assert kwargs["tools"] == [mock_toolkit]
+
+
+def test_create_timmy_show_tool_calls_false():
+    """show_tool_calls should always be False to prevent raw JSON in output."""
+    with patch("timmy.agent.Agent") as MockAgent, \
+         patch("timmy.agent.Ollama"), \
+         patch("timmy.agent.SqliteDb"):
+
+        from timmy.agent import create_timmy
+        create_timmy()
+
+        kwargs = MockAgent.call_args.kwargs
+        assert kwargs["show_tool_calls"] is False
--- a/tests/test_dashboard.py
+++ b/tests/test_dashboard.py
@@ -1,4 +1,4 @@
-from unittest.mock import AsyncMock, MagicMock, patch
+from unittest.mock import AsyncMock, patch


 # ── Index ─────────────────────────────────────────────────────────────────────
@@ -74,12 +74,7 @@ def test_agents_list_timmy_metadata(client):
 # ── Chat ──────────────────────────────────────────────────────────────────────

 def test_chat_timmy_success(client):
-    mock_agent = MagicMock()
-    mock_run = MagicMock()
-    mock_run.content = "I am Timmy, operational and sovereign."
-    mock_agent.run.return_value = mock_run
-
-    with patch("dashboard.routes.agents.create_timmy", return_value=mock_agent):
+    with patch("dashboard.routes.agents.timmy_chat", return_value="I am Timmy, operational and sovereign."):
        response = client.post("/agents/timmy/chat", data={"message": "status?"})

    assert response.status_code == 200
@@ -88,17 +83,14 @@ def test_chat_timmy_success(client):


 def test_chat_timmy_shows_user_message(client):
-    mock_agent = MagicMock()
-    mock_agent.run.return_value = MagicMock(content="Acknowledged.")
-
-    with patch("dashboard.routes.agents.create_timmy", return_value=mock_agent):
+    with patch("dashboard.routes.agents.timmy_chat", return_value="Acknowledged."):
        response = client.post("/agents/timmy/chat", data={"message": "hello there"})

    assert "hello there" in response.text


 def test_chat_timmy_ollama_offline(client):
-    with patch("dashboard.routes.agents.create_timmy", side_effect=Exception("connection refused")):
+    with patch("dashboard.routes.agents.timmy_chat", side_effect=Exception("connection refused")):
        response = client.post("/agents/timmy/chat", data={"message": "ping"})

    assert response.status_code == 200
@@ -120,10 +112,7 @@ def test_history_empty_shows_init_message(client):


 def test_history_records_user_and_agent_messages(client):
-    mock_agent = MagicMock()
-    mock_agent.run.return_value = MagicMock(content="I am operational.")
-
-    with patch("dashboard.routes.agents.create_timmy", return_value=mock_agent):
+    with patch("dashboard.routes.agents.timmy_chat", return_value="I am operational."):
        client.post("/agents/timmy/chat", data={"message": "status check"})

    response = client.get("/agents/timmy/history")
@@ -132,7 +121,7 @@ def test_history_records_user_and_agent_messages(client):


 def test_history_records_error_when_offline(client):
-    with patch("dashboard.routes.agents.create_timmy", side_effect=Exception("refused")):
+    with patch("dashboard.routes.agents.timmy_chat", side_effect=Exception("refused")):
        client.post("/agents/timmy/chat", data={"message": "ping"})

    response = client.get("/agents/timmy/history")
@@ -141,10 +130,7 @@ def test_history_records_error_when_offline(client):


 def test_history_clear_resets_to_init_message(client):
-    mock_agent = MagicMock()
-    mock_agent.run.return_value = MagicMock(content="Acknowledged.")
-
-    with patch("dashboard.routes.agents.create_timmy", return_value=mock_agent):
+    with patch("dashboard.routes.agents.timmy_chat", return_value="Acknowledged."):
        client.post("/agents/timmy/chat", data={"message": "hello"})

    response = client.delete("/agents/timmy/history")
@@ -153,10 +139,7 @@ def test_history_clear_resets_to_init_message(client):


 def test_history_empty_after_clear(client):
-    mock_agent = MagicMock()
-    mock_agent.run.return_value = MagicMock(content="OK.")
-
-    with patch("dashboard.routes.agents.create_timmy", return_value=mock_agent):
+    with patch("dashboard.routes.agents.timmy_chat", return_value="OK."):
        client.post("/agents/timmy/chat", data={"message": "test"})

    client.delete("/agents/timmy/history")
--- a/tests/test_scary_paths.py
+++ b/tests/test_scary_paths.py
@@ -274,7 +274,7 @@ class TestWebSocketResilience:
    
    def test_websocket_manager_handles_no_connections(self):
        """WebSocket manager handles zero connected clients."""
-        from websocket.handler import ws_manager
+        from ws_manager.handler import ws_manager
        
        # Should not crash when broadcasting with no connections
        try:
--- a/tests/test_session.py
+++ b/tests/test_session.py
@@ -0,0 +1,180 @@
+"""Tests for timmy.session — persistent chat session with response sanitization."""
+
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+
+# ---------------------------------------------------------------------------
+# Fixtures
+# ---------------------------------------------------------------------------
+
+@pytest.fixture(autouse=True)
+def _reset_session_singleton():
+    """Reset the module-level singleton between tests."""
+    import timmy.session as mod
+    mod._agent = None
+    yield
+    mod._agent = None
+
+
+# ---------------------------------------------------------------------------
+# chat()
+# ---------------------------------------------------------------------------
+
+def test_chat_returns_string():
+    """chat() should return a plain string response."""
+    mock_agent = MagicMock()
+    mock_agent.run.return_value = MagicMock(content="Hello, sir.")
+
+    with patch("timmy.session._get_agent", return_value=mock_agent):
+        from timmy.session import chat
+        result = chat("Hi Timmy")
+
+    assert isinstance(result, str)
+    assert "Hello, sir." in result
+
+
+def test_chat_passes_session_id():
+    """chat() should pass the session_id to agent.run()."""
+    mock_agent = MagicMock()
+    mock_agent.run.return_value = MagicMock(content="OK.")
+
+    with patch("timmy.session._get_agent", return_value=mock_agent):
+        from timmy.session import chat
+        chat("test", session_id="my-session")
+
+    _, kwargs = mock_agent.run.call_args
+    assert kwargs["session_id"] == "my-session"
+
+
+def test_chat_uses_default_session_id():
+    """chat() should use 'dashboard' as the default session_id."""
+    mock_agent = MagicMock()
+    mock_agent.run.return_value = MagicMock(content="OK.")
+
+    with patch("timmy.session._get_agent", return_value=mock_agent):
+        from timmy.session import chat
+        chat("test")
+
+    _, kwargs = mock_agent.run.call_args
+    assert kwargs["session_id"] == "dashboard"
+
+
+def test_chat_singleton_agent_reused():
+    """Calling chat() multiple times should reuse the same agent instance."""
+    mock_agent = MagicMock()
+    mock_agent.run.return_value = MagicMock(content="OK.")
+
+    with patch("timmy.agent.create_timmy", return_value=mock_agent) as mock_factory:
+        from timmy.session import chat
+        chat("first message")
+        chat("second message")
+
+    # Factory called only once (singleton)
+    mock_factory.assert_called_once()
+
+
+def test_chat_extracts_user_name():
+    """chat() should extract user name from message and persist to memory."""
+    mock_agent = MagicMock()
+    mock_agent.run.return_value = MagicMock(content="Nice to meet you!")
+
+    mock_mem = MagicMock()
+
+    with patch("timmy.session._get_agent", return_value=mock_agent), \
+         patch("timmy.memory_system.memory_system", mock_mem):
+        from timmy.session import chat
+        chat("my name is Alex")
+
+    mock_mem.update_user_fact.assert_called_once_with("Name", "Alex")
+
+
+def test_chat_graceful_degradation_on_memory_failure():
+    """chat() should still work if the conversation manager raises."""
+    mock_agent = MagicMock()
+    mock_agent.run.return_value = MagicMock(content="I'm operational.")
+
+    with patch("timmy.session._get_agent", return_value=mock_agent), \
+         patch("timmy.conversation.conversation_manager") as mock_cm:
+        mock_cm.extract_user_name.side_effect = Exception("memory broken")
+
+        from timmy.session import chat
+        result = chat("test message")
+
+    assert "operational" in result
+
+
+# ---------------------------------------------------------------------------
+# _clean_response()
+# ---------------------------------------------------------------------------
+
+def test_clean_response_strips_json_tool_calls():
+    """JSON tool call blocks should be removed from response text."""
+    from timmy.session import _clean_response
+
+    dirty = 'Here is the answer. {"name": "python", "parameters": {"code": "0.15 * 3847.23", "variable_to_return": "result"}} The result is 577.'
+    clean = _clean_response(dirty)
+
+    assert '{"name"' not in clean
+    assert '"parameters"' not in clean
+    assert "The result is 577." in clean
+
+
+def test_clean_response_strips_function_calls():
+    """Function-call-style text should be removed."""
+    from timmy.session import _clean_response
+
+    dirty = 'I will search for that. memory_search(query="recall number") Found nothing.'
+    clean = _clean_response(dirty)
+
+    assert "memory_search(" not in clean
+    assert "Found nothing." in clean
+
+
+def test_clean_response_strips_chain_of_thought():
+    """Chain-of-thought narration lines should be removed."""
+    from timmy.session import _clean_response
+
+    dirty = """Since there's no direct answer in my vault or hot memory, I'll use memory_search.
+Using memory_search(query="what is special"), I found a context.
+Here's a possible response:
+77 is special because it's a prime number."""
+    clean = _clean_response(dirty)
+
+    assert "Since there's no" not in clean
+    assert "Here's a possible" not in clean
+    assert "77 is special" in clean
+
+
+def test_clean_response_preserves_normal_text():
+    """Normal text without tool artifacts should pass through unchanged."""
+    from timmy.session import _clean_response
+
+    normal = "The number 77 is the sum of the first seven primes: 2+3+5+7+11+13+17."
+    assert _clean_response(normal) == normal
+
+
+def test_clean_response_handles_empty_string():
+    """Empty string should be returned as-is."""
+    from timmy.session import _clean_response
+    assert _clean_response("") == ""
+
+
+def test_clean_response_handles_none():
+    """None should be returned as-is."""
+    from timmy.session import _clean_response
+    assert _clean_response(None) is None
+
+
+# ---------------------------------------------------------------------------
+# reset_session()
+# ---------------------------------------------------------------------------
+
+def test_reset_session_clears_context():
+    """reset_session() should clear the conversation context."""
+    with patch("timmy.conversation.conversation_manager") as mock_cm:
+        from timmy.session import reset_session
+        reset_session("test-session")
+
+    mock_cm.clear_context.assert_called_once_with("test-session")