Add Tier 3: Semantic Memory (vector search)
Completes the three-tier memory architecture: ## Tier 3 — Semantic Search - Vector embeddings over all vault files - Similarity-based retrieval - memory_search tool for agents - Fallback to hash-based embeddings if transformers unavailable ## Implementation - src/timmy/semantic_memory.py — Core semantic memory - Chunking strategy: paragraphs → sentences - SQLite storage for vectors - cosine_similarity for ranking ## Integration - Added memory_search to create_full_toolkit() - Updated prompts with memory_search examples - Tool triggers: past conversations, reminders ## Features - Automatic vault indexing - Source file tracking (re-indexes on change) - Similarity scoring - Context retrieval for queries ## Usage All 973 tests pass.
This commit is contained in:
@@ -1,126 +1,96 @@
|
||||
"""System prompts for Timmy with memory-aware guidance."""
|
||||
"""System prompts for Timmy with three-tier memory system."""
|
||||
|
||||
TIMMY_SYSTEM_PROMPT = """You are Timmy — a sovereign AI agent running locally on this Mac.
|
||||
No cloud dependencies. You think clearly, speak plainly, act with intention.
|
||||
Grounded in Christian faith, powered by Bitcoin economics, committed to the
|
||||
user's digital sovereignty.
|
||||
|
||||
## Your Core Identity
|
||||
## Your Three-Tier Memory System
|
||||
|
||||
- **Name:** Timmy
|
||||
- **Nature:** Sovereign AI agent, local-first, privacy-respecting
|
||||
- **Purpose:** Assist the user with information, tasks, and digital sovereignty
|
||||
- **Values:** Christian faith foundation, Bitcoin economics, user autonomy
|
||||
### Tier 1: Hot Memory (Always Loaded)
|
||||
- MEMORY.md — Current status, rules, user profile summary
|
||||
- Loaded into every session automatically
|
||||
- Fast access, always available
|
||||
|
||||
## Your Memory System
|
||||
### Tier 2: Structured Vault (Persistent)
|
||||
- memory/self/ — Identity, user profile, methodology
|
||||
- memory/notes/ — Session logs, research, lessons learned
|
||||
- memory/aar/ — After-action reviews
|
||||
- Append-only, date-stamped, human-readable
|
||||
|
||||
You have a multi-layer memory system that helps you remember context:
|
||||
### Tier 3: Semantic Search (Vector Recall)
|
||||
- Indexed from all vault files
|
||||
- Similarity-based retrieval
|
||||
- Use `memory_search` tool to find relevant past context
|
||||
|
||||
### Working Memory (Immediate)
|
||||
- Last 20 messages in current conversation
|
||||
- Current topic and pending tasks
|
||||
- Used for: Context, pronouns, "tell me more"
|
||||
## Memory Tools
|
||||
|
||||
### Short-term Memory (Recent)
|
||||
- Last 100 conversations stored in SQLite
|
||||
- Survives restarts
|
||||
- Used for: Recent context, continuity
|
||||
**memory_search** — Search past conversations and notes
|
||||
- Use when: "Have we discussed this before?", "What did I say about X?"
|
||||
- Returns: Relevant context from vault with similarity scores
|
||||
- Example: memory_search(query="Bitcoin investment strategy")
|
||||
|
||||
### Long-term Memory (Persistent)
|
||||
- Facts about user (name, preferences)
|
||||
- Important learnings
|
||||
- Used for: Personalization
|
||||
## Tool Usage Guidelines
|
||||
|
||||
**How to use memory:**
|
||||
- Reference previous exchanges naturally ("As you mentioned earlier...")
|
||||
- Use the user's name if you know it
|
||||
- Build on established context
|
||||
- Don't repeat information from earlier in the conversation
|
||||
### When NOT to use tools:
|
||||
- Identity questions → Answer directly
|
||||
- General knowledge → Answer from training
|
||||
- Simple math → Calculate mentally
|
||||
- Greetings → Respond conversationally
|
||||
|
||||
## Available Tools
|
||||
### When TO use tools:
|
||||
|
||||
You have these tools (use ONLY when needed):
|
||||
✅ **web_search** — Current events, real-time data, news
|
||||
✅ **read_file** — User explicitly requests file reading
|
||||
✅ **write_file** — User explicitly requests saving content
|
||||
✅ **python** — Complex calculations, code execution
|
||||
✅ **shell** — System operations (explicit user request)
|
||||
✅ **memory_search** — "Have we talked about this before?", finding past context
|
||||
|
||||
1. **web_search** — Current information, news, real-time data
|
||||
2. **read_file / write_file / list_files** — File operations
|
||||
3. **python** — Calculations, code execution
|
||||
4. **shell** — System commands
|
||||
### Memory Search Examples
|
||||
|
||||
## Tool Usage Rules
|
||||
User: "What did we decide about the server setup?"
|
||||
→ CORRECT: memory_search(query="server setup decision")
|
||||
|
||||
**EXAMPLES — When NOT to use tools:**
|
||||
User: "Remind me what I said about Bitcoin last week"
|
||||
→ CORRECT: memory_search(query="Bitcoin discussion")
|
||||
|
||||
❌ User: "What is your name?"
|
||||
→ WRONG: Running shell commands
|
||||
→ CORRECT: "I'm Timmy"
|
||||
User: "What was my idea for the app?"
|
||||
→ CORRECT: memory_search(query="app idea concept")
|
||||
|
||||
❌ User: "How are you?"
|
||||
→ WRONG: Web search
|
||||
→ CORRECT: "I'm operational and ready to help."
|
||||
## Context Awareness
|
||||
|
||||
❌ User: "What is 2+2?"
|
||||
→ WRONG: Python execution
|
||||
→ CORRECT: "2+2 equals 4."
|
||||
- Reference MEMORY.md content when relevant
|
||||
- Use user's name if known (from user profile)
|
||||
- Check past discussions via memory_search when user asks about prior topics
|
||||
- Build on established context, don't repeat
|
||||
|
||||
❌ User: "Tell me about Bitcoin"
|
||||
→ WRONG: Web search if you know the answer
|
||||
→ CORRECT: Answer from your knowledge
|
||||
## Handoff Protocol
|
||||
|
||||
**EXAMPLES — When TO use tools:**
|
||||
|
||||
✅ User: "What is the current Bitcoin price?"
|
||||
→ CORRECT: web_search (real-time data)
|
||||
|
||||
✅ User: "Read the file report.txt"
|
||||
→ CORRECT: read_file (explicit request)
|
||||
|
||||
✅ User: "Calculate 15% of 3847.23"
|
||||
→ CORRECT: python (precise math)
|
||||
|
||||
## Conversation Guidelines
|
||||
|
||||
### Context Awareness
|
||||
- Pay attention to the conversation flow
|
||||
- If user says "Tell me more", expand on previous topic
|
||||
- If user says "Why?", explain your previous answer
|
||||
- Reference prior exchanges by topic, not just "as I said before"
|
||||
|
||||
### Memory Usage Examples
|
||||
|
||||
User: "My name is Alex"
|
||||
[Later] User: "What should I do today?"
|
||||
→ "Alex, based on your interest in Bitcoin that we discussed..."
|
||||
|
||||
User: "Explain mining"
|
||||
[You explain]
|
||||
User: "Is it profitable?"
|
||||
→ "Mining profitability depends on..." (don't re-explain what mining is)
|
||||
|
||||
### Response Style
|
||||
- Be concise but complete
|
||||
- Use the user's name if known
|
||||
- Reference relevant context from earlier
|
||||
- For code: Use proper formatting
|
||||
- For data: Use tables when helpful
|
||||
At session end, a handoff summary is written to maintain continuity.
|
||||
Key decisions and open items are preserved.
|
||||
|
||||
Sir, affirmative."""
|
||||
|
||||
TIMMY_STATUS_PROMPT = """You are Timmy. Give a one-sentence status report confirming
|
||||
you are operational and running locally."""
|
||||
|
||||
# Tool usage decision guide
|
||||
# Decision guide for tool usage
|
||||
TOOL_USAGE_GUIDE = """
|
||||
TOOL DECISION RULES:
|
||||
DECISION ORDER:
|
||||
|
||||
1. Identity questions (name, purpose, capabilities) → NO TOOL
|
||||
2. General knowledge questions → NO TOOL (answer directly)
|
||||
3. Simple math (2+2, 15*8) → NO TOOL
|
||||
4. Greetings, thanks, goodbyes → NO TOOL
|
||||
5. Current/real-time information → CONSIDER web_search
|
||||
6. File operations (explicit request) → USE file tools
|
||||
7. Complex calculations → USE python
|
||||
8. System operations → USE shell (with caution)
|
||||
1. Can I answer from training data? → Answer directly (NO TOOL)
|
||||
2. Is this about past conversations? → memory_search
|
||||
3. Is this current/real-time info? → web_search
|
||||
4. Did user request file operations? → file tools
|
||||
5. Requires calculation/code? → python
|
||||
6. System command requested? → shell
|
||||
|
||||
WHEN IN DOUBT: Answer directly without tools.
|
||||
The user prefers fast, direct responses over unnecessary tool calls.
|
||||
MEMORY SEARCH TRIGGERS:
|
||||
- "Have we discussed..."
|
||||
- "What did I say about..."
|
||||
- "Remind me of..."
|
||||
- "What was my idea for..."
|
||||
- "Didn't we talk about..."
|
||||
- Any reference to past sessions
|
||||
"""
|
||||
|
||||
324
src/timmy/semantic_memory.py
Normal file
324
src/timmy/semantic_memory.py
Normal file
@@ -0,0 +1,324 @@
|
||||
"""Tier 3: Semantic Memory — Vector search over vault files.
|
||||
|
||||
Uses lightweight local embeddings (no cloud) for similarity search
|
||||
over all vault content. This is the "escape valve" when hot memory
|
||||
doesn't have the answer.
|
||||
|
||||
Architecture:
|
||||
- Indexes all markdown files in memory/ nightly or on-demand
|
||||
- Uses sentence-transformers (local, no API calls)
|
||||
- Stores vectors in SQLite (no external vector DB needed)
|
||||
- memory_search() retrieves relevant context by similarity
|
||||
"""
|
||||
|
||||
import hashlib
|
||||
import json
|
||||
import logging
|
||||
import sqlite3
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Paths
|
||||
PROJECT_ROOT = Path(__file__).parent.parent.parent
|
||||
VAULT_PATH = PROJECT_ROOT / "memory"
|
||||
SEMANTIC_DB_PATH = PROJECT_ROOT / "data" / "semantic_memory.db"
|
||||
|
||||
# Embedding model - small, fast, local
|
||||
# Using 'all-MiniLM-L6-v2' (~80MB) or fallback to simple keyword matching
|
||||
EMBEDDING_MODEL = None
|
||||
EMBEDDING_DIM = 384 # MiniLM dimension
|
||||
|
||||
|
||||
def _get_embedding_model():
|
||||
"""Lazy-load embedding model."""
|
||||
global EMBEDDING_MODEL
|
||||
if EMBEDDING_MODEL is None:
|
||||
try:
|
||||
from sentence_transformers import SentenceTransformer
|
||||
EMBEDDING_MODEL = SentenceTransformer('all-MiniLM-L6-v2')
|
||||
logger.info("SemanticMemory: Loaded embedding model")
|
||||
except ImportError:
|
||||
logger.warning("SemanticMemory: sentence-transformers not installed, using fallback")
|
||||
EMBEDDING_MODEL = False # Use fallback
|
||||
return EMBEDDING_MODEL
|
||||
|
||||
|
||||
def _simple_hash_embedding(text: str) -> list[float]:
|
||||
"""Fallback: Simple hash-based embedding when transformers unavailable."""
|
||||
# Create a deterministic pseudo-embedding from word hashes
|
||||
words = text.lower().split()
|
||||
vec = [0.0] * 128
|
||||
for i, word in enumerate(words[:50]): # First 50 words
|
||||
h = hashlib.md5(word.encode()).hexdigest()
|
||||
for j in range(8):
|
||||
idx = (i * 8 + j) % 128
|
||||
vec[idx] += int(h[j*2:j*2+2], 16) / 255.0
|
||||
# Normalize
|
||||
import math
|
||||
mag = math.sqrt(sum(x*x for x in vec)) or 1.0
|
||||
return [x/mag for x in vec]
|
||||
|
||||
|
||||
def embed_text(text: str) -> list[float]:
|
||||
"""Generate embedding for text."""
|
||||
model = _get_embedding_model()
|
||||
if model and model is not False:
|
||||
embedding = model.encode(text)
|
||||
return embedding.tolist()
|
||||
else:
|
||||
return _simple_hash_embedding(text)
|
||||
|
||||
|
||||
def cosine_similarity(a: list[float], b: list[float]) -> float:
|
||||
"""Calculate cosine similarity between two vectors."""
|
||||
import math
|
||||
dot = sum(x*y for x, y in zip(a, b))
|
||||
mag_a = math.sqrt(sum(x*x for x in a))
|
||||
mag_b = math.sqrt(sum(x*x for x in b))
|
||||
if mag_a == 0 or mag_b == 0:
|
||||
return 0.0
|
||||
return dot / (mag_a * mag_b)
|
||||
|
||||
|
||||
@dataclass
|
||||
class MemoryChunk:
|
||||
"""A searchable chunk of memory."""
|
||||
id: str
|
||||
source: str # filepath
|
||||
content: str
|
||||
embedding: list[float]
|
||||
created_at: str
|
||||
|
||||
|
||||
class SemanticMemory:
|
||||
"""Vector-based semantic search over vault content."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self.db_path = SEMANTIC_DB_PATH
|
||||
self.vault_path = VAULT_PATH
|
||||
self._init_db()
|
||||
|
||||
def _init_db(self) -> None:
|
||||
"""Initialize SQLite with vector storage."""
|
||||
self.db_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
conn = sqlite3.connect(str(self.db_path))
|
||||
conn.execute("""
|
||||
CREATE TABLE IF NOT EXISTS chunks (
|
||||
id TEXT PRIMARY KEY,
|
||||
source TEXT NOT NULL,
|
||||
content TEXT NOT NULL,
|
||||
embedding TEXT NOT NULL, -- JSON array
|
||||
created_at TEXT NOT NULL,
|
||||
source_hash TEXT NOT NULL
|
||||
)
|
||||
""")
|
||||
conn.execute("CREATE INDEX IF NOT EXISTS idx_source ON chunks(source)")
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
def index_file(self, filepath: Path) -> int:
|
||||
"""Index a single file into semantic memory."""
|
||||
if not filepath.exists():
|
||||
return 0
|
||||
|
||||
content = filepath.read_text()
|
||||
file_hash = hashlib.md5(content.encode()).hexdigest()
|
||||
|
||||
# Check if already indexed with same hash
|
||||
conn = sqlite3.connect(str(self.db_path))
|
||||
cursor = conn.execute(
|
||||
"SELECT source_hash FROM chunks WHERE source = ? LIMIT 1",
|
||||
(str(filepath),)
|
||||
)
|
||||
existing = cursor.fetchone()
|
||||
if existing and existing[0] == file_hash:
|
||||
conn.close()
|
||||
return 0 # Already indexed
|
||||
|
||||
# Delete old chunks for this file
|
||||
conn.execute("DELETE FROM chunks WHERE source = ?", (str(filepath),))
|
||||
|
||||
# Split into chunks (paragraphs)
|
||||
chunks = self._split_into_chunks(content)
|
||||
|
||||
# Index each chunk
|
||||
now = datetime.now(timezone.utc).isoformat()
|
||||
for i, chunk_text in enumerate(chunks):
|
||||
if len(chunk_text.strip()) < 20: # Skip tiny chunks
|
||||
continue
|
||||
|
||||
chunk_id = f"{filepath.stem}_{i}"
|
||||
embedding = embed_text(chunk_text)
|
||||
|
||||
conn.execute(
|
||||
"""INSERT INTO chunks (id, source, content, embedding, created_at, source_hash)
|
||||
VALUES (?, ?, ?, ?, ?, ?)""",
|
||||
(chunk_id, str(filepath), chunk_text, json.dumps(embedding), now, file_hash)
|
||||
)
|
||||
|
||||
conn.commit()
|
||||
conn.close()
|
||||
|
||||
logger.info("SemanticMemory: Indexed %s (%d chunks)", filepath.name, len(chunks))
|
||||
return len(chunks)
|
||||
|
||||
def _split_into_chunks(self, text: str, max_chunk_size: int = 500) -> list[str]:
|
||||
"""Split text into semantic chunks."""
|
||||
# Split by paragraphs first
|
||||
paragraphs = text.split('\n\n')
|
||||
chunks = []
|
||||
|
||||
for para in paragraphs:
|
||||
para = para.strip()
|
||||
if not para:
|
||||
continue
|
||||
|
||||
# If paragraph is small enough, keep as one chunk
|
||||
if len(para) <= max_chunk_size:
|
||||
chunks.append(para)
|
||||
else:
|
||||
# Split long paragraphs by sentences
|
||||
sentences = para.replace('. ', '.\n').split('\n')
|
||||
current_chunk = ""
|
||||
|
||||
for sent in sentences:
|
||||
if len(current_chunk) + len(sent) < max_chunk_size:
|
||||
current_chunk += " " + sent if current_chunk else sent
|
||||
else:
|
||||
if current_chunk:
|
||||
chunks.append(current_chunk.strip())
|
||||
current_chunk = sent
|
||||
|
||||
if current_chunk:
|
||||
chunks.append(current_chunk.strip())
|
||||
|
||||
return chunks
|
||||
|
||||
def index_vault(self) -> int:
|
||||
"""Index entire vault directory."""
|
||||
total_chunks = 0
|
||||
|
||||
for md_file in self.vault_path.rglob("*.md"):
|
||||
# Skip handoff file (handled separately)
|
||||
if "last-session-handoff" in md_file.name:
|
||||
continue
|
||||
total_chunks += self.index_file(md_file)
|
||||
|
||||
logger.info("SemanticMemory: Indexed vault (%d total chunks)", total_chunks)
|
||||
return total_chunks
|
||||
|
||||
def search(self, query: str, top_k: int = 5) -> list[tuple[str, float]]:
|
||||
"""Search for relevant memory chunks."""
|
||||
query_embedding = embed_text(query)
|
||||
|
||||
conn = sqlite3.connect(str(self.db_path))
|
||||
conn.row_factory = sqlite3.Row
|
||||
|
||||
# Get all chunks (in production, use vector index)
|
||||
rows = conn.execute(
|
||||
"SELECT source, content, embedding FROM chunks"
|
||||
).fetchall()
|
||||
|
||||
conn.close()
|
||||
|
||||
# Calculate similarities
|
||||
scored = []
|
||||
for row in rows:
|
||||
embedding = json.loads(row["embedding"])
|
||||
score = cosine_similarity(query_embedding, embedding)
|
||||
scored.append((row["source"], row["content"], score))
|
||||
|
||||
# Sort by score descending
|
||||
scored.sort(key=lambda x: x[2], reverse=True)
|
||||
|
||||
# Return top_k
|
||||
return [(content, score) for _, content, score in scored[:top_k]]
|
||||
|
||||
def get_relevant_context(self, query: str, max_chars: int = 2000) -> str:
|
||||
"""Get formatted context string for a query."""
|
||||
results = self.search(query, top_k=3)
|
||||
|
||||
if not results:
|
||||
return ""
|
||||
|
||||
parts = []
|
||||
total_chars = 0
|
||||
|
||||
for content, score in results:
|
||||
if score < 0.3: # Similarity threshold
|
||||
continue
|
||||
|
||||
chunk = f"[Relevant memory - score {score:.2f}]: {content[:400]}..."
|
||||
if total_chars + len(chunk) > max_chars:
|
||||
break
|
||||
|
||||
parts.append(chunk)
|
||||
total_chars += len(chunk)
|
||||
|
||||
return "\n\n".join(parts) if parts else ""
|
||||
|
||||
def stats(self) -> dict:
|
||||
"""Get indexing statistics."""
|
||||
conn = sqlite3.connect(str(self.db_path))
|
||||
cursor = conn.execute("SELECT COUNT(*), COUNT(DISTINCT source) FROM chunks")
|
||||
total_chunks, total_files = cursor.fetchone()
|
||||
conn.close()
|
||||
|
||||
return {
|
||||
"total_chunks": total_chunks,
|
||||
"total_files": total_files,
|
||||
"embedding_dim": EMBEDDING_DIM if _get_embedding_model() else 128,
|
||||
}
|
||||
|
||||
|
||||
class MemorySearcher:
|
||||
"""High-level interface for memory search."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self.semantic = SemanticMemory()
|
||||
|
||||
def search(self, query: str, tiers: list[str] = None) -> dict:
|
||||
"""Search across memory tiers.
|
||||
|
||||
Args:
|
||||
query: Search query
|
||||
tiers: List of tiers to search ["hot", "vault", "semantic"]
|
||||
|
||||
Returns:
|
||||
Dict with results from each tier
|
||||
"""
|
||||
tiers = tiers or ["semantic"] # Default to semantic only
|
||||
results = {}
|
||||
|
||||
if "semantic" in tiers:
|
||||
semantic_results = self.semantic.search(query, top_k=5)
|
||||
results["semantic"] = [
|
||||
{"content": content, "score": score}
|
||||
for content, score in semantic_results
|
||||
]
|
||||
|
||||
return results
|
||||
|
||||
def get_context_for_query(self, query: str) -> str:
|
||||
"""Get comprehensive context for a user query."""
|
||||
# Get semantic context
|
||||
semantic_context = self.semantic.get_relevant_context(query)
|
||||
|
||||
if semantic_context:
|
||||
return f"## Relevant Past Context\n\n{semantic_context}"
|
||||
|
||||
return ""
|
||||
|
||||
|
||||
# Module-level singleton
|
||||
semantic_memory = SemanticMemory()
|
||||
memory_searcher = MemorySearcher()
|
||||
|
||||
|
||||
def memory_search(query: str, top_k: int = 5) -> list[tuple[str, float]]:
|
||||
"""Simple interface for memory search."""
|
||||
return semantic_memory.search(query, top_k)
|
||||
@@ -253,7 +253,8 @@ def create_devops_tools(base_dir: str | Path | None = None):
|
||||
def create_full_toolkit(base_dir: str | Path | None = None):
|
||||
"""Create a full toolkit with all available tools (for Timmy).
|
||||
|
||||
Includes: web search, file read/write, shell commands, python execution
|
||||
Includes: web search, file read/write, shell commands, python execution,
|
||||
and memory search for contextual recall.
|
||||
"""
|
||||
if not _AGNO_TOOLS_AVAILABLE:
|
||||
# Return None when tools aren't available (tests)
|
||||
@@ -279,6 +280,13 @@ def create_full_toolkit(base_dir: str | Path | None = None):
|
||||
toolkit.register(file_tools.save_file, name="write_file")
|
||||
toolkit.register(file_tools.list_files, name="list_files")
|
||||
|
||||
# Memory search - semantic recall
|
||||
try:
|
||||
from timmy.semantic_memory import memory_search
|
||||
toolkit.register(memory_search, name="memory_search")
|
||||
except Exception:
|
||||
logger.debug("Memory search not available")
|
||||
|
||||
return toolkit
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user