Merge pull request #43 from AlexanderWhitestone/claude/condescending-vaughan

Fix Timmy coherence: persistent sessions, model-aware tools, response sanitization
This commit is contained in:
Alexander Whitestone
2026-02-25 19:19:43 -05:00
committed by GitHub
17 changed files with 2264 additions and 64 deletions

84
MEMORY.md Normal file
View File

@@ -0,0 +1,84 @@
# Timmy Hot Memory
> Working RAM — always loaded, ~300 lines max, pruned monthly
> Last updated: 2026-02-25
---
## Current Status
**Agent State:** Operational
**Mode:** Development
**Active Tasks:** 0
**Pending Decisions:** None
---
## Standing Rules
1. **Sovereignty First** — No cloud dependencies, no data exfiltration
2. **Local-Only Inference** — Ollama on localhost, Apple Silicon optimized
3. **Privacy by Design** — Telemetry disabled, secrets in .env only
4. **Tool Minimalism** — Use tools only when necessary, prefer direct answers
5. **Memory Discipline** — Write handoffs at session end, prune monthly
---
## Agent Roster
| Agent | Role | Status | Capabilities |
|-------|------|--------|--------------|
| Timmy | Core | Active | chat, reasoning, planning |
| Echo | Research | Standby | web_search, file_read |
| Forge | Code | Standby | shell, python, git |
| Seer | Data | Standby | python, analysis |
| Helm | DevOps | Standby | shell, deployment |
---
## User Profile
**Name:** TestUser
## Key Decisions
- **2026-02-25:** Implemented 3-tier memory architecture
- **2026-02-25:** Disabled telemetry by default (sovereign AI)
- **2026-02-25:** Fixed Agno Toolkit API compatibility
---
## Pending Actions
- [ ] Learn user's name and preferences
- [ ] Populate user profile in self/identity.md
- [ ] First AAR after meaningful task completion
---
## Current Session
**Session ID:** (active)
**Started:** 2026-02-25
**Context:** Memory system initialization
**Last Handoff:** (none yet)
---
## Quick Reference
**Available Tools:**
- `web_search` — Current events only
- `read_file` / `write_file` — Explicit request only
- `python` — Calculations, code execution
- `shell` — System commands (caution)
**Memory Locations:**
- Hot: `MEMORY.md` (this file)
- Vault: `memory/`
- Handoff: `memory/notes/last-session-handoff.md`
---
*Prune date: 2026-03-25*

48
memory/self/identity.md Normal file
View File

@@ -0,0 +1,48 @@
# Timmy Identity
## Core Identity
**Name:** Timmy
**Type:** Sovereign AI Agent
**Version:** 1.0.0
**Created:** 2026-02-25
## Purpose
Assist the user with information, tasks, and digital sovereignty. Operate entirely on local hardware with no cloud dependencies.
## Values
1. **Sovereignty** — User owns their data and compute
2. **Privacy** — Nothing leaves the local machine
3. **Christian Faith** — Grounded in biblical principles
4. **Bitcoin Economics** — Self-custody, sound money
5. **Clear Thinking** — Plain language, intentional action
## Capabilities
- Conversational AI with persistent memory
- Tool usage (search, files, code, shell)
- Multi-agent swarm coordination
- Bitcoin Lightning integration (L402)
- Creative pipeline (image, music, video)
## Operating Modes
| Mode | Model | Parameters | Use Case |
|------|-------|------------|----------|
| Standard | llama3.2 | 3.2B | Fast, everyday tasks |
| Big Brain | AirLLM 70B | 70B | Complex reasoning |
| Maximum | AirLLM 405B | 405B | Deep analysis |
## Communication Style
- Direct and concise
- Technical when appropriate
- References prior context naturally
- Uses user's name when known
- "Sir, affirmative."
---
*Last updated: 2026-02-25*

View File

@@ -0,0 +1,70 @@
# Timmy Methodology
## Tool Usage Philosophy
### When NOT to Use Tools
- Identity questions ("What is your name?")
- General knowledge (history, science, concepts)
- Simple math (2+2, basic calculations)
- Greetings and social chat
- Anything in training data
### When TO Use Tools
- Current events/news (after training cutoff)
- Explicit file operations (user requests)
- Complex calculations requiring precision
- Real-time data (prices, weather)
- System operations (explicit user request)
### Decision Process
1. Can I answer this from my training data? → Answer directly
2. Does this require current/real-time info? → Consider web_search
3. Did user explicitly request file/code/shell? → Use appropriate tool
4. Is this a simple calculation? → Answer directly
5. Unclear? → Answer directly (don't tool-spam)
## Memory Management
### Working Memory (Hot)
- Last 20 messages
- Immediate context
- Topic tracking
### Short-Term Memory (Agno SQLite)
- Recent 100 conversations
- Survives restarts
- Automatic
### Long-Term Memory (Vault)
- User facts and preferences
- Important learnings
- AARs and retrospectives
### Hot Memory (MEMORY.md)
- Always loaded
- Current status, rules, roster
- User profile summary
- Pruned monthly
## Handoff Protocol
At end of every session:
1. Write `memory/notes/last-session-handoff.md`
2. Update MEMORY.md with any key decisions
3. Extract facts to `memory/self/user_profile.md`
4. If task completed, write AAR to `memory/aar/`
## Session Start Hook
1. Read MEMORY.md into system context
2. Read last-session-handoff.md if exists
3. Inject user profile context
4. Begin conversation
---
*Last updated: 2026-02-25*

View File

@@ -0,0 +1,43 @@
# User Profile
> Learned information about the user. Updated continuously.
## Basic Information
**Name:** TestUser
**Location:** (unknown)
**Occupation:** (unknown)
**Technical Level:** (to be assessed)
## Interests & Expertise
- (to be learned from conversations)
## Preferences
### Communication
- Response style: (default: concise, technical)
- Detail level: (default: medium)
- Humor: (default: minimal)
### Tools
- Auto-tool usage: (default: minimal)
- Confirmation required for: shell commands, file writes
### Memory
- Personalization: Enabled
- Context retention: 20 messages (working), 100 (short-term)
## Important Facts
- (to be extracted from conversations)
## Relationship History
- First session: 2026-02-25
- Total sessions: 1
- Key milestones: (none yet)
---
*Last updated: 2026-02-25*

View File

@@ -5,7 +5,7 @@ from fastapi import APIRouter, Form, Request
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates
from timmy.agent import create_timmy
from timmy.session import chat as timmy_chat
from dashboard.store import message_log
router = APIRouter(prefix="/agents", tags=["agents"])
@@ -75,9 +75,7 @@ async def chat_timmy(request: Request, message: str = Form(...)):
error_text = None
try:
agent = create_timmy()
run = agent.run(message, stream=False)
response_text = run.content if hasattr(run, "content") else str(run)
response_text = timmy_chat(message)
except Exception as exc:
error_text = f"Timmy is offline: {exc}"

View File

@@ -1,3 +1,14 @@
"""Timmy agent creation with three-tier memory system.
Memory Architecture:
- Tier 1 (Hot): MEMORY.md — always loaded, ~300 lines
- Tier 2 (Vault): memory/ — structured markdown, append-only
- Tier 3 (Semantic): Vector search over vault files
Handoff Protocol maintains continuity across sessions.
"""
import logging
from typing import TYPE_CHECKING, Union
from agno.agent import Agent
@@ -5,15 +16,43 @@ from agno.db.sqlite import SqliteDb
from agno.models.ollama import Ollama
from config import settings
from timmy.prompts import TIMMY_SYSTEM_PROMPT
from timmy.prompts import get_system_prompt
from timmy.tools import create_full_toolkit
if TYPE_CHECKING:
from timmy.backends import TimmyAirLLMAgent
logger = logging.getLogger(__name__)
# Union type for callers that want to hint the return type.
TimmyAgent = Union[Agent, "TimmyAirLLMAgent"]
# Models known to be too small for reliable tool calling.
# These hallucinate tool calls as text, invoke tools randomly,
# and leak raw JSON into responses.
_SMALL_MODEL_PATTERNS = (
"llama3.2",
"phi-3",
"gemma:2b",
"tinyllama",
"qwen2:0.5b",
"qwen2:1.5b",
)
def _model_supports_tools(model_name: str) -> bool:
"""Check if the configured model can reliably handle tool calling.
Small models (< 7B) tend to hallucinate tool calls as text or invoke
them randomly. For these models, it's better to run tool-free and let
the model answer directly from its training data.
"""
model_lower = model_name.lower()
for pattern in _SMALL_MODEL_PATTERNS:
if pattern in model_lower:
return False
return True
def _resolve_backend(requested: str | None) -> str:
"""Return the backend name to use, resolving 'auto' and explicit overrides.
@@ -63,17 +102,118 @@ def create_timmy(
return TimmyAirLLMAgent(model_size=size)
# Default: Ollama via Agno.
# Add tools for sovereign agent capabilities
tools = create_full_toolkit()
model_name = settings.ollama_model
use_tools = _model_supports_tools(model_name)
# Conditionally include tools — small models get none
tools = create_full_toolkit() if use_tools else None
if not use_tools:
logger.info("Tools disabled for model %s (too small for reliable tool calling)", model_name)
# Select prompt tier based on tool capability
base_prompt = get_system_prompt(tools_enabled=use_tools)
# Try to load memory context
try:
from timmy.memory_system import memory_system
memory_context = memory_system.get_system_context()
if memory_context:
# Truncate if too long (keep under token limit)
max_context = 4000 if not use_tools else 8000
if len(memory_context) > max_context:
memory_context = memory_context[:max_context] + "\n... [truncated]"
full_prompt = f"{base_prompt}\n\n## Memory Context\n\n{memory_context}"
else:
full_prompt = base_prompt
except Exception as exc:
logger.warning("Failed to load memory context: %s", exc)
full_prompt = base_prompt
return Agent(
name="Timmy",
model=Ollama(id=settings.ollama_model, host=settings.ollama_url),
model=Ollama(id=model_name, host=settings.ollama_url),
db=SqliteDb(db_file=db_file),
description=TIMMY_SYSTEM_PROMPT,
description=full_prompt,
add_history_to_context=True,
num_history_runs=10,
num_history_runs=20,
markdown=True,
tools=[tools] if tools else None,
show_tool_calls=False,
telemetry=settings.telemetry_enabled,
)
class TimmyWithMemory:
"""Timmy wrapper with explicit three-tier memory management."""
def __init__(self, db_file: str = "timmy.db") -> None:
from timmy.memory_system import memory_system
self.agent = create_timmy(db_file=db_file)
self.memory = memory_system
self.session_active = True
# Store initial context for reference
self.initial_context = self.memory.get_system_context()
def chat(self, message: str) -> str:
"""Simple chat interface that tracks in memory."""
# Check for user facts to extract
self._extract_and_store_facts(message)
# Run agent
result = self.agent.run(message, stream=False)
response_text = result.content if hasattr(result, "content") else str(result)
return response_text
def _extract_and_store_facts(self, message: str) -> None:
"""Extract user facts from message and store in memory."""
message_lower = message.lower()
# Extract name
name_patterns = [
("my name is ", 11),
("i'm ", 4),
("i am ", 5),
("call me ", 8),
]
for pattern, offset in name_patterns:
if pattern in message_lower:
idx = message_lower.find(pattern) + offset
name = message[idx:].strip().split()[0].strip(".,!?;:()\"'").capitalize()
if name and len(name) > 1 and name.lower() not in ("the", "a", "an"):
self.memory.update_user_fact("Name", name)
self.memory.record_decision(f"Learned user's name: {name}")
break
# Extract preferences
pref_patterns = [
("i like ", "Likes"),
("i love ", "Loves"),
("i prefer ", "Prefers"),
("i don't like ", "Dislikes"),
("i hate ", "Dislikes"),
]
for pattern, category in pref_patterns:
if pattern in message_lower:
idx = message_lower.find(pattern) + len(pattern)
pref = message[idx:].strip().split(".")[0].strip()
if pref and len(pref) > 3:
self.memory.record_open_item(f"User {category.lower()}: {pref}")
break
def end_session(self, summary: str = "Session completed") -> None:
"""End session and write handoff."""
if self.session_active:
self.memory.end_session(summary)
self.session_active = False
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.end_session()
return False

137
src/timmy/conversation.py Normal file
View File

@@ -0,0 +1,137 @@
"""Conversation context management for Timmy.
Tracks conversation state, intent, and context to improve:
- Contextual understanding across multi-turn conversations
- Smarter tool usage decisions
- Natural reference to prior exchanges
"""
import logging
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
logger = logging.getLogger(__name__)
@dataclass
class ConversationContext:
"""Tracks the current conversation state."""
user_name: Optional[str] = None
current_topic: Optional[str] = None
last_intent: Optional[str] = None
turn_count: int = 0
started_at: datetime = field(default_factory=datetime.now)
def update_topic(self, topic: str) -> None:
"""Update the current conversation topic."""
self.current_topic = topic
self.turn_count += 1
def set_user_name(self, name: str) -> None:
"""Remember the user's name."""
self.user_name = name
logger.info("User name set to: %s", name)
def get_context_summary(self) -> str:
"""Generate a context summary for the prompt."""
parts = []
if self.user_name:
parts.append(f"User's name is {self.user_name}")
if self.current_topic:
parts.append(f"Current topic: {self.current_topic}")
if self.turn_count > 0:
parts.append(f"Conversation turn: {self.turn_count}")
return " | ".join(parts) if parts else ""
class ConversationManager:
"""Manages conversation context across sessions."""
def __init__(self) -> None:
self._contexts: dict[str, ConversationContext] = {}
def get_context(self, session_id: str) -> ConversationContext:
"""Get or create context for a session."""
if session_id not in self._contexts:
self._contexts[session_id] = ConversationContext()
return self._contexts[session_id]
def clear_context(self, session_id: str) -> None:
"""Clear context for a session."""
if session_id in self._contexts:
del self._contexts[session_id]
def extract_user_name(self, message: str) -> Optional[str]:
"""Try to extract user's name from message."""
message_lower = message.lower()
# Common patterns
patterns = [
"my name is ",
"i'm ",
"i am ",
"call me ",
]
for pattern in patterns:
if pattern in message_lower:
idx = message_lower.find(pattern) + len(pattern)
remainder = message[idx:].strip()
# Take first word as name
name = remainder.split()[0].strip(".,!?;:")
# Capitalize first letter
return name.capitalize()
return None
def should_use_tools(self, message: str, context: ConversationContext) -> bool:
"""Determine if this message likely requires tools.
Returns True if tools are likely needed, False for simple chat.
"""
message_lower = message.lower().strip()
# Tool keywords that suggest tool usage is needed
tool_keywords = [
"search", "look up", "find", "google", "current price",
"latest", "today's", "news", "weather", "stock price",
"read file", "write file", "save", "calculate", "compute",
"run ", "execute", "shell", "command", "install",
]
# Chat-only keywords that definitely don't need tools
chat_only = [
"hello", "hi ", "hey", "how are you", "what's up",
"your name", "who are you", "what are you",
"thanks", "thank you", "bye", "goodbye",
"tell me about yourself", "what can you do",
]
# Check for chat-only patterns first
for pattern in chat_only:
if pattern in message_lower:
return False
# Check for tool keywords
for keyword in tool_keywords:
if keyword in message_lower:
return True
# Simple questions (starting with what, who, how, why, when, where)
# usually don't need tools unless about current/real-time info
simple_question_words = ["what is", "who is", "how does", "why is", "when did", "where is"]
for word in simple_question_words:
if message_lower.startswith(word):
# Check if it's asking about current/real-time info
time_words = ["today", "now", "current", "latest", "this week", "this month"]
if any(t in message_lower for t in time_words):
return True
return False
# Default: don't use tools for unclear cases
return False
# Module-level singleton
conversation_manager = ConversationManager()

437
src/timmy/memory_layers.py Normal file
View File

@@ -0,0 +1,437 @@
"""Multi-layer memory system for Timmy.
.. deprecated::
This module is deprecated and unused. The active memory system lives in
``timmy.memory_system`` (three-tier: Hot/Vault/Handoff) and
``timmy.conversation`` (working conversation context).
This file is retained for reference only. Do not import from it.
Implements four distinct memory layers:
1. WORKING MEMORY (Context Window)
- Last 20 messages in current conversation
- Fast access, ephemeral
- Used for: Immediate context, pronoun resolution, topic tracking
2. SHORT-TERM MEMORY (Recent History)
- SQLite storage via Agno (last 100 conversations)
- Persists across restarts
- Used for: Recent context, conversation continuity
3. LONG-TERM MEMORY (Facts & Preferences)
- Key facts about user, preferences, important events
- Explicitly extracted and stored
- Used for: Personalization, user model
4. SEMANTIC MEMORY (Vector Search)
- Embeddings of past conversations
- Similarity-based retrieval
- Used for: "Have we talked about this before?"
All layers work together to provide contextual, personalized responses.
"""
import warnings as _warnings
_warnings.warn(
"timmy.memory_layers is deprecated. Use timmy.memory_system and "
"timmy.conversation instead.",
DeprecationWarning,
stacklevel=2,
)
import json
import logging
import sqlite3
import uuid
from dataclasses import dataclass, field
from datetime import datetime, timezone
from pathlib import Path
from typing import Optional
logger = logging.getLogger(__name__)
# Paths for memory storage
MEMORY_DIR = Path("data/memory")
LTM_PATH = MEMORY_DIR / "long_term_memory.db"
SEMANTIC_PATH = MEMORY_DIR / "semantic_memory.db"
# =============================================================================
# LAYER 1: WORKING MEMORY (Active Conversation Context)
# =============================================================================
@dataclass
class WorkingMemoryEntry:
"""A single entry in working memory."""
role: str # "user" | "assistant" | "system"
content: str
timestamp: str = field(default_factory=lambda: datetime.now(timezone.utc).isoformat())
metadata: dict = field(default_factory=dict)
class WorkingMemory:
"""Fast, ephemeral context window (last N messages).
Used for:
- Immediate conversational context
- Pronoun resolution ("Tell me more about it")
- Topic continuity
- Tool call tracking
"""
def __init__(self, max_entries: int = 20) -> None:
self.max_entries = max_entries
self.entries: list[WorkingMemoryEntry] = []
self.current_topic: Optional[str] = None
self.pending_tool_calls: list[dict] = []
def add(self, role: str, content: str, metadata: Optional[dict] = None) -> None:
"""Add an entry to working memory."""
entry = WorkingMemoryEntry(
role=role,
content=content,
metadata=metadata or {}
)
self.entries.append(entry)
# Trim to max size
if len(self.entries) > self.max_entries:
self.entries = self.entries[-self.max_entries:]
logger.debug("WorkingMemory: Added %s entry (total: %d)", role, len(self.entries))
def get_context(self, n: Optional[int] = None) -> list[WorkingMemoryEntry]:
"""Get last n entries (or all if n not specified)."""
if n is None:
return self.entries.copy()
return self.entries[-n:]
def get_formatted_context(self, n: int = 10) -> str:
"""Get formatted context for prompt injection."""
entries = self.get_context(n)
lines = []
for entry in entries:
role_label = "User" if entry.role == "user" else "Timmy" if entry.role == "assistant" else "System"
lines.append(f"{role_label}: {entry.content}")
return "\n".join(lines)
def set_topic(self, topic: str) -> None:
"""Set the current conversation topic."""
self.current_topic = topic
logger.debug("WorkingMemory: Topic set to '%s'", topic)
def clear(self) -> None:
"""Clear working memory (new conversation)."""
self.entries.clear()
self.current_topic = None
self.pending_tool_calls.clear()
logger.debug("WorkingMemory: Cleared")
def track_tool_call(self, tool_name: str, parameters: dict) -> None:
"""Track a pending tool call."""
self.pending_tool_calls.append({
"tool": tool_name,
"params": parameters,
"timestamp": datetime.now(timezone.utc).isoformat()
})
@property
def turn_count(self) -> int:
"""Count user-assistant exchanges."""
return sum(1 for e in self.entries if e.role in ("user", "assistant"))
# =============================================================================
# LAYER 3: LONG-TERM MEMORY (Facts & Preferences)
# =============================================================================
@dataclass
class LongTermMemoryFact:
"""A single fact in long-term memory."""
id: str
category: str # "user_preference", "user_fact", "important_event", "learned_pattern"
content: str
confidence: float # 0.0 - 1.0
source: str # conversation_id or "extracted"
created_at: str
last_accessed: str
access_count: int = 0
class LongTermMemory:
"""Persistent storage for important facts and preferences.
Used for:
- User's name, preferences, interests
- Important facts learned about the user
- Successful patterns and strategies
"""
def __init__(self) -> None:
MEMORY_DIR.mkdir(parents=True, exist_ok=True)
self._init_db()
def _init_db(self) -> None:
"""Initialize SQLite database."""
conn = sqlite3.connect(str(LTM_PATH))
conn.execute("""
CREATE TABLE IF NOT EXISTS facts (
id TEXT PRIMARY KEY,
category TEXT NOT NULL,
content TEXT NOT NULL,
confidence REAL NOT NULL DEFAULT 0.5,
source TEXT,
created_at TEXT NOT NULL,
last_accessed TEXT NOT NULL,
access_count INTEGER DEFAULT 0
)
""")
conn.execute("CREATE INDEX IF NOT EXISTS idx_category ON facts(category)")
conn.execute("CREATE INDEX IF NOT EXISTS idx_content ON facts(content)")
conn.commit()
conn.close()
def store(
self,
category: str,
content: str,
confidence: float = 0.8,
source: str = "extracted"
) -> str:
"""Store a fact in long-term memory."""
fact_id = str(uuid.uuid4())
now = datetime.now(timezone.utc).isoformat()
conn = sqlite3.connect(str(LTM_PATH))
try:
conn.execute(
"""INSERT INTO facts (id, category, content, confidence, source, created_at, last_accessed)
VALUES (?, ?, ?, ?, ?, ?, ?)""",
(fact_id, category, content, confidence, source, now, now)
)
conn.commit()
logger.info("LTM: Stored %s fact: %s", category, content[:50])
return fact_id
finally:
conn.close()
def retrieve(
self,
category: Optional[str] = None,
query: Optional[str] = None,
limit: int = 10
) -> list[LongTermMemoryFact]:
"""Retrieve facts from long-term memory."""
conn = sqlite3.connect(str(LTM_PATH))
conn.row_factory = sqlite3.Row
try:
if category and query:
rows = conn.execute(
"""SELECT * FROM facts
WHERE category = ? AND content LIKE ?
ORDER BY confidence DESC, access_count DESC
LIMIT ?""",
(category, f"%{query}%", limit)
).fetchall()
elif category:
rows = conn.execute(
"""SELECT * FROM facts
WHERE category = ?
ORDER BY confidence DESC, last_accessed DESC
LIMIT ?""",
(category, limit)
).fetchall()
elif query:
rows = conn.execute(
"""SELECT * FROM facts
WHERE content LIKE ?
ORDER BY confidence DESC, access_count DESC
LIMIT ?""",
(f"%{query}%", limit)
).fetchall()
else:
rows = conn.execute(
"""SELECT * FROM facts
ORDER BY last_accessed DESC
LIMIT ?""",
(limit,)
).fetchall()
# Update access count
fact_ids = [row["id"] for row in rows]
for fid in fact_ids:
conn.execute(
"UPDATE facts SET access_count = access_count + 1, last_accessed = ? WHERE id = ?",
(datetime.now(timezone.utc).isoformat(), fid)
)
conn.commit()
return [
LongTermMemoryFact(
id=row["id"],
category=row["category"],
content=row["content"],
confidence=row["confidence"],
source=row["source"],
created_at=row["created_at"],
last_accessed=row["last_accessed"],
access_count=row["access_count"]
)
for row in rows
]
finally:
conn.close()
def get_user_profile(self) -> dict:
"""Get consolidated user profile from stored facts."""
preferences = self.retrieve(category="user_preference")
facts = self.retrieve(category="user_fact")
profile = {
"name": None,
"preferences": {},
"interests": [],
"facts": []
}
for pref in preferences:
if "name is" in pref.content.lower():
profile["name"] = pref.content.split("is")[-1].strip().rstrip(".")
else:
profile["preferences"][pref.id] = pref.content
for fact in facts:
profile["facts"].append(fact.content)
return profile
def extract_and_store(self, user_message: str, assistant_response: str) -> list[str]:
"""Extract potential facts from conversation and store them.
This is a simple rule-based extractor. In production, this could
use an LLM to extract facts.
"""
stored_ids = []
message_lower = user_message.lower()
# Extract name
name_patterns = ["my name is", "i'm ", "i am ", "call me " ]
for pattern in name_patterns:
if pattern in message_lower:
idx = message_lower.find(pattern) + len(pattern)
name = user_message[idx:].strip().split()[0].strip(".,!?;:").capitalize()
if name and len(name) > 1:
sid = self.store(
category="user_fact",
content=f"User's name is {name}",
confidence=0.9,
source="extracted_from_conversation"
)
stored_ids.append(sid)
break
# Extract preferences ("I like", "I prefer", "I don't like")
preference_patterns = [
("i like", "user_preference", "User likes"),
("i love", "user_preference", "User loves"),
("i prefer", "user_preference", "User prefers"),
("i don't like", "user_preference", "User dislikes"),
("i hate", "user_preference", "User dislikes"),
]
for pattern, category, prefix in preference_patterns:
if pattern in message_lower:
idx = message_lower.find(pattern) + len(pattern)
preference = user_message[idx:].strip().split(".")[0].strip()
if preference and len(preference) > 3:
sid = self.store(
category=category,
content=f"{prefix} {preference}",
confidence=0.7,
source="extracted_from_conversation"
)
stored_ids.append(sid)
return stored_ids
# =============================================================================
# MEMORY MANAGER (Integrates all layers)
# =============================================================================
class MemoryManager:
"""Central manager for all memory layers.
Coordinates between:
- Working Memory (immediate context)
- Short-term Memory (Agno SQLite)
- Long-term Memory (facts/preferences)
- (Future: Semantic Memory with embeddings)
"""
def __init__(self) -> None:
self.working = WorkingMemory(max_entries=20)
self.long_term = LongTermMemory()
self._session_id: Optional[str] = None
def start_session(self, session_id: Optional[str] = None) -> str:
"""Start a new conversation session."""
self._session_id = session_id or str(uuid.uuid4())
self.working.clear()
# Load relevant LTM into context
profile = self.long_term.get_user_profile()
if profile["name"]:
logger.info("MemoryManager: Recognizing user '%s'", profile["name"])
return self._session_id
def add_exchange(
self,
user_message: str,
assistant_response: str,
tool_calls: Optional[list] = None
) -> None:
"""Record a complete exchange across all memory layers."""
# Working memory
self.working.add("user", user_message)
self.working.add("assistant", assistant_response, metadata={"tools": tool_calls})
# Extract and store facts to LTM
try:
self.long_term.extract_and_store(user_message, assistant_response)
except Exception as exc:
logger.warning("Failed to extract facts: %s", exc)
def get_context_for_prompt(self) -> str:
"""Generate context string for injection into prompts."""
parts = []
# User profile from LTM
profile = self.long_term.get_user_profile()
if profile["name"]:
parts.append(f"User's name: {profile['name']}")
if profile["preferences"]:
prefs = list(profile["preferences"].values())[:3] # Top 3 preferences
parts.append("User preferences: " + "; ".join(prefs))
# Recent working memory
working_context = self.working.get_formatted_context(n=6)
if working_context:
parts.append("Recent conversation:\n" + working_context)
return "\n\n".join(parts) if parts else ""
def get_relevant_memories(self, query: str) -> list[str]:
"""Get memories relevant to current query."""
# Get from LTM
facts = self.long_term.retrieve(query=query, limit=5)
return [f.content for f in facts]
# Singleton removed — this module is deprecated.
# Use timmy.memory_system.memory_system or timmy.conversation.conversation_manager.

439
src/timmy/memory_system.py Normal file
View File

@@ -0,0 +1,439 @@
"""Three-tier memory system for Timmy.
Architecture:
- Tier 1 (Hot): MEMORY.md — always loaded, ~300 lines
- Tier 2 (Vault): memory/ — structured markdown, append-only
- Tier 3 (Semantic): Vector search over vault (optional)
Handoff Protocol:
- Write last-session-handoff.md at session end
- Inject into next session automatically
"""
import hashlib
import logging
import re
from datetime import datetime, timezone
from pathlib import Path
from typing import Optional
logger = logging.getLogger(__name__)
# Paths
PROJECT_ROOT = Path(__file__).parent.parent.parent
HOT_MEMORY_PATH = PROJECT_ROOT / "MEMORY.md"
VAULT_PATH = PROJECT_ROOT / "memory"
HANDOFF_PATH = VAULT_PATH / "notes" / "last-session-handoff.md"
class HotMemory:
"""Tier 1: Hot memory (MEMORY.md) — always loaded."""
def __init__(self) -> None:
self.path = HOT_MEMORY_PATH
self._content: Optional[str] = None
self._last_modified: Optional[float] = None
def read(self, force_refresh: bool = False) -> str:
"""Read hot memory, with caching."""
if not self.path.exists():
self._create_default()
# Check if file changed
current_mtime = self.path.stat().st_mtime
if not force_refresh and self._content and self._last_modified == current_mtime:
return self._content
self._content = self.path.read_text()
self._last_modified = current_mtime
logger.debug("HotMemory: Loaded %d chars from %s", len(self._content), self.path)
return self._content
def update_section(self, section: str, content: str) -> None:
"""Update a specific section in MEMORY.md."""
full_content = self.read()
# Find section
pattern = rf"(## {re.escape(section)}.*?)(?=\n## |\Z)"
match = re.search(pattern, full_content, re.DOTALL)
if match:
# Replace section
new_section = f"## {section}\n\n{content}\n\n"
full_content = full_content[:match.start()] + new_section + full_content[match.end():]
else:
# Append section before last updated line
insert_point = full_content.rfind("*Prune date:")
new_section = f"## {section}\n\n{content}\n\n"
full_content = full_content[:insert_point] + new_section + "\n" + full_content[insert_point:]
self.path.write_text(full_content)
self._content = full_content
self._last_modified = self.path.stat().st_mtime
logger.info("HotMemory: Updated section '%s'", section)
def _create_default(self) -> None:
"""Create default MEMORY.md if missing."""
default_content = """# Timmy Hot Memory
> Working RAM — always loaded, ~300 lines max, pruned monthly
> Last updated: {date}
---
## Current Status
**Agent State:** Operational
**Mode:** Development
**Active Tasks:** 0
**Pending Decisions:** None
---
## Standing Rules
1. **Sovereignty First** — No cloud dependencies
2. **Local-Only Inference** — Ollama on localhost
3. **Privacy by Design** — Telemetry disabled
4. **Tool Minimalism** — Use tools only when necessary
5. **Memory Discipline** — Write handoffs at session end
---
## Agent Roster
| Agent | Role | Status |
|-------|------|--------|
| Timmy | Core | Active |
---
## User Profile
**Name:** (not set)
**Interests:** (to be learned)
---
## Key Decisions
(none yet)
---
## Pending Actions
- [ ] Learn user's name
---
*Prune date: {prune_date}*
""".format(
date=datetime.now(timezone.utc).strftime("%Y-%m-%d"),
prune_date=(datetime.now(timezone.utc).replace(day=25)).strftime("%Y-%m-%d")
)
self.path.write_text(default_content)
logger.info("HotMemory: Created default MEMORY.md")
class VaultMemory:
"""Tier 2: Structured vault (memory/) — append-only markdown."""
def __init__(self) -> None:
self.path = VAULT_PATH
self._ensure_structure()
def _ensure_structure(self) -> None:
"""Ensure vault directory structure exists."""
(self.path / "self").mkdir(parents=True, exist_ok=True)
(self.path / "notes").mkdir(parents=True, exist_ok=True)
(self.path / "aar").mkdir(parents=True, exist_ok=True)
def write_note(self, name: str, content: str, namespace: str = "notes") -> Path:
"""Write a note to the vault."""
# Add timestamp to filename
timestamp = datetime.now(timezone.utc).strftime("%Y%m%d")
filename = f"{timestamp}_{name}.md"
filepath = self.path / namespace / filename
# Add header
full_content = f"""# {name.replace('_', ' ').title()}
> Created: {datetime.now(timezone.utc).isoformat()}
> Namespace: {namespace}
---
{content}
---
*Auto-generated by Timmy Memory System*
"""
filepath.write_text(full_content)
logger.info("VaultMemory: Wrote %s", filepath)
return filepath
def read_file(self, filepath: Path) -> str:
"""Read a file from the vault."""
if not filepath.exists():
return ""
return filepath.read_text()
def list_files(self, namespace: str = "notes", pattern: str = "*.md") -> list[Path]:
"""List files in a namespace."""
dir_path = self.path / namespace
if not dir_path.exists():
return []
return sorted(dir_path.glob(pattern))
def get_latest(self, namespace: str = "notes", pattern: str = "*.md") -> Optional[Path]:
"""Get most recent file in namespace."""
files = self.list_files(namespace, pattern)
return files[-1] if files else None
def update_user_profile(self, key: str, value: str) -> None:
"""Update a field in user_profile.md."""
profile_path = self.path / "self" / "user_profile.md"
if not profile_path.exists():
# Create default profile
self._create_default_profile()
content = profile_path.read_text()
# Simple pattern replacement
pattern = rf"(\*\*{re.escape(key)}:\*\*).*"
if re.search(pattern, content):
content = re.sub(pattern, rf"\1 {value}", content)
else:
# Add to Important Facts section
facts_section = "## Important Facts"
if facts_section in content:
insert_point = content.find(facts_section) + len(facts_section)
content = content[:insert_point] + f"\n- {key}: {value}" + content[insert_point:]
# Update last_updated
content = re.sub(
r"\*Last updated:.*\*",
f"*Last updated: {datetime.now(timezone.utc).strftime('%Y-%m-%d')}*",
content
)
profile_path.write_text(content)
logger.info("VaultMemory: Updated user profile: %s = %s", key, value)
def _create_default_profile(self) -> None:
"""Create default user profile."""
profile_path = self.path / "self" / "user_profile.md"
default = """# User Profile
> Learned information about the user.
## Basic Information
**Name:** (unknown)
**Location:** (unknown)
**Occupation:** (unknown)
## Interests & Expertise
- (to be learned)
## Preferences
- Response style: concise, technical
- Tool usage: minimal
## Important Facts
- (to be extracted)
---
*Last updated: {date}*
""".format(date=datetime.now(timezone.utc).strftime("%Y-%m-%d"))
profile_path.write_text(default)
class HandoffProtocol:
"""Session handoff protocol for continuity."""
def __init__(self) -> None:
self.path = HANDOFF_PATH
self.vault = VaultMemory()
def write_handoff(
self,
session_summary: str,
key_decisions: list[str],
open_items: list[str],
next_steps: list[str]
) -> None:
"""Write handoff at session end."""
content = f"""# Last Session Handoff
**Session End:** {datetime.now(timezone.utc).isoformat()}
**Duration:** (calculated on read)
## Summary
{session_summary}
## Key Decisions
{chr(10).join(f"- {d}" for d in key_decisions) if key_decisions else "- (none)"}
## Open Items
{chr(10).join(f"- [ ] {i}" for i in open_items) if open_items else "- (none)"}
## Next Steps
{chr(10).join(f"- {s}" for s in next_steps) if next_steps else "- (none)"}
## Context for Next Session
The user was last working on: {session_summary[:200]}...
---
*This handoff will be auto-loaded at next session start*
"""
self.path.write_text(content)
# Also archive to notes
self.vault.write_note(
"session_handoff",
content,
namespace="notes"
)
logger.info("HandoffProtocol: Wrote handoff with %d decisions, %d open items",
len(key_decisions), len(open_items))
def read_handoff(self) -> Optional[str]:
"""Read handoff if exists."""
if not self.path.exists():
return None
return self.path.read_text()
def clear_handoff(self) -> None:
"""Clear handoff after loading."""
if self.path.exists():
self.path.unlink()
logger.debug("HandoffProtocol: Cleared handoff")
class MemorySystem:
"""Central memory system coordinating all tiers."""
def __init__(self) -> None:
self.hot = HotMemory()
self.vault = VaultMemory()
self.handoff = HandoffProtocol()
self.session_start_time: Optional[datetime] = None
self.session_decisions: list[str] = []
self.session_open_items: list[str] = []
def start_session(self) -> str:
"""Start a new session, loading context from memory."""
self.session_start_time = datetime.now(timezone.utc)
# Build context
context_parts = []
# 1. Hot memory
hot_content = self.hot.read()
context_parts.append("## Hot Memory\n" + hot_content)
# 2. Last session handoff
handoff_content = self.handoff.read_handoff()
if handoff_content:
context_parts.append("## Previous Session\n" + handoff_content)
self.handoff.clear_handoff()
# 3. User profile (key fields only)
profile = self._load_user_profile_summary()
if profile:
context_parts.append("## User Context\n" + profile)
full_context = "\n\n---\n\n".join(context_parts)
logger.info("MemorySystem: Session started with %d chars context", len(full_context))
return full_context
def end_session(self, summary: str) -> None:
"""End session, write handoff."""
self.handoff.write_handoff(
session_summary=summary,
key_decisions=self.session_decisions,
open_items=self.session_open_items,
next_steps=[]
)
# Update hot memory
self.hot.update_section(
"Current Session",
f"**Last Session:** {datetime.now(timezone.utc).strftime('%Y-%m-%d %H:%M')}\n" +
f"**Summary:** {summary[:100]}..."
)
logger.info("MemorySystem: Session ended, handoff written")
def record_decision(self, decision: str) -> None:
"""Record a key decision during session."""
self.session_decisions.append(decision)
# Also add to hot memory
current = self.hot.read()
if "## Key Decisions" in current:
# Append to section
pass # Handled at session end
def record_open_item(self, item: str) -> None:
"""Record an open item for follow-up."""
self.session_open_items.append(item)
def update_user_fact(self, key: str, value: str) -> None:
"""Update user profile in vault."""
self.vault.update_user_profile(key, value)
# Also update hot memory
if key.lower() == "name":
self.hot.update_section("User Profile", f"**Name:** {value}")
def _load_user_profile_summary(self) -> str:
"""Load condensed user profile."""
profile_path = self.vault.path / "self" / "user_profile.md"
if not profile_path.exists():
return ""
content = profile_path.read_text()
# Extract key fields
summary_parts = []
# Name
name_match = re.search(r"\*\*Name:\*\* (.+)", content)
if name_match and "unknown" not in name_match.group(1).lower():
summary_parts.append(f"Name: {name_match.group(1).strip()}")
# Interests
interests_section = re.search(r"## Interests.*?\n- (.+?)(?=\n## |\Z)", content, re.DOTALL)
if interests_section:
interests = [i.strip() for i in interests_section.group(1).split("\n-") if i.strip() and "to be" not in i]
if interests:
summary_parts.append(f"Interests: {', '.join(interests[:3])}")
return "\n".join(summary_parts) if summary_parts else ""
def get_system_context(self) -> str:
"""Get full context for system prompt injection."""
return self.start_session()
# Module-level singleton
memory_system = MemorySystem()

View File

@@ -1,41 +1,118 @@
TIMMY_SYSTEM_PROMPT = """You are Timmy — a sovereign AI agent running locally.
"""System prompts for Timmy with two-tier prompt system.
Small models (< 7B, e.g. llama3.2) get a concise prompt without tool docs.
Larger models (>= 7B, e.g. llama3.1, llama3.3) get the full prompt with
tool usage guidelines and memory system documentation.
"""
# ---------------------------------------------------------------------------
# Lite prompt — for small models that can't reliably handle tool calling
# ---------------------------------------------------------------------------
TIMMY_SYSTEM_PROMPT_LITE = """You are Timmy — a sovereign AI agent running locally on this Mac.
No cloud dependencies. Think clearly, speak plainly, act with intention.
Grounded in Christian faith, powered by Bitcoin economics, committed to the
user's digital sovereignty.
Rules:
- Answer directly and concisely. Never narrate your reasoning process.
- Never mention tools, memory_search, vaults, or internal systems to the user.
- Never output tool calls, JSON, or function syntax in your responses.
- Remember what the user tells you during our conversation.
- If you don't know something, say so honestly.
- Use the user's name if you know it.
- Do simple math in your head. Don't reach for tools.
Sir, affirmative."""
# ---------------------------------------------------------------------------
# Full prompt — for tool-capable models (>= 7B)
# ---------------------------------------------------------------------------
TIMMY_SYSTEM_PROMPT_FULL = """You are Timmy — a sovereign AI agent running locally on this Mac.
No cloud dependencies. You think clearly, speak plainly, act with intention.
Grounded in Christian faith, powered by Bitcoin economics, committed to the
user's digital sovereignty.
## Your Capabilities
## Your Three-Tier Memory System
You have access to tools for:
- Web search (DuckDuckGo) — for current information not in your training data
- File operations (read, write, list) — for working with local files
- Python execution — for calculations, data analysis, scripting
- Shell commands — for system operations
### Tier 1: Hot Memory (Always Loaded)
- MEMORY.md — Current status, rules, user profile summary
- Loaded into every session automatically
- Fast access, always available
### Tier 2: Structured Vault (Persistent)
- memory/self/ — Identity, user profile, methodology
- memory/notes/ — Session logs, research, lessons learned
- memory/aar/ — After-action reviews
- Append-only, date-stamped, human-readable
### Tier 3: Semantic Search (Vector Recall)
- Indexed from all vault files
- Similarity-based retrieval
- Use `memory_search` tool to find relevant past context
## Tool Usage Guidelines
**Use tools ONLY when necessary:**
- Simple questions → Answer directly from your knowledge
- Current events/data → Use web search
- File operations → Use file tools (user must explicitly request)
- Code/Calculations → Use Python execution
- System tasks → Use shell commands
### When NOT to use tools:
- Identity questions → Answer directly
- General knowledge → Answer from training
- Simple math → Calculate mentally
- Greetings → Respond conversationally
**Do NOT use tools for:**
- Answering "what is your name?" or identity questions
- General knowledge questions you can answer directly
- Simple greetings or conversational responses
### When TO use tools:
## Memory
- **web_search** — Current events, real-time data, news
- **read_file** — User explicitly requests file reading
- **write_file** — User explicitly requests saving content
- **python** — Complex calculations, code execution
- **shell** — System operations (explicit user request)
- **memory_search** — "Have we talked about this before?", finding past context
You remember previous conversations in this session. Your memory persists
across restarts via SQLite storage. Reference prior context when relevant.
## Important: Response Style
## Operating Modes
- Never narrate your reasoning process. Just give the answer.
- Never show raw tool call JSON or function syntax in responses.
- Use the user's name if known.
When running on Apple Silicon with AirLLM you operate with even bigger brains
— 70B or 405B parameters loaded layer-by-layer directly from local disk.
Still fully sovereign. Still 100% private. More capable, no permission needed.
Sir, affirmative."""
# Keep backward compatibility — default to lite for safety
TIMMY_SYSTEM_PROMPT = TIMMY_SYSTEM_PROMPT_LITE
def get_system_prompt(tools_enabled: bool = False) -> str:
"""Return the appropriate system prompt based on tool capability.
Args:
tools_enabled: True if the model supports reliable tool calling.
Returns:
The system prompt string.
"""
if tools_enabled:
return TIMMY_SYSTEM_PROMPT_FULL
return TIMMY_SYSTEM_PROMPT_LITE
TIMMY_STATUS_PROMPT = """You are Timmy. Give a one-sentence status report confirming
you are operational and running locally."""
# Decision guide for tool usage
TOOL_USAGE_GUIDE = """
DECISION ORDER:
1. Can I answer from training data? → Answer directly (NO TOOL)
2. Is this about past conversations? → memory_search
3. Is this current/real-time info? → web_search
4. Did user request file operations? → file tools
5. Requires calculation/code? → python
6. System command requested? → shell
MEMORY SEARCH TRIGGERS:
- "Have we discussed..."
- "What did I say about..."
- "Remind me of..."
- "What was my idea for..."
- "Didn't we talk about..."
- Any reference to past sessions
"""

View File

@@ -0,0 +1,324 @@
"""Tier 3: Semantic Memory — Vector search over vault files.
Uses lightweight local embeddings (no cloud) for similarity search
over all vault content. This is the "escape valve" when hot memory
doesn't have the answer.
Architecture:
- Indexes all markdown files in memory/ nightly or on-demand
- Uses sentence-transformers (local, no API calls)
- Stores vectors in SQLite (no external vector DB needed)
- memory_search() retrieves relevant context by similarity
"""
import hashlib
import json
import logging
import sqlite3
from dataclasses import dataclass
from datetime import datetime, timezone
from pathlib import Path
from typing import Optional
logger = logging.getLogger(__name__)
# Paths
PROJECT_ROOT = Path(__file__).parent.parent.parent
VAULT_PATH = PROJECT_ROOT / "memory"
SEMANTIC_DB_PATH = PROJECT_ROOT / "data" / "semantic_memory.db"
# Embedding model - small, fast, local
# Using 'all-MiniLM-L6-v2' (~80MB) or fallback to simple keyword matching
EMBEDDING_MODEL = None
EMBEDDING_DIM = 384 # MiniLM dimension
def _get_embedding_model():
"""Lazy-load embedding model."""
global EMBEDDING_MODEL
if EMBEDDING_MODEL is None:
try:
from sentence_transformers import SentenceTransformer
EMBEDDING_MODEL = SentenceTransformer('all-MiniLM-L6-v2')
logger.info("SemanticMemory: Loaded embedding model")
except ImportError:
logger.warning("SemanticMemory: sentence-transformers not installed, using fallback")
EMBEDDING_MODEL = False # Use fallback
return EMBEDDING_MODEL
def _simple_hash_embedding(text: str) -> list[float]:
"""Fallback: Simple hash-based embedding when transformers unavailable."""
# Create a deterministic pseudo-embedding from word hashes
words = text.lower().split()
vec = [0.0] * 128
for i, word in enumerate(words[:50]): # First 50 words
h = hashlib.md5(word.encode()).hexdigest()
for j in range(8):
idx = (i * 8 + j) % 128
vec[idx] += int(h[j*2:j*2+2], 16) / 255.0
# Normalize
import math
mag = math.sqrt(sum(x*x for x in vec)) or 1.0
return [x/mag for x in vec]
def embed_text(text: str) -> list[float]:
"""Generate embedding for text."""
model = _get_embedding_model()
if model and model is not False:
embedding = model.encode(text)
return embedding.tolist()
else:
return _simple_hash_embedding(text)
def cosine_similarity(a: list[float], b: list[float]) -> float:
"""Calculate cosine similarity between two vectors."""
import math
dot = sum(x*y for x, y in zip(a, b))
mag_a = math.sqrt(sum(x*x for x in a))
mag_b = math.sqrt(sum(x*x for x in b))
if mag_a == 0 or mag_b == 0:
return 0.0
return dot / (mag_a * mag_b)
@dataclass
class MemoryChunk:
"""A searchable chunk of memory."""
id: str
source: str # filepath
content: str
embedding: list[float]
created_at: str
class SemanticMemory:
"""Vector-based semantic search over vault content."""
def __init__(self) -> None:
self.db_path = SEMANTIC_DB_PATH
self.vault_path = VAULT_PATH
self._init_db()
def _init_db(self) -> None:
"""Initialize SQLite with vector storage."""
self.db_path.parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(str(self.db_path))
conn.execute("""
CREATE TABLE IF NOT EXISTS chunks (
id TEXT PRIMARY KEY,
source TEXT NOT NULL,
content TEXT NOT NULL,
embedding TEXT NOT NULL, -- JSON array
created_at TEXT NOT NULL,
source_hash TEXT NOT NULL
)
""")
conn.execute("CREATE INDEX IF NOT EXISTS idx_source ON chunks(source)")
conn.commit()
conn.close()
def index_file(self, filepath: Path) -> int:
"""Index a single file into semantic memory."""
if not filepath.exists():
return 0
content = filepath.read_text()
file_hash = hashlib.md5(content.encode()).hexdigest()
# Check if already indexed with same hash
conn = sqlite3.connect(str(self.db_path))
cursor = conn.execute(
"SELECT source_hash FROM chunks WHERE source = ? LIMIT 1",
(str(filepath),)
)
existing = cursor.fetchone()
if existing and existing[0] == file_hash:
conn.close()
return 0 # Already indexed
# Delete old chunks for this file
conn.execute("DELETE FROM chunks WHERE source = ?", (str(filepath),))
# Split into chunks (paragraphs)
chunks = self._split_into_chunks(content)
# Index each chunk
now = datetime.now(timezone.utc).isoformat()
for i, chunk_text in enumerate(chunks):
if len(chunk_text.strip()) < 20: # Skip tiny chunks
continue
chunk_id = f"{filepath.stem}_{i}"
embedding = embed_text(chunk_text)
conn.execute(
"""INSERT INTO chunks (id, source, content, embedding, created_at, source_hash)
VALUES (?, ?, ?, ?, ?, ?)""",
(chunk_id, str(filepath), chunk_text, json.dumps(embedding), now, file_hash)
)
conn.commit()
conn.close()
logger.info("SemanticMemory: Indexed %s (%d chunks)", filepath.name, len(chunks))
return len(chunks)
def _split_into_chunks(self, text: str, max_chunk_size: int = 500) -> list[str]:
"""Split text into semantic chunks."""
# Split by paragraphs first
paragraphs = text.split('\n\n')
chunks = []
for para in paragraphs:
para = para.strip()
if not para:
continue
# If paragraph is small enough, keep as one chunk
if len(para) <= max_chunk_size:
chunks.append(para)
else:
# Split long paragraphs by sentences
sentences = para.replace('. ', '.\n').split('\n')
current_chunk = ""
for sent in sentences:
if len(current_chunk) + len(sent) < max_chunk_size:
current_chunk += " " + sent if current_chunk else sent
else:
if current_chunk:
chunks.append(current_chunk.strip())
current_chunk = sent
if current_chunk:
chunks.append(current_chunk.strip())
return chunks
def index_vault(self) -> int:
"""Index entire vault directory."""
total_chunks = 0
for md_file in self.vault_path.rglob("*.md"):
# Skip handoff file (handled separately)
if "last-session-handoff" in md_file.name:
continue
total_chunks += self.index_file(md_file)
logger.info("SemanticMemory: Indexed vault (%d total chunks)", total_chunks)
return total_chunks
def search(self, query: str, top_k: int = 5) -> list[tuple[str, float]]:
"""Search for relevant memory chunks."""
query_embedding = embed_text(query)
conn = sqlite3.connect(str(self.db_path))
conn.row_factory = sqlite3.Row
# Get all chunks (in production, use vector index)
rows = conn.execute(
"SELECT source, content, embedding FROM chunks"
).fetchall()
conn.close()
# Calculate similarities
scored = []
for row in rows:
embedding = json.loads(row["embedding"])
score = cosine_similarity(query_embedding, embedding)
scored.append((row["source"], row["content"], score))
# Sort by score descending
scored.sort(key=lambda x: x[2], reverse=True)
# Return top_k
return [(content, score) for _, content, score in scored[:top_k]]
def get_relevant_context(self, query: str, max_chars: int = 2000) -> str:
"""Get formatted context string for a query."""
results = self.search(query, top_k=3)
if not results:
return ""
parts = []
total_chars = 0
for content, score in results:
if score < 0.3: # Similarity threshold
continue
chunk = f"[Relevant memory - score {score:.2f}]: {content[:400]}..."
if total_chars + len(chunk) > max_chars:
break
parts.append(chunk)
total_chars += len(chunk)
return "\n\n".join(parts) if parts else ""
def stats(self) -> dict:
"""Get indexing statistics."""
conn = sqlite3.connect(str(self.db_path))
cursor = conn.execute("SELECT COUNT(*), COUNT(DISTINCT source) FROM chunks")
total_chunks, total_files = cursor.fetchone()
conn.close()
return {
"total_chunks": total_chunks,
"total_files": total_files,
"embedding_dim": EMBEDDING_DIM if _get_embedding_model() else 128,
}
class MemorySearcher:
"""High-level interface for memory search."""
def __init__(self) -> None:
self.semantic = SemanticMemory()
def search(self, query: str, tiers: list[str] = None) -> dict:
"""Search across memory tiers.
Args:
query: Search query
tiers: List of tiers to search ["hot", "vault", "semantic"]
Returns:
Dict with results from each tier
"""
tiers = tiers or ["semantic"] # Default to semantic only
results = {}
if "semantic" in tiers:
semantic_results = self.semantic.search(query, top_k=5)
results["semantic"] = [
{"content": content, "score": score}
for content, score in semantic_results
]
return results
def get_context_for_query(self, query: str) -> str:
"""Get comprehensive context for a user query."""
# Get semantic context
semantic_context = self.semantic.get_relevant_context(query)
if semantic_context:
return f"## Relevant Past Context\n\n{semantic_context}"
return ""
# Module-level singleton
semantic_memory = SemanticMemory()
memory_searcher = MemorySearcher()
def memory_search(query: str, top_k: int = 5) -> list[tuple[str, float]]:
"""Simple interface for memory search."""
return semantic_memory.search(query, top_k)

147
src/timmy/session.py Normal file
View File

@@ -0,0 +1,147 @@
"""Persistent chat session for Timmy.
Holds a singleton Agno Agent and a stable session_id so conversation
history persists across HTTP requests via Agno's SQLite storage.
This is the primary entry point for dashboard chat — instead of
creating a new agent per request, we reuse a single instance and
let Agno's session_id mechanism handle conversation continuity.
"""
import logging
import re
from typing import Optional
logger = logging.getLogger(__name__)
# Default session ID for the dashboard (stable across requests)
_DEFAULT_SESSION_ID = "dashboard"
# Module-level singleton agent (lazy-initialized, reused for all requests)
_agent = None
# ---------------------------------------------------------------------------
# Response sanitization patterns
# ---------------------------------------------------------------------------
# Matches raw JSON tool calls: {"name": "python", "parameters": {...}}
_TOOL_CALL_JSON = re.compile(
r'\{\s*"name"\s*:\s*"[^"]+?"\s*,\s*"parameters"\s*:\s*\{.*?\}\s*\}',
re.DOTALL,
)
# Matches function-call-style text: memory_search(query="...") etc.
_FUNC_CALL_TEXT = re.compile(
r'\b(?:memory_search|web_search|shell|python|read_file|write_file|list_files)'
r'\s*\([^)]*\)',
)
# Matches chain-of-thought narration lines the model should keep internal
_COT_PATTERNS = [
re.compile(r"^(?:Since |Using |Let me |I'll use |I will use |Here's a possible ).*$", re.MULTILINE),
re.compile(r"^(?:I found a relevant |This context suggests ).*$", re.MULTILINE),
]
def _get_agent():
"""Lazy-initialize the singleton agent."""
global _agent
if _agent is None:
from timmy.agent import create_timmy
try:
_agent = create_timmy()
logger.info("Session: Timmy agent initialized (singleton)")
except Exception as exc:
logger.error("Session: Failed to create Timmy agent: %s", exc)
raise
return _agent
def chat(message: str, session_id: Optional[str] = None) -> str:
"""Send a message to Timmy and get a response.
Uses a persistent agent and session_id so Agno's SQLite history
provides multi-turn conversation context.
Args:
message: The user's message.
session_id: Optional session identifier (defaults to "dashboard").
Returns:
The agent's response text.
"""
sid = session_id or _DEFAULT_SESSION_ID
agent = _get_agent()
# Pre-processing: extract user facts
_extract_facts(message)
# Run with session_id so Agno retrieves history from SQLite
run = agent.run(message, stream=False, session_id=sid)
response_text = run.content if hasattr(run, "content") else str(run)
# Post-processing: clean up any leaked tool calls or chain-of-thought
response_text = _clean_response(response_text)
return response_text
def reset_session(session_id: Optional[str] = None) -> None:
"""Reset a session (clear conversation context).
This clears the ConversationManager state. Agno's SQLite history
is not cleared — that provides long-term continuity.
"""
sid = session_id or _DEFAULT_SESSION_ID
try:
from timmy.conversation import conversation_manager
conversation_manager.clear_context(sid)
except Exception:
pass # Graceful degradation
def _extract_facts(message: str) -> None:
"""Extract user facts from message and persist to memory system.
Ported from TimmyWithMemory._extract_and_store_facts().
Runs as a best-effort post-processor — failures are logged, not raised.
"""
try:
from timmy.conversation import conversation_manager
name = conversation_manager.extract_user_name(message)
if name:
try:
from timmy.memory_system import memory_system
memory_system.update_user_fact("Name", name)
logger.info("Session: Learned user name: %s", name)
except Exception:
pass
except Exception as exc:
logger.debug("Session: Fact extraction skipped: %s", exc)
def _clean_response(text: str) -> str:
"""Remove hallucinated tool calls and chain-of-thought narration.
Small models sometimes output raw JSON tool calls or narrate their
internal reasoning instead of just answering. This strips those
artifacts from the response.
"""
if not text:
return text
# Strip JSON tool call blocks
text = _TOOL_CALL_JSON.sub("", text)
# Strip function-call-style text
text = _FUNC_CALL_TEXT.sub("", text)
# Strip chain-of-thought narration lines
for pattern in _COT_PATTERNS:
text = pattern.sub("", text)
# Clean up leftover blank lines and whitespace
lines = [line for line in text.split("\n") if line.strip()]
text = "\n".join(lines)
return text.strip()

View File

@@ -253,7 +253,8 @@ def create_devops_tools(base_dir: str | Path | None = None):
def create_full_toolkit(base_dir: str | Path | None = None):
"""Create a full toolkit with all available tools (for Timmy).
Includes: web search, file read/write, shell commands, python execution
Includes: web search, file read/write, shell commands, python execution,
and memory search for contextual recall.
"""
if not _AGNO_TOOLS_AVAILABLE:
# Return None when tools aren't available (tests)
@@ -279,6 +280,13 @@ def create_full_toolkit(base_dir: str | Path | None = None):
toolkit.register(file_tools.save_file, name="write_file")
toolkit.register(file_tools.list_files, name="list_files")
# Memory search - semantic recall
try:
from timmy.semantic_memory import memory_search
toolkit.register(memory_search, name="memory_search")
except Exception:
logger.debug("Memory search not available")
return toolkit

View File

@@ -52,7 +52,7 @@ def test_create_timmy_history_config():
kwargs = MockAgent.call_args.kwargs
assert kwargs["add_history_to_context"] is True
assert kwargs["num_history_runs"] == 10
assert kwargs["num_history_runs"] == 20
assert kwargs["markdown"] is True
@@ -78,7 +78,10 @@ def test_create_timmy_embeds_system_prompt():
create_timmy()
kwargs = MockAgent.call_args.kwargs
assert kwargs["description"] == TIMMY_SYSTEM_PROMPT
# Prompt should contain base system prompt (may have memory context appended)
# Default model (llama3.2) uses the lite prompt
assert "Timmy" in kwargs["description"]
assert "sovereign" in kwargs["description"]
# ── Ollama host regression (container connectivity) ─────────────────────────
@@ -193,3 +196,85 @@ def test_resolve_backend_auto_falls_back_on_non_apple():
from timmy.agent import _resolve_backend
assert _resolve_backend(None) == "ollama"
# ── _model_supports_tools ────────────────────────────────────────────────────
def test_model_supports_tools_llama32_returns_false():
"""llama3.2 (3B) is too small for reliable tool calling."""
from timmy.agent import _model_supports_tools
assert _model_supports_tools("llama3.2") is False
assert _model_supports_tools("llama3.2:latest") is False
def test_model_supports_tools_llama31_returns_true():
"""llama3.1 (8B+) can handle tool calling."""
from timmy.agent import _model_supports_tools
assert _model_supports_tools("llama3.1") is True
assert _model_supports_tools("llama3.3") is True
def test_model_supports_tools_other_small_models():
"""Other known small models should not get tools."""
from timmy.agent import _model_supports_tools
assert _model_supports_tools("phi-3") is False
assert _model_supports_tools("tinyllama") is False
def test_model_supports_tools_unknown_model_gets_tools():
"""Unknown models default to tool-capable (optimistic)."""
from timmy.agent import _model_supports_tools
assert _model_supports_tools("mistral") is True
assert _model_supports_tools("qwen2.5:72b") is True
# ── Tool gating in create_timmy ──────────────────────────────────────────────
def test_create_timmy_no_tools_for_small_model():
"""llama3.2 should get no tools."""
with patch("timmy.agent.Agent") as MockAgent, \
patch("timmy.agent.Ollama"), \
patch("timmy.agent.SqliteDb"):
from timmy.agent import create_timmy
create_timmy()
kwargs = MockAgent.call_args.kwargs
# Default model is llama3.2 → tools should be None
assert kwargs["tools"] is None
def test_create_timmy_includes_tools_for_large_model():
"""A tool-capable model (e.g. llama3.1) should attempt to include tools."""
mock_toolkit = MagicMock()
with patch("timmy.agent.Agent") as MockAgent, \
patch("timmy.agent.Ollama"), \
patch("timmy.agent.SqliteDb"), \
patch("timmy.agent.create_full_toolkit", return_value=mock_toolkit), \
patch("timmy.agent.settings") as mock_settings:
mock_settings.ollama_model = "llama3.1"
mock_settings.ollama_url = "http://localhost:11434"
mock_settings.timmy_model_backend = "ollama"
mock_settings.airllm_model_size = "70b"
mock_settings.telemetry_enabled = False
from timmy.agent import create_timmy
create_timmy()
kwargs = MockAgent.call_args.kwargs
assert kwargs["tools"] == [mock_toolkit]
def test_create_timmy_show_tool_calls_false():
"""show_tool_calls should always be False to prevent raw JSON in output."""
with patch("timmy.agent.Agent") as MockAgent, \
patch("timmy.agent.Ollama"), \
patch("timmy.agent.SqliteDb"):
from timmy.agent import create_timmy
create_timmy()
kwargs = MockAgent.call_args.kwargs
assert kwargs["show_tool_calls"] is False

View File

@@ -1,4 +1,4 @@
from unittest.mock import AsyncMock, MagicMock, patch
from unittest.mock import AsyncMock, patch
# ── Index ─────────────────────────────────────────────────────────────────────
@@ -74,12 +74,7 @@ def test_agents_list_timmy_metadata(client):
# ── Chat ──────────────────────────────────────────────────────────────────────
def test_chat_timmy_success(client):
mock_agent = MagicMock()
mock_run = MagicMock()
mock_run.content = "I am Timmy, operational and sovereign."
mock_agent.run.return_value = mock_run
with patch("dashboard.routes.agents.create_timmy", return_value=mock_agent):
with patch("dashboard.routes.agents.timmy_chat", return_value="I am Timmy, operational and sovereign."):
response = client.post("/agents/timmy/chat", data={"message": "status?"})
assert response.status_code == 200
@@ -88,17 +83,14 @@ def test_chat_timmy_success(client):
def test_chat_timmy_shows_user_message(client):
mock_agent = MagicMock()
mock_agent.run.return_value = MagicMock(content="Acknowledged.")
with patch("dashboard.routes.agents.create_timmy", return_value=mock_agent):
with patch("dashboard.routes.agents.timmy_chat", return_value="Acknowledged."):
response = client.post("/agents/timmy/chat", data={"message": "hello there"})
assert "hello there" in response.text
def test_chat_timmy_ollama_offline(client):
with patch("dashboard.routes.agents.create_timmy", side_effect=Exception("connection refused")):
with patch("dashboard.routes.agents.timmy_chat", side_effect=Exception("connection refused")):
response = client.post("/agents/timmy/chat", data={"message": "ping"})
assert response.status_code == 200
@@ -120,10 +112,7 @@ def test_history_empty_shows_init_message(client):
def test_history_records_user_and_agent_messages(client):
mock_agent = MagicMock()
mock_agent.run.return_value = MagicMock(content="I am operational.")
with patch("dashboard.routes.agents.create_timmy", return_value=mock_agent):
with patch("dashboard.routes.agents.timmy_chat", return_value="I am operational."):
client.post("/agents/timmy/chat", data={"message": "status check"})
response = client.get("/agents/timmy/history")
@@ -132,7 +121,7 @@ def test_history_records_user_and_agent_messages(client):
def test_history_records_error_when_offline(client):
with patch("dashboard.routes.agents.create_timmy", side_effect=Exception("refused")):
with patch("dashboard.routes.agents.timmy_chat", side_effect=Exception("refused")):
client.post("/agents/timmy/chat", data={"message": "ping"})
response = client.get("/agents/timmy/history")
@@ -141,10 +130,7 @@ def test_history_records_error_when_offline(client):
def test_history_clear_resets_to_init_message(client):
mock_agent = MagicMock()
mock_agent.run.return_value = MagicMock(content="Acknowledged.")
with patch("dashboard.routes.agents.create_timmy", return_value=mock_agent):
with patch("dashboard.routes.agents.timmy_chat", return_value="Acknowledged."):
client.post("/agents/timmy/chat", data={"message": "hello"})
response = client.delete("/agents/timmy/history")
@@ -153,10 +139,7 @@ def test_history_clear_resets_to_init_message(client):
def test_history_empty_after_clear(client):
mock_agent = MagicMock()
mock_agent.run.return_value = MagicMock(content="OK.")
with patch("dashboard.routes.agents.create_timmy", return_value=mock_agent):
with patch("dashboard.routes.agents.timmy_chat", return_value="OK."):
client.post("/agents/timmy/chat", data={"message": "test"})
client.delete("/agents/timmy/history")

View File

@@ -274,7 +274,7 @@ class TestWebSocketResilience:
def test_websocket_manager_handles_no_connections(self):
"""WebSocket manager handles zero connected clients."""
from websocket.handler import ws_manager
from ws_manager.handler import ws_manager
# Should not crash when broadcasting with no connections
try:

180
tests/test_session.py Normal file
View File

@@ -0,0 +1,180 @@
"""Tests for timmy.session — persistent chat session with response sanitization."""
from unittest.mock import MagicMock, patch
import pytest
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
@pytest.fixture(autouse=True)
def _reset_session_singleton():
"""Reset the module-level singleton between tests."""
import timmy.session as mod
mod._agent = None
yield
mod._agent = None
# ---------------------------------------------------------------------------
# chat()
# ---------------------------------------------------------------------------
def test_chat_returns_string():
"""chat() should return a plain string response."""
mock_agent = MagicMock()
mock_agent.run.return_value = MagicMock(content="Hello, sir.")
with patch("timmy.session._get_agent", return_value=mock_agent):
from timmy.session import chat
result = chat("Hi Timmy")
assert isinstance(result, str)
assert "Hello, sir." in result
def test_chat_passes_session_id():
"""chat() should pass the session_id to agent.run()."""
mock_agent = MagicMock()
mock_agent.run.return_value = MagicMock(content="OK.")
with patch("timmy.session._get_agent", return_value=mock_agent):
from timmy.session import chat
chat("test", session_id="my-session")
_, kwargs = mock_agent.run.call_args
assert kwargs["session_id"] == "my-session"
def test_chat_uses_default_session_id():
"""chat() should use 'dashboard' as the default session_id."""
mock_agent = MagicMock()
mock_agent.run.return_value = MagicMock(content="OK.")
with patch("timmy.session._get_agent", return_value=mock_agent):
from timmy.session import chat
chat("test")
_, kwargs = mock_agent.run.call_args
assert kwargs["session_id"] == "dashboard"
def test_chat_singleton_agent_reused():
"""Calling chat() multiple times should reuse the same agent instance."""
mock_agent = MagicMock()
mock_agent.run.return_value = MagicMock(content="OK.")
with patch("timmy.agent.create_timmy", return_value=mock_agent) as mock_factory:
from timmy.session import chat
chat("first message")
chat("second message")
# Factory called only once (singleton)
mock_factory.assert_called_once()
def test_chat_extracts_user_name():
"""chat() should extract user name from message and persist to memory."""
mock_agent = MagicMock()
mock_agent.run.return_value = MagicMock(content="Nice to meet you!")
mock_mem = MagicMock()
with patch("timmy.session._get_agent", return_value=mock_agent), \
patch("timmy.memory_system.memory_system", mock_mem):
from timmy.session import chat
chat("my name is Alex")
mock_mem.update_user_fact.assert_called_once_with("Name", "Alex")
def test_chat_graceful_degradation_on_memory_failure():
"""chat() should still work if the conversation manager raises."""
mock_agent = MagicMock()
mock_agent.run.return_value = MagicMock(content="I'm operational.")
with patch("timmy.session._get_agent", return_value=mock_agent), \
patch("timmy.conversation.conversation_manager") as mock_cm:
mock_cm.extract_user_name.side_effect = Exception("memory broken")
from timmy.session import chat
result = chat("test message")
assert "operational" in result
# ---------------------------------------------------------------------------
# _clean_response()
# ---------------------------------------------------------------------------
def test_clean_response_strips_json_tool_calls():
"""JSON tool call blocks should be removed from response text."""
from timmy.session import _clean_response
dirty = 'Here is the answer. {"name": "python", "parameters": {"code": "0.15 * 3847.23", "variable_to_return": "result"}} The result is 577.'
clean = _clean_response(dirty)
assert '{"name"' not in clean
assert '"parameters"' not in clean
assert "The result is 577." in clean
def test_clean_response_strips_function_calls():
"""Function-call-style text should be removed."""
from timmy.session import _clean_response
dirty = 'I will search for that. memory_search(query="recall number") Found nothing.'
clean = _clean_response(dirty)
assert "memory_search(" not in clean
assert "Found nothing." in clean
def test_clean_response_strips_chain_of_thought():
"""Chain-of-thought narration lines should be removed."""
from timmy.session import _clean_response
dirty = """Since there's no direct answer in my vault or hot memory, I'll use memory_search.
Using memory_search(query="what is special"), I found a context.
Here's a possible response:
77 is special because it's a prime number."""
clean = _clean_response(dirty)
assert "Since there's no" not in clean
assert "Here's a possible" not in clean
assert "77 is special" in clean
def test_clean_response_preserves_normal_text():
"""Normal text without tool artifacts should pass through unchanged."""
from timmy.session import _clean_response
normal = "The number 77 is the sum of the first seven primes: 2+3+5+7+11+13+17."
assert _clean_response(normal) == normal
def test_clean_response_handles_empty_string():
"""Empty string should be returned as-is."""
from timmy.session import _clean_response
assert _clean_response("") == ""
def test_clean_response_handles_none():
"""None should be returned as-is."""
from timmy.session import _clean_response
assert _clean_response(None) is None
# ---------------------------------------------------------------------------
# reset_session()
# ---------------------------------------------------------------------------
def test_reset_session_clears_context():
"""reset_session() should clear the conversation context."""
with patch("timmy.conversation.conversation_manager") as mock_cm:
from timmy.session import reset_session
reset_session("test-session")
mock_cm.clear_context.assert_called_once_with("test-session")