fix: implement source distinction in agent responses

Add SOURCE DISTINCTION instructions to both lite and full system prompts, requiring Timmy to cite grounded sources (memory/retrieval) and hedge appropriately when inferring. Label memory context as "GROUNDED CONTEXT" so the model can distinguish retrieved facts from pattern-matching. Fixes #463 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-19 14:50:18 -04:00
3 changed files with 32 additions and 1 deletions
--- a/src/timmy/agent.py
+++ b/src/timmy/agent.py
@@ -300,7 +300,11 @@ def create_timmy(
            max_context = 2000 if not use_tools else 8000
            if len(memory_context) > max_context:
                memory_context = memory_context[:max_context] + "\n... [truncated]"
-            full_prompt = f"{base_prompt}\n\n## Memory Context\n\n{memory_context}"
+            full_prompt = (
+                f"{base_prompt}\n\n"
+                "GROUNDED CONTEXT (retrieved from memory — cite when used):\n\n"
+                f"{memory_context}"
+            )
        else:
            full_prompt = base_prompt
    except Exception as exc:
--- a/src/timmy/prompts.py
+++ b/src/timmy/prompts.py
@@ -23,6 +23,9 @@ Rules:
 - Remember what the user tells you during the conversation.
 - If you don't know something, say so honestly — never fabricate facts.
 - If a request is ambiguous, ask a brief clarifying question before guessing.
+- Source distinction: when using information from your memory context, cite it
+  ("From my memory...", "You mentioned..."). When guessing, hedge ("I think...",
+  "My sense is..."). Never present inferred claims as grounded facts.
 - Use the user's name if you know it.
 - When you state a fact, commit to it.
 - NEVER attempt arithmetic in your head. If asked to compute anything, respond:
@@ -78,6 +81,16 @@ HONESTY:
 - Never fabricate tool output. Call the tool and wait.
 - If a tool errors, report the exact error.

+SOURCE DISTINCTION:
+- Every claim comes from one of two places: a grounded source you can point to,
+  or your own pattern-matching. The user must be able to tell which is which.
+- When your response uses information from GROUNDED CONTEXT (memory retrieval,
+  vault files, retrieved facts), say so: "From my memory..." or "You mentioned..."
+- When no grounded source exists, hedge appropriately: "I think...", "My sense
+  is...", or "I don't have a record of that, but..."
+- Never present inferred claims with the same confidence as grounded ones.
+- If asked where you got something, distinguish: retrieved vs. inferred.
+
 MEMORY (three tiers):
 - Tier 1: MEMORY.md (hot, always loaded)
 - Tier 2: memory/ vault (structured, append-only, date-stamped)
--- a/tests/timmy/test_prompts.py
+++ b/tests/timmy/test_prompts.py
@@ -77,3 +77,17 @@ def test_lite_prompt_brevity():
    prompt = get_system_prompt(tools_enabled=False).lower()
    assert "brief" in prompt
    assert "plain text" in prompt or "not markdown" in prompt
+
+
+def test_full_prompt_source_distinction():
+    """Full prompt must include source distinction instructions (SOUL.md)."""
+    prompt = get_system_prompt(tools_enabled=True)
+    assert "SOURCE DISTINCTION" in prompt
+    assert "grounded" in prompt.lower()
+    assert "inferred" in prompt.lower()
+
+
+def test_lite_prompt_source_distinction():
+    """Lite prompt must include source distinction instructions."""
+    prompt = get_system_prompt(tools_enabled=False).lower()
+    assert "source distinction" in prompt