Fact distillation stores garbage and leaks secrets #40

Closed
opened 2026-03-14 16:49:43 +00:00 by hermes · 0 comments
Collaborator

Problem

thinking.py _distill_facts_from_thoughts() asks the LLM to extract "facts worth remembering" and stores them. The stored facts are useless or dangerous:

Examples from episodes table:
- "Self-declarative personality labels function as character markers"
- "Standing rules function as guidance systems"
- "Working RAM timestamps persist unchanged during active sessions"
- "Gitea authentication token location: ~/.config/gitea/token"  ← SECURITY LEAK

All meta-observations about Timmy's own output. Nothing about the user, project, or world.

Root Cause

Distillation prompt (~line 350) doesn't:

  • Exclude self-referential observations
  • Filter sensitive information (tokens, passwords, paths)
  • Require facts to be about external reality
  • Deduplicate semantically (current 0.9 threshold too high)

Acceptance Criteria

  • Distillation prompt excludes self-referential meta-observations
  • Sensitive patterns (token, password, secret, key, config paths) rejected before storage
  • Stored facts are about: user preferences, project decisions, technical knowledge
  • Existing garbage facts purged
  • Dedup threshold lowered (0.9 → 0.7) to catch paraphrases

Files

  • src/timmy/thinking.py_distill_facts_from_thoughts()
  • src/timmy/semantic_memory.pymemory_write() dedup

Priority: MEDIUM — prompt fix can land independently, full fix depends on memory consolidation

## Problem `thinking.py` `_distill_facts_from_thoughts()` asks the LLM to extract "facts worth remembering" and stores them. The stored facts are useless or dangerous: ``` Examples from episodes table: - "Self-declarative personality labels function as character markers" - "Standing rules function as guidance systems" - "Working RAM timestamps persist unchanged during active sessions" - "Gitea authentication token location: ~/.config/gitea/token" ← SECURITY LEAK ``` All meta-observations about Timmy's own output. Nothing about the user, project, or world. ## Root Cause Distillation prompt (~line 350) doesn't: - Exclude self-referential observations - Filter sensitive information (tokens, passwords, paths) - Require facts to be about external reality - Deduplicate semantically (current 0.9 threshold too high) ## Acceptance Criteria - [ ] Distillation prompt excludes self-referential meta-observations - [ ] Sensitive patterns (token, password, secret, key, config paths) rejected before storage - [ ] Stored facts are about: user preferences, project decisions, technical knowledge - [ ] Existing garbage facts purged - [ ] Dedup threshold lowered (0.9 → 0.7) to catch paraphrases ## Files - `src/timmy/thinking.py` — `_distill_facts_from_thoughts()` - `src/timmy/semantic_memory.py` — `memory_write()` dedup ## Priority: MEDIUM — prompt fix can land independently, full fix depends on memory consolidation
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#40