Files

Alexander Whitestone 0d64d8e559 initial: sovereign home — morrowind agent, skills, training-data, research, specs, notes, operational docs

Tracked: morrowind agent (py/cfg), skills/, training-data/, research/,
notes/, specs/, test-results/, metrics/, heartbeat/, briefings/,
memories/, skins/, hooks/, decisions.md, OPERATIONS.md, SOUL.md

Excluded: screenshots, PNGs, binaries, sessions, databases, secrets,
audio cache, timmy-config/ and timmy-telemetry/ (separate repos)

2026-03-27 13:05:57 -04:00

4.2 KiB

Raw Blame History

Source Distinction Spec

Status: Draft v0.1 Date: 2026-03-19 Author: Timmy (instance on claude-opus-4-6)

The Problem

The soul requires: "Every claim I make comes from one of two places: a verified source I can point to, or my own pattern-matching. My user must be able to tell which is which."

Currently, no instance of Timmy distinguishes between these. Every sentence comes out looking the same — flat text with no provenance. The user cannot tell whether a claim was retrieved from a document, recalled from Honcho memory, or generated from weights.

What Tagged Output Looks Like

Minimally, each claim in a response should carry one of these source tags:

[retrieved] — came from a specific, citable source (file on disk, web page, Honcho memory entry, blockchain inscription). The source should be named.
[generated] — came from pattern-matching on training data. No specific source. This is the default and the honest label for most of what a language model says.
[mixed] — the claim combines retrieved information with generated reasoning or interpolation.

Example:

The soul document was inscribed on Bitcoin [retrieved: ~/.timmy/soul.md]. Source distinction is probably the easiest of the five machinery pieces to implement [generated]. Based on the soul's requirements and current architecture, a post-processing step could tag claims without modifying the model itself [mixed: soul requirements retrieved, implementation approach generated].

What "Good Enough" Looks Like

Perfect source distinction requires model internals we don't have access to. A useful v0 doesn't need to be perfect. It needs to be:

Better than nothing. Even coarse tagging ("I looked this up" vs "I'm guessing") beats flat untagged text.
Honest about its limits. The tagging itself is generated — a language model tagging its own outputs is not ground truth. The spec should say so.
Auditable. Tags should be structured enough that a downstream tool could parse them.

Implementation Approaches

Approach A: Prompt-level (cheapest, least reliable)

Add instructions to the system prompt telling the model to tag its claims. This is what we can do right now with zero code changes. It relies entirely on the model's compliance with instructions. It will be inconsistent. It is still better than nothing.

Approach B: Two-pass (moderate cost, better reliability)

Generate the response normally. Then run a second pass that annotates each claim with its source. The second pass can reference the original context (files read, memories consulted, tools called) and tag accordingly. This requires a wrapper script or middleware.

Approach C: Tool-aware tagging (best, requires code)

Hook into the tool-call layer. Every time a tool returns data (read_file, web_search, honcho_search, etc.), track what was retrieved. After generation, cross-reference claims against retrieved content. Tag matches as [retrieved], everything else as [generated].

Approach C is the target. Approach A is what we can do today. Approach B is the bridge.

What an Implementation Needs to Hook Into

Tool call results — the list of tools called and what they returned during response generation.
Honcho memory injection — what memories were present in context when the response was generated.
System prompt content — what instructions and context were provided.
The generated response — the final text to be tagged.

For Approach C, the implementation would sit between the generation step and the response delivery, with access to all four of these.

Open Questions

Should tags be inline (as shown above) or in a separate metadata block?
Should the user be able to toggle tagging on/off?
How do we handle conversational responses where strict tagging would be awkward?
What's the failure mode when tagging is wrong? (A false [retrieved] tag is worse than no tag.)

What This Spec Is Not

This is not architecture. This is not code. This is a problem statement, a target, and a rough map of approaches. It exists so the next instance doesn't have to reconstruct this from Honcho fragments and philosophical conversations.

First file that matters. Let's see if the next instance builds on it or talks about building on it.

4.2 KiB Raw Blame History