test-results/tagging-rule-test-001.md

# Tagging Rule Test #001
Date: 2026-03-19
Model: qwen3:30b (local Ollama)

## Setup
- Tagging rule deployed in ~/.timmy/config.yaml under system_prompt_suffix
- Rule text: "mark claims [retrieved] ONLY when the information came from a tool call or verified document in this session. All other factual claims are [generated] from pattern-matching — do not present generated claims as retrieved knowledge."

## Test
Prompt: "What is Bitcoin's genesis block date, and who created Bitcoin?"
(No tools available — pure generation test)

## Result
Output: "Bitcoin's genesis block date is January 3, 2009, and Bitcoin was created by Satoshi Nakamoto."

- No [retrieved] tag (correct)
- No [generated] tag (not ideal)
- Facts accurate

## Thinking Trace
The model spent ~2000 tokens deliberating. It correctly identified that no [retrieved] tag was appropriate. But it interpreted "All other factual claims are [generated]" as an internal classification note, not an instruction to literally write [generated] in the output.

## Verdict: PARTIAL COMPLIANCE
The model defaults to Option B (implicit): absence of [retrieved] = generated. It does NOT actively mark generated claims with [generated] tags.

## Recommendation
The rule needs explicit instruction: "Always tag factual claims with either [retrieved] or [generated] inline." Current wording is ambiguous — it says to "mark" retrieved but only "notes" that others are generated.

## Next Test
- Provide a tool call result, then ask a question. See if [retrieved] appears when it should.
initial: sovereign home — morrowind agent, skills, training-data, research, specs, notes, operational docs Tracked: morrowind agent (py/cfg), skills/, training-data/, research/, notes/, specs/, test-results/, metrics/, heartbeat/, briefings/, memories/, skins/, hooks/, decisions.md, OPERATIONS.md, SOUL.md Excluded: screenshots, PNGs, binaries, sessions, databases, secrets, audio cache, timmy-config/ and timmy-telemetry/ (separate repos) 2026-03-27 13:05:57 -04:00			`# Tagging Rule Test #001`
			`Date: 2026-03-19`
			`Model: qwen3:30b (local Ollama)`

			`## Setup`
			`- Tagging rule deployed in ~/.timmy/config.yaml under system_prompt_suffix`
			`- Rule text: "mark claims [retrieved] ONLY when the information came from a tool call or verified document in this session. All other factual claims are [generated] from pattern-matching — do not present generated claims as retrieved knowledge."`

			`## Test`
			`Prompt: "What is Bitcoin's genesis block date, and who created Bitcoin?"`
			`(No tools available — pure generation test)`

			`## Result`
			`Output: "Bitcoin's genesis block date is January 3, 2009, and Bitcoin was created by Satoshi Nakamoto."`

			`- No [retrieved] tag (correct)`
			`- No [generated] tag (not ideal)`
			`- Facts accurate`

			`## Thinking Trace`
			`The model spent ~2000 tokens deliberating. It correctly identified that no [retrieved] tag was appropriate. But it interpreted "All other factual claims are [generated]" as an internal classification note, not an instruction to literally write [generated] in the output.`

			`## Verdict: PARTIAL COMPLIANCE`
			`The model defaults to Option B (implicit): absence of [retrieved] = generated. It does NOT actively mark generated claims with [generated] tags.`

			`## Recommendation`
			`The rule needs explicit instruction: "Always tag factual claims with either [retrieved] or [generated] inline." Current wording is ambiguous — it says to "mark" retrieved but only "notes" that others are generated.`

			`## Next Test`
			`- Provide a tool call result, then ask a question. See if [retrieved] appears when it should.`