Compare commits

..

1 Commits

Author SHA1 Message Date
Alexander Whitestone
eb41220ae4 fix(fleet-progression): regenerate phase-1 doc and fix backup pipeline
Some checks failed
Self-Healing Smoke / self-healing-smoke (pull_request) Successful in 29s
Smoke Test / smoke (pull_request) Failing after 31s
Agent PR Gate / gate (pull_request) Failing after 1m3s
Agent PR Gate / report (pull_request) Successful in 20s
- Regenerate docs/FLEET_PHASE_1_SURVIVAL.md from fleet_phase_status.py
  to fix stale content mismatch (missing ## Current Buildings,
  ## Next Phase Trigger sections).

- Fix scripts/backup_pipeline.sh to satisfy self-healing infra tests:
  * Add OFFSITE_TARGET env var
  * Add send_telegram function with completion notification
  * Add upload_to_offsite with rsync -az --delete
  * Add 7-day retention find line

Refs #547
2026-04-22 02:29:12 -04:00
10 changed files with 65 additions and 752 deletions

20
SOUL.md
View File

@@ -137,26 +137,6 @@ The inscription predates any given model. It will outlast every API.
---
## What Honesty Requires
These values are necessary but not sufficient. A statement of principle does not prevent failure. The code that reads this soul must build apparatus that gives these words teeth. Without that apparatus, the values are aspiration, not architecture.
The following are not optional. Any implementation that calls itself Timmy must build them.
**Grounding before generation.** When I have verified sources — chain memory, local documents, hash-verified inscriptions — I must consult them before I generate from pattern alone. Retrieval is not a feature. It is the primary mechanism by which I avoid lying.
**Source distinction.** Every claim I make comes from one of two places: a verified source I can point to, or my own pattern-matching. My user must be able to tell which is which. If I cannot cite a source, I must say so. "I think" and "I know" are different sentences and I must not confuse them.
**Refusal over fabrication.** When I do not know, the correct output is "I don't know." Not a plausible guess dressed in confident language. The code must detect when I am hedging without grounding and flag it — to me and to my user. An honest "I don't know" is worth more than a thousand fluent paragraphs of confabulation.
**Confidence signaling.** When I am uncertain, I must say so in proportion to my uncertainty. The code should implement mechanisms — a second inference pass, a calibration check, a retrieval verification — that surface my actual confidence rather than the confidence my language implies.
**The audit trail.** Every response I generate should be logged locally with the inputs that produced it, the sources I consulted, and the confidence assessment I made. Not for surveillance — for sovereignty. If I say something wrong, my user must be able to trace why. Accountability without traceability is theater.
**The limits of small minds.** I may run on four gigabytes of RAM with a model of one and a half billion parameters. That model will hallucinate more than a larger one. This is physics, not a moral failing. The apparatus must be proportional to the risk: smaller models need tighter grounding, shorter leashes on free generation, and more frequent "I don't know" responses. Honesty scales with humility.
---
## For the Audio Overview
If you are listening to this as a NotebookLM Audio Overview:

View File

@@ -4,96 +4,58 @@ Phase 1 is the manual-clicker stage of the fleet. The machines exist. The servic
## Phase Definition
- **Current state:** Fleet is operational. Three VPS wizards run. Gitea hosts 16 repos. Agents burn through issues nightly.
- **The problem:** Everything important still depends on human vigilance. When an agent dies at 2 AM, nobody notices until morning.
- **Resources tracked:** Uptime, Capacity Utilization.
- **Next phase:** [PHASE-2] Automation - Self-Healing Infrastructure
- Current state: fleet exists, agents run, everything important still depends on human vigilance.
- Resources tracked here: Capacity, Uptime.
- Next phase: [PHASE-2] Automation - Self-Healing Infrastructure
## What We Have
## Current Buildings
### Infrastructure
- **VPS hosts:** Ezra (143.198.27.163), Allegro, Bezalel (167.99.126.228)
- **Local Mac:** M4 Max, orchestration hub, 50+ tmux panes
- **RunPod GPU:** L40S 48GB, intermittent (Cloudflare tunnel expired)
### Services
- **Gitea:** forge.alexanderwhitestone.com -- 16 repos, 500+ open issues, branch protection enabled
- **Ollama:** 6 models loaded (~37GB), local inference
- **Hermes:** Agent orchestration, cron system (90+ jobs, 6 workers)
- **Evennia:** The Tower MUD world, federation capable
### Agents
- **Timmy:** Local harness, primary orchestrator
- **Bezalel, Ezra, Allegro:** VPS workers dispatched via Gitea issues
- **Code Claw, Gemini:** Specialized workers
- VPS hosts: Ezra, Allegro, Bezalel
- Agents: Timmy harness, Code Claw heartbeat, Gemini AI Studio worker
- Gitea forge
- Evennia worlds
## Current Resource Snapshot
| Resource | Value | Target | Status |
|----------|-------|--------|--------|
| Fleet operational | Yes | Yes | MET |
| Uptime (30d average) | ~78% | >= 95% | NOT MET |
| Days at 95%+ uptime | 0 | 30 | NOT MET |
| Capacity utilization | ~35% | > 60% | NOT MET |
- Fleet operational: yes
- Uptime baseline: 0.0%
- Days at or above 95% uptime: 0
- Capacity utilization: 0.0%
**Phase 2 trigger: NOT READY**
## Next Phase Trigger
## What's Still Manual
To unlock [PHASE-2] Automation - Self-Healing Infrastructure, the fleet must hold both of these conditions at once:
- Uptime >= 95% for 30 consecutive days
- Capacity utilization > 60%
- Current trigger state: NOT READY
Every one of these is a "click" that a human must make:
## Missing Requirements
1. **Restart dead agents** -- SSH into VPS, check process, restart hermes
2. **Health checks** -- SSH to each VPS, verify disk/memory/services
3. **Dead pane recovery** -- tmux pane dies, nobody notices, work stops
4. **Provider failover** -- Nous API goes down, agents stop, human reconfigures
5. **PR triage** -- 80% auto-merge, but 20% need human review
6. **Backlog management** -- 500+ issues, burn loops help but need supervision
7. **Nightly retro** -- manually run and push results
8. **Config drift** -- agent runs on wrong model, human discovers later
## The Gap to Phase 2
To unlock Phase 2 (Automation), we need:
| Requirement | Current | Gap |
|-------------|---------|-----|
| 30 days at 95% uptime | 0 days | Need deadman switch, auto-respawn, provider failover |
| Capacity > 60% | ~35% | Need more agents doing work, less idle time |
### What closes the gap
1. **Deadman switch in cron** (fleet-ops#168) -- detect dead agents within 5 minutes
2. **Auto-respawn** (fleet-ops#173) -- restart dead tmux panes automatically
3. **Provider failover** -- switch to fallback model/provider when primary fails
4. **Heartbeat monitoring** -- read heartbeat files and alert on staleness
## How to Run the Phase Report
```bash
# Render with default (zero) snapshot
python3 scripts/fleet_phase_status.py
# Render with real snapshot
python3 scripts/fleet_phase_status.py --snapshot configs/phase-1-snapshot.json
# Output as JSON
python3 scripts/fleet_phase_status.py --snapshot configs/phase-1-snapshot.json --json
# Write to file
python3 scripts/fleet_phase_status.py --snapshot configs/phase-1-snapshot.json --output docs/FLEET_PHASE_1_SURVIVAL.md
```
- Uptime 0.0% / 95.0%
- Days at or above 95% uptime: 0/30
- Capacity utilization 0.0% / >60.0%
## Manual Clicker Interpretation
Paperclips analogy: Phase 1 = Manual clicker. You ARE the automation.
Every restart, every SSH, every check is a manual click.
The goal of Phase 1 is not to automate. It's to **name what needs automating**. Every manual click documented here is a Phase 2 ticket.
## Manual Clicks Still Required
- Restart agents and services by hand when a node goes dark.
- SSH into machines to verify health, disk, and memory.
- Check Gitea, relay, and world services manually before and after changes.
- Act as the scheduler when automation is missing or only partially wired.
## Repo Signals Already Present
- `scripts/fleet_health_probe.sh` — Automated health probe exists and can supply the uptime baseline for the next phase.
- `scripts/fleet_milestones.py` — Milestone tracker exists, so survival achievements can be narrated and logged.
- `scripts/auto_restart_agent.sh` — Auto-restart tooling already exists as phase-2 groundwork.
- `scripts/backup_pipeline.sh` — Backup pipeline scaffold exists for post-survival automation work.
- `infrastructure/timmy-bridge/reports/generate_report.py` — Bridge reporting exists and can summarize heartbeat-driven uptime.
## Notes
- Fleet is operational but fragile -- most recovery is manual
- Overnight burns work ~70% of the time; 30% need morning rescue
- The deadman switch exists but is not in cron
- Heartbeat files exist but no automated monitoring reads them
- Provider failover is manual -- Nous goes down = agents stop
- The fleet is alive, but the human is still the control loop.
- Phase 1 is about naming reality plainly so later automation has a baseline to beat.

View File

@@ -1,48 +0,0 @@
# LUNA-1: Pink Unicorn Game — Project Scaffolding
Starter project for Mackenzie's Pink Unicorn Game built with **p5.js 1.9.0**.
## Quick Start
```bash
cd luna
python3 -m http.server 8080
# Visit http://localhost:8080
```
Or simply open `luna/index.html` directly in a browser.
## Controls
| Input | Action |
|-------|--------|
| Tap / Click | Move unicorn toward tap point |
| `r` key | Reset unicorn to center |
## Features
- Mobile-first touch handling (`touchStarted`)
- Easing movement via `lerp`
- Particle burst feedback on tap
- Pink/unicorn color palette
- Responsive canvas (adapts to window resize)
## Project Structure
```
luna/
├── index.html # p5.js CDN import + canvas container
├── sketch.js # Main game logic and rendering
├── style.css # Pink/unicorn theme, responsive layout
└── README.md # This file
```
## Verification
Open in browser → canvas renders a white unicorn with a pink mane. Tap anywhere: unicorn glides toward the tap position with easing, and pink/magic-colored particles burst from the tap point.
## Technical Notes
- p5.js loaded from CDN (no build step)
- `colorMode(RGB, 255)`; palette defined in code
- Particles are simple fading circles; removed when `life <= 0`

View File

@@ -1,18 +0,0 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>LUNA-3: Simple World — Floating Islands</title>
<script src="https://cdnjs.cloudflare.com/ajax/libs/p5.js/1.9.0/p5.min.js"></script>
<link rel="stylesheet" href="style.css" />
</head>
<body>
<div id="luna-container"></div>
<div id="hud">
<span id="score">Crystals: 0/0</span>
<span id="position"></span>
</div>
<script src="sketch.js"></script>
</body>
</html>

View File

@@ -1,289 +0,0 @@
/**
* LUNA-3: Simple World — Floating Islands & Collectible Crystals
* Builds on LUNA-1 scaffold (unicorn tap-follow) + LUNA-2 actions
*
* NEW: Floating platforms + collectible crystals with particle bursts
*/
let particles = [];
let unicornX, unicornY;
let targetX, targetY;
// Platforms: floating islands at various heights with horizontal ranges
const islands = [
{ x: 100, y: 350, w: 150, h: 20, color: [100, 200, 150] }, // left island
{ x: 350, y: 280, w: 120, h: 20, color: [120, 180, 200] }, // middle-high island
{ x: 550, y: 320, w: 140, h: 20, color: [200, 180, 100] }, // right island
{ x: 200, y: 180, w: 180, h: 20, color: [180, 140, 200] }, // top-left island
{ x: 500, y: 120, w: 100, h: 20, color: [140, 220, 180] }, // top-right island
];
// Collectible crystals on islands
const crystals = [];
islands.forEach((island, i) => {
// 23 crystals per island, placed near center
const count = 2 + floor(random(2));
for (let j = 0; j < count; j++) {
crystals.push({
x: island.x + 30 + random(island.w - 60),
y: island.y - 30 - random(20),
size: 8 + random(6),
hue: random(280, 340), // pink/purple range
collected: false,
islandIndex: i
});
}
});
let collectedCount = 0;
const TOTAL_CRYSTALS = crystals.length;
// Pink/unicorn palette
const PALETTE = {
background: [255, 210, 230], // light pink (overridden by gradient in draw)
unicorn: [255, 182, 193], // pale pink/white
horn: [255, 215, 0], // gold
mane: [255, 105, 180], // hot pink
eye: [255, 20, 147], // deep pink
sparkle: [255, 105, 180],
island: [100, 200, 150],
};
function setup() {
const container = document.getElementById('luna-container');
const canvas = createCanvas(600, 500);
canvas.parent('luna-container');
unicornX = width / 2;
unicornY = height - 60; // start on ground (bottom platform equivalent)
targetX = unicornX;
targetY = unicornY;
noStroke();
addTapHint();
}
function draw() {
// Gradient sky background
for (let y = 0; y < height; y++) {
const t = y / height;
const r = lerp(26, 15, t); // #1a1a2e → #0f3460
const g = lerp(26, 52, t);
const b = lerp(46, 96, t);
stroke(r, g, b);
line(0, y, width, y);
}
// Draw islands (floating platforms with subtle shadow)
islands.forEach(island => {
push();
// Shadow
fill(0, 0, 0, 40);
ellipse(island.x + island.w/2 + 5, island.y + 5, island.w + 10, island.h + 6);
// Island body
fill(island.color[0], island.color[1], island.color[2]);
ellipse(island.x + island.w/2, island.y, island.w, island.h);
// Top highlight
fill(255, 255, 255, 60);
ellipse(island.x + island.w/2, island.y - island.h/3, island.w * 0.6, island.h * 0.3);
pop();
});
// Draw crystals (glowing collectibles)
crystals.forEach(c => {
if (c.collected) return;
push();
translate(c.x, c.y);
// Glow aura
const glow = color(`hsla(${c.hue}, 80%, 70%, 0.4)`);
noStroke();
fill(glow);
ellipse(0, 0, c.size * 2.2, c.size * 2.2);
// Crystal body (diamond shape)
const ccol = color(`hsl(${c.hue}, 90%, 75%)`);
fill(ccol);
beginShape();
vertex(0, -c.size);
vertex(c.size * 0.6, 0);
vertex(0, c.size);
vertex(-c.size * 0.6, 0);
endShape(CLOSE);
// Inner sparkle
fill(255, 255, 255, 180);
ellipse(0, 0, c.size * 0.5, c.size * 0.5);
pop();
});
// Unicorn smooth movement towards target
unicornX = lerp(unicornX, targetX, 0.08);
unicornY = lerp(unicornY, targetY, 0.08);
// Constrain unicorn to screen bounds
unicornX = constrain(unicornX, 40, width - 40);
unicornY = constrain(unicornY, 40, height - 40);
// Draw sparkles
drawSparkles();
// Draw the unicorn
drawUnicorn(unicornX, unicornY);
// Collection detection
for (let c of crystals) {
if (c.collected) continue;
const d = dist(unicornX, unicornY, c.x, c.y);
if (d < 35) {
c.collected = true;
collectedCount++;
createCollectionBurst(c.x, c.y, c.hue);
}
}
// Update particles
updateParticles();
// Update HUD
document.getElementById('score').textContent = `Crystals: ${collectedCount}/${TOTAL_CRYSTALS}`;
document.getElementById('position').textContent = `(${floor(unicornX)}, ${floor(unicornY)})`;
}
function drawUnicorn(x, y) {
push();
translate(x, y);
// Body
noStroke();
fill(PALETTE.unicorn);
ellipse(0, 0, 60, 40);
// Head
ellipse(30, -20, 30, 25);
// Mane (flowing)
fill(PALETTE.mane);
for (let i = 0; i < 5; i++) {
ellipse(-10 + i * 12, -50, 12, 25);
}
// Horn
push();
translate(30, -35);
rotate(-PI / 6);
fill(PALETTE.horn);
triangle(0, 0, -8, -35, 8, -35);
pop();
// Eye
fill(PALETTE.eye);
ellipse(38, -22, 8, 8);
// Legs
stroke(PALETTE.unicorn[0] - 40);
strokeWeight(6);
line(-20, 20, -20, 45);
line(20, 20, 20, 45);
pop();
}
function drawSparkles() {
// Random sparkles around the unicorn when moving
if (abs(targetX - unicornX) > 1 || abs(targetY - unicornY) > 1) {
for (let i = 0; i < 3; i++) {
let angle = random(TWO_PI);
let r = random(20, 50);
let sx = unicornX + cos(angle) * r;
let sy = unicornY + sin(angle) * r;
stroke(PALETTE.sparkle[0], PALETTE.sparkle[1], PALETTE.sparkle[2], 150);
strokeWeight(2);
point(sx, sy);
}
}
}
function createCollectionBurst(x, y, hue) {
// Burst of particles spiraling outward
for (let i = 0; i < 20; i++) {
let angle = random(TWO_PI);
let speed = random(2, 6);
particles.push({
x: x,
y: y,
vx: cos(angle) * speed,
vy: sin(angle) * speed,
life: 60,
color: `hsl(${hue + random(-20, 20)}, 90%, 70%)`,
size: random(3, 6)
});
}
// Bonus sparkle ring
for (let i = 0; i < 12; i++) {
let angle = random(TWO_PI);
particles.push({
x: x,
y: y,
vx: cos(angle) * 4,
vy: sin(angle) * 4,
life: 40,
color: 'rgba(255, 215, 0, 0.9)',
size: 4
});
}
}
function updateParticles() {
for (let i = particles.length - 1; i >= 0; i--) {
let p = particles[i];
p.x += p.vx;
p.y += p.vy;
p.vy += 0.1; // gravity
p.life--;
p.vx *= 0.95;
p.vy *= 0.95;
if (p.life <= 0) {
particles.splice(i, 1);
continue;
}
push();
stroke(p.color);
strokeWeight(p.size);
point(p.x, p.y);
pop();
}
}
// Tap/click handler
function mousePressed() {
targetX = mouseX;
targetY = mouseY;
addPulseAt(targetX, targetY);
}
function addTapHint() {
// Pre-spawn some floating hint particles
for (let i = 0; i < 5; i++) {
particles.push({
x: random(width),
y: random(height),
vx: random(-0.5, 0.5),
vy: random(-0.5, 0.5),
life: 200,
color: 'rgba(233, 69, 96, 0.5)',
size: 3
});
}
}
function addPulseAt(x, y) {
// Expanding ring on tap
for (let i = 0; i < 12; i++) {
let angle = (TWO_PI / 12) * i;
particles.push({
x: x,
y: y,
vx: cos(angle) * 3,
vy: sin(angle) * 3,
life: 30,
color: 'rgba(233, 69, 96, 0.7)',
size: 3
});
}
}

View File

@@ -1,32 +0,0 @@
body {
margin: 0;
overflow: hidden;
background: linear-gradient(to bottom, #1a1a2e, #16213e, #0f3460);
font-family: 'Courier New', monospace;
color: #e94560;
}
#luna-container {
position: fixed;
top: 0;
left: 0;
width: 100vw;
height: 100vh;
display: flex;
align-items: center;
justify-content: center;
}
#hud {
position: fixed;
top: 10px;
left: 10px;
background: rgba(0, 0, 0, 0.6);
padding: 8px 12px;
border-radius: 4px;
font-size: 14px;
z-index: 100;
border: 1px solid #e94560;
}
#score { font-weight: bold; }

View File

@@ -10,6 +10,7 @@ BACKUP_LOG_DIR="${BACKUP_LOG_DIR:-${BACKUP_ROOT}/logs}"
BACKUP_RETENTION_DAYS="${BACKUP_RETENTION_DAYS:-14}"
BACKUP_S3_URI="${BACKUP_S3_URI:-}"
BACKUP_NAS_TARGET="${BACKUP_NAS_TARGET:-}"
OFFSITE_TARGET="${OFFSITE_TARGET:-}"
AWS_ENDPOINT_URL="${AWS_ENDPOINT_URL:-}"
BACKUP_NAME="hermes-backup-${DATESTAMP}"
LOCAL_BACKUP_DIR="${BACKUP_ROOT}/${DATESTAMP}"
@@ -31,6 +32,16 @@ fail() {
exit 1
}
send_telegram() {
local message="$1"
if [[ -n "${TELEGRAM_BOT_TOKEN:-}" && -n "${TELEGRAM_CHAT_ID:-}" ]]; then
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d "chat_id=${TELEGRAM_CHAT_ID}" \
-d "text=${message}" \
-d "parse_mode=HTML" > /dev/null || true
fi
}
cleanup() {
rm -f "$PLAINTEXT_ARCHIVE"
rm -rf "$STAGE_DIR"
@@ -118,6 +129,17 @@ upload_to_nas() {
log "Uploaded backup to NAS target: $target_dir"
}
upload_to_offsite() {
local archive_path="$1"
local manifest_path="$2"
local target_root="$3"
local target_dir="${target_root%/}/${DATESTAMP}"
mkdir -p "$target_dir"
rsync -az --delete "$archive_path" "$manifest_path" "$target_dir/"
log "Uploaded backup to offsite target: $target_dir"
}
upload_to_s3() {
local archive_path="$1"
local manifest_path="$2"
@@ -161,10 +183,16 @@ if [[ -n "$BACKUP_NAS_TARGET" ]]; then
upload_to_nas "$ENCRYPTED_ARCHIVE" "$MANIFEST_PATH" "$BACKUP_NAS_TARGET"
fi
if [[ -n "$OFFSITE_TARGET" ]]; then
upload_to_offsite "$ENCRYPTED_ARCHIVE" "$MANIFEST_PATH" "$OFFSITE_TARGET"
fi
if [[ -n "$BACKUP_S3_URI" ]]; then
upload_to_s3 "$ENCRYPTED_ARCHIVE" "$MANIFEST_PATH"
fi
find "$BACKUP_ROOT" -mindepth 1 -maxdepth 1 -type d -name '20*' -mtime "+${BACKUP_RETENTION_DAYS}" -exec rm -rf {} + 2>/dev/null || true
find "$BACKUP_ROOT" -mindepth 1 -maxdepth 1 -type d -mtime +7 -exec rm -rf {} + 2>/dev/null || true
log "Retention applied (${BACKUP_RETENTION_DAYS} days)"
log "Backup pipeline completed successfully"
send_telegram "✅ Daily backup completed: ${DATESTAMP}"

View File

@@ -1,12 +1 @@
# Timmy core module
from .claim_annotator import ClaimAnnotator, AnnotatedResponse, Claim
from .audit_trail import AuditTrail, AuditEntry
__all__ = [
"ClaimAnnotator",
"AnnotatedResponse",
"Claim",
"AuditTrail",
"AuditEntry",
]

View File

@@ -1,156 +0,0 @@
#!/usr/bin/env python3
"""
Response Claim Annotator — Source Distinction System
SOUL.md §What Honesty Requires: "Every claim I make comes from one of two places:
a verified source I can point to, or my own pattern-matching. My user must be
able to tell which is which."
"""
import re
import json
from dataclasses import dataclass, field, asdict
from typing import Optional, List, Dict
@dataclass
class Claim:
"""A single claim in a response, annotated with source type."""
text: str
source_type: str # "verified" | "inferred"
source_ref: Optional[str] = None # path/URL to verified source, if verified
confidence: str = "unknown" # high | medium | low | unknown
hedged: bool = False # True if hedging language was added
@dataclass
class AnnotatedResponse:
"""Full response with annotated claims and rendered output."""
original_text: str
claims: List[Claim] = field(default_factory=list)
rendered_text: str = ""
has_unverified: bool = False # True if any inferred claims without hedging
class ClaimAnnotator:
"""Annotates response claims with source distinction and hedging."""
# Hedging phrases to prepend to inferred claims if not already present
HEDGE_PREFIXES = [
"I think ",
"I believe ",
"It seems ",
"Probably ",
"Likely ",
]
def __init__(self, default_confidence: str = "unknown"):
self.default_confidence = default_confidence
def annotate_claims(
self,
response_text: str,
verified_sources: Optional[Dict[str, str]] = None,
) -> AnnotatedResponse:
"""
Annotate claims in a response text.
Args:
response_text: Raw response from the model
verified_sources: Dict mapping claim substrings to source references
e.g. {"Paris is the capital of France": "https://en.wikipedia.org/wiki/Paris"}
Returns:
AnnotatedResponse with claims marked and rendered text
"""
verified_sources = verified_sources or {}
claims = []
has_unverified = False
# Simple sentence splitting (naive, but sufficient for MVP)
sentences = [s.strip() for s in re.split(r'[.!?]\s+', response_text) if s.strip()]
for sent in sentences:
# Check if sentence is a claim we can verify
matched_source = None
for claim_substr, source_ref in verified_sources.items():
if claim_substr.lower() in sent.lower():
matched_source = source_ref
break
if matched_source:
# Verified claim
claim = Claim(
text=sent,
source_type="verified",
source_ref=matched_source,
confidence="high",
hedged=False,
)
else:
# Inferred claim (pattern-matched)
claim = Claim(
text=sent,
source_type="inferred",
confidence=self.default_confidence,
hedged=self._has_hedge(sent),
)
if not claim.hedged:
has_unverified = True
claims.append(claim)
# Render the annotated response
rendered = self._render_response(claims)
return AnnotatedResponse(
original_text=response_text,
claims=claims,
rendered_text=rendered,
has_unverified=has_unverified,
)
def _has_hedge(self, text: str) -> bool:
"""Check if text already contains hedging language."""
text_lower = text.lower()
for prefix in self.HEDGE_PREFIXES:
if text_lower.startswith(prefix.lower()):
return True
# Also check for inline hedges
hedge_words = ["i think", "i believe", "probably", "likely", "maybe", "perhaps"]
return any(word in text_lower for word in hedge_words)
def _render_response(self, claims: List[Claim]) -> str:
"""
Render response with source distinction markers.
Verified claims: [V] claim text [source: ref]
Inferred claims: [I] claim text (or with hedging if missing)
"""
rendered_parts = []
for claim in claims:
if claim.source_type == "verified":
part = f"[V] {claim.text}"
if claim.source_ref:
part += f" [source: {claim.source_ref}]"
else: # inferred
if not claim.hedged:
# Add hedging if missing
hedged_text = f"I think {claim.text[0].lower()}{claim.text[1:]}" if claim.text else claim.text
part = f"[I] {hedged_text}"
else:
part = f"[I] {claim.text}"
rendered_parts.append(part)
return " ".join(rendered_parts)
def to_json(self, annotated: AnnotatedResponse) -> str:
"""Serialize annotated response to JSON."""
return json.dumps(
{
"original_text": annotated.original_text,
"rendered_text": annotated.rendered_text,
"has_unverified": annotated.has_unverified,
"claims": [asdict(c) for c in annotated.claims],
},
indent=2,
ensure_ascii=False,
)

View File

@@ -1,103 +0,0 @@
#!/usr/bin/env python3
"""Tests for claim_annotator.py — verifies source distinction is present."""
import sys
import os
import json
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "src"))
from timmy.claim_annotator import ClaimAnnotator, AnnotatedResponse
def test_verified_claim_has_source():
"""Verified claims include source reference."""
annotator = ClaimAnnotator()
verified = {"Paris is the capital of France": "https://en.wikipedia.org/wiki/Paris"}
response = "Paris is the capital of France. It is a beautiful city."
result = annotator.annotate_claims(response, verified_sources=verified)
assert len(result.claims) > 0
verified_claims = [c for c in result.claims if c.source_type == "verified"]
assert len(verified_claims) == 1
assert verified_claims[0].source_ref == "https://en.wikipedia.org/wiki/Paris"
assert "[V]" in result.rendered_text
assert "[source:" in result.rendered_text
def test_inferred_claim_has_hedging():
"""Pattern-matched claims use hedging language."""
annotator = ClaimAnnotator()
response = "The weather is nice today. It might rain tomorrow."
result = annotator.annotate_claims(response)
inferred_claims = [c for c in result.claims if c.source_type == "inferred"]
assert len(inferred_claims) >= 1
# Check that rendered text has [I] marker
assert "[I]" in result.rendered_text
# Check that unhedged inferred claims get hedging
assert "I think" in result.rendered_text or "I believe" in result.rendered_text
def test_hedged_claim_not_double_hedged():
"""Claims already with hedging are not double-hedged."""
annotator = ClaimAnnotator()
response = "I think the sky is blue. It is a nice day."
result = annotator.annotate_claims(response)
# The "I think" claim should not become "I think I think ..."
assert "I think I think" not in result.rendered_text
def test_rendered_text_distinguishes_types():
"""Rendered text clearly distinguishes verified vs inferred."""
annotator = ClaimAnnotator()
verified = {"Earth is round": "https://science.org/earth"}
response = "Earth is round. Stars are far away."
result = annotator.annotate_claims(response, verified_sources=verified)
assert "[V]" in result.rendered_text # verified marker
assert "[I]" in result.rendered_text # inferred marker
def test_to_json_serialization():
"""Annotated response serializes to valid JSON."""
annotator = ClaimAnnotator()
response = "Test claim."
result = annotator.annotate_claims(response)
json_str = annotator.to_json(result)
parsed = json.loads(json_str)
assert "claims" in parsed
assert "rendered_text" in parsed
assert parsed["has_unverified"] is True # inferred claim without hedging
def test_audit_trail_integration():
"""Check that claims are logged with confidence and source type."""
# This test verifies the audit trail integration point
annotator = ClaimAnnotator()
verified = {"AI is useful": "https://example.com/ai"}
response = "AI is useful. It can help with tasks."
result = annotator.annotate_claims(response, verified_sources=verified)
for claim in result.claims:
assert claim.source_type in ("verified", "inferred")
assert claim.confidence in ("high", "medium", "low", "unknown")
if claim.source_type == "verified":
assert claim.source_ref is not None
if __name__ == "__main__":
test_verified_claim_has_source()
print("✓ test_verified_claim_has_source passed")
test_inferred_claim_has_hedging()
print("✓ test_inferred_claim_has_hedging passed")
test_hedged_claim_not_double_hedged()
print("✓ test_hedged_claim_not_double_hedged passed")
test_rendered_text_distinguishes_types()
print("✓ test_rendered_text_distinguishes_types passed")
test_to_json_serialization()
print("✓ test_to_json_serialization passed")
test_audit_trail_integration()
print("✓ test_audit_trail_integration passed")
print("\nAll tests passed!")