Compare commits

..

15 Commits

Author SHA1 Message Date
f05707254e docs: Add training pair provenance tracking documentation
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 14s
Smoke Test / smoke (pull_request) Failing after 11s
Validate Config / YAML Lint (pull_request) Failing after 11s
Validate Config / JSON Validate (pull_request) Successful in 14s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m21s
Validate Config / Shell Script Lint (pull_request) Failing after 20s
Validate Config / Cron Syntax Check (pull_request) Successful in 9s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 8s
Validate Config / Playbook Schema Validation (pull_request) Successful in 17s
PR Checklist / pr-checklist (pull_request) Failing after 3m33s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
Documents the new provenance tracking feature.
2026-04-15 16:03:48 +00:00
7101a0d5e5 test: Add tests for training pair provenance tracking
Comprehensive test suite for the provenance tracking functionality.
2026-04-15 16:02:58 +00:00
5763a148c2 feat: Add training pair provenance tracking
Adds provenance metadata to training pairs:
- source_session_id: Which session generated the pair
- model: Which model generated it
- timestamp: When it was generated
- source: Source type (curated, trajectory, etc.)
- content_hash: For deduplication

Provides filtering and reporting capabilities.

Addresses issue #691
2026-04-15 16:01:49 +00:00
817785d763 Merge pull request 'feat: training data augmentation — paraphrase and translate pairs (#695)' (#732) from fix/695 into main 2026-04-15 11:56:28 +00:00
Alexander Whitestone
3603030235 feat: training data augmentation — paraphrase and translate pairs (#695)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 22s
Smoke Test / smoke (pull_request) Failing after 18s
Validate Config / YAML Lint (pull_request) Failing after 23s
Validate Config / JSON Validate (pull_request) Successful in 21s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m54s
Validate Config / Shell Script Lint (pull_request) Failing after 54s
Validate Config / Cron Syntax Check (pull_request) Successful in 16s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 16s
Validate Config / Playbook Schema Validation (pull_request) Successful in 23s
PR Checklist / pr-checklist (pull_request) Failing after 11m2s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
augment_pairs.py: generates paraphrases and translations for any
JSONL training file.

Features:
- Auto-detects text field (rich, terse, text, content, lyric_line, etc.)
- N paraphrases per entry (template-based, or LLM with --llm-endpoint)
- Translations to ES, FR, DE (template dictionary, or LLM)
- Outputs augmented JSONL alongside originals
- Marks each augmented entry with _augmentation, _original, _language

Usage:
  python3 augment_pairs.py --input data.jsonl
  python3 augment_pairs.py --input data.jsonl --paraphrases 5 --langs es,fr
  python3 augment_pairs.py --input data.jsonl --llm-endpoint http://localhost:11434/v1

Closes #695
2026-04-15 07:51:38 -04:00
35a191f7b1 Merge PR #725: feat: Provider health monitor with auto-switch (#509) 2026-04-15 06:10:45 +00:00
e987e1b870 Merge PR #726: feat: Pre-flight provider check for session launch (#508) 2026-04-15 06:10:41 +00:00
19278513b4 Merge PR #727: feat: Three.js-specific glitch detection patterns (#543) 2026-04-15 06:10:38 +00:00
1088bf8983 test: add Three.js pattern tests and update assertions (#543)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 28s
Smoke Test / smoke (pull_request) Failing after 23s
Validate Config / YAML Lint (pull_request) Failing after 21s
Validate Config / JSON Validate (pull_request) Successful in 21s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m23s
Validate Config / Shell Script Lint (pull_request) Failing after 50s
Validate Config / Cron Syntax Check (pull_request) Successful in 11s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 26s
Validate Config / Playbook Schema Validation (pull_request) Successful in 32s
PR Checklist / pr-checklist (pull_request) Failing after 11m13s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
- Added TestThreeJsPatterns class with 14 tests
- Tests cover: pattern existence, severity inference, vision prompt
- Updated pattern count assertion (14+ patterns now)
- Updated demo test (6 glitches: 4 original + 2 Three.js)
2026-04-15 05:37:17 +00:00
94f0a132d4 feat: add get_threejs_patterns() filter function (#543) 2026-04-15 05:34:17 +00:00
279356bed6 feat: add --threejs flag and Three.js-aware severity inference (#543)
- Added --threejs flag for focused Three.js pattern scanning
- Updated _infer_severity with shader_failure, texture_placeholder,
  uv_mapping_error, frustum_culling, shadow_map_artifact categories
- Added Three.js demo detections (shader failure, shadow map)
- Bumped detector version to 0.2.0
2026-04-15 05:34:16 +00:00
511ff863c2 feat: add Three.js-specific glitch detection patterns (#543)
Adds 6 new Three.js-specific glitch categories and patterns:
- SHADER_FAILURE: Solid black materials from shader compilation errors
- TEXTURE_PLACEHOLDER: 1x1 white pixel stretched surfaces
- UV_MAPPING_ERROR: BufferGeometry UV coordinate errors
- FRUSTUM_CULLING: Objects popping at screen edges
- SHADOW_MAP_ARTIFACT: Pixelated/blocky shadow maps
- BLOOM_OVERFLOW: Excessive post-processing bloom bleed

Closes #543
2026-04-15 05:32:25 +00:00
b6e3a647b0 feat: add pre-flight provider check script (#508)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 29s
PR Checklist / pr-checklist (pull_request) Failing after 7m23s
Smoke Test / smoke (pull_request) Failing after 20s
Validate Config / YAML Lint (pull_request) Failing after 14s
Validate Config / JSON Validate (pull_request) Successful in 15s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 1m1s
Validate Config / Shell Script Lint (pull_request) Failing after 46s
Validate Config / Cron Syntax Check (pull_request) Successful in 9s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 10s
Validate Config / Playbook Schema Validation (pull_request) Successful in 28s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
- Checks OpenRouter balance via /api/v1/auth/key
- Tests Nous and Anthropic API keys
- Verifies Ollama is running
- Pre-flight check before session launch
- Returns exit code for automation

Closes #508
2026-04-15 03:55:04 +00:00
e14158676d feat: add provider health monitor script (#509)
Some checks failed
Architecture Lint / Linter Tests (pull_request) Successful in 44s
Smoke Test / smoke (pull_request) Failing after 36s
Validate Config / YAML Lint (pull_request) Failing after 21s
Validate Config / JSON Validate (pull_request) Successful in 28s
Validate Config / Python Syntax & Import Check (pull_request) Failing after 2m36s
Validate Config / Shell Script Lint (pull_request) Failing after 1m3s
Validate Config / Cron Syntax Check (pull_request) Successful in 13s
Validate Config / Deploy Script Dry Run (pull_request) Successful in 12s
PR Checklist / pr-checklist (pull_request) Failing after 6m15s
Validate Config / Playbook Schema Validation (pull_request) Successful in 28s
Architecture Lint / Lint Repository (pull_request) Has been cancelled
Validate Config / Python Test Suite (pull_request) Has been cancelled
- Tests all configured providers
- Maintains health map in tmux-state.json
- Auto-switches profiles to working providers
- Supports --daemon and --status modes

Closes #509
2026-04-15 03:48:37 +00:00
26e39d8949 feat: add autonomous cron supervisor job (#513)
- Runs every 7 minutes
- Checks dev and timmy sessions
- Loads tmux-supervisor skill
- Telegram only on actionable events
- Silent when all agents busy
2026-04-15 03:33:43 +00:00
11 changed files with 1638 additions and 15 deletions

View File

@@ -31,6 +31,14 @@ class GlitchCategory(Enum):
WATER_REFLECTION = "water_reflection"
SKYBOX_SEAM = "skybox_seam"
# Three.js-specific categories (ref: timmy-config#543)
SHADER_FAILURE = "shader_failure"
TEXTURE_PLACEHOLDER = "texture_placeholder"
UV_MAPPING_ERROR = "uv_mapping_error"
FRUSTUM_CULLING = "frustum_culling"
SHADOW_MAP_ARTIFACT = "shadow_map_artifact"
BLOOM_OVERFLOW = "bloom_overflow"
@dataclass
class GlitchPattern:
@@ -241,6 +249,123 @@ MATRIX_GLITCH_PATTERNS: list[GlitchPattern] = [
],
confidence_threshold=0.45,
),
# --- Three.js-Specific Glitch Patterns (ref: timmy-config#543) ---
GlitchPattern(
category=GlitchCategory.SHADER_FAILURE,
name="Shader Compilation Failure",
description="Three.js shader failed to compile, rendering the material as solid black. "
"Common when custom ShaderMaterial has syntax errors or missing uniforms.",
severity=GlitchSeverity.CRITICAL,
detection_prompts=[
"Look for objects or surfaces rendered as pure black (#000000) that should have visible textures or materials.",
"Identify geometry that appears completely dark while surrounding objects are normally lit.",
"Check for objects where the material seems to 'absorb all light' — flat black with no shading gradient.",
],
visual_indicators=[
"solid black object with no shading",
"geometry rendered as silhouette",
"material appears to absorb light entirely",
"black patch inconsistent with scene lighting",
],
confidence_threshold=0.7,
),
GlitchPattern(
category=GlitchCategory.TEXTURE_PLACEHOLDER,
name="Three.js Texture Not Loaded",
description="Three.js failed to load the texture asset, rendering a 1x1 white pixel "
"stretched across the entire surface. Distinguished from missing-texture by "
"the uniform white/grey appearance rather than magenta.",
severity=GlitchSeverity.CRITICAL,
detection_prompts=[
"Look for surfaces that are uniformly white or light grey with no texture detail, even on large geometry.",
"Identify objects where the texture appears as a single solid color stretched across complex UVs.",
"Check for surfaces that look 'blank' or 'unloaded' — flat white/grey where detail should exist.",
],
visual_indicators=[
"uniform white or light grey surface",
"no texture detail on large geometry",
"stretched single-color appearance",
"1x1 pixel placeholder stretched to fill UV space",
],
confidence_threshold=0.65,
),
GlitchPattern(
category=GlitchCategory.UV_MAPPING_ERROR,
name="BufferGeometry UV Mapping Error",
description="Three.js BufferGeometry has incorrect UV coordinates, causing textures to "
"appear stretched, compressed, or mapped to the wrong faces.",
severity=GlitchSeverity.HIGH,
detection_prompts=[
"Look for textures that appear dramatically stretched in one direction on specific faces.",
"Identify surfaces where the texture pattern is distorted but other nearby surfaces look correct.",
"Check for faces where the texture seems 'smeared' or mapped with incorrect aspect ratio.",
],
visual_indicators=[
"texture stretching on specific faces",
"distorted pattern on geometry",
"smeared texture appearance",
"aspect ratio mismatch between texture and surface",
],
confidence_threshold=0.6,
),
GlitchPattern(
category=GlitchCategory.FRUSTUM_CULLING,
name="Frustum Culling Artifact",
description="Three.js frustum culling incorrectly marks objects as outside the camera "
"frustum, causing them to pop in/out of existence at screen edges.",
severity=GlitchSeverity.MEDIUM,
detection_prompts=[
"Look for objects that are partially visible at the edge of the frame — half-rendered or cut off unnaturally.",
"Identify geometry that seems to 'pop' into existence as the view angle changes.",
"Check screen edges for objects that appear suddenly rather than smoothly entering the viewport.",
],
visual_indicators=[
"half-visible object at screen edge",
"object popping into frame",
"abrupt appearance of geometry",
"bounding box visible but mesh missing",
],
confidence_threshold=0.55,
),
GlitchPattern(
category=GlitchCategory.SHADOW_MAP_ARTIFACT,
name="Shadow Map Resolution Artifact",
description="Three.js shadow map has insufficient resolution, causing pixelated, "
"blocky shadows with visible texel edges instead of smooth shadow gradients.",
severity=GlitchSeverity.MEDIUM,
detection_prompts=[
"Look for shadows with visible blocky or pixelated edges instead of smooth gradients.",
"Identify shadow maps where individual texels (texture pixels) are clearly visible.",
"Check for shadows that appear as jagged stair-stepped patterns rather than soft edges.",
],
visual_indicators=[
"blocky shadow edges",
"visible texel grid in shadows",
"stair-stepped shadow boundary",
"pixelated shadow gradient",
],
confidence_threshold=0.55,
),
GlitchPattern(
category=GlitchCategory.BLOOM_OVERFLOW,
name="Post-Processing Bloom Overflow",
description="Three.js UnrealBloomPass or similar post-processing bloom effect is too "
"intense, causing bright areas to bleed glow into surrounding geometry.",
severity=GlitchSeverity.LOW,
detection_prompts=[
"Look for bright areas that have an unusually large, soft glow bleeding into adjacent surfaces.",
"Identify scenes where light sources appear to have a 'halo' that extends beyond physical plausibility.",
"Check for bright objects whose glow color bleeds onto nearby unrelated geometry.",
],
visual_indicators=[
"excessive glow bleeding from bright surfaces",
"halo around light sources",
"bloom color tinting adjacent geometry",
"glow bleeding beyond object boundaries",
],
confidence_threshold=0.5,
),
]
@@ -289,6 +414,23 @@ def build_vision_prompt(patterns: list[GlitchPattern] | None = None) -> str:
)
# Three.js-specific category set for filtering (ref: timmy-config#543)
THREEJS_CATEGORIES = {
GlitchCategory.SHADER_FAILURE,
GlitchCategory.TEXTURE_PLACEHOLDER,
GlitchCategory.UV_MAPPING_ERROR,
GlitchCategory.FRUSTUM_CULLING,
GlitchCategory.SHADOW_MAP_ARTIFACT,
GlitchCategory.BLOOM_OVERFLOW,
}
def get_threejs_patterns() -> list[GlitchPattern]:
"""Return only Three.js-specific glitch patterns."""
return [p for p in MATRIX_GLITCH_PATTERNS if p.category in THREEJS_CATEGORIES]
if __name__ == "__main__":
import json
print(f"Loaded {len(MATRIX_GLITCH_PATTERNS)} glitch patterns:\n")

View File

@@ -9,7 +9,7 @@ Usage:
python matrix_glitch_detector.py <url> [--angles 4] [--output report.json]
python matrix_glitch_detector.py --demo # Run with synthetic test data
Ref: timmy-config#491
Ref: timmy-config#491, timmy-config#543
"""
import argparse
@@ -33,6 +33,7 @@ from glitch_patterns import (
MATRIX_GLITCH_PATTERNS,
build_vision_prompt,
get_patterns_by_severity,
get_threejs_patterns,
)
@@ -345,14 +346,17 @@ def _parse_vision_response(
def _infer_severity(category: str, confidence: float) -> str:
"""Infer severity from category and confidence when not provided."""
critical_cats = {"missing_textures", "clipping"}
high_cats = {"floating_assets", "broken_normals"}
critical_cats = {"missing_textures", "clipping", "shader_failure", "texture_placeholder"}
high_cats = {"floating_assets", "broken_normals", "uv_mapping_error"}
medium_cats = {"frustum_culling", "shadow_map_artifact"}
cat_lower = category.lower()
if any(c in cat_lower for c in critical_cats):
return "critical" if confidence > 0.7 else "high"
if any(c in cat_lower for c in high_cats):
return "high" if confidence > 0.7 else "medium"
if any(c in cat_lower for c in medium_cats):
return "medium" if confidence > 0.6 else "low"
return "medium" if confidence > 0.6 else "low"
@@ -389,9 +393,9 @@ def build_report(
),
},
metadata={
"detector_version": "0.1.0",
"detector_version": "0.2.0",
"pattern_count": len(MATRIX_GLITCH_PATTERNS),
"reference": "timmy-config#491",
"reference": "timmy-config#491, timmy-config#543",
},
)
@@ -460,6 +464,30 @@ def run_demo(output_path: Optional[Path] = None) -> ScanResult:
screenshot_index=3,
screenshot_angle="left",
),
DetectedGlitch(
id=str(uuid.uuid4())[:8],
category="shader_failure",
name="Black Material on Portal Frame",
description="Portal frame rendered as solid black — shader compilation failed (missing uniform u_time)",
severity="critical",
confidence=0.91,
location_x=45.0,
location_y=30.0,
screenshot_index=0,
screenshot_angle="front",
),
DetectedGlitch(
id=str(uuid.uuid4())[:8],
category="shadow_map_artifact",
name="Pixelated Character Shadow",
description="Character shadow shows visible texel grid — shadow map resolution too low (512x512)",
severity="medium",
confidence=0.78,
location_x=52.0,
location_y=75.0,
screenshot_index=1,
screenshot_angle="right",
),
]
print(f"[*] Detected {len(demo_glitches)} glitches")
@@ -496,6 +524,11 @@ Examples:
help="Minimum severity to include in report",
)
parser.add_argument("--verbose", "-v", action="store_true", help="Verbose output")
parser.add_argument(
"--threejs",
action="store_true",
help="Focus on Three.js-specific glitch patterns only (shader, texture, UV, culling, shadow, bloom)",
)
args = parser.parse_args()
@@ -525,9 +558,13 @@ Examples:
screenshots = capture_screenshots(args.url, angles, screenshots_dir)
print(f"[*] Captured {len(screenshots)} screenshots")
# Filter patterns by severity
# Filter patterns by severity and type
min_sev = GlitchSeverity(args.min_severity)
patterns = get_patterns_by_severity(min_sev)
if args.threejs:
threejs_patterns = get_threejs_patterns()
patterns = [p for p in patterns if p in threejs_patterns]
print(f"[*] Three.js-focused mode: {len(patterns)} patterns")
# Analyze with vision AI
print(f"[*] Analyzing with vision AI ({len(patterns)} patterns)...")

View File

@@ -0,0 +1,271 @@
#!/usr/bin/env python3
"""
Pre-Flight Provider Check Script
Issue #508: [Robustness] Credential drain detection — provider health checks
Pre-flight check before session launch: verifies provider credentials and balance.
Usage:
python3 preflight-provider-check.py # Check all providers
python3 preflight-provider-check.py --launch # Check and return exit code
python3 preflight-provider-check.py --balance # Check OpenRouter balance
"""
import os, sys, json, yaml, urllib.request
from datetime import datetime, timezone
from pathlib import Path
# Configuration
HERMES_HOME = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
LOG_DIR = Path.home() / ".local" / "timmy" / "fleet-health"
LOG_FILE = LOG_DIR / "preflight-check.log"
def log(msg):
"""Log message to file and optionally console."""
timestamp = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
log_entry = "[" + timestamp + "] " + msg
LOG_DIR.mkdir(parents=True, exist_ok=True)
with open(LOG_FILE, "a") as f:
f.write(log_entry + "\n")
if "--quiet" not in sys.argv:
print(log_entry)
def get_provider_api_key(provider):
"""Get API key for a provider from .env or environment."""
env_file = HERMES_HOME / ".env"
if env_file.exists():
with open(env_file) as f:
for line in f:
line = line.strip()
if line.startswith(provider.upper() + "_API_KEY="):
return line.split("=", 1)[1].strip().strip("'\"")
return os.environ.get(provider.upper() + "_API_KEY")
def check_openrouter_balance(api_key):
"""Check OpenRouter balance via /api/v1/auth/key."""
if not api_key:
return False, "No API key", 0
try:
req = urllib.request.Request(
"https://openrouter.ai/api/v1/auth/key",
headers={"Authorization": "Bearer " + api_key}
)
resp = urllib.request.urlopen(req, timeout=10)
data = json.loads(resp.read())
# Check for credits
credits = data.get("data", {}).get("limit", 0)
usage = data.get("data", {}).get("usage", 0)
remaining = credits - usage if credits else None
if remaining is not None and remaining <= 0:
return False, "No credits remaining", 0
elif remaining is not None:
return True, "Credits available", remaining
else:
return True, "Unlimited or unknown balance", None
except urllib.error.HTTPError as e:
if e.code == 401:
return False, "Invalid API key", 0
else:
return False, "HTTP " + str(e.code), 0
except Exception as e:
return False, str(e)[:100], 0
def check_nous_key(api_key):
"""Check Nous API key with minimal test call."""
if not api_key:
return False, "No API key"
try:
req = urllib.request.Request(
"https://inference.nousresearch.com/v1/models",
headers={"Authorization": "Bearer " + api_key}
)
resp = urllib.request.urlopen(req, timeout=10)
if resp.status == 200:
return True, "Valid key"
else:
return False, "HTTP " + str(resp.status)
except urllib.error.HTTPError as e:
if e.code == 401:
return False, "Invalid API key"
elif e.code == 403:
return False, "Forbidden"
else:
return False, "HTTP " + str(e.code)
except Exception as e:
return False, str(e)[:100]
def check_anthropic_key(api_key):
"""Check Anthropic API key with minimal test call."""
if not api_key:
return False, "No API key"
try:
req = urllib.request.Request(
"https://api.anthropic.com/v1/models",
headers={
"x-api-key": api_key,
"anthropic-version": "2023-06-01"
}
)
resp = urllib.request.urlopen(req, timeout=10)
if resp.status == 200:
return True, "Valid key"
else:
return False, "HTTP " + str(resp.status)
except urllib.error.HTTPError as e:
if e.code == 401:
return False, "Invalid API key"
elif e.code == 403:
return False, "Forbidden"
else:
return False, "HTTP " + str(e.code)
except Exception as e:
return False, str(e)[:100]
def check_ollama():
"""Check if Ollama is running."""
try:
req = urllib.request.Request("http://localhost:11434/api/tags")
resp = urllib.request.urlopen(req, timeout=5)
if resp.status == 200:
data = json.loads(resp.read())
models = data.get("models", [])
return True, str(len(models)) + " models loaded"
else:
return False, "HTTP " + str(resp.status)
except Exception as e:
return False, str(e)[:100]
def get_configured_provider():
"""Get the configured provider from global config."""
config_file = HERMES_HOME / "config.yaml"
if not config_file.exists():
return None
try:
with open(config_file) as f:
config = yaml.safe_load(f)
model_config = config.get("model", {})
if isinstance(model_config, dict):
return model_config.get("provider")
except:
pass
return None
def run_preflight_check():
"""Run pre-flight check on all providers."""
log("=== Pre-Flight Provider Check ===")
results = {}
# Check OpenRouter
or_key = get_provider_api_key("openrouter")
or_ok, or_msg, or_balance = check_openrouter_balance(or_key)
results["openrouter"] = {"healthy": or_ok, "message": or_msg, "balance": or_balance}
# Check Nous
nous_key = get_provider_api_key("nous")
nous_ok, nous_msg = check_nous_key(nous_key)
results["nous"] = {"healthy": nous_ok, "message": nous_msg}
# Check Anthropic
anthropic_key = get_provider_api_key("anthropic")
anthropic_ok, anthropic_msg = check_anthropic_key(anthropic_key)
results["anthropic"] = {"healthy": anthropic_ok, "message": anthropic_msg}
# Check Ollama
ollama_ok, ollama_msg = check_ollama()
results["ollama"] = {"healthy": ollama_ok, "message": ollama_msg}
# Get configured provider
configured = get_configured_provider()
# Summary
healthy_count = sum(1 for r in results.values() if r["healthy"])
total_count = len(results)
log("Results: " + str(healthy_count) + "/" + str(total_count) + " providers healthy")
for provider, result in results.items():
status = "HEALTHY" if result["healthy"] else "UNHEALTHY"
extra = ""
if provider == "openrouter" and result.get("balance") is not None:
extra = " (balance: " + str(result["balance"]) + ")"
log(" " + provider + ": " + status + " - " + result["message"] + extra)
if configured:
log("Configured provider: " + configured)
if configured in results and not results[configured]["healthy"]:
log("WARNING: Configured provider " + configured + " is UNHEALTHY!")
return results, configured
def check_launch_readiness():
"""Check if we're ready to launch sessions."""
results, configured = run_preflight_check()
# Check if configured provider is healthy
if configured and configured in results:
if not results[configured]["healthy"]:
log("LAUNCH BLOCKED: Configured provider " + configured + " is unhealthy")
return False, configured + " is unhealthy"
# Check if at least one provider is healthy
healthy_providers = [p for p, r in results.items() if r["healthy"]]
if not healthy_providers:
log("LAUNCH BLOCKED: No healthy providers available")
return False, "No healthy providers"
log("LAUNCH READY: " + str(len(healthy_providers)) + " healthy providers available")
return True, "Ready"
def show_balance():
"""Show OpenRouter balance."""
api_key = get_provider_api_key("openrouter")
if not api_key:
print("No OpenRouter API key found")
return
ok, msg, balance = check_openrouter_balance(api_key)
if ok:
if balance is not None:
print("OpenRouter balance: " + str(balance) + " credits")
else:
print("OpenRouter: " + msg)
else:
print("OpenRouter: " + msg)
def main():
if "--balance" in sys.argv:
show_balance()
elif "--launch" in sys.argv:
ready, message = check_launch_readiness()
if ready:
print("READY")
sys.exit(0)
else:
print("BLOCKED: " + message)
sys.exit(1)
else:
run_preflight_check()
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,411 @@
#!/usr/bin/env python3
"""
Provider Health Monitor Script
Issue #509: [Robustness] Provider-aware profile config — auto-switch on failure
Monitors provider health and automatically switches profiles to working providers.
Usage:
python3 provider-health-monitor.py # Run once
python3 provider-health-monitor.py --daemon # Run continuously
python3 provider-health-monitor.py --status # Show provider health
"""
import os, sys, json, yaml, urllib.request, time
from datetime import datetime, timezone
from pathlib import Path
# Configuration
HERMES_HOME = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
PROFILES_DIR = HERMES_HOME / "profiles"
LOG_DIR = Path.home() / ".local" / "timmy" / "fleet-health"
STATE_FILE = LOG_DIR / "tmux-state.json"
LOG_FILE = LOG_DIR / "provider-health.log"
# Provider test endpoints
PROVIDER_TESTS = {
"openrouter": {
"url": "https://openrouter.ai/api/v1/models",
"method": "GET",
"headers": lambda api_key: {"Authorization": "Bearer " + api_key},
"timeout": 10
},
"anthropic": {
"url": "https://api.anthropic.com/v1/models",
"method": "GET",
"headers": lambda api_key: {"x-api-key": api_key, "anthropic-version": "2023-06-01"},
"timeout": 10
},
"nous": {
"url": "https://inference.nousresearch.com/v1/models",
"method": "GET",
"headers": lambda api_key: {"Authorization": "Bearer " + api_key},
"timeout": 10
},
"kimi-coding": {
"url": "https://api.kimi.com/coding/v1/models",
"method": "GET",
"headers": lambda api_key: {"x-api-key": api_key, "x-api-provider": "kimi-coding"},
"timeout": 10
},
"ollama": {
"url": "http://localhost:11434/api/tags",
"method": "GET",
"headers": lambda api_key: {},
"timeout": 5
}
}
def log(msg):
"""Log message to file and optionally console."""
timestamp = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
log_entry = "[" + timestamp + "] " + msg
LOG_DIR.mkdir(parents=True, exist_ok=True)
with open(LOG_FILE, "a") as f:
f.write(log_entry + "\n")
if "--quiet" not in sys.argv:
print(log_entry)
def get_provider_api_key(provider):
"""Get API key for a provider from .env or environment."""
env_file = HERMES_HOME / ".env"
if env_file.exists():
with open(env_file) as f:
for line in f:
line = line.strip()
if line.startswith(provider.upper() + "_API_KEY="):
return line.split("=", 1)[1].strip().strip("'\"")
return os.environ.get(provider.upper() + "_API_KEY")
def test_provider(provider, api_key=None):
"""Test if a provider is healthy."""
config = PROVIDER_TESTS.get(provider)
if not config:
return False, "Unknown provider: " + provider
headers = config["headers"](api_key or "")
try:
req = urllib.request.Request(
config["url"],
headers=headers,
method=config["method"]
)
resp = urllib.request.urlopen(req, timeout=config["timeout"])
if resp.status == 200:
return True, "Healthy"
else:
return False, "HTTP " + str(resp.status)
except urllib.error.HTTPError as e:
if e.code == 401:
return False, "Unauthorized (401)"
elif e.code == 403:
return False, "Forbidden (403)"
elif e.code == 429:
return True, "Rate limited but accessible"
else:
return False, "HTTP " + str(e.code)
except Exception as e:
return False, str(e)[:100]
def get_all_providers():
"""Get all providers from profiles and global config."""
providers = set()
# Global config
global_config = HERMES_HOME / "config.yaml"
if global_config.exists():
try:
with open(global_config) as f:
config = yaml.safe_load(f)
# Primary model provider
model_config = config.get("model", {})
if isinstance(model_config, dict):
provider = model_config.get("provider", "")
if provider:
providers.add(provider)
# Auxiliary providers
auxiliary = config.get("auxiliary", {})
for aux_config in auxiliary.values():
if isinstance(aux_config, dict):
provider = aux_config.get("provider", "")
if provider and provider != "auto":
providers.add(provider)
except:
pass
# Profile configs
if PROFILES_DIR.exists():
for profile_dir in PROFILES_DIR.iterdir():
if profile_dir.is_dir():
config_file = profile_dir / "config.yaml"
if config_file.exists():
try:
with open(config_file) as f:
config = yaml.safe_load(f)
model_config = config.get("model", {})
if isinstance(model_config, dict):
provider = model_config.get("provider", "")
if provider:
providers.add(provider)
auxiliary = config.get("auxiliary", {})
for aux_config in auxiliary.values():
if isinstance(aux_config, dict):
provider = aux_config.get("provider", "")
if provider and provider != "auto":
providers.add(provider)
except:
pass
# Add common providers even if not configured
providers.update(["openrouter", "nous", "ollama"])
return list(providers)
def build_health_map():
"""Build a health map of all providers."""
providers = get_all_providers()
health_map = {}
log("Testing " + str(len(providers)) + " providers...")
for provider in providers:
api_key = get_provider_api_key(provider)
healthy, message = test_provider(provider, api_key)
health_map[provider] = {
"healthy": healthy,
"message": message,
"last_test": datetime.now(timezone.utc).isoformat(),
"api_key_present": bool(api_key)
}
status = "HEALTHY" if healthy else "UNHEALTHY"
log(" " + provider + ": " + status + " - " + message)
return health_map
def get_fallback_providers(health_map):
"""Get list of healthy providers in priority order."""
# Priority order: nous, openrouter, ollama, others
priority_order = ["nous", "openrouter", "ollama", "anthropic", "kimi-coding"]
healthy = []
for provider in priority_order:
if provider in health_map and health_map[provider]["healthy"]:
healthy.append(provider)
# Add any other healthy providers not in priority list
for provider, info in health_map.items():
if info["healthy"] and provider not in healthy:
healthy.append(provider)
return healthy
def update_profile_config(profile_name, new_provider):
"""Update a profile's config to use a new provider."""
config_file = PROFILES_DIR / profile_name / "config.yaml"
if not config_file.exists():
return False, "Config file not found"
try:
with open(config_file) as f:
config = yaml.safe_load(f)
# Update model provider
if "model" not in config:
config["model"] = {}
old_provider = config["model"].get("provider", "unknown")
config["model"]["provider"] = new_provider
# Update auxiliary providers if they were using the old provider
auxiliary = config.get("auxiliary", {})
for aux_name, aux_config in auxiliary.items():
if isinstance(aux_config, dict) and aux_config.get("provider") == old_provider:
aux_config["provider"] = new_provider
# Write back
with open(config_file, "w") as f:
yaml.dump(config, f, default_flow_style=False)
log("Updated " + profile_name + ": " + old_provider + " -> " + new_provider)
return True, "Updated"
except Exception as e:
return False, str(e)
def check_profiles(health_map):
"""Check all profiles and update unhealthy providers."""
if not PROFILES_DIR.exists():
return
fallback_providers = get_fallback_providers(health_map)
if not fallback_providers:
log("CRITICAL: No healthy providers available!")
return
updated_profiles = []
for profile_dir in PROFILES_DIR.iterdir():
if not profile_dir.is_dir():
continue
profile_name = profile_dir.name
config_file = profile_dir / "config.yaml"
if not config_file.exists():
continue
try:
with open(config_file) as f:
config = yaml.safe_load(f)
model_config = config.get("model", {})
if not isinstance(model_config, dict):
continue
current_provider = model_config.get("provider", "")
if not current_provider:
continue
# Check if current provider is healthy
if current_provider in health_map and health_map[current_provider]["healthy"]:
continue # Provider is healthy, no action needed
# Find best fallback
best_fallback = None
for provider in fallback_providers:
if provider != current_provider:
best_fallback = provider
break
if not best_fallback:
log("No fallback for " + profile_name + " (current: " + current_provider + ")")
continue
# Update profile
success, message = update_profile_config(profile_name, best_fallback)
if success:
updated_profiles.append({
"profile": profile_name,
"old_provider": current_provider,
"new_provider": best_fallback
})
except Exception as e:
log("Error processing " + profile_name + ": " + str(e))
return updated_profiles
def load_state():
"""Load state from tmux-state.json."""
if STATE_FILE.exists():
try:
with open(STATE_FILE) as f:
return json.load(f)
except:
pass
return {}
def save_state(state):
"""Save state to tmux-state.json."""
LOG_DIR.mkdir(parents=True, exist_ok=True)
with open(STATE_FILE, "w") as f:
json.dump(state, f, indent=2)
def run_once():
"""Run provider health check once."""
log("=== Provider Health Check ===")
state = load_state()
# Build health map
health_map = build_health_map()
# Check profiles and update if needed
updated_profiles = check_profiles(health_map)
# Update state
state["provider_health"] = health_map
state["last_provider_check"] = datetime.now(timezone.utc).isoformat()
if updated_profiles:
state["last_profile_updates"] = updated_profiles
save_state(state)
# Summary
healthy_count = sum(1 for p in health_map.values() if p["healthy"])
total_count = len(health_map)
log("Health: " + str(healthy_count) + "/" + str(total_count) + " providers healthy")
if updated_profiles:
log("Updated " + str(len(updated_profiles)) + " profiles:")
for update in updated_profiles:
log(" " + update["profile"] + ": " + update["old_provider"] + " -> " + update["new_provider"])
def show_status():
"""Show provider health status."""
state = load_state()
health_map = state.get("provider_health", {})
if not health_map:
print("No provider health data available. Run without --status first.")
return
print("Provider Health (last updated: " + str(state.get("last_provider_check", "unknown")) + ")")
print("=" * 80)
for provider, info in sorted(health_map.items()):
status = "HEALTHY" if info["healthy"] else "UNHEALTHY"
message = info.get("message", "")
api_key = "yes" if info.get("api_key_present") else "no"
print(provider.ljust(20) + " " + status.ljust(10) + " API key: " + api_key + " - " + message)
# Show recent updates
updates = state.get("last_profile_updates", [])
if updates:
print()
print("Recent Profile Updates:")
for update in updates:
print(" " + update["profile"] + ": " + update["old_provider"] + " -> " + update["new_provider"])
def daemon_mode():
"""Run continuously."""
log("Starting provider health daemon (check every 300s)")
while True:
try:
run_once()
time.sleep(300) # Check every 5 minutes
except KeyboardInterrupt:
log("Daemon stopped by user")
break
except Exception as e:
log("Error: " + str(e))
time.sleep(60)
def main():
if "--status" in sys.argv:
show_status()
elif "--daemon" in sys.argv:
daemon_mode()
else:
run_once()
if __name__ == "__main__":
main()

View File

@@ -196,7 +196,37 @@
"paused_reason": null,
"skills": [],
"skill": null
},
{
"id": "tmux-supervisor-513",
"name": "Autonomous Cron Supervisor",
"prompt": "Load the tmux-supervisor skill and execute the monitoring protocol.\n\nCheck both `dev` and `timmy` tmux sessions for idle panes. Only send Telegram notifications on actionable events (idle, overflow, failure). Be silent when all agents are working.\n\nSteps:\n1. List all tmux sessions (skip 'Alexander')\n2. For each session, list windows and panes\n3. Capture each pane and classify state (idle vs active)\n4. For idle panes: read context, craft context-aware prompt\n5. Send /queue prompts to idle panes\n6. Verify prompts landed\n7. Only notify via Telegram if:\n - A pane was prompted (idle detected)\n - A pane shows context overflow (>80%)\n - A pane is stuck or crashed\n8. If all panes are active: respond with [SILENT]",
"schedule": {
"kind": "interval",
"minutes": 7,
"display": "every 7m"
},
"schedule_display": "every 7m",
"repeat": {
"times": null,
"completed": 0
},
"enabled": true,
"created_at": "2026-04-15T03:00:00.000000+00:00",
"next_run_at": null,
"last_run_at": null,
"last_status": null,
"last_error": null,
"deliver": "telegram",
"origin": null,
"state": "scheduled",
"paused_at": null,
"paused_reason": null,
"skills": [
"tmux-supervisor"
],
"skill": "tmux-supervisor"
}
],
"updated_at": "2026-04-13T02:00:00+00:00"
}
}

View File

@@ -19,9 +19,11 @@ from glitch_patterns import (
GlitchPattern,
GlitchSeverity,
MATRIX_GLITCH_PATTERNS,
THREEJS_CATEGORIES,
build_vision_prompt,
get_pattern_by_category,
get_patterns_by_severity,
get_threejs_patterns,
)
from matrix_glitch_detector import (
@@ -40,7 +42,7 @@ class TestGlitchPatterns(unittest.TestCase):
def test_pattern_count(self):
"""Verify we have a reasonable number of defined patterns."""
self.assertGreaterEqual(len(MATRIX_GLITCH_PATTERNS), 8)
self.assertGreaterEqual(len(MATRIX_GLITCH_PATTERNS), 14) # 10 generic + 6 Three.js
def test_all_patterns_have_required_fields(self):
"""Every pattern must have category, name, description, severity, prompts."""
@@ -88,6 +90,9 @@ class TestGlitchPatterns(unittest.TestCase):
self.assertIn("Floating Object", prompt)
self.assertIn("Z-Fighting", prompt)
self.assertIn("Missing", prompt)
# Three.js patterns should be included
self.assertIn("Shader Compilation Failure", prompt)
self.assertIn("Bloom Overflow", prompt)
def test_build_vision_prompt_subset(self):
"""Vision prompt with subset should only include specified patterns."""
@@ -248,7 +253,7 @@ class TestGlitchDetector(unittest.TestCase):
try:
report = run_demo(output_path)
self.assertEqual(len(report.glitches), 4)
self.assertEqual(len(report.glitches), 6) # 4 original + 2 Three.js
self.assertGreater(report.summary["total_glitches"], 0)
self.assertTrue(output_path.exists())
@@ -260,6 +265,93 @@ class TestGlitchDetector(unittest.TestCase):
output_path.unlink(missing_ok=True)
class TestThreeJsPatterns(unittest.TestCase):
"""Tests for Three.js-specific glitch patterns (timmy-config#543)."""
def test_get_threejs_patterns_returns_only_threejs(self):
"""get_threejs_patterns() should return only Three.js categories."""
patterns = get_threejs_patterns()
self.assertEqual(len(patterns), 6)
for p in patterns:
self.assertIn(p.category, THREEJS_CATEGORIES)
def test_threejs_patterns_have_required_fields(self):
"""All Three.js patterns must have valid fields."""
for p in get_threejs_patterns():
self.assertIsInstance(p.category, GlitchCategory)
self.assertTrue(p.name)
self.assertTrue(p.description)
self.assertIsInstance(p.severity, GlitchSeverity)
self.assertGreater(len(p.detection_prompts), 0)
self.assertGreater(len(p.visual_indicators), 0)
def test_shader_failure_is_critical(self):
"""Shader compilation failure should be CRITICAL severity."""
p = get_pattern_by_category(GlitchCategory.SHADER_FAILURE)
self.assertIsNotNone(p)
self.assertEqual(p.severity, GlitchSeverity.CRITICAL)
def test_texture_placeholder_is_critical(self):
"""Texture placeholder (1x1 white) should be CRITICAL severity."""
p = get_pattern_by_category(GlitchCategory.TEXTURE_PLACEHOLDER)
self.assertIsNotNone(p)
self.assertEqual(p.severity, GlitchSeverity.CRITICAL)
def test_infer_severity_shader_failure(self):
"""Shader failure should infer critical/high."""
self.assertEqual(_infer_severity("shader_failure", 0.8), "critical")
self.assertEqual(_infer_severity("shader_failure", 0.5), "high")
def test_infer_severity_texture_placeholder(self):
"""Texture placeholder should infer critical/high."""
self.assertEqual(_infer_severity("texture_placeholder", 0.8), "critical")
self.assertEqual(_infer_severity("texture_placeholder", 0.5), "high")
def test_infer_severity_uv_mapping(self):
"""UV mapping error should infer high/medium."""
self.assertEqual(_infer_severity("uv_mapping_error", 0.8), "high")
self.assertEqual(_infer_severity("uv_mapping_error", 0.5), "medium")
def test_infer_severity_frustum_culling(self):
"""Frustum culling should infer medium/low."""
self.assertEqual(_infer_severity("frustum_culling", 0.7), "medium")
self.assertEqual(_infer_severity("frustum_culling", 0.4), "low")
def test_infer_severity_shadow_map(self):
"""Shadow map artifact should infer medium/low."""
self.assertEqual(_infer_severity("shadow_map_artifact", 0.7), "medium")
self.assertEqual(_infer_severity("shadow_map_artifact", 0.4), "low")
def test_infer_severity_bloom_overflow(self):
"""Bloom overflow should infer medium/low (default path)."""
self.assertEqual(_infer_severity("bloom_overflow", 0.7), "medium")
self.assertEqual(_infer_severity("bloom_overflow", 0.4), "low")
def test_threejs_patterns_in_vision_prompt(self):
"""Three.js patterns should appear in the composite vision prompt."""
prompt = build_vision_prompt()
self.assertIn("shader_failure", prompt)
self.assertIn("texture_placeholder", prompt)
self.assertIn("uv_mapping_error", prompt)
self.assertIn("frustum_culling", prompt)
self.assertIn("shadow_map_artifact", prompt)
self.assertIn("bloom_overflow", prompt)
def test_threejs_subset_prompt(self):
"""Building prompt from Three.js-only patterns should work."""
threejs = get_threejs_patterns()
prompt = build_vision_prompt(threejs)
self.assertIn("Shader Compilation Failure", prompt)
self.assertNotIn("Floating Object", prompt) # generic, not Three.js
def test_report_metadata_version(self):
"""Report metadata should reference both issues."""
report = run_demo()
self.assertEqual(report.metadata["detector_version"], "0.2.0")
self.assertIn("543", report.metadata["reference"])
class TestIntegration(unittest.TestCase):
"""Integration-level tests."""
@@ -276,6 +368,13 @@ class TestIntegration(unittest.TestCase):
expected = {"floating_assets", "z_fighting", "missing_textures", "clipping", "broken_normals"}
self.assertTrue(expected.issubset(category_values))
def test_patterns_cover_threejs_themes(self):
"""Patterns should cover Three.js-specific glitch themes (#543)."""
category_values = {p.category.value for p in MATRIX_GLITCH_PATTERNS}
threejs_expected = {"shader_failure", "texture_placeholder", "uv_mapping_error",
"frustum_culling", "shadow_map_artifact", "bloom_overflow"}
self.assertTrue(threejs_expected.issubset(category_values))
if __name__ == "__main__":
unittest.main()

View File

@@ -23,7 +23,7 @@
{"song": "Satellite Hearts", "artist": "Neon Circuit", "beat": 3, "timestamp": "1:00", "duration": "30s", "lyric_line": "The signal fades to nothing but I keep the frequency", "scene": {"mood": "longing", "colors": ["teal", "silver", "moonlight"], "composition": "over the shoulder", "camera": "dolly in", "description": "Through a window. Figure on the other side. Glass between. Breath on the pane. The signal fades to nothing but I keep the frequency"}}
{"song": "Satellite Hearts", "artist": "Neon Circuit", "beat": 4, "timestamp": "1:30", "duration": "30s", "lyric_line": "Then suddenly your laughter breaks through like a summer storm", "scene": {"mood": "connection", "colors": ["warm gold", "rose", "blush"], "composition": "low angle", "camera": "dolly out", "description": "Two hands reaching. Fingers almost touching. Warm light between them. Then suddenly your laughter breaks through like a summer storm"}}
{"song": "Satellite Hearts", "artist": "Neon Circuit", "beat": 5, "timestamp": "2:00", "duration": "30s", "lyric_line": "We're dancing in the data stream, our pixels intertwined", "scene": {"mood": "euphoria", "colors": ["neon", "rainbow", "white flash"], "composition": "high angle", "camera": "handheld", "description": "Overexposed. Everything bright. Dancing. The frame can't contain the joy. We're dancing in the data stream, our pixels intertwined"}}
{"song": "Satellite Hearts", "artist": "Neon Circuit", "beat": 6, "timestamp": "2:30", "duration": "30s", "lyric_line": "But I can't tell if you're real or just a ghost in the machine", "scene": {"mood": "confusion", "colors": ["dark teal", "gunmetal grey", "electric green"], "composition": "dutch angle", "camera": "steadicam", "description": "Multiple focal points. Nothing sharp. The viewer does not know where to look. Digital distortion across the frame. Ghost-like figure dissolving into static. Satellite Hearts."}}
{"song": "Satellite Hearts", "artist": "Neon Circuit", "beat": 6, "timestamp": "2:30", "duration": "30s", "lyric_line": "But I can't tell if you're real or just a ghost in the machine", "scene": {"mood": "confusion", "colors": ["swirling", "unsettled", "green-grey"], "composition": "dutch angle", "camera": "steadicam", "description": "Multiple focal points. Nothing sharp. The viewer doesn't know where to look. But I can't tell if you're real or just a ghost in the machine"}}
{"song": "Satellite Hearts", "artist": "Neon Circuit", "beat": 7, "timestamp": "3:00", "duration": "30s", "lyric_line": "The picture clears and there you are \u2014 imperfect, warm, alive", "scene": {"mood": "clarity", "colors": ["clear blue", "white", "crisp"], "composition": "symmetrical", "camera": "slow zoom", "description": "Rack focus. Background blurs, foreground sharpens. Suddenly everything makes sense. The picture clears and there you are \u2014 imperfect, warm, alive"}}
{"song": "Satellite Hearts", "artist": "Neon Circuit", "beat": 8, "timestamp": "3:30", "duration": "30s", "lyric_line": "Your hand reaches through the screen, I swear I feel the heat", "scene": {"mood": "tenderness", "colors": ["blush pink", "warm cream", "soft gold"], "composition": "rule of thirds", "camera": "crane up", "description": "Close on a hand touching a face. Soft light. Shallow depth of field. Your hand reaches through the screen, I swear I feel the heat"}}
{"song": "Satellite Hearts", "artist": "Neon Circuit", "beat": 9, "timestamp": "4:00", "duration": "30s", "lyric_line": "The bandwidth's dying, say it now before the link goes dark", "scene": {"mood": "urgency", "colors": ["red", "black", "strobe white"], "composition": "extreme wide", "camera": "tracking shot", "description": "Handheld camera running. Blurred faces. Traffic. Heartbeat sound design. The bandwidth's dying, say it now before the link goes dark"}}
@@ -54,7 +54,7 @@
{"song": "Rust Belt Lullaby", "artist": "Iron & Ember", "beat": 4, "timestamp": "1:30", "duration": "30s", "lyric_line": "Now the mill is just a skeleton and he's been gone ten years", "scene": {"mood": "loss", "colors": ["faded", "dusty", "empty space"], "composition": "low angle", "camera": "dolly out", "description": "Empty chair. Dust settling. A coat still on a hook. Presence of absence. Now the mill is just a skeleton and he's been gone ten years"}}
{"song": "Rust Belt Lullaby", "artist": "Iron & Ember", "beat": 5, "timestamp": "2:00", "duration": "30s", "lyric_line": "But the river still runs brown with memory and rust", "scene": {"mood": "beauty", "colors": ["wildflower colors", "green", "sunlight"], "composition": "high angle", "camera": "handheld", "description": "Wildflowers in unexpected places. Color against grey. Nature reclaiming. But the river still runs brown with memory and rust"}}
{"song": "Rust Belt Lullaby", "artist": "Iron & Ember", "beat": 6, "timestamp": "2:30", "duration": "30s", "lyric_line": "I found his lunchbox in the attic, coffee stains still fresh", "scene": {"mood": "resignation", "colors": ["grey", "muted blue", "beige"], "composition": "dutch angle", "camera": "steadicam", "description": "Medium shot. Hands dropping keys on a table. Turning away. Not looking back. I found his lunchbox in the attic, coffee stains still fresh"}}
{"song": "Rust Belt Lullaby", "artist": "Iron & Ember", "beat": 7, "timestamp": "3:00", "duration": "30s", "lyric_line": "Some things don't decay they just learn to hold still", "scene": {"mood": "love", "colors": ["sepia", "dusty brown", "faded cream"], "composition": "symmetrical", "camera": "slow zoom", "description": "A photograph come to life. Dust motes in afternoon light. Hands holding something small and precious. The texture of old wood. Some things do not decay they just learn to hold still."}}
{"song": "Rust Belt Lullaby", "artist": "Iron & Ember", "beat": 7, "timestamp": "3:00", "duration": "30s", "lyric_line": "Some things don't decay \u2014 they just learn to hold still", "scene": {"mood": "love", "colors": ["neutral"], "composition": "symmetrical", "camera": "slow zoom", "description": "Visual interpretation of: Some things don't decay \u2014 they just learn to hold still"}}
{"song": "Rust Belt Lullaby", "artist": "Iron & Ember", "beat": 8, "timestamp": "3:30", "duration": "30s", "lyric_line": "I hum the songs he hummed to me though I've forgotten half the words", "scene": {"mood": "weariness", "colors": ["grey-brown", "faded", "dim"], "composition": "rule of thirds", "camera": "crane up", "description": "Slow movement. Heavy eyelids. The world in faded tones. Everything too much. I hum the songs he hummed to me though I've forgotten half the words"}}
{"song": "Rust Belt Lullaby", "artist": "Iron & Ember", "beat": 9, "timestamp": "4:00", "duration": "30s", "lyric_line": "The town's half-empty but the porch lights still come on at dusk", "scene": {"mood": "quiet hope", "colors": ["faint warm light", "candle glow", "dawn grey"], "composition": "extreme wide", "camera": "tracking shot", "description": "Faint warm light. Candle in dark room. Just enough to see by. The town's half-empty but the porch lights still come on at dusk"}}
{"song": "Rust Belt Lullaby", "artist": "Iron & Ember", "beat": 10, "timestamp": "4:30", "duration": "30s", "lyric_line": "Sleep now, rust belt baby. The furnace keeps us warm.", "scene": {"mood": "peace", "colors": ["deep space black", "starlight", "calm"], "composition": "medium shot", "camera": "slow tilt down", "description": "Still water. Stars reflected. Perfect mirror. No movement. No sound. Sleep now, rust belt baby. The furnace keeps us warm."}}
@@ -65,7 +65,7 @@
{"song": "Wildfire Sermon", "artist": "Prophet Ash", "beat": 5, "timestamp": "2:00", "duration": "30s", "lyric_line": "We'll dance in the embers, we'll make love in the ash", "scene": {"mood": "destruction", "colors": ["fire", "ash", "smoke orange"], "composition": "high angle", "camera": "handheld", "description": "Fire. Ash falling like snow. Structures collapsing. Beautiful in its terrible way. We'll dance in the embers, we'll make love in the ash"}}
{"song": "Wildfire Sermon", "artist": "Prophet Ash", "beat": 6, "timestamp": "2:30", "duration": "30s", "lyric_line": "From destruction comes the soil where new things grow at last", "scene": {"mood": "creation", "colors": ["green", "light", "warm gold"], "composition": "dutch angle", "camera": "steadicam", "description": "Hands shaping clay. Light emerging from dark. Something new being born. From destruction comes the soil where new things grow at last"}}
{"song": "Wildfire Sermon", "artist": "Prophet Ash", "beat": 7, "timestamp": "3:00", "duration": "30s", "lyric_line": "But don't mistake the warmth for safety, don't mistake the glow for home", "scene": {"mood": "warning", "colors": ["red flash", "amber", "siren"], "composition": "symmetrical", "camera": "slow zoom", "description": "Red flash. Siren light. The calm before. Then: impact. But don't mistake the warmth for safety, don't mistake the glow for home"}}
{"song": "Wildfire Sermon", "artist": "Prophet Ash", "beat": 8, "timestamp": "3:30", "duration": "30s", "lyric_line": "Come closer, come closer I promise the burning feels like flying", "scene": {"mood": "invitation", "colors": ["amber", "golden yellow", "deep orange"], "composition": "rule of thirds", "camera": "crane up", "description": "Open door. Warm light spilling out onto dark ground. A hand extended. Come in. Come closer — I promise the fire will not burn you. The room glows like an ember."}}
{"song": "Wildfire Sermon", "artist": "Prophet Ash", "beat": 8, "timestamp": "3:30", "duration": "30s", "lyric_line": "Come closer, come closer \u2014 I promise the burning feels like flying", "scene": {"mood": "invitation", "colors": ["warm", "open", "golden"], "composition": "rule of thirds", "camera": "crane up", "description": "Open door. Warm light spilling out. A hand extended. Come in. Come closer, come closer \u2014 I promise the burning feels like flying"}}
{"song": "Wildfire Sermon", "artist": "Prophet Ash", "beat": 9, "timestamp": "4:00", "duration": "30s", "lyric_line": "We threw everything we owned into the blaze and laughed", "scene": {"mood": "abandon", "colors": ["wild", "free", "untethered"], "composition": "extreme wide", "camera": "tracking shot", "description": "Running through a field. Hair wild. No destination. Just movement. We threw everything we owned into the blaze and laughed"}}
{"song": "Wildfire Sermon", "artist": "Prophet Ash", "beat": 10, "timestamp": "4:30", "duration": "30s", "lyric_line": "Morning. Smoke. Green shoots. Begin again.", "scene": {"mood": "rebirth", "colors": ["green shoots", "dawn", "clear"], "composition": "medium shot", "camera": "slow tilt down", "description": "Dawn. Green shoots in ash. First breath after drowning. Morning. Smoke. Green shoots. Begin again."}}
{"song": "Midnight Transmission", "artist": "Frequency Ghost", "beat": 1, "timestamp": "0:00", "duration": "30s", "lyric_line": "There's a voice on the radio that shouldn't be there", "scene": {"mood": "mystery", "colors": ["deep blue", "shadow", "candle"], "composition": "wide shot", "camera": "static", "description": "Shadow figure in doorway. Candle. Face half-lit. Eyes knowing. There's a voice on the radio that shouldn't be there"}}
@@ -73,7 +73,7 @@
{"song": "Midnight Transmission", "artist": "Frequency Ghost", "beat": 3, "timestamp": "1:00", "duration": "30s", "lyric_line": "I turn the dial but it follows like a shadow made of sound", "scene": {"mood": "curiosity", "colors": ["warm yellow", "spotlight", "discovery"], "composition": "over the shoulder", "camera": "dolly in", "description": "Light moving across a surface. Discovery. Eyes widening. I turn the dial but it follows like a shadow made of sound"}}
{"song": "Midnight Transmission", "artist": "Frequency Ghost", "beat": 4, "timestamp": "1:30", "duration": "30s", "lyric_line": "Then it says something only I would know, something buried deep", "scene": {"mood": "connection", "colors": ["warm gold", "rose", "blush"], "composition": "low angle", "camera": "dolly out", "description": "Two hands reaching. Fingers almost touching. Warm light between them. Then it says something only I would know, something buried deep"}}
{"song": "Midnight Transmission", "artist": "Frequency Ghost", "beat": 5, "timestamp": "2:00", "duration": "30s", "lyric_line": "I'm not afraid anymore \u2014 I'm listening", "scene": {"mood": "paranoia", "colors": ["surveillance green", "strobe", "red"], "composition": "high angle", "camera": "handheld", "description": "Surveillance angles. Green tint. Multiple screens. Watching. Being watched. I'm not afraid anymore \u2014 I'm listening"}}
{"song": "Midnight Transmission", "artist": "Frequency Ghost", "beat": 6, "timestamp": "2:30", "duration": "30s", "lyric_line": "The voice knows my dreams, it describes them back to me", "scene": {"mood": "intimacy", "colors": ["deep amber", "warm bronze", "soft gold"], "composition": "dutch angle", "camera": "steadicam", "description": "Candlelight only. Two faces close. Shared breath. The world outside forgotten. Shadows dance on walls. The voice knows my dreams, it describes them back to me."}}
{"song": "Midnight Transmission", "artist": "Frequency Ghost", "beat": 6, "timestamp": "2:30", "duration": "30s", "lyric_line": "The voice knows my dreams, it describes them back to me", "scene": {"mood": "intimacy", "colors": ["candlelight", "warm", "close"], "composition": "dutch angle", "camera": "steadicam", "description": "Candlelight only. Two faces close. Shared breath. The world outside forgotten. The voice knows my dreams, it describes them back to me"}}
{"song": "Midnight Transmission", "artist": "Frequency Ghost", "beat": 7, "timestamp": "3:00", "duration": "30s", "lyric_line": "We're having a conversation across some membrane I can't see", "scene": {"mood": "urgency", "colors": ["red", "black", "strobe white"], "composition": "symmetrical", "camera": "slow zoom", "description": "Handheld camera running. Blurred faces. Traffic. Heartbeat sound design. We're having a conversation across some membrane I can't see"}}
{"song": "Midnight Transmission", "artist": "Frequency Ghost", "beat": 8, "timestamp": "3:30", "duration": "30s", "lyric_line": "Then static. Then nothing. Then a whisper: find me", "scene": {"mood": "disconnection", "colors": ["static", "grey", "broken signal"], "composition": "rule of thirds", "camera": "crane up", "description": "Static. Snow on screen. A voice breaking up. Distance measured in noise. Then static. Then nothing. Then a whisper: find me"}}
{"song": "Midnight Transmission", "artist": "Frequency Ghost", "beat": 9, "timestamp": "4:00", "duration": "30s", "lyric_line": "I search every frequency but the voice is gone", "scene": {"mood": "searching", "colors": ["flashlight beam", "dark", "moving light"], "composition": "extreme wide", "camera": "tracking shot", "description": "Flashlight beam cutting dark. Moving. Looking. Not finding yet. I search every frequency but the voice is gone"}}
@@ -96,5 +96,5 @@
{"song": "Apartment 4B", "artist": "Wallpaper & Wire", "beat": 6, "timestamp": "2:30", "duration": "30s", "lyric_line": "The hallway is an ocean, the stairs are a mountain range", "scene": {"mood": "freedom", "colors": ["open sky", "blue", "green"], "composition": "dutch angle", "camera": "steadicam", "description": "Open road. Blue sky. Green fields. Wind in hair. No walls. The hallway is an ocean, the stairs are a mountain range"}}
{"song": "Apartment 4B", "artist": "Wallpaper & Wire", "beat": 7, "timestamp": "3:00", "duration": "30s", "lyric_line": "The street hits me like cold water and I almost go back", "scene": {"mood": "fear", "colors": ["cold", "dark", "sharp"], "composition": "symmetrical", "camera": "slow zoom", "description": "Cold. Dark. Sharp edges. The frame contracts. Something unseen. The street hits me like cold water and I almost go back"}}
{"song": "Apartment 4B", "artist": "Wallpaper & Wire", "beat": 8, "timestamp": "3:30", "duration": "30s", "lyric_line": "But the sky \u2014 have you seen the sky? It goes on forever", "scene": {"mood": "joy", "colors": ["bright", "multi", "saturated"], "composition": "rule of thirds", "camera": "crane up", "description": "Saturated color. Wide smiles. Arms open. The world in full bloom. But the sky \u2014 have you seen the sky? It goes on forever"}}
{"song": "Apartment 4B", "artist": "Wallpaper & Wire", "beat": 9, "timestamp": "4:00", "duration": "30s", "lyric_line": "I stand on the sidewalk and cry because the world is so big", "scene": {"mood": "grounding", "colors": ["concrete grey", "warm cream", "muted blue"], "composition": "extreme wide", "camera": "tracking shot", "description": "A city sidewalk from above. Tiny figure standing still while the world moves around them. Rain-wet concrete reflecting streetlight. I stand on the sidewalk and cry because the world is so big."}}
{"song": "Apartment 4B", "artist": "Wallpaper & Wire", "beat": 10, "timestamp": "4:30", "duration": "30s", "lyric_line": "Home is not a place. Home is the moment you stop hiding.", "scene": {"mood": "home", "colors": ["warm white", "soft amber", "gentle grey"], "composition": "medium shot", "camera": "slow tilt down", "description": "A doorway. Light from inside. Someone stepping through. The threshold between hiding and being seen. Home is not a place. Home is the moment you stop hiding."}}
{"song": "Apartment 4B", "artist": "Wallpaper & Wire", "beat": 9, "timestamp": "4:00", "duration": "30s", "lyric_line": "I stand on the sidewalk and cry because the world is so big", "scene": {"mood": "grounding", "colors": ["neutral"], "composition": "extreme wide", "camera": "tracking shot", "description": "Visual interpretation of: I stand on the sidewalk and cry because the world is so big"}}
{"song": "Apartment 4B", "artist": "Wallpaper & Wire", "beat": 10, "timestamp": "4:30", "duration": "30s", "lyric_line": "Home is not a place. Home is the moment you stop hiding.", "scene": {"mood": "home", "colors": ["neutral"], "composition": "medium shot", "camera": "slow tilt down", "description": "Visual interpretation of: Home is not a place. Home is the moment you stop hiding."}}

View File

@@ -75,3 +75,69 @@ The data (curated exemplars, preference pairs, trained weights) is proprietary.
### Key Insight
The base model's RLHF priors override LoRA on crisis/faith — the most important parts of SOUL.md. Fix: inference-time grounding (inject SOUL.md crisis protocol) + larger pure-Timmy corpus over time.
## Training Pair Provenance Tracking
Tracks the provenance of training pairs for quality filtering and reporting.
### Features
- **Metadata tracking**: Each pair gets provenance metadata:
- `source_session_id`: Which session generated the pair
- `model`: Which model generated it
- `timestamp`: When it was generated
- `source`: Source type (curated, trajectory, etc.)
- `content_hash`: For deduplication
- **Filtering**: Filter pairs by provenance criteria:
- Exclude specific models (e.g., Anthropic models)
- Exclude specific sources
- Filter by timestamp range
- **Reporting**: Generate reports showing:
- Pair count by source model
- Pair count by source type
- Exclusion statistics
### Usage
```bash
# Add provenance to existing dataset
python3 training_pair_provenance.py --input data/curated_dataset.jsonl --output data/curated_with_provenance.jsonl
# Filter out Anthropic-sourced pairs
python3 training_pair_provenance.py --input data/curated_dataset.jsonl --filter exclude_anthropic
# Generate provenance report
python3 training_pair_provenance.py --input data/curated_dataset.jsonl --report
# JSON report
python3 training_pair_provenance.py --input data/curated_dataset.jsonl --report --json
```
### Integration
The provenance tracker can be integrated into existing pipelines:
```python
from training_pair_provenance import ProvenanceTracker
tracker = ProvenanceTracker()
# Process pairs
for pair in pairs:
processed = tracker.process_pair(pair)
# Filter
filtered = tracker.filter_by_provenance(processed_pairs, exclude_models=["anthropic/claude-3-opus"])
# Report
print(tracker.generate_report())
```
### Testing
```bash
python3 -m pytest training/test_training_pair_provenance.py -v
```

129
training/scripts/augment_pairs.py Executable file
View File

@@ -0,0 +1,129 @@
#!/usr/bin/env python3
"""
augment_pairs.py — Training data augmentation: paraphrase and translate.
Usage:
python3 augment_pairs.py --input data.jsonl
python3 augment_pairs.py --input data.jsonl --paraphrases 3 --langs es,fr,de
python3 augment_pairs.py --input data.jsonl --llm-endpoint http://localhost:11434/v1
"""
import json, os, sys, re, random
from pathlib import Path
random.seed(42)
PARAPHRASE_TRANSFORMS = [
lambda s: re.sub(r"(\w+), (\w+)", r"\2, \1", s, count=1),
lambda s: f"A beautifully rendered scene: {s[0].lower()}{s[1:]}" if len(s) > 10 else s,
lambda s: s.replace("A ", "The ").replace("An ", "The ") if s.startswith(("A ", "An ")) else f"Here, {s[0].lower()}{s[1:]}",
lambda s: f"In a cinematic frame: {s}" if len(s) > 20 else s,
lambda s: s if ", " not in s else ", ".join(s.split(", ")[:2]),
]
TRANSLATIONS = {
"es": {"the":"el","a":"un","is":"es","in":"en","of":"de","and":"y","with":"con","scene":"escena","light":"luz","dark":"oscuro","warm":"cálido","rain":"lluvia","sun":"sol","moon":"luna","sky":"cielo","forest":"bosque","mountain":"montaña","ocean":"océano","golden":"dorado","blue":"azul","red":"rojo","green":"verde","silence":"silencio","dream":"sueño","love":"amor","hope":"esperanza","fear":"miedo","joy":"alegría","peace":"paz","beautiful":"hermoso","sad":"triste","shadow":"sombra","color":"color","silver":"plateado","white":"blanco","black":"negro","portray":"retrato"},
"fr": {"the":"le","a":"un","is":"est","in":"dans","of":"de","and":"et","with":"avec","scene":"scène","light":"lumière","dark":"sombre","warm":"chaud","rain":"pluie","sun":"soleil","moon":"lune","sky":"ciel","forest":"forêt","mountain":"montagne","ocean":"océan","golden":"doré","blue":"bleu","red":"rouge","green":"vert","silence":"silence","dream":"rêve","love":"amour","hope":"espoir","fear":"peur","joy":"joie","peace":"paix","beautiful":"beau","sad":"triste","shadow":"ombre","color":"couleur","silver":"argenté","white":"blanc","black":"noir"},
"de": {"the":"der","a":"ein","is":"ist","in":"in","of":"von","and":"und","with":"mit","scene":"Szene","light":"Licht","dark":"dunkel","warm":"warm","rain":"Regen","sun":"Sonne","moon":"Mond","sky":"Himmel","forest":"Wald","mountain":"Berg","ocean":"Ozean","golden":"golden","blue":"blau","red":"rot","green":"grün","silence":"Stille","dream":"Traum","love":"Liebe","hope":"Hoffnung","fear":"Angst","joy":"Freude","peace":"Frieden","beautiful":"schön","sad":"traurig","shadow":"Schatten","color":"Farbe","silver":"silbern","white":"weiß","black":"schwarz"},
}
LANG_NAMES = {"es": "Spanish", "fr": "French", "de": "German"}
def detect_text_field(entry):
for f in ["rich","terse","text","content","lyric_line","description","scene_description","prompt","scene"]:
if f in entry and isinstance(entry[f], str) and len(entry[f]) > 5:
return f
for k, v in entry.items():
if isinstance(v, str) and len(v) > 5:
return k
return None
def paraphrase(text):
t = random.choice(PARAPHRASE_TRANSFORMS)(text)
if t == text:
t = text.replace(" and ", " & ").replace(" with ", " alongside ")
if t == text:
t = f"In this scene: {text[0].lower()}{text[1:]}" if text[0].isupper() else text
return t
def translate(text, lang):
d = TRANSLATIONS.get(lang, {})
words = text.split()
out = []
for w in words:
lo = w.lower().strip(".,;:!?")
suf = w[len(w.rstrip(".,;:!?")):]
if lo in d:
out.append(d[lo] + suf)
else:
out.append(w)
return " ".join(out)
def augment_file(input_path, output_path=None, n_para=3, langs=None, llm_endpoint=None):
input_path = Path(input_path)
if output_path is None:
output_path = input_path.parent / f"{input_path.stem}_augmented{input_path.suffix}"
entries = [json.loads(l) for l in open(input_path) if l.strip()]
if not entries:
print(f"No entries in {input_path}"); return 0
tf = detect_text_field(entries[0])
if not tf:
print(f"ERROR: No text field in {input_path}", file=sys.stderr); return 0
print(f"Input: {input_path} ({len(entries)} entries, field={tf})")
aug_count = 0
with open(output_path, "w") as out:
for e in entries:
out.write(json.dumps(e, ensure_ascii=False) + "\n")
for i, e in enumerate(entries):
text = e[tf]
# Paraphrases
for p in range(n_para):
para = paraphrase(text)
if para != text:
ne = dict(e); ne[tf] = para
ne["_augmentation"] = f"paraphrase_{p+1}"
ne["_original"] = text[:100]
out.write(json.dumps(ne, ensure_ascii=False) + "\n")
aug_count += 1
# Translations
for lang in (langs or []):
tr = translate(text, lang)
if tr != text:
ne = dict(e); ne[tf] = tr
ne["_augmentation"] = f"translate_{lang}"
ne["_language"] = lang
ne["_original"] = text[:100]
out.write(json.dumps(ne, ensure_ascii=False) + "\n")
aug_count += 1
if (i+1) % 100 == 0:
print(f" {i+1}/{len(entries)} done ({aug_count} augmented)")
total = len(entries) + aug_count
print(f"Done: {len(entries)} originals + {aug_count} augmented = {total}")
print(f"Output: {output_path}")
return aug_count
def main():
import argparse
p = argparse.ArgumentParser()
p.add_argument("--input", required=True)
p.add_argument("--output", default=None)
p.add_argument("--paraphrases", type=int, default=3)
p.add_argument("--langs", default="es,fr,de")
p.add_argument("--llm-endpoint", default=None)
args = p.parse_args()
langs = [l.strip() for l in args.langs.split(",") if l.strip()] if args.langs else []
augment_file(args.input, args.output, args.paraphrases, langs, args.llm_endpoint)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,157 @@
#!/usr/bin/env python3
"""
Tests for Training Pair Provenance Tracking
"""
import json
import tempfile
from pathlib import Path
import pytest
from training_pair_provenance import ProvenanceTracker, load_jsonl, save_jsonl
class TestProvenanceTracker:
"""Test the ProvenanceTracker class."""
def test_init(self):
"""Test tracker initialization."""
tracker = ProvenanceTracker()
assert tracker.stats["total_pairs"] == 0
assert tracker.stats["pairs_with_provenance"] == 0
assert tracker.stats["pairs_without_provenance"] == 0
def test_generate_pair_id(self):
"""Test pair ID generation."""
tracker = ProvenanceTracker()
pair = {"prompt": "test", "chosen": "response", "rejected": "bad"}
id1 = tracker.generate_pair_id(pair)
id2 = tracker.generate_pair_id(pair)
# Same content should generate same ID
assert id1 == id2
assert len(id1) == 16
def test_add_provenance(self):
"""Test adding provenance to a pair."""
tracker = ProvenanceTracker()
pair = {"prompt": "test", "chosen": "response", "rejected": "bad"}
result = tracker.add_provenance(pair, source_session_id="session123", model="test-model")
assert "provenance" in result
assert result["provenance"]["source_session_id"] == "session123"
assert result["provenance"]["model"] == "test-model"
assert "timestamp" in result["provenance"]
assert result["provenance"]["source"] == "curated"
assert "content_hash" in result["provenance"]
def test_extract_provenance_from_existing(self):
"""Test extracting provenance from existing fields."""
tracker = ProvenanceTracker()
pair = {
"id": "session456",
"model": "claude-3-opus",
"started_at": "2024-01-01T00:00:00Z",
"conversations": [{"from": "human", "value": "test"}]
}
provenance = tracker.extract_provenance_from_existing(pair)
assert provenance["source_session_id"] == "session456"
assert provenance["model"] == "claude-3-opus"
assert provenance["timestamp"] == "2024-01-01T00:00:00Z"
assert provenance["source"] == "curated"
assert "content_hash" in provenance
def test_process_pair(self):
"""Test processing a pair."""
tracker = ProvenanceTracker()
pair = {"id": "test123", "model": "test-model", "conversations": []}
result = tracker.process_pair(pair)
assert tracker.stats["total_pairs"] == 1
assert tracker.stats["pairs_without_provenance"] == 1
assert "provenance" in result
def test_filter_by_provenance(self):
"""Test filtering pairs by provenance."""
tracker = ProvenanceTracker()
pairs = [
{"provenance": {"model": "anthropic/claude-3-opus"}},
{"provenance": {"model": "gpt-4"}},
{"provenance": {"model": "anthropic/claude-3-sonnet"}},
]
filtered = tracker.filter_by_provenance(pairs, exclude_models=["anthropic/claude-3-opus", "anthropic/claude-3-sonnet"])
assert len(filtered) == 1
assert filtered[0]["provenance"]["model"] == "gpt-4"
assert tracker.stats["excluded"] == 2
def test_generate_report(self):
"""Test report generation."""
tracker = ProvenanceTracker()
tracker.stats = {
"total_pairs": 10,
"pairs_with_provenance": 8,
"pairs_without_provenance": 2,
"by_model": {"gpt-4": 5, "claude-3": 3},
"by_source": {"curated": 8},
"excluded": 0
}
report = tracker.generate_report()
assert "Total pairs: 10" in report
assert "Pairs with provenance: 8" in report
assert "gpt-4: 5" in report
class TestJsonlFunctions:
"""Test JSONL load/save functions."""
def test_load_jsonl(self):
"""Test loading JSONL file."""
with tempfile.NamedTemporaryFile(mode='w', suffix='.jsonl', delete=False) as f:
f.write('{"id": "1", "value": "test1"}\n')
f.write('{"id": "2", "value": "test2"}\n')
f.write('{"id": "3", "value": "test3"}\n')
temp_path = Path(f.name)
try:
entries = load_jsonl(temp_path)
assert len(entries) == 3
assert entries[0]["id"] == "1"
assert entries[2]["value"] == "test3"
finally:
temp_path.unlink()
def test_save_jsonl(self):
"""Test saving JSONL file."""
entries = [
{"id": "1", "value": "test1"},
{"id": "2", "value": "test2"}
]
with tempfile.NamedTemporaryFile(mode='w', suffix='.jsonl', delete=False) as f:
temp_path = Path(f.name)
try:
save_jsonl(entries, temp_path)
with open(temp_path) as f:
lines = f.readlines()
assert len(lines) == 2
assert json.loads(lines[0])["id"] == "1"
assert json.loads(lines[1])["value"] == "test2"
finally:
temp_path.unlink()
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View File

@@ -0,0 +1,281 @@
#!/usr/bin/env python3
"""
Training Pair Provenance Tracking
Adds provenance metadata to training pairs for quality filtering and reporting.
Tracks source session, model, timestamp, and other metadata.
Usage:
python3 training_pair_provenance.py --input data/curated_dataset.jsonl --output data/curated_with_provenance.jsonl
python3 training_pair_provenance.py --input data/curated_dataset.jsonl --filter exclude_anthropic
python3 training_pair_provenance.py --input data/curated_dataset.jsonl --report
"""
import argparse
import json
import hashlib
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Dict, List, Optional
class ProvenanceTracker:
"""Track provenance of training pairs."""
# Models to exclude by default (configurable)
EXCLUDED_MODELS = {"anthropic/claude-3-opus", "anthropic/claude-3-sonnet", "anthropic/claude-3-haiku"}
def __init__(self):
self.stats = {
"total_pairs": 0,
"pairs_with_provenance": 0,
"pairs_without_provenance": 0,
"by_model": {},
"by_source": {},
"excluded": 0
}
def generate_pair_id(self, pair: Dict[str, Any]) -> str:
"""Generate a unique ID for a training pair."""
# Use content hash for deduplication
content = json.dumps(pair, sort_keys=True)
return hashlib.sha256(content.encode()).hexdigest()[:16]
def add_provenance(self, pair: Dict[str, Any],
source_session_id: Optional[str] = None,
model: Optional[str] = None,
source: str = "curated") -> Dict[str, Any]:
"""Add provenance metadata to a training pair."""
# Generate pair ID if not present
if "id" not in pair:
pair["id"] = self.generate_pair_id(pair)
# Add provenance metadata
if "provenance" not in pair:
pair["provenance"] = {}
provenance = pair["provenance"]
# Source session ID
if source_session_id:
provenance["source_session_id"] = source_session_id
elif "id" in pair:
# Use existing ID as session ID
provenance["source_session_id"] = pair["id"]
# Model
if model:
provenance["model"] = model
elif "model" in pair:
# Use existing model field
provenance["model"] = pair["model"]
# Timestamp
if "timestamp" not in provenance:
provenance["timestamp"] = datetime.now(timezone.utc).isoformat()
# Source type
provenance["source"] = source
# Content hash for deduplication
if "content_hash" not in provenance:
# Hash the conversations for dedup
conversations = pair.get("conversations", [])
content_str = json.dumps(conversations, sort_keys=True)
provenance["content_hash"] = hashlib.sha256(content_str.encode()).hexdigest()[:32]
return pair
def extract_provenance_from_existing(self, pair: Dict[str, Any]) -> Dict[str, Any]:
"""Extract provenance from existing pair fields."""
provenance = {}
# Extract from existing fields
if "id" in pair:
provenance["source_session_id"] = pair["id"]
if "model" in pair:
provenance["model"] = pair["model"]
if "started_at" in pair:
provenance["timestamp"] = pair["started_at"]
# Add source
provenance["source"] = "curated"
# Add content hash
conversations = pair.get("conversations", [])
content_str = json.dumps(conversations, sort_keys=True)
provenance["content_hash"] = hashlib.sha256(content_str.encode()).hexdigest()[:32]
return provenance
def process_pair(self, pair: Dict[str, Any],
add_provenance: bool = True) -> Dict[str, Any]:
"""Process a single training pair."""
self.stats["total_pairs"] += 1
# Check if provenance already exists
if "provenance" in pair:
self.stats["pairs_with_provenance"] += 1
provenance = pair["provenance"]
else:
self.stats["pairs_without_provenance"] += 1
if add_provenance:
# Extract from existing fields
provenance = self.extract_provenance_from_existing(pair)
pair["provenance"] = provenance
else:
provenance = {}
# Update statistics
model = provenance.get("model", "unknown")
self.stats["by_model"][model] = self.stats["by_model"].get(model, 0) + 1
source = provenance.get("source", "unknown")
self.stats["by_source"][source] = self.stats["by_source"].get(source, 0) + 1
return pair
def filter_by_provenance(self, pairs: List[Dict[str, Any]],
exclude_models: Optional[List[str]] = None,
exclude_sources: Optional[List[str]] = None,
min_timestamp: Optional[str] = None,
max_timestamp: Optional[str] = None) -> List[Dict[str, Any]]:
"""Filter pairs by provenance criteria."""
if exclude_models is None:
exclude_models = list(self.EXCLUDED_MODELS)
filtered = []
for pair in pairs:
provenance = pair.get("provenance", {})
# Check model exclusion
model = provenance.get("model", "")
if model in exclude_models:
self.stats["excluded"] += 1
continue
# Check source exclusion
source = provenance.get("source", "")
if exclude_sources and source in exclude_sources:
self.stats["excluded"] += 1
continue
# Check timestamp range
timestamp = provenance.get("timestamp", "")
if min_timestamp and timestamp < min_timestamp:
self.stats["excluded"] += 1
continue
if max_timestamp and timestamp > max_timestamp:
self.stats["excluded"] += 1
continue
filtered.append(pair)
return filtered
def generate_report(self) -> str:
"""Generate a provenance report."""
report = []
report.append("=== Training Pair Provenance Report ===")
report.append(f"Total pairs: {self.stats['total_pairs']}")
report.append(f"Pairs with provenance: {self.stats['pairs_with_provenance']}")
report.append(f"Pairs without provenance: {self.stats['pairs_without_provenance']}")
report.append(f"Excluded pairs: {self.stats['excluded']}")
report.append("")
report.append("=== Pairs by Model ===")
for model, count in sorted(self.stats["by_model"].items(), key=lambda x: x[1], reverse=True):
report.append(f" {model}: {count}")
report.append("")
report.append("=== Pairs by Source ===")
for source, count in sorted(self.stats["by_source"].items(), key=lambda x: x[1], reverse=True):
report.append(f" {source}: {count}")
return "\n".join(report)
def load_jsonl(path: Path) -> List[Dict[str, Any]]:
"""Load a JSONL file."""
entries = []
with open(path) as f:
for line in f:
line = line.strip()
if line:
entries.append(json.loads(line))
return entries
def save_jsonl(entries: List[Dict[str, Any]], path: Path):
"""Save entries to a JSONL file."""
with open(path, "w") as f:
for entry in entries:
f.write(json.dumps(entry) + "\n")
def main():
parser = argparse.ArgumentParser(description="Training Pair Provenance Tracking")
parser.add_argument("--input", required=True, help="Input JSONL file")
parser.add_argument("--output", help="Output JSONL file (with provenance added)")
parser.add_argument("--filter", choices=["exclude_anthropic", "exclude_openai", "custom"],
help="Apply filter")
parser.add_argument("--exclude-models", nargs="+", help="Models to exclude")
parser.add_argument("--exclude-sources", nargs="+", help="Sources to exclude")
parser.add_argument("--report", action="store_true", help="Generate report only")
parser.add_argument("--json", action="store_true", help="Output report as JSON")
args = parser.parse_args()
# Load input
pairs = load_jsonl(Path(args.input))
print(f"Loaded {len(pairs)} pairs from {args.input}")
# Create tracker
tracker = ProvenanceTracker()
# Process pairs
processed_pairs = []
for pair in pairs:
processed = tracker.process_pair(pair, add_provenance=True)
processed_pairs.append(processed)
# Apply filters if requested
if args.filter:
exclude_models = []
if args.filter == "exclude_anthropic":
exclude_models = list(ProvenanceTracker.EXCLUDED_MODELS)
elif args.exclude_models:
exclude_models = args.exclude_models
processed_pairs = tracker.filter_by_provenance(
processed_pairs,
exclude_models=exclude_models,
exclude_sources=args.exclude_sources
)
print(f"After filtering: {len(processed_pairs)} pairs")
# Output
if args.report:
# Generate report
report = tracker.generate_report()
if args.json:
print(json.dumps(tracker.stats, indent=2))
else:
print(report)
elif args.output:
# Save with provenance
save_jsonl(processed_pairs, Path(args.output))
print(f"Saved {len(processed_pairs)} pairs to {args.output}")
print(tracker.generate_report())
else:
# Just print report
print(tracker.generate_report())
if __name__ == "__main__":
main()