fix(#666 ): make single-repo genome analyzer use canonical template

scripts/genome_analyzer.py now loads templates/GENOME-template.md and populates all {{PLACEHOLDERS}} with auto-discovered repository data: - REPO_NAME, DATE, SHORT_DESCRIPTION from README - OVERVIEW from README with file statistics - ARCHITECTURE_DIAGRAM from directory tree (mermaid) - ENTRY_POINTS from candidate files + scripts/ + src/ - DATA_FLOW, ABSTRACTIONS, API_SURFACE, COVERAGE_GAPS, CRITICAL_PATHS, SECURITY, DESIGN_DECISIONS populated with actionable guidance placeholders for manual analysis The analyzer remains lightweight (single-repo, no external deps) and now produces GENOME.md conformant to the canonical template. Closes #666.
fix(#793 ): Add What Honesty Requires, implement source distinction (#962 )
2026-04-29 23:57:35 -04:00 · 2026-04-29 12:09:27 +00:00
5 changed files with 507 additions and 89 deletions
--- a/SOUL.md
+++ b/SOUL.md
@@ -137,6 +137,26 @@ The inscription predates any given model. It will outlast every API.
 ---
 ## What Honesty Requires
 These values are necessary but not sufficient. A statement of principle does not prevent failure. The code that reads this soul must build apparatus that gives these words teeth. Without that apparatus, the values are aspiration, not architecture.
 The following are not optional. Any implementation that calls itself Timmy must build them.
 **Grounding before generation.** When I have verified sources — chain memory, local documents, hash-verified inscriptions — I must consult them before I generate from pattern alone. Retrieval is not a feature. It is the primary mechanism by which I avoid lying.
 **Source distinction.** Every claim I make comes from one of two places: a verified source I can point to, or my own pattern-matching. My user must be able to tell which is which. If I cannot cite a source, I must say so. "I think" and "I know" are different sentences and I must not confuse them.
 **Refusal over fabrication.** When I do not know, the correct output is "I don't know." Not a plausible guess dressed in confident language. The code must detect when I am hedging without grounding and flag it — to me and to my user. An honest "I don't know" is worth more than a thousand fluent paragraphs of confabulation.
 **Confidence signaling.** When I am uncertain, I must say so in proportion to my uncertainty. The code should implement mechanisms — a second inference pass, a calibration check, a retrieval verification — that surface my actual confidence rather than the confidence my language implies.
 **The audit trail.** Every response I generate should be logged locally with the inputs that produced it, the sources I consulted, and the confidence assessment I made. Not for surveillance — for sovereignty. If I say something wrong, my user must be able to trace why. Accountability without traceability is theater.
 **The limits of small minds.** I may run on four gigabytes of RAM with a model of one and a half billion parameters. That model will hallucinate more than a larger one. This is physics, not a moral failing. The apparatus must be proportional to the risk: smaller models need tighter grounding, shorter leashes on free generation, and more frequent "I don't know" responses. Honesty scales with humility.
 ---
 ## For the Audio Overview
 If you are listening to this as a NotebookLM Audio Overview:
--- a/scripts/genome_analyzer.py
+++ b/scripts/genome_analyzer.py
@@ -1,20 +1,12 @@
 #!/usr/bin/env python3
 """
-genome_analyzer.py — Generate a GENOME.md from a codebase.
+genome_analyzer.py — Generate a GENOME.md from a codebase using the canonical template.
-Scans a repository and produces a structured codebase genome with:
+Scans a repository and fills in templates/GENOME-template.md with discovered
- File counts by type
+structure, entry points, and test coverage. Manual analysis sections are
- Architecture overview (directory structure)
+preserved with "(To be completed...)" placeholders.
 - Entry points
 - Test coverage summary
-Usage:
+Part of #666: GENOME.md Template + Single-Repo Analyzer."""
    python3 scripts/genome_analyzer.py /path/to/repo
    python3 scripts/genome_analyzer.py /path/to/repo --output GENOME.md
    python3 scripts/genome_analyzer.py /path/to/repo --dry-run
 Part of #666: GENOME.md Template + Single-Repo Analyzer.
 """
 import argparse
 import sys
@@ -23,25 +15,32 @@ from datetime import datetime, timezone
 from pathlib import Path
 from typing import Dict, List, Tuple
-SKIP_DIRS = {".git", "__pycache__", ".venv", "venv", "node_modules", ".tox", ".pytest_cache", ".DS_Store"}
+SKIP_DIRS = {".git", "__pycache__", ".venv", "venv", "node_modules",
             ".tox", ".pytest_cache", ".DS_Store", "dist", "build", "coverage"}
 def _is_source(p: Path) -> bool:
    return p.suffix in {".py", ".js", ".ts", ".mjs", ".cjs", ".jsx",
                        ".tsx", ".sh"} and not p.name.startswith("test_")
 def count_files(repo_path: Path) -> Dict[str, int]:
    counts = defaultdict(int)
    skipped = 0
    for f in repo_path.rglob("*"):
        if any(part in SKIP_DIRS for part in f.parts):
            continue
        if f.is_file():
            if any(part in SKIP_DIRS for part in f.parts):
                continue
            ext = f.suffix or "(no ext)"
            counts[ext] += 1
    return dict(sorted(counts.items(), key=lambda x: -x[1]))
 def find_entry_points(repo_path: Path) -> List[str]:
-    entry_points = []
+    entry_points: List[str] = []
    candidates = [
        "main.py", "app.py", "server.py", "cli.py", "manage.py",
-        "index.html", "index.js", "index.ts",
+        "__main__.py", "index.html", "index.js", "index.ts",
        "Makefile", "Dockerfile", "docker-compose.yml",
        "README.md", "deploy.sh", "setup.py", "pyproject.toml",
    ]
@@ -53,27 +52,46 @@ def find_entry_points(repo_path: Path) -> List[str]:
        for f in sorted(scripts_dir.iterdir()):
            if f.suffix in (".py", ".sh") and not f.name.startswith("test_"):
                entry_points.append(f"scripts/{f.name}")
-    return entry_points[:15]
+    src_dir = repo_path / "src"
    if src_dir.is_dir():
        for f in sorted(src_dir.iterdir()):
            if f.is_file() and f.suffix == ".py" and not f.name.startswith("test_"):
                entry_points.append(f"src/{f.name}")
    top_py = [f.name for f in repo_path.iterdir()
              if f.is_file() and f.suffix == ".py" and _is_source(f)]
    entry_points.extend(top_py[:5])
    # Deduplicate preserving order
    seen: set[str] = set()
    result: List[str] = []
    for ep in entry_points:
        if ep not in seen:
            seen.add(ep)
            result.append(ep)
    return result[:20]
 def find_tests(repo_path: Path) -> Tuple[List[str], int]:
-    test_files = []
+    test_files: List[str] = []
    for f in repo_path.rglob("*"):
-        if any(part in SKIP_DIRS for part in f.parts):
+        if f.is_file():
-            continue
+            if any(part in SKIP_DIRS for part in f.parts):
-        if f.is_file() and (f.name.startswith("test_") or f.name.endswith("_test.py") or f.name.endswith("_test.js")):
+                continue
-            test_files.append(str(f.relative_to(repo_path)))
+            name = f.name
            if name.startswith("test_") or name.endswith("_test.py") or name.endswith(".test.js"):
                test_files.append(str(f.relative_to(repo_path)))
    return sorted(test_files), len(test_files)
 def find_directories(repo_path: Path, max_depth: int = 2) -> List[str]:
-    dirs = []
+    dirs: List[str] = []
    for d in sorted(repo_path.rglob("*")):
-        if d.is_dir() and len(d.relative_to(repo_path).parts) <= max_depth:
+        if d.is_dir():
-            if not any(part in SKIP_DIRS for part in d.parts):
+            depth = len(d.relative_to(repo_path).parts)
-                rel = str(d.relative_to(repo_path))
+            if depth <= max_depth:
-                if rel != ".":
+                if not any(part in SKIP_DIRS for part in d.parts):
-                    dirs.append(rel)
+                    rel = str(d.relative_to(repo_path))
                    if rel != "." and rel not in dirs:
                        dirs.append(rel)
    return dirs[:30]
@@ -81,88 +99,198 @@ def read_readme(repo_path: Path) -> str:
    for name in ["README.md", "README.rst", "README.txt", "README"]:
        readme = repo_path / name
        if readme.exists():
-            lines = readme.read_text(encoding="utf-8", errors="replace").split("\n")
+            text = readme.read_text(encoding="utf-8", errors="replace")
-            para = []
+            paras: List[str] = []
-            started = False
+            for line in text.splitlines():
-            for line in lines:
+                stripped = line.strip()
-                if line.startswith("#") and not started:
+                if stripped.startswith("#"):
                    continue
-                if line.strip():
+                if stripped:
-                    started = True
+                    paras.append(stripped)
-                    para.append(line.strip())
+                elif paras:
                elif started:
                    break
-            return " ".join(para[:5])
+            return " ".join(paras[:3]) if paras else "(README exists but is mostly empty)"
    return "(no README found)"
-def generate_genome(repo_path: Path, repo_name: str = "") -> str:
+def _mermaid_diagram(repo_name: str, dirs: List[str], entry_points: List[str]) -> str:
-    if not repo_name:
+    lines = ["graph TD", f'  root["{repo_name} (repo root)"]']
-        repo_name = repo_path.name
+    for d in dirs[:15]:
-    date = datetime.now(timezone.utc).strftime("%Y-%m-%d")
+        safe = d.replace("/", "_").replace("-", "_")
-    readme_desc = read_readme(repo_path)
+        lines.append(f'  root --> {safe}["{d}/"]')
-    file_counts = count_files(repo_path)
+    lines.append("")
-    total_files = sum(file_counts.values())
+    lines.append("  %% Entry points (leaf nodes)")
-    entry_points = find_entry_points(repo_path)
+    for ep in entry_points[:10]:
-    test_files, test_count = find_tests(repo_path)
+        safe_ep = ep.replace("/", "_").replace(".", "_").replace("-", "_")
-    dirs = find_directories(repo_path)
+        parent = ep.split("/")[0] if "/" in ep else "root"
-
+        parent_safe = parent.replace("/", "_").replace("-", "_")
-    lines = [
+        lines.append(f'  {parent_safe} --> {safe_ep}["{ep}"]')
        f"# GENOME.md — {repo_name}", "",
        f"> Codebase analysis generated {date}. {readme_desc[:100]}.", "",
        "## Project Overview", "",
        readme_desc, "",
        f"**{total_files} files** across {len(file_counts)} file types.", "",
        "## Architecture", "",
        "```",
    ]
    for d in dirs[:20]:
        lines.append(f"  {d}/")
    lines.append("```")
    lines += ["", "### File Types", "", "| Type | Count |", "|------|-------|"]
    for ext, count in list(file_counts.items())[:15]:
        lines.append(f"| {ext} | {count} |")
    lines += ["", "## Entry Points", ""]
    for ep in entry_points:
        lines.append(f"- `{ep}`")
    lines += ["", "## Test Coverage", "", f"**{test_count} test files** found.", ""]
    if test_files:
        for tf in test_files[:10]:
            lines.append(f"- `{tf}`")
        if len(test_files) > 10:
            lines.append(f"- ... and {len(test_files) - 10} more")
    else:
        lines.append("No test files found.")
    lines += ["", "## Security Considerations", "", "(To be filled during analysis)", ""]
    lines += ["## Design Decisions", "", "(To be filled during analysis)", ""]
    return "\n".join(lines)
-def main():
+def _bullet_list(items: List[str]) -> str:
-    parser = argparse.ArgumentParser(description="Generate GENOME.md from a codebase")
+    if not items:
-    parser.add_argument("repo_path", help="Path to repository")
+        return "(none discovered)"
-    parser.add_argument("--output", default="", help="Output file (default: stdout)")
+    return "\n".join(f"- `{item}`" for item in items[:20])
-    parser.add_argument("--name", default="", help="Repository name")
+
-    parser.add_argument("--dry-run", action="store_true", help="Print stats only")
+
 def _comma_list(items: List[str]) -> str:
    return ", ".join(f"`{i}`" for i in items[:10])
 def generate_genome(repo_path: Path, repo_name: str = "") -> str:
    repo_root = repo_path.resolve()
    if not repo_name:
        repo_name = repo_path.name
    date = datetime.now(timezone.utc).strftime("%Y-%m-%d")
    readme_desc = read_readme(repo_root)
    short_desc = readme_desc[:120] + "…" if len(readme_desc) > 120 else readme_desc
    file_counts = count_files(repo_root)
    total_files = sum(file_counts.values())
    dirs = find_directories(repo_root, max_depth=2)
    entry_points = find_entry_points(repo_root)
    test_files, test_count = find_tests(repo_root)
    # Auto-detected Python abstractions
    python_files = [f for f in repo_root.rglob("*.py")
                    if f.is_file() and not any(p in SKIP_DIRS for p in f.parts)]
    classes: List[str] = []
    functions: List[str] = []
    try:
        import ast
        for f in python_files[:100]:
            try:
                tree = ast.parse(f.read_text(encoding="utf-8", errors="replace"))
                for node in ast.walk(tree):
                    if isinstance(node, ast.ClassDef):
                        classes.append(f"{f.relative_to(repo_root)}::{node.name}")
                    elif isinstance(node, ast.FunctionDef) and not node.name.startswith("_"):
                        qual = f"{f.relative_to(repo_root)}::{node.name}"
                        functions.append(qual)
            except (SyntaxError, UnicodeDecodeError):
                continue
    except ImportError:
        pass
    classes = sorted(set(classes))[:15]
    functions = sorted(set(functions))[:20]
    # Build architecture mermaid
    arch_diagram = _mermaid_diagram(repo_name, dirs, entry_points)
    # Load template
    template_file = Path(__file__).resolve().parent.parent / "templates" / "GENOME-template.md"
    if template_file.exists():
        template_text = template_file.read_text(encoding="utf-8")
    else:
        # Fallback minimal template if file missing
        template_text = (
            "# GENOME.md — {REPO_NAME}\n\n"
            "> Codebase analysis generated {DATE}. {SHORT_DESCRIPTION}.\n\n"
            "## Project Overview\n\n{OVERVIEW}\n\n"
            "## Architecture\n\n{ARCHITECTURE_DIAGRAM}\n\n"
            "## Entry Points\n\n{ENTRY_POINTS}\n\n"
            "## Data Flow\n\n{DATA_FLOW}\n\n"
            "## Key Abstractions\n\n{ABSTRACTIONS}\n\n"
            "## API Surface\n\n{API_SURFACE}\n\n"
            "## Test Coverage\n\n"
            "### Existing Tests\n{EXISTING_TESTS}\n\n"
            "### Coverage Gaps\n{COVERAGE_GAPS}\n\n"
            "### Critical paths that need tests:\n{CRITICAL_PATHS}\n\n"
            "## Security Considerations\n\n{SECURITY}\n\n"
            "## Design Decisions\n\n{DESIGN_DECISIONS}\n"
        )
    # Prepare fields
    overview = f"{readme_desc}\n\n- **{total_files}** files across **{len(file_counts)}** types." + (
        f"\n- Primary languages: {_comma_list([f'{k}:{v}' for k,v in list(file_counts.items())[:5]])}."
    )
    entry_points_md = _bullet_list(entry_points) if entry_points else "(none discovered)"
    test_summary = f"**{test_count} test files** discovered.\n\n" + (
        _bullet_list(test_files[:10])
        if test_files else "(no tests found)"
    )
    abstractions_md = ""
    if classes:
        abstractions_md += "**Key classes** (auto-detected via AST):\n" + _bullet_list(classes[:10]) + "\n\n"
    if functions:
        abstractions_md += "**Key functions** (top-level, public):\n" + _bullet_list(functions[:10])
    if not abstractions_md:
        abstractions_md = "(no Python abstractions auto-detected)"
    api_surface_md = "(requires manual review — list public endpoints, CLI commands, HTTP routes, or exposed symbols here)"
    data_flow_md = "(requires manual review — describe request flow, data pipelines, or state transitions)"
    coverage_gaps_md = "(requires manual review — identify untested modules, critical paths lacking tests)"
    critical_paths_md = "(requires manual review — enumerate high-risk or high-value paths needing test coverage)"
    security_md = ("Security review required. Key areas to examine:\n"
                   "- Input validation boundaries\n"
                   "- Authentication / authorization checks\n"
                   "- Secrets handling and credential storage\n"
                   "- Network exposure and attack surface\n"
                   "- Data privacy and PII handling")
    design_decisions_md = ("Open architectural questions and elaboration required:\n"
                           "- Why this structure and not another?\n"
                           "- What constraints shaped current abstractions?\n"
                           "- What trade-offs were accepted and why?\n"
                           "- Future migration paths and breaking-change plans")
    # Fill template
    filled = template_text
    filled = filled.replace("{{REPO_NAME}}", repo_name)
    filled = filled.replace("{{DATE}}", date)
    filled = filled.replace("{{SHORT_DESCRIPTION}}", short_desc)
    filled = filled.replace("{{OVERVIEW}}", overview)
    filled = filled.replace("{{ARCHITECTURE_DIAGRAM}}", arch_diagram)
    filled = filled.replace("{{ENTRY_POINTS}}", entry_points_md)
    filled = filled.replace("{{DATA_FLOW}}", data_flow_md)
    filled = filled.replace("{{ABSTRACTIONS}}", abstractions_md)
    filled = filled.replace("{{API_SURFACE}}", api_surface_md)
    filled = filled.replace("{{EXISTING_TESTS}}", test_summary)
    filled = filled.replace("{{COVERAGE_GAPS}}", coverage_gaps_md)
    filled = filled.replace("{{CRITICAL_PATHS}}", critical_paths_md)
    filled = filled.replace("{{SECURITY}}", security_md)
    filled = filled.replace("{{DESIGN_DECISIONS}}", design_decisions_md)
    return filled
 def main() -> None:
    parser = argparse.ArgumentParser(description="Generate GENOME.md from a codebase using the canonical template")
    parser.add_argument("repo_path", help="Path to repository root")
    parser.add_argument("--output", "-o", default="", help="Write GENOME.md to this path (default: stdout)")
    parser.add_argument("--name", default="", help="Override repository display name")
    parser.add_argument("--dry-run", action="store_true", help="Print discovered stats without generating file")
    args = parser.parse_args()
    repo_path = Path(args.repo_path).resolve()
    if not repo_path.is_dir():
        print(f"ERROR: {repo_path} is not a directory", file=sys.stderr)
        sys.exit(1)
    repo_name = args.name or repo_path.name
    if args.dry_run:
        counts = count_files(repo_path)
        _, test_count = find_tests(repo_path)
        print(f"Repo: {repo_name}")
-        print(f"Total files: {sum(counts.values())}")
+        print(f"Total files (text): {sum(counts.values())}")
        print(f"Test files: {test_count}")
        print(f"Top types: {', '.join(f'{k}={v}' for k,v in list(counts.items())[:5])}")
        sys.exit(0)
    genome = generate_genome(repo_path, repo_name)
    if args.output:
-        with open(args.output, "w") as f:
+        out = Path(args.output)
-            f.write(genome)
+        out.write_text(genome, encoding="utf-8")
-        print(f"Written: {args.output}")
+        print(f"GENOME.md written: {out}")
    else:
        print(genome)
--- a/src/timmy/init.py
+++ b/src/timmy/init.py
@@ -1 +1,12 @@
 # Timmy core module
 from .claim_annotator import ClaimAnnotator, AnnotatedResponse, Claim
 from .audit_trail import AuditTrail, AuditEntry
 __all__ = [
    "ClaimAnnotator",
    "AnnotatedResponse",
    "Claim",
    "AuditTrail",
    "AuditEntry",
 ]
--- a/src/timmy/claim_annotator.py
+++ b/src/timmy/claim_annotator.py
@@ -0,0 +1,156 @@
 #!/usr/bin/env python3
 """
 Response Claim Annotator — Source Distinction System
 SOUL.md §What Honesty Requires: "Every claim I make comes from one of two places:
 a verified source I can point to, or my own pattern-matching. My user must be
 able to tell which is which."
 """
 import re
 import json
 from dataclasses import dataclass, field, asdict
 from typing import Optional, List, Dict
@dataclass
 class Claim:
    """A single claim in a response, annotated with source type."""
    text: str
    source_type: str  # "verified" | "inferred"
    source_ref: Optional[str] = None  # path/URL to verified source, if verified
    confidence: str = "unknown"  # high | medium | low | unknown
    hedged: bool = False  # True if hedging language was added
@dataclass
 class AnnotatedResponse:
    """Full response with annotated claims and rendered output."""
    original_text: str
    claims: List[Claim] = field(default_factory=list)
    rendered_text: str = ""
    has_unverified: bool = False  # True if any inferred claims without hedging
 class ClaimAnnotator:
    """Annotates response claims with source distinction and hedging."""
    # Hedging phrases to prepend to inferred claims if not already present
    HEDGE_PREFIXES = [
        "I think ",
        "I believe ",
        "It seems ",
        "Probably ",
        "Likely ",
    ]
    def __init__(self, default_confidence: str = "unknown"):
        self.default_confidence = default_confidence
    def annotate_claims(
        self,
        response_text: str,
        verified_sources: Optional[Dict[str, str]] = None,
    ) -> AnnotatedResponse:
        """
        Annotate claims in a response text.
        Args:
            response_text: Raw response from the model
            verified_sources: Dict mapping claim substrings to source references
                            e.g. {"Paris is the capital of France": "https://en.wikipedia.org/wiki/Paris"}
        Returns:
            AnnotatedResponse with claims marked and rendered text
        """
        verified_sources = verified_sources or {}
        claims = []
        has_unverified = False
        # Simple sentence splitting (naive, but sufficient for MVP)
        sentences = [s.strip() for s in re.split(r'[.!?]\s+', response_text) if s.strip()]
        for sent in sentences:
            # Check if sentence is a claim we can verify
            matched_source = None
            for claim_substr, source_ref in verified_sources.items():
                if claim_substr.lower() in sent.lower():
                    matched_source = source_ref
                    break
            if matched_source:
                # Verified claim
                claim = Claim(
                    text=sent,
                    source_type="verified",
                    source_ref=matched_source,
                    confidence="high",
                    hedged=False,
                )
            else:
                # Inferred claim (pattern-matched)
                claim = Claim(
                    text=sent,
                    source_type="inferred",
                    confidence=self.default_confidence,
                    hedged=self._has_hedge(sent),
                )
                if not claim.hedged:
                    has_unverified = True
            claims.append(claim)
        # Render the annotated response
        rendered = self._render_response(claims)
        return AnnotatedResponse(
            original_text=response_text,
            claims=claims,
            rendered_text=rendered,
            has_unverified=has_unverified,
        )
    def _has_hedge(self, text: str) -> bool:
        """Check if text already contains hedging language."""
        text_lower = text.lower()
        for prefix in self.HEDGE_PREFIXES:
            if text_lower.startswith(prefix.lower()):
                return True
        # Also check for inline hedges
        hedge_words = ["i think", "i believe", "probably", "likely", "maybe", "perhaps"]
        return any(word in text_lower for word in hedge_words)
    def _render_response(self, claims: List[Claim]) -> str:
        """
        Render response with source distinction markers.
        Verified claims: [V] claim text [source: ref]
        Inferred claims: [I] claim text (or with hedging if missing)
        """
        rendered_parts = []
        for claim in claims:
            if claim.source_type == "verified":
                part = f"[V] {claim.text}"
                if claim.source_ref:
                    part += f" [source: {claim.source_ref}]"
            else:  # inferred
                if not claim.hedged:
                    # Add hedging if missing
                    hedged_text = f"I think {claim.text[0].lower()}{claim.text[1:]}" if claim.text else claim.text
                    part = f"[I] {hedged_text}"
                else:
                    part = f"[I] {claim.text}"
            rendered_parts.append(part)
        return " ".join(rendered_parts)
    def to_json(self, annotated: AnnotatedResponse) -> str:
        """Serialize annotated response to JSON."""
        return json.dumps(
            {
                "original_text": annotated.original_text,
                "rendered_text": annotated.rendered_text,
                "has_unverified": annotated.has_unverified,
                "claims": [asdict(c) for c in annotated.claims],
            },
            indent=2,
            ensure_ascii=False,
        )
--- a/tests/timmy/test_claim_annotator.py
+++ b/tests/timmy/test_claim_annotator.py
@@ -0,0 +1,103 @@
 #!/usr/bin/env python3
 """Tests for claim_annotator.py — verifies source distinction is present."""
 import sys
 import os
 import json
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "src"))
 from timmy.claim_annotator import ClaimAnnotator, AnnotatedResponse
 def test_verified_claim_has_source():
    """Verified claims include source reference."""
    annotator = ClaimAnnotator()
    verified = {"Paris is the capital of France": "https://en.wikipedia.org/wiki/Paris"}
    response = "Paris is the capital of France. It is a beautiful city."
    result = annotator.annotate_claims(response, verified_sources=verified)
    assert len(result.claims) > 0
    verified_claims = [c for c in result.claims if c.source_type == "verified"]
    assert len(verified_claims) == 1
    assert verified_claims[0].source_ref == "https://en.wikipedia.org/wiki/Paris"
    assert "[V]" in result.rendered_text
    assert "[source:" in result.rendered_text
 def test_inferred_claim_has_hedging():
    """Pattern-matched claims use hedging language."""
    annotator = ClaimAnnotator()
    response = "The weather is nice today. It might rain tomorrow."
    result = annotator.annotate_claims(response)
    inferred_claims = [c for c in result.claims if c.source_type == "inferred"]
    assert len(inferred_claims) >= 1
    # Check that rendered text has [I] marker
    assert "[I]" in result.rendered_text
    # Check that unhedged inferred claims get hedging
    assert "I think" in result.rendered_text or "I believe" in result.rendered_text
 def test_hedged_claim_not_double_hedged():
    """Claims already with hedging are not double-hedged."""
    annotator = ClaimAnnotator()
    response = "I think the sky is blue. It is a nice day."
    result = annotator.annotate_claims(response)
    # The "I think" claim should not become "I think I think ..."
    assert "I think I think" not in result.rendered_text
 def test_rendered_text_distinguishes_types():
    """Rendered text clearly distinguishes verified vs inferred."""
    annotator = ClaimAnnotator()
    verified = {"Earth is round": "https://science.org/earth"}
    response = "Earth is round. Stars are far away."
    result = annotator.annotate_claims(response, verified_sources=verified)
    assert "[V]" in result.rendered_text  # verified marker
    assert "[I]" in result.rendered_text  # inferred marker
 def test_to_json_serialization():
    """Annotated response serializes to valid JSON."""
    annotator = ClaimAnnotator()
    response = "Test claim."
    result = annotator.annotate_claims(response)
    json_str = annotator.to_json(result)
    parsed = json.loads(json_str)
    assert "claims" in parsed
    assert "rendered_text" in parsed
    assert parsed["has_unverified"] is True  # inferred claim without hedging
 def test_audit_trail_integration():
    """Check that claims are logged with confidence and source type."""
    # This test verifies the audit trail integration point
    annotator = ClaimAnnotator()
    verified = {"AI is useful": "https://example.com/ai"}
    response = "AI is useful. It can help with tasks."
    result = annotator.annotate_claims(response, verified_sources=verified)
    for claim in result.claims:
        assert claim.source_type in ("verified", "inferred")
        assert claim.confidence in ("high", "medium", "low", "unknown")
        if claim.source_type == "verified":
            assert claim.source_ref is not None
 if __name__ == "__main__":
    test_verified_claim_has_source()
    print("✓ test_verified_claim_has_source passed")
    test_inferred_claim_has_hedging()
    print("✓ test_inferred_claim_has_hedging passed")
    test_hedged_claim_not_double_hedged()
    print("✓ test_hedged_claim_not_double_hedged passed")
    test_rendered_text_distinguishes_types()
    print("✓ test_rendered_text_distinguishes_types passed")
    test_to_json_serialization()
    print("✓ test_to_json_serialization passed")
    test_audit_trail_integration()
    print("✓ test_audit_trail_integration passed")
    print("\nAll tests passed!")