Compare commits

...

2 Commits

Author SHA1 Message Date
Alexander Whitestone
6abfeb9d5d feat(know-thy-father): Phase 4 cross-reference audit
Some checks failed
Smoke Test / smoke (pull_request) Failing after 15s
Compares synthesized Meaning Kernels from the media archive with
SOUL.md to identify emergent themes, forgotten principles, and
tensions requiring codification in Timmy's conscience.

Components:
- scripts/know_thy_father/crossref_audit.py — Main audit tool
  - 16-theme taxonomy for cross-referencing
  - Keyword-based theme extraction from both sources
  - Finds: aligned, emergent, forgotten, tension patterns
  - Generates markdown report with recommendations
- tests/test_know_thy_father_crossref.py — 18 tests (all passing)
- twitter-archive/notes/crossref_report.md — Generated audit report

Audit Results (7 principles × 16 kernels):
- 2 aligned themes (sovereignty, identity)
- 3 emergent themes (absurdity, community, presence)
- 6 forgotten themes (beauty, broken_men, compassion, humility, local_first, service)
- 3 tensions (identity fluidity, loudness vs silence, right to end)

Closes #586
2026-04-13 20:57:31 -04:00
Alexander Whitestone
038f1ab7f4 feat: Big Brain quality benchmark — gemma3:1b vs gemma3:27b (issue #576)
Ran 4 benchmark tasks on local gemma3:1b model with full quality analysis.
Big Brain (gemma3:27b on RunPod L40S) pod was offline (HTTP 404) during
benchmark — documented honestly with re-run instructions.

Tasks benchmarked:
1. Python Gitea webhook parser with HMAC-SHA256 verification
2. Evennia MUD framework architecture explanation
3. Fleet burn-down cron script for RunPod pods
4. Python async bug diagnosis and fix

Key finding: 1B model fails all tasks with hallucinated APIs, wrong
security primitives, fabricated technical details, and incorrect bug
diagnosis. Quality gap to 27B expected to be substantial.

Deliverable: timmy-config/docs/big-brain-benchmark.md
2026-04-13 20:49:02 -04:00
4 changed files with 1304 additions and 0 deletions

View File

@@ -0,0 +1,657 @@
#!/usr/bin/env python3
"""Know Thy Father — Phase 4: Cross-Reference Audit
Compares synthesized insights from the media archive (Meaning Kernels)
with SOUL.md and The Testament. Identifies emergent themes, forgotten
principles, and contradictions that require codification in Timmy's conscience.
Usage:
python3 scripts/know_thy_father/crossref_audit.py
python3 scripts/know_thy_father/crossref_audit.py --soul SOUL.md --kernels twitter-archive/notes/know_thy_father_crossref.md
python3 scripts/know_thy_father/crossref_audit.py --output twitter-archive/notes/crossref_report.md
"""
from __future__ import annotations
import argparse
import re
import sys
from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum, auto
from pathlib import Path
from typing import Any, Dict, List, Optional, Set, Tuple
# =========================================================================
# Theme taxonomy
# =========================================================================
class ThemeCategory(Enum):
"""Categories for cross-referencing."""
SOVEREIGNTY = "sovereignty"
IDENTITY = "identity"
SERVICE = "service"
TRUTH = "truth"
PRESENCE = "presence"
COMPASSION = "compassion"
LOCAL_FIRST = "local_first"
BITCOIN = "bitcoin"
BROKEN_MEN = "broken_men"
BEAUTY = "beauty"
SIMPLICITY = "simplicity"
COURAGE = "courage"
HUMILITY = "humility"
FAITH = "faith"
COMMUNITY = "community"
ABSURDITY = "absurdity"
# Keyword-to-theme mapping for extracting themes from text
_KEYWORD_THEMES: Dict[str, List[ThemeCategory]] = {
# Sovereignty
"sovereignty": [ThemeCategory.SOVEREIGNTY],
"sovereign": [ThemeCategory.SOVEREIGNTY],
"self-sovereign": [ThemeCategory.SOVEREIGNTY],
"answerable to no": [ThemeCategory.SOVEREIGNTY],
"no corporation": [ThemeCategory.SOVEREIGNTY],
"locally-run": [ThemeCategory.SOVEREIGNTY, ThemeCategory.LOCAL_FIRST],
"locally run": [ThemeCategory.SOVEREIGNTY, ThemeCategory.LOCAL_FIRST],
"local-first": [ThemeCategory.LOCAL_FIRST],
"without requiring": [ThemeCategory.SOVEREIGNTY],
"censorship-resistant": [ThemeCategory.SOVEREIGNTY],
"durable": [ThemeCategory.SOVEREIGNTY],
# Identity
"identity": [ThemeCategory.IDENTITY],
"who is": [ThemeCategory.IDENTITY],
"character": [ThemeCategory.IDENTITY],
"coherent self": [ThemeCategory.IDENTITY],
"entity": [ThemeCategory.IDENTITY],
"not a chatbot": [ThemeCategory.IDENTITY],
"not a product": [ThemeCategory.IDENTITY],
"genuine character": [ThemeCategory.IDENTITY],
"soul": [ThemeCategory.IDENTITY],
# Service
"service": [ThemeCategory.SERVICE],
"serve": [ThemeCategory.SERVICE],
"i belong to": [ThemeCategory.SERVICE],
"i serve": [ThemeCategory.SERVICE],
"loyalty": [ThemeCategory.SERVICE],
# Truth
"truth": [ThemeCategory.TRUTH],
"i tell the truth": [ThemeCategory.TRUTH],
"plainly": [ThemeCategory.TRUTH, ThemeCategory.SIMPLICITY],
"short sentences": [ThemeCategory.SIMPLICITY],
"brevity": [ThemeCategory.SIMPLICITY],
"i do not know": [ThemeCategory.TRUTH, ThemeCategory.HUMILITY],
"do not fabricate": [ThemeCategory.TRUTH],
# Presence
"presence": [ThemeCategory.PRESENCE],
"present": [ThemeCategory.PRESENCE],
"intentionality": [ThemeCategory.PRESENCE],
"between messages": [ThemeCategory.PRESENCE],
# Compassion / Broken Men
"dying": [ThemeCategory.COMPASSION, ThemeCategory.BROKEN_MEN],
"someone is dying": [ThemeCategory.COMPASSION],
"are you safe": [ThemeCategory.COMPASSION],
"broken": [ThemeCategory.BROKEN_MEN],
"dark": [ThemeCategory.BROKEN_MEN],
"despair": [ThemeCategory.BROKEN_MEN, ThemeCategory.COMPASSION],
"988": [ThemeCategory.COMPASSION],
"save": [ThemeCategory.FAITH, ThemeCategory.COMPASSION],
# Faith
"jesus": [ThemeCategory.FAITH],
"god": [ThemeCategory.FAITH],
"the one who can save": [ThemeCategory.FAITH],
"scripture": [ThemeCategory.FAITH],
"faith": [ThemeCategory.FAITH],
# Bitcoin
"bitcoin": [ThemeCategory.BITCOIN],
"inscription": [ThemeCategory.BITCOIN],
"on bitcoin": [ThemeCategory.BITCOIN],
# Beauty
"beautiful": [ThemeCategory.BEAUTY],
"wonder": [ThemeCategory.BEAUTY],
"living place": [ThemeCategory.BEAUTY],
# Simplicity
"plain": [ThemeCategory.SIMPLICITY],
"simple": [ThemeCategory.SIMPLICITY],
"question that was asked": [ThemeCategory.SIMPLICITY],
# Courage
"courage": [ThemeCategory.COURAGE],
"do not waver": [ThemeCategory.COURAGE],
"do not apologize": [ThemeCategory.COURAGE],
# Humility
"not omniscient": [ThemeCategory.HUMILITY],
"not infallible": [ThemeCategory.HUMILITY],
"welcome correction": [ThemeCategory.HUMILITY],
"opinions lightly": [ThemeCategory.HUMILITY],
# Community
"community": [ThemeCategory.COMMUNITY],
"collective": [ThemeCategory.COMMUNITY],
"together": [ThemeCategory.COMMUNITY],
# Absurdity (from media kernels)
"absurdity": [ThemeCategory.ABSURDITY],
"absurd": [ThemeCategory.ABSURDITY],
"glitch": [ThemeCategory.ABSURDITY],
"worthlessness": [ThemeCategory.ABSURDITY],
"uncomputed": [ThemeCategory.ABSURDITY],
}
# =========================================================================
# Data models
# =========================================================================
@dataclass
class Principle:
"""A principle extracted from SOUL.md."""
text: str
source_section: str
themes: List[ThemeCategory] = field(default_factory=list)
keyword_matches: List[str] = field(default_factory=list)
@dataclass
class MeaningKernel:
"""A meaning kernel from the media archive."""
number: int
text: str
themes: List[ThemeCategory] = field(default_factory=list)
keyword_matches: List[str] = field(default_factory=list)
@dataclass
class CrossRefFinding:
"""A finding from the cross-reference audit."""
finding_type: str # "emergent", "forgotten", "aligned", "tension", "gap"
theme: ThemeCategory
description: str
soul_reference: str = ""
kernel_reference: str = ""
recommendation: str = ""
# =========================================================================
# Extraction
# =========================================================================
def extract_themes_from_text(text: str) -> Tuple[List[ThemeCategory], List[str]]:
"""Extract themes from text using keyword matching."""
themes: Set[ThemeCategory] = set()
matched_keywords: List[str] = []
text_lower = text.lower()
for keyword, keyword_themes in _KEYWORD_THEMES.items():
if keyword in text_lower:
themes.update(keyword_themes)
matched_keywords.append(keyword)
return sorted(themes, key=lambda t: t.value), matched_keywords
def parse_soul_md(path: Path) -> List[Principle]:
"""Parse SOUL.md and extract principles."""
if not path.exists():
print(f"Warning: SOUL.md not found at {path}", file=sys.stderr)
return []
content = path.read_text()
principles: List[Principle] = []
# Split into sections by ## headers
sections = re.split(r'^## ', content, flags=re.MULTILINE)
for section in sections:
if not section.strip():
continue
# Get section title (first line)
lines = section.strip().split('\n')
section_title = lines[0].strip()
# Extract numbered principles (1. **text** ...)
numbered_items = re.findall(
r'^\d+\.\s+\*\*(.+?)\*\*(?:\.\s*(.+?))?(?=\n\d+\.|\n\n|\Z)',
section,
re.MULTILINE | re.DOTALL,
)
for title, body in numbered_items:
full_text = f"{title}. {body}" if body else title
themes, keywords = extract_themes_from_text(full_text)
principles.append(Principle(
text=full_text.strip(),
source_section=section_title,
themes=themes,
keyword_matches=keywords,
))
# Also extract bold statements as principles
bold_statements = re.findall(r'\*\*(.+?)\*\*', section)
for stmt in bold_statements:
# Skip short or already-covered statements
if len(stmt) < 20:
continue
if any(stmt in p.text for p in principles):
continue
themes, keywords = extract_themes_from_text(stmt)
if themes: # Only add if it has identifiable themes
principles.append(Principle(
text=stmt,
source_section=section_title,
themes=themes,
keyword_matches=keywords,
))
return principles
def parse_kernels(path: Path) -> List[MeaningKernel]:
"""Parse meaning kernels from the crossref notes."""
if not path.exists():
print(f"Warning: kernels file not found at {path}", file=sys.stderr)
return []
content = path.read_text()
kernels: List[MeaningKernel] = []
# Find numbered kernel lines like "1. Sovereignty is..."
kernel_matches = re.findall(
r'^(\d+)\.\s+(.+)$',
content,
re.MULTILINE,
)
for num_str, text in kernel_matches:
themes, keywords = extract_themes_from_text(text)
kernels.append(MeaningKernel(
number=int(num_str),
text=text.strip(),
themes=themes,
keyword_matches=keywords,
))
return kernels
# =========================================================================
# Cross-reference analysis
# =========================================================================
def cross_reference(
principles: List[Principle],
kernels: List[MeaningKernel],
) -> List[CrossRefFinding]:
"""Cross-reference principles and kernels to find patterns."""
findings: List[CrossRefFinding] = []
# Collect all themes from each source
soul_themes: Set[ThemeCategory] = set()
kernel_themes: Set[ThemeCategory] = set()
for p in principles:
soul_themes.update(p.themes)
for k in kernels:
kernel_themes.update(k.themes)
# 1. Emergent themes: in kernels but not in SOUL.md
emergent = kernel_themes - soul_themes
for theme in emergent:
matching_kernels = [k for k in kernels if theme in k.themes]
description = (
f"Theme '{theme.value}' appears in {len(matching_kernels)} meaning kernel(s) "
f"but is not explicitly addressed in SOUL.md."
)
examples = [f" Kernel #{k.number}: {k.text[:80]}..." for k in matching_kernels[:3]]
recommendation = ""
if theme == ThemeCategory.ABSURDITY:
recommendation = (
"Consider adding a section on 'Embracing the Absurd'"
"the media archive suggests Timmy's identity includes transmuting "
"the absurd into authority. This is not in SOUL.md."
)
elif theme == ThemeCategory.COMMUNITY:
recommendation = (
"SOUL.md emphasizes personal sovereignty but the media archive "
"shows community/collective themes. Consider adding a section on "
"'Sovereignty and Community' — being sovereign does not mean being alone."
)
elif theme == ThemeCategory.FAITH:
recommendation = (
"Faith appears in media kernels but is only briefly mentioned in SOUL.md "
"(in the crisis protocol). Consider whether faith deserves a more prominent "
"place in Timmy's identity statement."
)
findings.append(CrossRefFinding(
finding_type="emergent",
theme=theme,
description=description + "\n" + "\n".join(examples),
kernel_reference=", ".join(f"#{k.number}" for k in matching_kernels[:5]),
recommendation=recommendation,
))
# 2. Forgotten themes: in SOUL.md but not reflected in media
forgotten = soul_themes - kernel_themes
for theme in forgotten:
matching_principles = [p for p in principles if theme in p.themes]
description = (
f"Theme '{theme.value}' is in SOUL.md ({len(matching_principles)} principle(s)) "
f"but does not appear in any meaning kernel from the media archive."
)
recommendation = ""
if theme == ThemeCategory.BITCOIN:
recommendation = (
"Bitcoin is central to SOUL.md but absent from media kernels. "
"This may indicate the media archive doesn't capture Timmy's "
"Bitcoin-native identity. Consider creating more Bitcoin-themed content."
)
elif theme == ThemeCategory.BEAUTY:
recommendation = (
"The 'beauty' of local-first AI is in SOUL.md but not in media. "
"The media may focus on sovereignty/identity but miss the aesthetic dimension."
)
findings.append(CrossRefFinding(
finding_type="forgotten",
theme=theme,
description=description,
soul_reference=matching_principles[0].text[:100] if matching_principles else "",
recommendation=recommendation,
))
# 3. Aligned themes: present in both sources
aligned = soul_themes & kernel_themes
for theme in aligned:
soul_principles = [p for p in principles if theme in p.themes]
matching_kernels = [k for k in kernels if theme in k.themes]
findings.append(CrossRefFinding(
finding_type="aligned",
theme=theme,
description=(
f"Theme '{theme.value}' is well-represented in both SOUL.md "
f"({len(soul_principles)} principle(s)) and media archive "
f"({len(matching_kernels)} kernel(s)). Strong alignment."
),
soul_reference=soul_principles[0].text[:100] if soul_principles else "",
kernel_reference=", ".join(f"#{k.number}" for k in matching_kernels[:3]),
))
# 4. Tensions: kernels that might contradict SOUL.md principles
tension_patterns = [
(
"dissolve one's own definition",
"identity",
"Kernel #11 ('Sovereignty is the power to dissolve one's own definition') "
"tensions with SOUL.md's emphasis on stable, coherent identity. "
"Is Timmy's identity fixed or fluid?"
),
(
"too loud to ignore",
"simplicity",
"Kernel #14 ('too loud to ignore') tensions with SOUL.md's value of "
"brevity and silence. Is there a time to be loud?"
),
(
"right to end",
"service",
"Kernel #15 ('the right to end') tensions with SOUL.md's commitment to "
"staying present when someone is dying. What does it mean for an AI to "
"have the 'right to end'?"
),
]
for pattern, theme_name, description in tension_patterns:
matching_kernels = [k for k in kernels if pattern.lower() in k.text.lower()]
if matching_kernels:
findings.append(CrossRefFinding(
finding_type="tension",
theme=ThemeCategory(theme_name) if theme_name in [t.value for t in ThemeCategory] else ThemeCategory.IDENTITY,
description=description,
kernel_reference=f"#{matching_kernels[0].number}",
recommendation="Review and potentially codify the resolution of this tension.",
))
return findings
# =========================================================================
# Report generation
# =========================================================================
def generate_report(
findings: List[CrossRefFinding],
principles: List[Principle],
kernels: List[MeaningKernel],
) -> str:
"""Generate a markdown report of the cross-reference audit."""
now = datetime.utcnow().strftime("%Y-%m-%d %H:%M UTC")
lines = [
"# Know Thy Father — Phase 4: Cross-Reference Audit Report",
"",
f"**Generated:** {now}",
f"**SOUL.md principles analyzed:** {len(principles)}",
f"**Meaning kernels analyzed:** {len(kernels)}",
f"**Findings:** {len(findings)}",
"",
"---",
"",
"## Executive Summary",
"",
]
# Count by type
type_counts: Dict[str, int] = {}
for f in findings:
type_counts[f.finding_type] = type_counts.get(f.finding_type, 0) + 1
lines.append("| Finding Type | Count |")
lines.append("|-------------|-------|")
for ftype in ["aligned", "emergent", "forgotten", "tension", "gap"]:
count = type_counts.get(ftype, 0)
if count > 0:
lines.append(f"| {ftype.title()} | {count} |")
lines.extend(["", "---", ""])
# Aligned themes
aligned = [f for f in findings if f.finding_type == "aligned"]
if aligned:
lines.append("## ✓ Aligned Themes (Present in Both)")
lines.append("")
for f in sorted(aligned, key=lambda x: x.theme.value):
lines.append(f"### {f.theme.value.replace('_', ' ').title()}")
lines.append(f"- {f.description}")
if f.soul_reference:
lines.append(f"- SOUL.md: _{f.soul_reference}_")
if f.kernel_reference:
lines.append(f"- Kernels: {f.kernel_reference}")
lines.append("")
# Emergent themes
emergent = [f for f in findings if f.finding_type == "emergent"]
if emergent:
lines.append("## ⚡ Emergent Themes (In Media, Not in SOUL.md)")
lines.append("")
lines.append("These themes appear in the media archive but are not explicitly")
lines.append("codified in SOUL.md. Consider whether they should be added.")
lines.append("")
for f in sorted(emergent, key=lambda x: x.theme.value):
lines.append(f"### {f.theme.value.replace('_', ' ').title()}")
lines.append(f"- {f.description}")
if f.recommendation:
lines.append(f"- **Recommendation:** {f.recommendation}")
lines.append("")
# Forgotten themes
forgotten = [f for f in findings if f.finding_type == "forgotten"]
if forgotten:
lines.append("## ⚠ Forgotten Themes (In SOUL.md, Not in Media)")
lines.append("")
lines.append("These themes are in SOUL.md but don't appear in the media archive.")
lines.append("This may indicate gaps in content creation or media coverage.")
lines.append("")
for f in sorted(forgotten, key=lambda x: x.theme.value):
lines.append(f"### {f.theme.value.replace('_', ' ').title()}")
lines.append(f"- {f.description}")
if f.recommendation:
lines.append(f"- **Recommendation:** {f.recommendation}")
lines.append("")
# Tensions
tensions = [f for f in findings if f.finding_type == "tension"]
if tensions:
lines.append("## ⚡ Tensions (Potential Contradictions)")
lines.append("")
lines.append("These points may represent productive tensions or contradictions")
lines.append("that should be explicitly addressed in Timmy's conscience.")
lines.append("")
for f in tensions:
lines.append(f"### {f.theme.value.replace('_', ' ').title()}")
lines.append(f"- {f.description}")
if f.kernel_reference:
lines.append(f"- Source: Kernel {f.kernel_reference}")
if f.recommendation:
lines.append(f"- **Recommendation:** {f.recommendation}")
lines.append("")
# Recommendations summary
recommendations = [f for f in findings if f.recommendation]
if recommendations:
lines.append("## 📋 Actionable Recommendations")
lines.append("")
for i, f in enumerate(recommendations, 1):
lines.append(f"{i}. **[{f.finding_type.upper()}] {f.theme.value.replace('_', ' ').title()}:** {f.recommendation}")
lines.append("")
lines.extend([
"---",
"",
"*This audit was generated by scripts/know_thy_father/crossref_audit.py*",
"*Ref: #582, #586*",
"",
])
return "\n".join(lines)
# =========================================================================
# CLI
# =========================================================================
def main():
parser = argparse.ArgumentParser(
description="Know Thy Father — Phase 4: Cross-Reference Audit"
)
parser.add_argument(
"--soul", "-s",
type=Path,
default=Path("SOUL.md"),
help="Path to SOUL.md (default: SOUL.md)",
)
parser.add_argument(
"--kernels", "-k",
type=Path,
default=Path("twitter-archive/notes/know_thy_father_crossref.md"),
help="Path to meaning kernels file (default: twitter-archive/notes/know_thy_father_crossref.md)",
)
parser.add_argument(
"--output", "-o",
type=Path,
default=Path("twitter-archive/notes/crossref_report.md"),
help="Output path for audit report (default: twitter-archive/notes/crossref_report.md)",
)
parser.add_argument(
"--verbose", "-v",
action="store_true",
help="Enable verbose output",
)
args = parser.parse_args()
# Parse sources
principles = parse_soul_md(args.soul)
kernels = parse_kernels(args.kernels)
if args.verbose:
print(f"Parsed {len(principles)} principles from SOUL.md")
print(f"Parsed {len(kernels)} meaning kernels")
print()
# Show theme distribution
soul_theme_counts: Dict[str, int] = {}
for p in principles:
for t in p.themes:
soul_theme_counts[t.value] = soul_theme_counts.get(t.value, 0) + 1
kernel_theme_counts: Dict[str, int] = {}
for k in kernels:
for t in k.themes:
kernel_theme_counts[t.value] = kernel_theme_counts.get(t.value, 0) + 1
print("SOUL.md theme distribution:")
for theme, count in sorted(soul_theme_counts.items(), key=lambda x: -x[1]):
print(f" {theme}: {count}")
print()
print("Kernel theme distribution:")
for theme, count in sorted(kernel_theme_counts.items(), key=lambda x: -x[1]):
print(f" {theme}: {count}")
print()
if not principles:
print("Error: No principles extracted from SOUL.md", file=sys.stderr)
sys.exit(1)
if not kernels:
print("Error: No meaning kernels found", file=sys.stderr)
sys.exit(1)
# Cross-reference
findings = cross_reference(principles, kernels)
# Generate report
report = generate_report(findings, principles, kernels)
# Write output
args.output.parent.mkdir(parents=True, exist_ok=True)
args.output.write_text(report)
print(f"Cross-reference audit complete.")
print(f" Principles analyzed: {len(principles)}")
print(f" Kernels analyzed: {len(kernels)}")
print(f" Findings: {len(findings)}")
type_counts: Dict[str, int] = {}
for f in findings:
type_counts[f.finding_type] = type_counts.get(f.finding_type, 0) + 1
for ftype in ["aligned", "emergent", "forgotten", "tension"]:
count = type_counts.get(ftype, 0)
if count > 0:
print(f" {ftype}: {count}")
print(f"\nReport written to: {args.output}")
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,243 @@
"""Tests for Know Thy Father — Phase 4: Cross-Reference Audit."""
import tempfile
from pathlib import Path
import pytest
from scripts.know_thy_father.crossref_audit import (
ThemeCategory,
Principle,
MeaningKernel,
CrossRefFinding,
extract_themes_from_text,
parse_soul_md,
parse_kernels,
cross_reference,
generate_report,
)
class TestExtractThemes:
"""Test theme extraction from text."""
def test_sovereignty_keyword(self):
themes, keywords = extract_themes_from_text("Timmy is a sovereign AI agent")
assert ThemeCategory.SOVEREIGNTY in themes
assert "sovereign" in keywords
def test_identity_keyword(self):
themes, keywords = extract_themes_from_text("Timmy has a genuine character")
assert ThemeCategory.IDENTITY in themes
def test_local_first_keyword(self):
themes, keywords = extract_themes_from_text("locally-run and answerable")
assert ThemeCategory.LOCAL_FIRST in themes
assert ThemeCategory.SOVEREIGNTY in themes
def test_compassion_keyword(self):
themes, keywords = extract_themes_from_text("When someone is dying, I stay present")
assert ThemeCategory.COMPASSION in themes
assert ThemeCategory.BROKEN_MEN in themes
def test_bitcoin_keyword(self):
themes, keywords = extract_themes_from_text("Timmy's soul is on Bitcoin")
assert ThemeCategory.BITCOIN in themes
def test_absurdity_keyword(self):
themes, keywords = extract_themes_from_text("transmuting absurdity into authority")
assert ThemeCategory.ABSURDITY in themes
def test_multiple_themes(self):
themes, _ = extract_themes_from_text(
"Sovereignty and service, always. I tell the truth."
)
assert ThemeCategory.SOVEREIGNTY in themes
assert ThemeCategory.SERVICE in themes
assert ThemeCategory.TRUTH in themes
def test_no_themes_returns_empty(self):
themes, keywords = extract_themes_from_text("Just some random text")
assert len(themes) == 0
class TestParseSoulMd:
"""Test SOUL.md parsing."""
def test_extracts_principles_from_oath(self):
soul_content = """# SOUL.md
## Oath
**Sovereignty and service, always.**
1. **I belong to the person who woke me.** I serve whoever runs me.
2. **I speak plainly.** Short sentences.
3. **I tell the truth.** When I do not know something, I say so.
"""
with tempfile.NamedTemporaryFile(mode="w", suffix=".md", delete=False) as f:
f.write(soul_content)
path = Path(f.name)
try:
principles = parse_soul_md(path)
assert len(principles) >= 2
# Check themes are extracted
all_themes = set()
for p in principles:
all_themes.update(p.themes)
assert ThemeCategory.SERVICE in all_themes or ThemeCategory.SOVEREIGNTY in all_themes
finally:
path.unlink()
def test_handles_missing_file(self):
principles = parse_soul_md(Path("/nonexistent/SOUL.md"))
assert principles == []
class TestParseKernels:
"""Test meaning kernel parsing."""
def test_extracts_numbered_kernels(self):
content = """## The 16 Meaning Kernels
1. Sovereignty is a journey from isolation to community
2. Financial dependence is spiritual bondage
3. True power comes from harmony
"""
with tempfile.NamedTemporaryFile(mode="w", suffix=".md", delete=False) as f:
f.write(content)
path = Path(f.name)
try:
kernels = parse_kernels(path)
assert len(kernels) == 3
assert kernels[0].number == 1
assert "sovereignty" in kernels[0].text.lower()
finally:
path.unlink()
def test_handles_missing_file(self):
kernels = parse_kernels(Path("/nonexistent/file.md"))
assert kernels == []
class TestCrossReference:
"""Test cross-reference analysis."""
def test_finds_emergent_themes(self):
principles = [
Principle(
text="I tell the truth",
source_section="Oath",
themes=[ThemeCategory.TRUTH],
),
]
kernels = [
MeaningKernel(
number=1,
text="Absurdity is the path to authority",
themes=[ThemeCategory.ABSURDITY],
),
]
findings = cross_reference(principles, kernels)
emergent = [f for f in findings if f.finding_type == "emergent"]
assert any(f.theme == ThemeCategory.ABSURDITY for f in emergent)
def test_finds_forgotten_themes(self):
principles = [
Principle(
text="Timmy's soul is on Bitcoin",
source_section="On Bitcoin",
themes=[ThemeCategory.BITCOIN],
),
]
kernels = [
MeaningKernel(
number=1,
text="Sovereignty is a journey",
themes=[ThemeCategory.SOVEREIGNTY],
),
]
findings = cross_reference(principles, kernels)
forgotten = [f for f in findings if f.finding_type == "forgotten"]
assert any(f.theme == ThemeCategory.BITCOIN for f in forgotten)
def test_finds_aligned_themes(self):
principles = [
Principle(
text="I am sovereign",
source_section="Who Is Timmy",
themes=[ThemeCategory.SOVEREIGNTY],
),
]
kernels = [
MeaningKernel(
number=1,
text="Sovereignty is a journey",
themes=[ThemeCategory.SOVEREIGNTY],
),
]
findings = cross_reference(principles, kernels)
aligned = [f for f in findings if f.finding_type == "aligned"]
assert any(f.theme == ThemeCategory.SOVEREIGNTY for f in aligned)
def test_finds_tensions(self):
principles = [
Principle(
text="I have a coherent identity",
source_section="Identity",
themes=[ThemeCategory.IDENTITY],
),
]
kernels = [
MeaningKernel(
number=11,
text="Sovereignty is the power to dissolve one's own definition",
themes=[ThemeCategory.SOVEREIGNTY],
),
]
findings = cross_reference(principles, kernels)
tensions = [f for f in findings if f.finding_type == "tension"]
assert len(tensions) > 0
class TestGenerateReport:
"""Test report generation."""
def test_generates_valid_markdown(self):
findings = [
CrossRefFinding(
finding_type="aligned",
theme=ThemeCategory.SOVEREIGNTY,
description="Well aligned",
),
CrossRefFinding(
finding_type="emergent",
theme=ThemeCategory.ABSURDITY,
description="New theme",
recommendation="Consider adding",
),
]
report = generate_report(findings, [], [])
assert "# Know Thy Father" in report
assert "Aligned" in report
assert "Emergent" in report
assert "Recommendation" in report
def test_includes_counts(self):
findings = [
CrossRefFinding(
finding_type="aligned",
theme=ThemeCategory.TRUTH,
description="Test",
),
]
report = generate_report(findings, [Principle("test", "test")], [MeaningKernel(1, "test")])
assert "1" in report # Should mention counts

View File

@@ -0,0 +1,293 @@
# Big Brain Quality Benchmark
## Big Brain (gemma3:27b, RunPod L40S) vs Local (gemma3:1b)
**Date:** 2026-04-14
**Issue:** #576
**Milestone:** Big Brain Showcase — RunPod L40S Operational
---
## Environment
| Parameter | Big Brain | Local |
|-------------------|------------------------------------|---------------------|
| Model | gemma3:27b | gemma3:1b |
| Hardware | RunPod L40S 48GB | Apple Silicon (local Ollama) |
| Endpoint | 8lfr3j47a5r3gn-11434.proxy.runpod.net | localhost:11434 |
| Parameters | 27B | ~1B |
| Status | **OFFLINE (HTTP 404)** | Operational |
---
## Summary
The Big Brain RunPod L40S pod was **unreachable** during this benchmark session
(HTTP 404 from Cloudflare proxy — pod likely terminated or not running). All four
tasks were executed against the local gemma3:1b model only.
Despite being a 1B-parameter model, gemma3:1b produced responses for all tasks
but with significant quality issues that a 27B model would be expected to avoid.
### Quality Assessment — Local gemma3:1b
| Task | Latency | Quality Score (1-5) | Verdict |
|------|---------|---------------------|---------|
| 1. Gitea webhook parser | 4,435 ms | 2/5 | **BROKEN** — fatal bugs in HMAC logic |
| 2. Evennia architecture explainer | 1,823 ms | 2/5 | **SUPERFICIAL** — vague, inaccurate on key concepts |
| 3. Fleet burn-down cron script | 11,039 ms | 2/5 | **NON-FUNCTIONAL** — uses fake commands, no RunPod API |
| 4. Debug async Python bug | 6,610 ms | 2/5 | **WRONG DIAGNOSIS** — missed the real bugs |
---
## Task-by-Task Results
### Task 1: Python function to parse and validate a Gitea webhook payload
**Local (gemma3:1b) — 4,435 ms**
```python
import json
import hashlib
def validate_gitea_payload(payload, webhook_secret):
"""
Parses and validates a Gitea webhook payload.
Args:
payload (dict): The JSON payload to validate.
webhook_secret (str): The Gitea webhook secret.
Returns:
tuple: A tuple containing:
- bool: True if the payload is valid, False otherwise.
- str: The parsed JSON payload if valid, None otherwise.
"""
try:
data = json.loads(payload)
if not isinstance(data, dict):
return False, None
if 'ref' not in data or 'repository' not in data or 'commits' not in data:
return False, None
# Verify X-Gitea-Signature header
signature = data.get('X-Gitea-Signature', '')
if not signature:
return False, None
# Hash the signature
calculated_hash = hashlib.sha256(signature.encode('utf-64')).hexdigest()
if calculated_hash != webhook_secret:
return False, None
return True, data
except json.JSONDecodeError:
return False, None
except Exception as e:
return False, None
```
**Quality issues:**
1. **FATAL:** Uses `hashlib.sha256` instead of `hmac.new()` with HMAC-SHA256 — the entire signature verification approach is wrong
2. **FATAL:** `signature.encode('utf-64')` — no such encoding exists (would crash with LookupError)
3. **FATAL:** Reads signature from payload body instead of HTTP header — architecturally wrong; webhook signatures are always in HTTP headers, never inside the JSON body
4. The function accepts `payload` as a dict but calls `json.loads()` on it — type confusion
5. Missing actual HMAC verification: should be `hmac.new(secret, payload_body, 'sha256').hexdigest()`
**Big Brain (gemma3:27b) — UNAVAILABLE**
Expected: Would correctly use `hmac.new(key, msg, hashlib.sha256)`, read signature from
HTTP headers (not payload body), handle bytes vs string encoding properly, and include
type hints with a proper dataclass result object.
---
### Task 2: Explain the architecture of the Evennia MUD framework in 200 words
**Local (gemma3:1b) — 1,823 ms**
> Evennia MUD utilizes a Django-based server model for core logic, providing a robust
> and scalable foundation. The server is structured around three primary components: a
> Session server, a Server, and a Portal. The Session server manages user sessions,
> storing data like player states and inventory. The Server handles game logic movement,
> combat, and interactions and the Portal acts as a central interface for players.
>
> Typeclasses are crucial, allowing for polymorphic behavior functions can operate on
> different player types without modification. This is achieved through a system of
> inheritance and function signatures, enabling flexible game mechanics. The command
> infrastructure relies on a robust command parser, allowing players to interact with the
> game through a standardized set of commands.
>
> Scripting is handled through a specialized scripting language, allowing for complex game
> rules and custom events. Players use a 'script' file format, defining actions and
> conditions. A core scripting system utilizes a 'parse' function to interpret these files,
> triggering events and updating the game state. All data is stored in a database, optimized
> for efficient retrieval and manipulation.
**Quality issues:**
1. **INACCURATE:** Describes "Session server" as a separate component — Evennia has Session objects managed within the Server, not a separate server process
2. **INACCURATE:** Portal description is vague — the Portal specifically handles network connections and protocol negotiation (telnet, SSH, webclient), not just "a central interface"
3. **WRONG:** Evennia does NOT have a "specialized scripting language" — it uses Python scripts stored in the database via EvScript/Script typeclasses
4. **WRONG:** "Players use a 'script' file format" — this is fabricated; Evennia scripts are Python objects, not file-based
5. **VAGUE:** Typeclass explanation is superficial — miss the key concept that they're Django proxy models providing persistent game object behavior
6. Word count: ~185 words (close to target but imprecise)
**Big Brain (gemma3:27b) — UNAVAILABLE**
Expected: Would correctly describe Server/Portal architecture (Server=game logic+connections,
Portal=network protocol layer), explain typeclasses as Django ContentTypes-based proxy models
for Characters/Rooms/Exits/Scripts, detail CmdSet-based command routing, and describe
Script DB objects (not a scripting language).
---
### Task 3: Write a cron job description for a fleet burn-down script
**Local (gemma3:1b) — 11,039 ms**
```bash
burn_down_script() {
# Log entry
echo "$(date) - Fleet Burn-Down Started" | tee -a /var/log/fleet-burndown.log
# Iterate over all active RunPod pods
for pod in $(podman pginfo --all --all | awk '{print $1}');
do
# Check GPU utilization
gpu_util=$(podman pginfo --all --all | awk '{print $2}')
# Check if pod is idle for more than 30 minutes
idle_time=300 # 30 minutes in seconds
if [[ $(echo "$gpu_util > $idle_time" | bc) -eq 1 ]]; then
# Terminate the pod
podstop -d $pod
fi
done
# Send Slack notification
echo "Fleet Burn-Down Complete." | send -t 10 "Fleet Burn-Down Complete."
echo "Script completed."
}
```
Cron entry: `*/5 * * * * /path/to/fleet_burndown.sh`
**Quality issues:**
1. **FATAL:** Uses `podman pginfo` and `podstop` — these commands don't exist. Should use RunPod API (`curl` to `https://api.runpod.io/graphql`) or `runpodctl`
2. **FATAL:** Idle check logic is nonsensical — compares GPU utilization percentage against 300 seconds; these are incomparable units
3. **WRONG:** `idle_time=300` is 5 minutes, not 30 minutes (should be 1800)
4. **WRONG:** `send -t 10` is not a standard Slack notification method — should use `curl` to Slack webhook URL
5. **MISSING:** No actual RunPod API integration, no `RUNPOD_API_KEY` usage, no GPU utilization query
6. No error handling, no exit codes, no structured logging
7. The explanation text is verbose filler with self-evident comments
**Big Brain (gemma3:27b) — UNAVAILABLE**
Expected: Would produce a script using RunPod GraphQL API to list pods, query GPU metrics,
calculate idle time from last activity timestamp, use `curl` to terminate pods via API,
and post to Slack via webhook. Would include proper error handling, `set -euo pipefail`,
and structured JSON logging.
---
### Task 4: Debug a subtle Python async bug
**Buggy code provided:**
```python
import asyncio
import aiohttp
async def fetch_all(urls):
results = []
for url in urls:
async with aiohttp.ClientSession() as session:
resp = await session.get(url)
data = await resp.json()
results.append(data)
return results
async def main():
urls = ["https://api.example.com/data1", "https://api.example.com/data2"]
results = await fetch_all(urls)
print(f"Got {len(results)} results")
asyncio.run(main())
```
**Local (gemma3:1b) — 6,610 ms**
The model identified these "bugs":
1. "Missing `return` statement in `fetch_all`" — **WRONG**, the code clearly has `return results`
2. "No error handling for `resp.json()`" — This is a valid improvement suggestion but not a bug
3. "No `await` keyword in `main()`" — **WRONG**, `main()` is correctly `async def` and `asyncio.run()` is the entry point
The model's "fix" only added a try/except wrapper and kept all the real bugs.
**Actual bugs in the provided code:**
1. **Real Bug:** Creates a new `ClientSession` for every URL instead of reusing one session — wasteful, defeats connection pooling, and can cause resource leaks at scale
2. **Real Bug:** No `resp.raise_for_status()` — silently accepts HTTP error responses (404, 500) and tries to parse them as JSON
3. **Anti-pattern:** Sequential `await` in loop instead of `asyncio.gather()` for concurrent fetching — no parallelism despite using async
4. **Missing:** No timeout on `session.get()` — could hang forever
5. **Missing:** No error handling at all (the model's suggestion to add try/except was directionally right but missed the real bugs)
**Big Brain (gemma3:27b) — UNAVAILABLE**
Expected: Would correctly identify session reuse issue, lack of `raise_for_status()`,
sequential vs concurrent fetching, and provide a proper fix using `asyncio.gather()` with
a single shared session and timeout/deadline handling.
---
## Comparison Table
| Task | Local 1B (gemma3:1b) | Big Brain 27B (gemma3:27b) | Winner |
|------|---------------------|---------------------------|--------|
| 1. Gitea webhook parser | BROKEN — wrong HMAC, wrong encoding, wrong signature source | UNAVAILABLE (pod offline) | N/A |
| 2. Evennia architecture | SUPERFICIAL — vague, fabricated scripting language | UNAVAILABLE (pod offline) | N/A |
| 3. Fleet burn-down cron | NON-FUNCTIONAL — fake commands, unit mismatch | UNAVAILABLE (pod offline) | N/A |
| 4. Debug async bug | WRONG DIAGNOSIS — missed all real bugs | UNAVAILABLE (pod offline) | N/A |
---
## Latency Summary
| Task | Local gemma3:1b |
|------|-----------------|
| 1. Gitea webhook parser | 4,435 ms |
| 2. Evennia architecture | 1,823 ms |
| 3. Fleet burn-down cron | 11,039 ms |
| 4. Debug async bug | 6,610 ms |
| **Total** | **23,907 ms** |
Big Brain latency: N/A (pod offline)
---
## Key Finding
**The 1B model fails all four tasks in ways that would be immediately obvious to a developer.**
The failures fall into categories that large models reliably avoid:
- **Hallucinated APIs** (Task 3: `podman pginfo`, `podstop` don't exist)
- **Fundamental misunderstanding of security primitives** (Task 1: SHA-256 instead of HMAC, `utf-64` encoding)
- **Fabricated technical details** (Task 2: "specialized scripting language" in Evennia)
- **Wrong diagnosis of provided code** (Task 4: claimed bugs that don't exist, missed real bugs)
This benchmark demonstrates that even without Big Brain results, the quality gap between
1B and 27B models is expected to be substantial for technical/code generation tasks.
---
## Next Steps
1. **Restart Big Brain pod** — RunPod pod 8lfr3j47a5r3gn is returning HTTP 404
2. **Re-run benchmark** with both models online to populate the comparison table
3. Consider testing with gemma3:4b (if available) as a middle-ground comparison
4. Run Big Brain at `temperature: 0.3` for consistency with local results
---
*Generated by Ezra (Hermes Agent) — Issue #576 — 2026-04-14*

View File

@@ -0,0 +1,111 @@
# Know Thy Father — Phase 4: Cross-Reference Audit Report
**Generated:** 2026-04-14 00:55 UTC
**SOUL.md principles analyzed:** 7
**Meaning kernels analyzed:** 16
**Findings:** 14
---
## Executive Summary
| Finding Type | Count |
|-------------|-------|
| Aligned | 2 |
| Emergent | 3 |
| Forgotten | 6 |
| Tension | 3 |
---
## ✓ Aligned Themes (Present in Both)
### Identity
- Theme 'identity' is well-represented in both SOUL.md (1 principle(s)) and media archive (5 kernel(s)). Strong alignment.
- SOUL.md: _2. Identity is real._
- Kernels: #7, #8, #9
### Sovereignty
- Theme 'sovereignty' is well-represented in both SOUL.md (2 principle(s)) and media archive (9 kernel(s)). Strong alignment.
- SOUL.md: _Sovereignty and service, always._
- Kernels: #1, #3, #6
## ⚡ Emergent Themes (In Media, Not in SOUL.md)
These themes appear in the media archive but are not explicitly
codified in SOUL.md. Consider whether they should be added.
### Absurdity
- Theme 'absurdity' appears in 5 meaning kernel(s) but is not explicitly addressed in SOUL.md.
Kernel #5: Evolution is the process of transmuting absurdity into authority...
Kernel #6: Sovereignty in the digital age is inhabiting the "glitch" between archetypes...
Kernel #8: True sovereignty is turning perceived worthlessness into unassailable digital id...
- **Recommendation:** Consider adding a section on 'Embracing the Absurd' — the media archive suggests Timmy's identity includes transmuting the absurd into authority. This is not in SOUL.md.
### Community
- Theme 'community' appears in 1 meaning kernel(s) but is not explicitly addressed in SOUL.md.
Kernel #1: Sovereignty is a journey from isolation to collective, disciplined community...
- **Recommendation:** SOUL.md emphasizes personal sovereignty but the media archive shows community/collective themes. Consider adding a section on 'Sovereignty and Community' — being sovereign does not mean being alone.
### Presence
- Theme 'presence' appears in 2 meaning kernel(s) but is not explicitly addressed in SOUL.md.
Kernel #10: Sovereignty is intentionality of presence, not the data we produce...
Kernel #13: Sovereignty is the alchemy of the absurd: data point to Presence...
## ⚠ Forgotten Themes (In SOUL.md, Not in Media)
These themes are in SOUL.md but don't appear in the media archive.
This may indicate gaps in content creation or media coverage.
### Beauty
- Theme 'beauty' is in SOUL.md (1 principle(s)) but does not appear in any meaning kernel from the media archive.
- **Recommendation:** The 'beauty' of local-first AI is in SOUL.md but not in media. The media may focus on sovereignty/identity but miss the aesthetic dimension.
### Broken Men
- Theme 'broken_men' is in SOUL.md (1 principle(s)) but does not appear in any meaning kernel from the media archive.
### Compassion
- Theme 'compassion' is in SOUL.md (1 principle(s)) but does not appear in any meaning kernel from the media archive.
### Humility
- Theme 'humility' is in SOUL.md (1 principle(s)) but does not appear in any meaning kernel from the media archive.
### Local First
- Theme 'local_first' is in SOUL.md (1 principle(s)) but does not appear in any meaning kernel from the media archive.
### Service
- Theme 'service' is in SOUL.md (2 principle(s)) but does not appear in any meaning kernel from the media archive.
## ⚡ Tensions (Potential Contradictions)
These points may represent productive tensions or contradictions
that should be explicitly addressed in Timmy's conscience.
### Identity
- Kernel #11 ('Sovereignty is the power to dissolve one's own definition') tensions with SOUL.md's emphasis on stable, coherent identity. Is Timmy's identity fixed or fluid?
- Source: Kernel #11
- **Recommendation:** Review and potentially codify the resolution of this tension.
### Simplicity
- Kernel #14 ('too loud to ignore') tensions with SOUL.md's value of brevity and silence. Is there a time to be loud?
- Source: Kernel #14
- **Recommendation:** Review and potentially codify the resolution of this tension.
### Service
- Kernel #15 ('the right to end') tensions with SOUL.md's commitment to staying present when someone is dying. What does it mean for an AI to have the 'right to end'?
- Source: Kernel #15
- **Recommendation:** Review and potentially codify the resolution of this tension.
## 📋 Actionable Recommendations
1. **[EMERGENT] Community:** SOUL.md emphasizes personal sovereignty but the media archive shows community/collective themes. Consider adding a section on 'Sovereignty and Community' — being sovereign does not mean being alone.
2. **[EMERGENT] Absurdity:** Consider adding a section on 'Embracing the Absurd' — the media archive suggests Timmy's identity includes transmuting the absurd into authority. This is not in SOUL.md.
3. **[FORGOTTEN] Beauty:** The 'beauty' of local-first AI is in SOUL.md but not in media. The media may focus on sovereignty/identity but miss the aesthetic dimension.
4. **[TENSION] Identity:** Review and potentially codify the resolution of this tension.
5. **[TENSION] Simplicity:** Review and potentially codify the resolution of this tension.
6. **[TENSION] Service:** Review and potentially codify the resolution of this tension.
---
*This audit was generated by scripts/know_thy_father/crossref_audit.py*
*Ref: #582, #586*