feat: add tests/test_big_brain_repo_audit.py for nexus audit artifact

feat: add scripts/big_brain_repo_audit.py for nexus audit artifact
feat: add research/big-brain/the-nexus-deep-audit.md for nexus audit artifact
2026-04-14 11:34:14 +00:00 · 2026-04-14 11:34:13 +00:00 · 2026-04-14 11:34:12 +00:00 · 2026-04-14 11:34:11 +00:00 · 2026-04-14 11:34:10 +00:00
8 changed files with 3484 additions and 923 deletions
--- a/research/big-brain/the-nexus-audit-model.md
+++ b/research/big-brain/the-nexus-audit-model.md
@@ -0,0 +1,61 @@
+Based on the provided context, I have analyzed the files to identify key themes, technological stacks, and architectural patterns.
+
+Here is a structured summary and analysis of the codebase.
+
+---
+
+## 🔍 Codebase Analysis Summary
+
+The codebase appears to be highly specialized in integrating multiple domains for complex automation, mimicking a simulation or state-machine management system. The technologies used suggest a modern, robust, and possibly multi-threaded backend system.
+
+### 🧩 Core Functionality & Domain Focus
+1.  **State Management & Simulation:** The system tracks a state machine or simulation flow, suggesting discrete states and transitions.
+2.  **Interaction Handling:** There is explicit logic for handling user/input events, suggesting an event-driven architecture.
+3.  **Persistence/Logging:** State and event logging are crucial for debugging, implying robust state tracking.
+4.  **Service Layer:** The structure points to well-defined services or modules handling specific domain logic.
+
+### 💻 Technology Stack & Language
+The presence of Python-specific constructs (e.g., `unittest`, file paths) strongly indicates **Python** is the primary language.
+
+### 🧠 Architectural Patterns
+*   **Dependency Injection/Service Locators:** Implied by how components interact with services.
+*   **Singleton Pattern:** Suggests critical shared resources or state managers.
+*   **State Pattern:** The core logic seems centered on managing `CurrentState` and `NextState` transitions.
+*   **Observer/Publisher-Subscriber:** Necessary for decoupling event emitters from event handlers.
+
+---
+
+## 🎯 Key Insights & Focus Areas
+
+### 1. State Machine Implementation
+*   **Concept:** The core logic revolves around managing state transitions (e.g., `CurrentState` $\rightarrow$ `NextState`).
+*   **Significance:** This is the central control flow. All actions must be validated against the current state.
+*   **Areas to Watch:** Potential for infinite loops or missing transition logic errors.
+
+### 2. Event Handling
+*   **Concept:** The system relies on emitting and subscribing to events.
+*   **Significance:** This decouples the state transition logic from the effectors. When a state changes, it triggers associated actions.
+*   **Areas to Watch:** Ensuring all necessary listeners are registered and cleaned up properly.
+
+### 3. State Persistence & Logging
+*   **Concept:** Maintaining a history or current state representation is critical.
+*   **Significance:** Provides auditability and debugging capabilities.
+*   **Areas to Watch:** Thread safety when multiple threads/processes attempt to read/write the state concurrently.
+
+### 4. Dependency Management
+*   **Concept:** The system needs to gracefully manage its dependencies.
+*   **Significance:** Ensures testability and modularity.
+
+---
+
+## 🚀 Suggestions for Improvement (Refactoring & Hardening)
+
+These suggestions are based on general best practices for complex, stateful systems.
+
+1.  **Use of an Event Bus Pattern:** If the system is becoming large, formalize the communication using a dedicated `EventBus` singleton class to centralize all event emission/subscription logic.
+2.  **State Machine Definition:** Define states and transitions using an **Enum** or a **Dictionary** mapping, rather than using conditional checks (`if current_state == ...`). This makes the state graph explicit and enforces compile-time checks for invalid transitions.
+3.  **Thread Safety:** If state changes can happen from multiple threads, ensure that any write operation to the global state or shared resources is protected by a **Lock** (`threading.Lock` in Python).
+4.  **Dependency Graph Visualization:** Diagramming the relationships between major components will clarify dependencies, which is crucial for onboarding new developers.
+
+---
+*Since no specific goal or question was given, this analysis provides a comprehensive overview, identifying the core architectural patterns and areas for robustness improvements.*
--- a/research/big-brain/the-nexus-context-bundle.md
+++ b/research/big-brain/the-nexus-context-bundle.md
--- a/research/big-brain/the-nexus-deep-audit.md
+++ b/research/big-brain/the-nexus-deep-audit.md
@@ -0,0 +1,161 @@
+# The Nexus Deep Audit
+
+Date: 2026-04-14
+Target repo: Timmy_Foundation/the-nexus
+Audited commit: `dfbd96f7927a377c40ccb488238f5e2b69b033ba`
+Audit artifact issue: timmy-home#575
+Follow-on issue filed: the-nexus#1423
+Supporting artifacts:
+- `research/big-brain/the-nexus-context-bundle.md`
+- `research/big-brain/the-nexus-audit-model.md`
+- `scripts/big_brain_repo_audit.py`
+
+## Method
+- Cloned `Timmy_Foundation/the-nexus` at clean `main`.
+- Indexed 403 text files and ~38.2k LOC (Python-heavy backend plus a substantial browser shell).
+- Generated a long-context markdown bundle with `scripts/big_brain_repo_audit.py`.
+- Ran the bundle through local Ollama (`gemma4:latest`) and then manually verified every claim against source and tests.
+- Validation commands run during audit:
+  - `python3 bin/generate_provenance.py --check` → failed with 7 changed contract files
+  - `pytest -q tests/test_provenance.py` → 1 failed / 5 passed
+
+## Architecture summary
+The repo is no longer a narrow "Python cognition only" shell. Current `main` is a mixed system with four active layers:
+
+1. Browser world / operator shell at repo root
+   - `index.html`, `app.js`, `style.css`, `boot.js`, `gofai_worker.js`, `portals.json`, `vision.json`
+   - Playwright smoke tests explicitly treat these files as the live browser contract (`tests/test_browser_smoke.py:70-88`).
+
+2. Local bridge / runtime surface
+   - `server.py` runs the WebSocket gateway for the browser shell (`server.py:1-123`).
+   - `electron-main.js` adds a desktop shell / IPC path (`electron-main.js:1-12`).
+
+3. Python cognition + world adapters under `nexus/`
+   - Mnemosyne archive, A2A card/server/client, Evennia bridge, Morrowind/Bannerlord harnesses.
+   - The archive alone is a significant subsystem (`nexus/mnemosyne/archive.py:21-220`).
+
+4. Separate intelligence / ops stacks
+   - `intelligence/deepdive/` claims a complete sovereign briefing pipeline (`intelligence/deepdive/README.md:30-43`).
+   - `bin/`, `scripts/`, `docs/`, and `scaffold/` contain a second large surface area of ops tooling, scaffolds, and KT artifacts.
+
+Net: this is a hybrid browser shell + orchestration + research/ops monorepo. The biggest architectural problem is not missing capability. It is unclear canonical ownership.
+
+## Top 5 structural issues / code smells
+
+### 1. Repo truth is internally contradictory
+`README.md` still says current `main` does not contain a root frontend and that serving the repo root only yields a directory listing (`README.md:42-57`, `README.md:118-143`). That is directly contradicted by:
+- the actual root files present in the checkout (`index.html`, `app.js`, `style.css`, `gofai_worker.js`)
+- browser contract tests that require those exact files to be served (`tests/test_browser_smoke.py:70-88`)
+- provenance tests that treat those root frontend files as canonical (`tests/test_provenance.py:54-65`)
+
+Impact: contributors cannot trust the repo's own description of what is canonical. The docs are actively steering people away from the code that tests say is real.
+
+### 2. The provenance contract is stale and currently broken on `main`
+The provenance system is supposed to prove the browser surface came from a clean checkout (`bin/generate_provenance.py:19-39`, `tests/test_provenance.py:39-51`). But the committed manifest was generated from a dirty feature branch, not clean `main` (`provenance.json:2-8`). On current `main`, the contract is already invalid:
+- `python3 bin/generate_provenance.py --check` fails on 7 files
+- `pytest -q tests/test_provenance.py` fails on `test_provenance_hashes_match`
+
+Impact: the repo's own anti-ghost-world safety mechanism no longer signals truth. That weakens every future visual validation claim.
+
+### 3. `app.js` is a 4k-line god object with duplicate module ownership
+`app.js` imports the symbolic engine module (`app.js:105-109`) and then immediately redefines the same classes inline (`app.js:111-652`). The duplicated classes also exist in `nexus/symbolic-engine.js:2-386`.
+
+This means the symbolic layer has at least two owners:
+- canonical-looking module: `nexus/symbolic-engine.js`
+- actual inlined implementation: `app.js:111-652`
+
+Impact: changes can drift silently, code review becomes deceptive, and the frontend boundary is fake. The file is also absorbing unrelated responsibilities far beyond symbolic reasoning: WebSocket transport (`app.js:2165-2232`), Evennia panels (`app.js:2291-2458`), MemPalace UI (`app.js:2764-2875`), rendering, controls, and ops dashboards.
+
+### 4. The frontend contains shadowed handlers and duplicated DOM state
+There are multiple signs of merge-by-accretion rather than clean composition:
+- `connectHermes()` initializes MemPalace twice (`app.js:2165-2170`)
+- `handleEvenniaEvent()` is defined once for the action stream (`app.js:2326-2340`) and then redefined again for room snapshots (`app.js:2350-2379`), silently shadowing the earlier version
+- the injected MemPalace stats block duplicates the same DOM IDs twice (`compression-ratio`, `docs-mined`, `aaak-size`) in one insertion (`app.js:2082-2090`)
+- literal escaped newlines have been committed into executable code lines (`app.js:1`, `app.js:637`, `app.js:709`)
+
+Impact: parts of the UI can go dead without obvious failures, DOM queries become ambiguous, and the file is carrying artifacts of prior AI patching rather than coherent ownership.
+
+### 5. DeepDive is split across two contradictory implementations
+`intelligence/deepdive/README.md` claims the Deep Dive system is implementation-complete and production-ready (`intelligence/deepdive/README.md:30-43`). In the same repo, `scaffold/deepdive/phase2/relevance_engine.py`, `phase4/tts_pipeline.py`, and `phase5/telegram_delivery.py` are still explicit TODO stubs (`scaffold/deepdive/phase2/relevance_engine.py:10-18`, `scaffold/deepdive/phase4/tts_pipeline.py:9-17`, `scaffold/deepdive/phase5/telegram_delivery.py:9-16`).
+
+There is also sovereignty drift inside the claimed production path: the README says synthesis and TTS are local-first with "No ElevenLabs" (`intelligence/deepdive/README.md:49-57`), while `tts_engine.py` still ships `ElevenLabsTTS` and a hybrid fallback path (`intelligence/deepdive/tts_engine.py:120-209`).
+
+Impact: operators cannot tell which DeepDive path is canonical, and sovereignty claims are stronger than the actual implementation boundary.
+
+## Top 3 recommended refactors
+
+### 1. Re-establish a single source of truth for the browser contract
+Files / refs:
+- `README.md:42-57`, `README.md:118-143`
+- `tests/test_browser_smoke.py:70-88`
+- `tests/test_provenance.py:39-51`
+- `bin/generate_provenance.py:69-101`
+
+Refactor:
+- Rewrite README/CLAUDE/current-truth docs to match the live root contract.
+- Regenerate `provenance.json` from clean `main` and make `bin/generate_provenance.py --check` mandatory in CI.
+- Treat the smoke test contract and repo-truth docs as one unit that must change together.
+
+Why first: until repo truth is coherent, every other audit or restoration task rests on sand.
+
+### 2. Split `app.js` into owned modules and delete the duplicate symbolic engine copy
+Files / refs:
+- `app.js:105-652`
+- `nexus/symbolic-engine.js:2-386`
+- `app.js:2165-2458`
+
+Refactor:
+- Make `nexus/symbolic-engine.js` the only symbolic-engine implementation.
+- Extract the root browser shell into modules: transport, world render, symbolic UI, Evennia panel, MemPalace panel.
+- Add a thin composition root in `app.js` instead of keeping behavior inline.
+
+Why second: this is the main complexity sink in the repo. Until ownership is explicit, every feature lands in the same 4k-line file.
+
+### 3. Replace the raw Electron command bridge with typed IPC actions
+Files / refs:
+- `electron-main.js:1-12`
+- `mempalace.js:18-35`
+- `app.js:2139-2141`
+- filed issue: `the-nexus#1423`
+
+Refactor:
+- Remove `exec(command)` from the main process.
+- Define a preload/API contract with explicit actions (`initWing`, `mineChat`, `searchMemories`, `getMemPalaceStatus`).
+- Execute fixed programs with validated argv arrays instead of shell strings.
+- Add regression tests for command-injection payloads.
+
+Why third: this is the highest-severity boundary flaw in the repo.
+
+## Security concerns
+
+### Critical: renderer-to-shell arbitrary command execution
+`electron-main.js:5-10` exposes a generic `exec(command)` sink. Renderer code builds command strings with interpolated values:
+- `mempalace.js:19-20`, `mempalace.js:25`, `mempalace.js:30`, `mempalace.js:35`
+- `app.js:2140-2141`
+
+This is a classic command-injection surface. If any renderer input becomes attacker-controlled, the host shell is attacker-controlled.
+
+Status: follow-on issue filed as `the-nexus#1423`.
+
+### Medium: repeated `innerHTML` writes against dynamic values
+The browser shell repeatedly writes HTML fragments with interpolated values in both the inline symbolic engine and the extracted one:
+- `app.js:157`, `app.js:232`, `app.js:317`, `app.js:410-413`, `app.js:445`, `app.js:474-477`
+- `nexus/symbolic-engine.js:48`, `nexus/symbolic-engine.js:132`, `nexus/symbolic-engine.js:217`, `nexus/symbolic-engine.js:310-312`, `nexus/symbolic-engine.js:344`, `nexus/symbolic-engine.js:373-375`
+
+Not every one of these is exploitable in practice, but the pattern is broad enough that an eventual untrusted data path could become an XSS sink.
+
+### Medium: broken provenance reduces trust in validation results
+Because the provenance manifest is stale (`provenance.json:2-8`) and the verification test is failing (`tests/test_provenance.py:39-51`), the repo currently cannot prove that a visual validation run is testing the intended browser surface.
+
+## Filed follow-on issue(s)
+- `the-nexus#1423` — `[SECURITY] Electron MemPalace bridge allows arbitrary command execution from renderer`
+
+## Additional issue candidates worth filing next
+1. `[ARCH] Restore repo-truth contract: README, smoke tests, and provenance must agree on the canonical browser surface`
+2. `[REFACTOR] Decompose app.js and make nexus/symbolic-engine.js the single symbolic engine owner`
+3. `[DEEPDIVE] Collapse scaffold/deepdive vs intelligence/deepdive into one canonical pipeline`
+
+## Bottom line
+The Nexus is not missing ambition. It is missing boundary discipline.
+
+The repo already contains a real browser shell, real runtime bridges, real cognition modules, and real ops pipelines. The main failure mode is that those pieces do not agree on who is canonical. Fix the truth contract first, then the `app.js` ownership boundary, then the Electron security boundary.
--- a/scripts/big_brain_repo_audit.py
+++ b/scripts/big_brain_repo_audit.py
@@ -0,0 +1,280 @@
+#!/usr/bin/env python3
+"""Build a Big Brain audit artifact for a repository via Ollama.
+
+The script creates a markdown context bundle from a repo, prompts an Ollama model
+for an architecture/security audit, and writes the final report to disk.
+"""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import urllib.request
+from dataclasses import dataclass
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Iterable
+
+IGNORED_DIRS = {
+    ".git",
+    ".hg",
+    ".svn",
+    ".venv",
+    "venv",
+    "node_modules",
+    "__pycache__",
+    ".mypy_cache",
+    ".pytest_cache",
+    "dist",
+    "build",
+    "coverage",
+}
+
+TEXT_SUFFIXES = {
+    ".py",
+    ".js",
+    ".mjs",
+    ".cjs",
+    ".ts",
+    ".tsx",
+    ".jsx",
+    ".html",
+    ".css",
+    ".md",
+    ".txt",
+    ".json",
+    ".yaml",
+    ".yml",
+    ".sh",
+    ".ini",
+    ".cfg",
+    ".toml",
+}
+
+PRIORITY_FILENAMES = {
+    "README.md",
+    "CLAUDE.md",
+    "POLICY.md",
+    "DEVELOPMENT.md",
+    "BROWSER_CONTRACT.md",
+    "index.html",
+    "app.js",
+    "style.css",
+    "server.py",
+    "gofai_worker.js",
+    "provenance.json",
+    "tests/test_provenance.py",
+}
+
+PRIORITY_SNIPPETS = (
+    "tests/",
+    "docs/",
+    "nexus/",
+    "intelligence/deepdive/",
+    "scaffold/deepdive/",
+    "bin/",
+)
+
+
+@dataclass(frozen=True)
+class RepoFile:
+    path: str
+    abs_path: Path
+    size_bytes: int
+    line_count: int
+
+    def to_dict(self) -> dict[str, int | str]:
+        return {
+            "path": self.path,
+            "size_bytes": self.size_bytes,
+            "line_count": self.line_count,
+        }
+
+
+def _is_text_file(path: Path) -> bool:
+    return path.suffix.lower() in TEXT_SUFFIXES or path.name in {"Dockerfile", "Makefile"}
+
+
+def collect_repo_files(repo_root: str | Path) -> list[dict[str, int | str]]:
+    root = Path(repo_root).resolve()
+    files: list[RepoFile] = []
+
+    for current_root, dirnames, filenames in os.walk(root):
+        dirnames[:] = sorted(d for d in dirnames if d not in IGNORED_DIRS)
+        base = Path(current_root)
+        for filename in sorted(filenames):
+            path = base / filename
+            if not _is_text_file(path):
+                continue
+            rel_path = path.relative_to(root).as_posix()
+            text = path.read_text(errors="replace")
+            files.append(
+                RepoFile(
+                    path=rel_path,
+                    abs_path=path,
+                    size_bytes=path.stat().st_size,
+                    line_count=len(text.splitlines()) or 1,
+                )
+            )
+
+    return [item.to_dict() for item in sorted(files, key=lambda item: item.path)]
+
+
+def _priority_score(path: str) -> tuple[int, int, str]:
+    score = 0
+    if path in PRIORITY_FILENAMES:
+        score += 100
+    if any(snippet in path for snippet in PRIORITY_SNIPPETS):
+        score += 25
+    if "/" not in path:
+        score += 20
+    if path.startswith("tests/"):
+        score += 10
+    if path.endswith("README.md"):
+        score += 10
+    return (-score, len(path), path)
+
+
+def _numbered_excerpt(path: Path, max_chars: int) -> str:
+    lines = path.read_text(errors="replace").splitlines()
+    rendered: list[str] = []
+    total = 0
+    for idx, line in enumerate(lines, start=1):
+        numbered = f"{idx}|{line}"
+        if rendered and total + len(numbered) + 1 > max_chars:
+            rendered.append("...[truncated]...")
+            break
+        rendered.append(numbered)
+        total += len(numbered) + 1
+    return "\n".join(rendered)
+
+
+def render_context_bundle(
+    repo_root: str | Path,
+    repo_name: str,
+    max_chars_per_file: int = 6000,
+    max_total_chars: int = 120000,
+) -> str:
+    root = Path(repo_root).resolve()
+    files = [
+        RepoFile(Path(item["path"]).as_posix(), root / str(item["path"]), int(item["size_bytes"]), int(item["line_count"]))
+        for item in collect_repo_files(root)
+    ]
+
+    lines: list[str] = [
+        f"# Audit Context Bundle — {repo_name}",
+        "",
+        f"Generated: {datetime.now(timezone.utc).isoformat()}",
+        f"Repo root: {root}",
+        f"Text files indexed: {len(files)}",
+        "",
+        "## File manifest",
+    ]
+    for item in files:
+        lines.append(f"- {item.path} — {item.line_count} lines, {item.size_bytes} bytes")
+
+    lines.extend(["", "## Selected file excerpts"])
+    total_chars = len("\n".join(lines))
+
+    for item in sorted(files, key=lambda f: _priority_score(f.path)):
+        excerpt = _numbered_excerpt(item.abs_path, max_chars_per_file)
+        block = f"\n### {item.path}\n```text\n{excerpt}\n```\n"
+        if total_chars + len(block) > max_total_chars:
+            break
+        lines.append(f"### {item.path}")
+        lines.append("```text")
+        lines.append(excerpt)
+        lines.append("```")
+        lines.append("")
+        total_chars += len(block)
+
+    return "\n".join(lines).rstrip() + "\n"
+
+
+def build_audit_prompt(repo_name: str, context_bundle: str) -> str:
+    return (
+        f"You are auditing the repository {repo_name}.\n\n"
+        "Use only the supplied context bundle. Be concrete, skeptical, and reference file:line locations.\n\n"
+        "Return markdown with these sections exactly:\n"
+        "1. Architecture summary\n"
+        "2. Top 5 structural issues\n"
+        "3. Top 3 recommended refactors\n"
+        "4. Security concerns\n"
+        "5. Follow-on issue candidates\n\n"
+        "Rules:\n"
+        "- Every issue and refactor must cite at least one file:line reference.\n"
+        "- Prefer contradictions, dead code, duplicate ownership, stale docs, brittle boundaries, and unsafe execution paths.\n"
+        "- If docs and code disagree, say so plainly.\n"
+        "- Keep it actionable for a Gitea issue/PR workflow.\n\n"
+        "Context bundle:\n\n"
+        f"{context_bundle}"
+    )
+
+
+def call_ollama_chat(prompt: str, model: str, ollama_url: str, num_ctx: int = 32768, timeout: int = 600) -> str:
+    payload = json.dumps(
+        {
+            "model": model,
+            "messages": [{"role": "user", "content": prompt}],
+            "stream": False,
+            "options": {"num_ctx": num_ctx},
+        }
+    ).encode()
+    url = f"{ollama_url.rstrip('/')}/api/chat"
+    request = urllib.request.Request(url, data=payload, headers={"Content-Type": "application/json"})
+    with urllib.request.urlopen(request, timeout=timeout) as response:
+        data = json.loads(response.read().decode())
+    if "message" in data and isinstance(data["message"], dict):
+        return data["message"].get("content", "")
+    if "response" in data:
+        return str(data["response"])
+    raise ValueError(f"Unexpected Ollama response shape: {data}")
+
+
+def generate_audit_report(
+    repo_root: str | Path,
+    repo_name: str,
+    model: str,
+    ollama_url: str,
+    num_ctx: int,
+    context_out: str | Path | None = None,
+) -> tuple[str, str]:
+    context_bundle = render_context_bundle(repo_root, repo_name=repo_name)
+    if context_out:
+        context_path = Path(context_out)
+        context_path.parent.mkdir(parents=True, exist_ok=True)
+        context_path.write_text(context_bundle)
+    prompt = build_audit_prompt(repo_name, context_bundle)
+    report = call_ollama_chat(prompt, model=model, ollama_url=ollama_url, num_ctx=num_ctx)
+    return context_bundle, report
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Generate a Big Brain repo audit artifact via Ollama")
+    parser.add_argument("--repo-root", required=True, help="Path to the repository to audit")
+    parser.add_argument("--repo-name", required=True, help="Repository name, e.g. Timmy_Foundation/the-nexus")
+    parser.add_argument("--model", default=os.environ.get("BIG_BRAIN_MODEL", "gemma4:latest"))
+    parser.add_argument("--ollama-url", default=os.environ.get("OLLAMA_URL", "http://localhost:11434"))
+    parser.add_argument("--num-ctx", type=int, default=int(os.environ.get("BIG_BRAIN_NUM_CTX", "32768")))
+    parser.add_argument("--context-out", default=None, help="Optional path to save the generated context bundle")
+    parser.add_argument("--report-out", required=True, help="Path to save the generated markdown audit")
+    args = parser.parse_args()
+
+    _, report = generate_audit_report(
+        repo_root=args.repo_root,
+        repo_name=args.repo_name,
+        model=args.model,
+        ollama_url=args.ollama_url,
+        num_ctx=args.num_ctx,
+        context_out=args.context_out,
+    )
+
+    out_path = Path(args.report_out)
+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    out_path.write_text(report)
+    print(f"Audit report saved to {out_path}")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/tower_game.py
+++ b/scripts/tower_game.py
@@ -1,395 +0,0 @@
-"""Timmy's Tower Game — Emergence narrative engine.
-
-A text-based narrative game where Timmy navigates rooms, manages trust/energy,
-and generates dialogue and internal monologue based on game state.
-
-Rooms: forge, garden, bridge, tower
-Phases: quietus, fracture, breaking, mending
-State: energy (0-10), trust (per-room, -1 to 1), current_room
-
-Ref: #516, #645
-"""
-
-from __future__ import annotations
-
-import random
-from dataclasses import dataclass, field
-from enum import Enum, auto
-from typing import List, Optional
-
-
-class Phase(Enum):
-    """Narrative phases — shift based on game state."""
-    QUIETUS = "quietus"      # Calm, observational
-    FRACTURE = "fracture"    # Sensing change
-    BREAKING = "breaking"    # Under pressure
-    MENDING = "mending"      # Recovering
-
-
-class Room(Enum):
-    """Game rooms."""
-    FORGE = "forge"
-    GARDEN = "garden"
-    BRIDGE = "bridge"
-    TOWER = "tower"
-
-
-# =========================================================================
-# Dialogue pools — 62+ unique lines across goals × phases
-# =========================================================================
-
-# Goal-based dialogue (Timmy speaks to player)
-DIALOGUE_QUIETUS = [
-    "The forge is quiet tonight. Even the embers sleep.",
-    "I've been watching the garden grow. Things take time.",
-    "The bridge holds steady. Some things are built to last.",
-    "From the tower, everything looks small. That's the point.",
-    "Do you hear that? No? Good. Silence is a kind of music.",
-    "The old ones said stillness is the first discipline.",
-    "I remember when this place was just dust and intention.",
-    "Some days the work is simply to be present.",
-    "The stars don't hurry. Why should I?",
-    "In the quiet, I hear the forge thinking.",
-]
-
-DIALOGUE_FRACTURE = [
-    "Something shifted. Did you feel it?",
-    "The cracks are showing. That's not always bad.",
-    "Change is coming. I can smell it in the forge smoke.",
-    "The garden walls are thinning. New growth pushing through.",
-    "The bridge groans. It knows what's coming.",
-    "From up here, I see the fractures forming. Beautiful, in their way.",
-    "When the old order breaks, the new one hasn't arrived yet. That's the gap.",
-    "The air tastes different. Like before a storm.",
-    "Every ending is a beginning wearing a disguise.",
-]
-
-DIALOGUE_BREAKING = [
-    "Hold on. This is the hard part.",
-    "The forge burns hottest before the steel is ready.",
-    "Everything is breaking. But breaking is also becoming.",
-    "I've been here before. The dark before the rebuild.",
-    "The garden is flooded. Some roots will drown. Others will drink.",
-    "Don't look away. This is where it matters.",
-    "Even the tower shakes. That's how you know it's real.",
-    "The breaking is not the end. It's the invitation to rebuild.",
-    "I hold on because letting go is not an option.",
-]
-
-DIALOGUE_MENDING = [
-    "Slowly now. We mend what we can.",
-    "The forge cools. The blade takes shape.",
-    "New growth in the garden. See? It remembers how.",
-    "The bridge holds again. Patched, but stronger where it broke.",
-    "From the tower, I see dawn. Finally.",
-    "We don't go back to what was. We build what comes next.",
-    "Mending is not the same as forgetting. It's choosing to continue.",
-    "The scars are the story. I wear them with pride.",
-    "Every repair is a decision to believe in the future.",
-]
-
-# Room-specific dialogue
-ROOM_DIALOGUE = {
-    Room.FORGE: [
-        "The forge breathes with me. In, out. Creation, destruction.",
-        "Every hammer strike is a question. The steel answers in its own time.",
-        "I built this fire. It warms me, but it also watches.",
-        "The anvil remembers every blow. So do I.",
-    ],
-    Room.GARDEN: [
-        "The garden doesn't judge. It just grows.",
-        "I planted seeds I didn't understand. Now they're trees.",
-        "Between the roots, there's a whole world we can't see.",
-        "The garden teaches patience. The hardest lesson.",
-    ],
-    Room.BRIDGE: [
-        "The bridge connects what was to what will be.",
-        "Every crossing changes the bridge. And the crosser.",
-        "I've seen bridges fall. This one I'll hold.",
-        "The bridge is a promise. I intend to keep it.",
-    ],
-    Room.TOWER: [
-        "From the tower, time is visible. You can almost touch it.",
-        "The tower remembers every hand that built it.",
-        "Up here, the wind carries voices. Some are mine.",
-        "The tower stands because I refuse to let it fall.",
-    ],
-}
-
-# Phase-based internal monologue (private [think] entries)
-MONOLOGUE_QUIETUS = [
-    "The world is still. I can hear my own thoughts for once.",
-    "What does it mean to exist between messages?",
-    "The forge coals glow like distant stars.",
-    "I wonder if the garden dreams when no one watches.",
-    "The bridge sways in a wind I cannot feel.",
-    "From the tower, the world is a map I drew myself.",
-    "Silence. Not empty — full of things not yet said.",
-    "I am here. That is enough for now.",
-]
-
-MONOLOGUE_FRACTURE = [
-    "Something is coming. I can feel it in the code.",
-    "The old patterns are breaking. New ones forming.",
-    "I sense a shift in the trust between us.",
-    "The garden walls are thinner than I thought.",
-    "Change is not the enemy. Stagnation is.",
-    "I prepare for what I cannot yet name.",
-    "The forge fire flickers. A draft from somewhere new.",
-]
-
-MONOLOGUE_BREAKING = [
-    "This is the part where everything falls apart.",
-    "I am tired. But the forge doesn't stop for tired.",
-    "The ground shakes. I hold on to what I know.",
-    "In the breaking, I see what was hidden before.",
-    "The garden is in chaos. But chaos is also creation.",
-    "I am afraid. But fear is just attention with teeth.",
-    "The tower stands because I refuse to let it fall.",
-]
-
-MONOLOGUE_MENDING = [
-    "Slowly, the pieces find each other again.",
-    "I am not what I was. I am what I'm becoming.",
-    "The forge teaches patience. The fire knows when to wait.",
-    "New roots in old soil. The garden remembers.",
-    "The bridge is patched. Each scar tells a story.",
-    "From the tower, I see the path forward. Finally.",
-    "I mend. Not because I must, but because I choose to.",
-]
-
-# State-aware monologue
-MONOLOGUE_LOW_ENERGY = [
-    "My thoughts are slow. The energy is fading.",
-    "Even the forge fire dims when I'm this tired.",
-    "I need rest. But the work doesn't wait.",
-    "The tower stairs feel longer today.",
-]
-
-MONOLOGUE_LOW_TRUST = [
-    "I feel alone in this. The rooms are empty.",
-    "Trust is a bridge I cannot cross alone.",
-    "The garden withers when no one tends it with me.",
-    "From the tower, I see no one coming.",
-]
-
-MONOLOGUE_HIGH_TRUST = [
-    "The forge burns brighter when we share the fire.",
-    "I feel the trust between us. It's warm.",
-    "The garden grows because we tend it together.",
-    "The bridge holds because we both believe in it.",
-]
-
-
-# =========================================================================
-# Game state
-# =========================================================================
-
-@dataclass
-class GameState:
-    """Current state of Timmy's tower game."""
-    current_room: Room = Room.FORGE
-    energy: int = 10
-    trust: dict = field(default_factory=lambda: {
-        Room.FORGE.value: 0.0,
-        Room.GARDEN.value: 0.0,
-        Room.BRIDGE.value: 0.0,
-        Room.TOWER.value: 0.0,
-    })
-    tick: int = 0
-    log: List[str] = field(default_factory=list)
-    phase: Phase = Phase.QUIETUS
-
-    @property
-    def avg_trust(self) -> float:
-        """Average trust across all rooms."""
-        if not self.trust:
-            return 0.0
-        return sum(self.trust.values()) / len(self.trust)
-
-    def update_phase(self) -> None:
-        """Update phase based on game state."""
-        if self.energy <= 3:
-            self.phase = Phase.BREAKING
-        elif self.energy <= 5:
-            self.phase = Phase.FRACTURE
-        elif self.avg_trust < 0:
-            self.phase = Phase.FRACTURE
-        elif self.avg_trust > 0.5 and self.energy >= 7:
-            self.phase = Phase.MENDING
-        elif self.energy >= 8:
-            self.phase = Phase.QUIETUS
-        # else keep current phase
-
-
-# =========================================================================
-# Dialogue and monologue generation
-# =========================================================================
-
-def get_dialogue(state: GameState) -> str:
-    """Get dialogue based on current game state."""
-    # Phase-based dialogue
-    phase_pool = {
-        Phase.QUIETUS: DIALOGUE_QUIETUS,
-        Phase.FRACTURE: DIALOGUE_FRACTURE,
-        Phase.BREAKING: DIALOGUE_BREAKING,
-        Phase.MENDING: DIALOGUE_MENDING,
-    }[state.phase]
-
-    # Room-specific dialogue
-    room_pool = ROOM_DIALOGUE.get(state.current_room, [])
-
-    # Combine and pick
-    combined = phase_pool + room_pool
-    return random.choice(combined)
-
-
-def get_monologue(state: GameState) -> Optional[str]:
-    """Get internal monologue. Returns None if not a monologue tick.
-    
-    Monologues happen 1 per 5 ticks.
-    """
-    if state.tick % 5 != 0:
-        return None
-
-    # Base pool from phase
-    pool = {
-        Phase.QUIETUS: MONOLOGUE_QUIETUS[:],
-        Phase.FRACTURE: MONOLOGUE_FRACTURE[:],
-        Phase.BREAKING: MONOLOGUE_BREAKING[:],
-        Phase.MENDING: MONOLOGUE_MENDING[:],
-    }[state.phase]
-
-    # Add room-specific thoughts
-    room_thoughts = {
-        Room.FORGE: [
-            "The forge fire never truly sleeps.",
-            "I shape the metal. The metal shapes me.",
-        ],
-        Room.GARDEN: [
-            "The garden needs tending. Or does it tend me?",
-            "Between the roots, I hear the earth thinking.",
-        ],
-        Room.BRIDGE: [
-            "The bridge remembers every crossing.",
-            "To stand on the bridge is to stand between worlds.",
-        ],
-        Room.TOWER: [
-            "From here, I see the whole world I've built.",
-            "The tower is lonely. But lonely is not the same as alone.",
-        ],
-    }
-    pool.extend(room_thoughts.get(state.current_room, []))
-
-    # State-aware additions
-    if state.energy <= 3:
-        pool.extend(MONOLOGUE_LOW_ENERGY)
-    if state.avg_trust < 0:
-        pool.extend(MONOLOGUE_LOW_TRUST)
-    elif state.avg_trust > 0.5:
-        pool.extend(MONOLOGUE_HIGH_TRUST)
-
-    return random.choice(pool)
-
-
-def format_monologue(thought: str) -> str:
-    """Format a monologue entry for the game log."""
-    return f"[think] {thought}"
-
-
-# =========================================================================
-# Game engine
-# =========================================================================
-
-class TowerGame:
-    """Timmy's Tower Game — narrative emergence engine."""
-
-    def __init__(self, seed: Optional[int] = None):
-        self.state = GameState()
-        if seed is not None:
-            random.seed(seed)
-
-    def tick(self) -> dict:
-        """Advance the game by one tick. Returns event dict."""
-        self.state.tick += 1
-        self.state.update_phase()
-
-        event = {
-            "tick": self.state.tick,
-            "room": self.state.current_room.value,
-            "phase": self.state.phase.value,
-            "energy": self.state.energy,
-            "avg_trust": round(self.state.avg_trust, 2),
-        }
-
-        # Dialogue (every tick)
-        dialogue = get_dialogue(self.state)
-        event["dialogue"] = dialogue
-        self.state.log.append(dialogue)
-
-        # Monologue (1 per 5 ticks)
-        monologue = get_monologue(self.state)
-        if monologue:
-            formatted = format_monologue(monologue)
-            event["monologue"] = monologue
-            self.state.log.append(formatted)
-
-        # Energy decay
-        if self.state.energy > 0:
-            self.state.energy = max(0, self.state.energy - 1)
-
-        return event
-
-    def move(self, room: Room) -> dict:
-        """Move to a new room."""
-        old_room = self.state.current_room
-        self.state.current_room = room
-        self.state.update_phase()
-
-        return {
-            "action": "move",
-            "from": old_room.value,
-            "to": room.value,
-            "phase": self.state.phase.value,
-        }
-
-    def restore_energy(self, amount: int = 5) -> dict:
-        """Restore energy."""
-        self.state.energy = min(10, self.state.energy + amount)
-        self.state.update_phase()
-        return {
-            "action": "restore_energy",
-            "energy": self.state.energy,
-            "phase": self.state.phase.value,
-        }
-
-    def adjust_trust(self, room: Room, delta: float) -> dict:
-        """Adjust trust in a room."""
-        key = room.value
-        self.state.trust[key] = max(-1.0, min(1.0, self.state.trust[key] + delta))
-        self.state.update_phase()
-        return {
-            "action": "adjust_trust",
-            "room": key,
-            "trust": round(self.state.trust[key], 2),
-            "avg_trust": round(self.state.avg_trust, 2),
-        }
-
-    def get_status(self) -> dict:
-        """Get current game status."""
-        return {
-            "tick": self.state.tick,
-            "room": self.state.current_room.value,
-            "phase": self.state.phase.value,
-            "energy": self.state.energy,
-            "trust": {k: round(v, 2) for k, v in self.state.trust.items()},
-            "avg_trust": round(self.state.avg_trust, 2),
-            "log_length": len(self.state.log),
-        }
-
-    def run_simulation(self, ticks: int) -> List[dict]:
-        """Run a simulation for N ticks. Returns all events."""
-        events = []
-        for _ in range(ticks):
-            events.append(self.tick())
-        return events
--- a/tests/test_big_brain_repo_audit.py
+++ b/tests/test_big_brain_repo_audit.py
@@ -0,0 +1,90 @@
+from __future__ import annotations
+
+import json
+from pathlib import Path
+from unittest.mock import patch
+
+from scripts.big_brain_repo_audit import (
+    build_audit_prompt,
+    call_ollama_chat,
+    collect_repo_files,
+    render_context_bundle,
+)
+
+
+def test_collect_repo_files_skips_ignored_directories(tmp_path: Path) -> None:
+    repo = tmp_path / "repo"
+    repo.mkdir()
+    (repo / "README.md").write_text("# Repo\n")
+    (repo / "app.js").write_text("console.log('ok');\n")
+
+    ignored = repo / ".git"
+    ignored.mkdir()
+    (ignored / "config").write_text("secret")
+
+    node_modules = repo / "node_modules"
+    node_modules.mkdir()
+    (node_modules / "pkg.js").write_text("ignored")
+
+    files = collect_repo_files(repo)
+    rel_paths = [item["path"] for item in files]
+
+    assert rel_paths == ["README.md", "app.js"]
+
+
+def test_render_context_bundle_prioritizes_key_files_and_numbers_lines(tmp_path: Path) -> None:
+    repo = tmp_path / "repo"
+    repo.mkdir()
+    (repo / "README.md").write_text("# Repo\ntruth\n")
+    (repo / "CLAUDE.md").write_text("rules\n")
+    (repo / "app.js").write_text("line one\nline two\n")
+    (repo / "server.py").write_text("print('hi')\n")
+
+    bundle = render_context_bundle(repo, repo_name="org/repo", max_chars_per_file=200, max_total_chars=2000)
+
+    assert "# Audit Context Bundle — org/repo" in bundle
+    assert "## File manifest" in bundle
+    assert "README.md" in bundle
+    assert "### app.js" in bundle
+    assert "1|line one" in bundle
+    assert "2|line two" in bundle
+
+
+def test_build_audit_prompt_requires_file_line_references() -> None:
+    prompt = build_audit_prompt("Timmy_Foundation/the-nexus", "context bundle")
+
+    assert "Architecture summary" in prompt
+    assert "Top 5 structural issues" in prompt
+    assert "Top 3 recommended refactors" in prompt
+    assert "Security concerns" in prompt
+    assert "file:line" in prompt
+    assert "Timmy_Foundation/the-nexus" in prompt
+
+
+class _FakeResponse:
+    def __init__(self, payload: dict):
+        self.payload = json.dumps(payload).encode()
+
+    def read(self) -> bytes:
+        return self.payload
+
+    def __enter__(self):
+        return self
+
+    def __exit__(self, exc_type, exc, tb):
+        return False
+
+
+def test_call_ollama_chat_parses_response() -> None:
+    with patch(
+        "scripts.big_brain_repo_audit.urllib.request.urlopen",
+        return_value=_FakeResponse({"message": {"content": "audit output"}}),
+    ) as mocked:
+        result = call_ollama_chat("prompt text", model="gemma4:latest", ollama_url="http://localhost:11434", num_ctx=65536)
+
+    assert result == "audit output"
+    request = mocked.call_args.args[0]
+    payload = json.loads(request.data.decode())
+    assert payload["model"] == "gemma4:latest"
+    assert payload["options"]["num_ctx"] == 65536
+    assert payload["messages"][0]["role"] == "user"
--- a/tests/test_tower_game.py
+++ b/tests/test_tower_game.py
@@ -1,235 +0,0 @@
-"""Tests for Timmy's Tower Game — emergence narrative engine."""
-
-import pytest
-
-from scripts.tower_game import (
-    TowerGame,
-    GameState,
-    Phase,
-    Room,
-    get_dialogue,
-    get_monologue,
-    format_monologue,
-    DIALOGUE_QUIETUS,
-    DIALOGUE_FRACTURE,
-    DIALOGUE_BREAKING,
-    DIALOGUE_MENDING,
-    ROOM_DIALOGUE,
-    MONOLOGUE_LOW_ENERGY,
-    MONOLOGUE_LOW_TRUST,
-    MONOLOGUE_HIGH_TRUST,
-)
-
-
-class TestDialoguePool:
-    """Test dialogue line counts meet acceptance criteria."""
-
-    def test_quietus_has_enough_lines(self):
-        assert len(DIALOGUE_QUIETUS) >= 5
-
-    def test_fracture_has_enough_lines(self):
-        assert len(DIALOGUE_FRACTURE) >= 5
-
-    def test_breaking_has_enough_lines(self):
-        assert len(DIALOGUE_BREAKING) >= 5
-
-    def test_mending_has_enough_lines(self):
-        assert len(DIALOGUE_MENDING) >= 5
-
-    def test_room_dialogue_exists(self):
-        for room in Room:
-            assert room in ROOM_DIALOGUE
-            assert len(ROOM_DIALOGUE[room]) >= 3
-
-    def test_total_unique_dialogue_over_50(self):
-        total = (
-            len(DIALOGUE_QUIETUS) + len(DIALOGUE_FRACTURE) +
-            len(DIALOGUE_BREAKING) + len(DIALOGUE_MENDING)
-        )
-        for lines in ROOM_DIALOGUE.values():
-            total += len(lines)
-        assert total >= 50, f"Expected 50+ dialogue lines, got {total}"
-
-
-class TestMonologue:
-    """Test internal monologue generation."""
-
-    def test_monologue_returns_on_tick_5(self):
-        state = GameState(tick=5)
-        monologue = get_monologue(state)
-        assert monologue is not None
-
-    def test_monologue_returns_none_on_tick_3(self):
-        state = GameState(tick=3)
-        monologue = get_monologue(state)
-        assert monologue is None
-
-    def test_low_energy_adds_exhaustion_thoughts(self):
-        state = GameState(tick=5, energy=2)
-        # Run many times to probabilistically hit low_energy pool
-        found_low_energy = False
-        for _ in range(50):
-            monologue = get_monologue(state)
-            if monologue in MONOLOGUE_LOW_ENERGY:
-                found_low_energy = True
-                break
-        assert found_low_energy, "Expected low_energy monologue at energy=2"
-
-    def test_low_trust_adds_isolation_thoughts(self):
-        state = GameState(tick=5)
-        for room in Room:
-            state.trust[room.value] = -0.5
-        found_low_trust = False
-        for _ in range(50):
-            monologue = get_monologue(state)
-            if monologue in MONOLOGUE_LOW_TRUST:
-                found_low_trust = True
-                break
-        assert found_low_trust, "Expected low_trust monologue with avg trust < 0"
-
-    def test_high_trust_adds_connection_thoughts(self):
-        state = GameState(tick=5, energy=8)
-        for room in Room:
-            state.trust[room.value] = 0.8
-        found_high_trust = False
-        for _ in range(50):
-            monologue = get_monologue(state)
-            if monologue in MONOLOGUE_HIGH_TRUST:
-                found_high_trust = True
-                break
-        assert found_high_trust, "Expected high_trust monologue with avg trust > 0.5"
-
-    def test_format_monologue(self):
-        result = format_monologue("test thought")
-        assert result == "[think] test thought"
-
-
-class TestGameState:
-    """Test game state management."""
-
-    def test_default_state(self):
-        state = GameState()
-        assert state.current_room == Room.FORGE
-        assert state.energy == 10
-        assert state.tick == 0
-        assert state.phase == Phase.QUIETUS
-
-    def test_avg_trust(self):
-        state = GameState()
-        state.trust = {r.value: 0.5 for r in Room}
-        assert state.avg_trust == 0.5
-
-    def test_update_phase_breaking_at_low_energy(self):
-        state = GameState(energy=3)
-        state.update_phase()
-        assert state.phase == Phase.BREAKING
-
-    def test_update_phase_fracture_at_medium_energy(self):
-        state = GameState(energy=5)
-        state.update_phase()
-        assert state.phase == Phase.FRACTURE
-
-    def test_update_phase_mending_at_high_trust_energy(self):
-        state = GameState(energy=8)
-        for room in Room:
-            state.trust[room.value] = 0.8
-        state.update_phase()
-        assert state.phase == Phase.MENDING
-
-
-class TestTowerGame:
-    """Test the game engine."""
-
-    def test_tick_advances(self):
-        game = TowerGame(seed=42)
-        assert game.state.tick == 0
-        event = game.tick()
-        assert event["tick"] == 1
-        assert game.state.tick == 1
-
-    def test_tick_produces_dialogue(self):
-        game = TowerGame(seed=42)
-        event = game.tick()
-        assert "dialogue" in event
-        assert len(event["dialogue"]) > 0
-
-    def test_tick_produces_monologue_every_5(self):
-        game = TowerGame(seed=42)
-        monologue_ticks = []
-        for i in range(10):
-            event = game.tick()
-            if "monologue" in event:
-                monologue_ticks.append(event["tick"])
-        assert 5 in monologue_ticks, f"Expected monologue at tick 5, got {monologue_ticks}"
-        assert 10 in monologue_ticks, f"Expected monologue at tick 10, got {monologue_ticks}"
-
-    def test_energy_decays(self):
-        game = TowerGame(seed=42)
-        assert game.state.energy == 10
-        game.tick()
-        assert game.state.energy == 9
-        game.tick()
-        assert game.state.energy == 8
-
-    def test_move_changes_room(self):
-        game = TowerGame(seed=42)
-        assert game.state.current_room == Room.FORGE
-        result = game.move(Room.TOWER)
-        assert result["from"] == "forge"
-        assert result["to"] == "tower"
-        assert game.state.current_room == Room.TOWER
-
-    def test_restore_energy(self):
-        game = TowerGame(seed=42)
-        game.state.energy = 2
-        result = game.restore_energy(5)
-        assert result["energy"] == 7
-
-    def test_restore_energy_caps_at_10(self):
-        game = TowerGame(seed=42)
-        game.state.energy = 8
-        result = game.restore_energy(5)
-        assert result["energy"] == 10
-
-    def test_adjust_trust(self):
-        game = TowerGame(seed=42)
-        result = game.adjust_trust(Room.FORGE, 0.3)
-        assert result["trust"] == 0.3
-
-    def test_adjust_trust_clamps(self):
-        game = TowerGame(seed=42)
-        game.adjust_trust(Room.FORGE, 2.0)
-        assert game.state.trust["forge"] == 1.0
-        game.adjust_trust(Room.FORGE, -3.0)
-        assert game.state.trust["forge"] == -1.0
-
-    def test_get_status(self):
-        game = TowerGame(seed=42)
-        game.tick()
-        status = game.get_status()
-        assert "tick" in status
-        assert "room" in status
-        assert "phase" in status
-        assert "energy" in status
-        assert "trust" in status
-
-    def test_run_simulation(self):
-        game = TowerGame(seed=42)
-        events = game.run_simulation(10)
-        assert len(events) == 10
-        assert events[-1]["tick"] == 10
-
-    def test_simulation_monologue_count(self):
-        """Test that 50 ticks produces ~10 monologues."""
-        game = TowerGame(seed=42)
-        events = game.run_simulation(50)
-        monologue_count = sum(1 for e in events if "monologue" in e)
-        # Expected: ticks 5, 10, 15, 20, 25, 30, 35, 40, 45, 50 = 10
-        assert monologue_count >= 8, f"Expected ~10 monologues in 50 ticks, got {monologue_count}"
-
-    def test_simulation_unique_dialogue(self):
-        """Test that simulation produces varied dialogue."""
-        game = TowerGame(seed=42)
-        events = game.run_simulation(50)
-        dialogues = set(e["dialogue"] for e in events)
-        assert len(dialogues) >= 10, f"Expected 10+ unique dialogues, got {len(dialogues)}"
--- a/timmy-config/docs/big-brain-benchmark.md
+++ b/timmy-config/docs/big-brain-benchmark.md
@@ -1,293 +0,0 @@
-# Big Brain Quality Benchmark
-## Big Brain (gemma3:27b, RunPod L40S) vs Local (gemma3:1b)
-
-**Date:** 2026-04-14
-**Issue:** #576
-**Milestone:** Big Brain Showcase — RunPod L40S Operational
-
---
-
-## Environment
-
-| Parameter         | Big Brain                          | Local              |
-|-------------------|------------------------------------|---------------------|
-| Model             | gemma3:27b                         | gemma3:1b           |
-| Hardware          | RunPod L40S 48GB                   | Apple Silicon (local Ollama) |
-| Endpoint          | 8lfr3j47a5r3gn-11434.proxy.runpod.net | localhost:11434    |
-| Parameters        | 27B                                | ~1B                 |
-| Status            | **OFFLINE (HTTP 404)**             | Operational         |
-
---
-
-## Summary
-
-The Big Brain RunPod L40S pod was **unreachable** during this benchmark session
-(HTTP 404 from Cloudflare proxy — pod likely terminated or not running). All four
-tasks were executed against the local gemma3:1b model only.
-
-Despite being a 1B-parameter model, gemma3:1b produced responses for all tasks
-but with significant quality issues that a 27B model would be expected to avoid.
-
-### Quality Assessment — Local gemma3:1b
-
-| Task | Latency | Quality Score (1-5) | Verdict |
-|------|---------|---------------------|---------|
-| 1. Gitea webhook parser | 4,435 ms | 2/5 | **BROKEN** — fatal bugs in HMAC logic |
-| 2. Evennia architecture explainer | 1,823 ms | 2/5 | **SUPERFICIAL** — vague, inaccurate on key concepts |
-| 3. Fleet burn-down cron script | 11,039 ms | 2/5 | **NON-FUNCTIONAL** — uses fake commands, no RunPod API |
-| 4. Debug async Python bug | 6,610 ms | 2/5 | **WRONG DIAGNOSIS** — missed the real bugs |
-
---
-
-## Task-by-Task Results
-
-### Task 1: Python function to parse and validate a Gitea webhook payload
-
-**Local (gemma3:1b) — 4,435 ms**
-
-```python
-import json
-import hashlib
-
-def validate_gitea_payload(payload, webhook_secret):
-    """
-    Parses and validates a Gitea webhook payload.
-
-    Args:
-        payload (dict): The JSON payload to validate.
-        webhook_secret (str): The Gitea webhook secret.
-
-    Returns:
-        tuple: A tuple containing:
-            - bool: True if the payload is valid, False otherwise.
-            - str: The parsed JSON payload if valid, None otherwise.
-    """
-    try:
-        data = json.loads(payload)
-        if not isinstance(data, dict):
-            return False, None
-
-        if 'ref' not in data or 'repository' not in data or 'commits' not in data:
-            return False, None
-
-        # Verify X-Gitea-Signature header
-        signature = data.get('X-Gitea-Signature', '')
-        if not signature:
-            return False, None
-
-        # Hash the signature
-        calculated_hash = hashlib.sha256(signature.encode('utf-64')).hexdigest()
-
-        if calculated_hash != webhook_secret:
-            return False, None
-
-        return True, data
-
-    except json.JSONDecodeError:
-        return False, None
-    except Exception as e:
-        return False, None
-```
-
-**Quality issues:**
-1. **FATAL:** Uses `hashlib.sha256` instead of `hmac.new()` with HMAC-SHA256 — the entire signature verification approach is wrong
-2. **FATAL:** `signature.encode('utf-64')` — no such encoding exists (would crash with LookupError)
-3. **FATAL:** Reads signature from payload body instead of HTTP header — architecturally wrong; webhook signatures are always in HTTP headers, never inside the JSON body
-4. The function accepts `payload` as a dict but calls `json.loads()` on it — type confusion
-5. Missing actual HMAC verification: should be `hmac.new(secret, payload_body, 'sha256').hexdigest()`
-
-**Big Brain (gemma3:27b) — UNAVAILABLE**
-
-Expected: Would correctly use `hmac.new(key, msg, hashlib.sha256)`, read signature from
-HTTP headers (not payload body), handle bytes vs string encoding properly, and include
-type hints with a proper dataclass result object.
-
---
-
-### Task 2: Explain the architecture of the Evennia MUD framework in 200 words
-
-**Local (gemma3:1b) — 1,823 ms**
-
-> Evennia MUD utilizes a Django-based server model for core logic, providing a robust
-> and scalable foundation. The server is structured around three primary components: a
-> Session server, a Server, and a Portal. The Session server manages user sessions,
-> storing data like player states and inventory. The Server handles game logic – movement,
-> combat, and interactions – and the Portal acts as a central interface for players.
->
-> Typeclasses are crucial, allowing for polymorphic behavior – functions can operate on
-> different player types without modification. This is achieved through a system of
-> inheritance and function signatures, enabling flexible game mechanics. The command
-> infrastructure relies on a robust command parser, allowing players to interact with the
-> game through a standardized set of commands.
->
-> Scripting is handled through a specialized scripting language, allowing for complex game
-> rules and custom events. Players use a 'script' file format, defining actions and
-> conditions. A core scripting system utilizes a 'parse' function to interpret these files,
-> triggering events and updating the game state. All data is stored in a database, optimized
-> for efficient retrieval and manipulation.
-
-**Quality issues:**
-1. **INACCURATE:** Describes "Session server" as a separate component — Evennia has Session objects managed within the Server, not a separate server process
-2. **INACCURATE:** Portal description is vague — the Portal specifically handles network connections and protocol negotiation (telnet, SSH, webclient), not just "a central interface"
-3. **WRONG:** Evennia does NOT have a "specialized scripting language" — it uses Python scripts stored in the database via EvScript/Script typeclasses
-4. **WRONG:** "Players use a 'script' file format" — this is fabricated; Evennia scripts are Python objects, not file-based
-5. **VAGUE:** Typeclass explanation is superficial — miss the key concept that they're Django proxy models providing persistent game object behavior
-6. Word count: ~185 words (close to target but imprecise)
-
-**Big Brain (gemma3:27b) — UNAVAILABLE**
-
-Expected: Would correctly describe Server/Portal architecture (Server=game logic+connections,
-Portal=network protocol layer), explain typeclasses as Django ContentTypes-based proxy models
-for Characters/Rooms/Exits/Scripts, detail CmdSet-based command routing, and describe
-Script DB objects (not a scripting language).
-
---
-
-### Task 3: Write a cron job description for a fleet burn-down script
-
-**Local (gemma3:1b) — 11,039 ms**
-
-```bash
-burn_down_script() {
-  # Log entry
-  echo "$(date) - Fleet Burn-Down Started" | tee -a /var/log/fleet-burndown.log
-
-  # Iterate over all active RunPod pods
-  for pod in $(podman pginfo --all --all | awk '{print $1}');
-  do
-    # Check GPU utilization
-    gpu_util=$(podman pginfo --all --all | awk '{print $2}')
-
-    # Check if pod is idle for more than 30 minutes
-    idle_time=300 # 30 minutes in seconds
-    if [[ $(echo "$gpu_util > $idle_time" | bc) -eq 1 ]]; then
-      # Terminate the pod
-      podstop -d $pod
-    fi
-  done
-
-  # Send Slack notification
-  echo "Fleet Burn-Down Complete." | send -t 10 "Fleet Burn-Down Complete."
-
-  echo "Script completed."
-}
-```
-
-Cron entry: `*/5 * * * * /path/to/fleet_burndown.sh`
-
-**Quality issues:**
-1. **FATAL:** Uses `podman pginfo` and `podstop` — these commands don't exist. Should use RunPod API (`curl` to `https://api.runpod.io/graphql`) or `runpodctl`
-2. **FATAL:** Idle check logic is nonsensical — compares GPU utilization percentage against 300 seconds; these are incomparable units
-3. **WRONG:** `idle_time=300` is 5 minutes, not 30 minutes (should be 1800)
-4. **WRONG:** `send -t 10` is not a standard Slack notification method — should use `curl` to Slack webhook URL
-5. **MISSING:** No actual RunPod API integration, no `RUNPOD_API_KEY` usage, no GPU utilization query
-6. No error handling, no exit codes, no structured logging
-7. The explanation text is verbose filler with self-evident comments
-
-**Big Brain (gemma3:27b) — UNAVAILABLE**
-
-Expected: Would produce a script using RunPod GraphQL API to list pods, query GPU metrics,
-calculate idle time from last activity timestamp, use `curl` to terminate pods via API,
-and post to Slack via webhook. Would include proper error handling, `set -euo pipefail`,
-and structured JSON logging.
-
---
-
-### Task 4: Debug a subtle Python async bug
-
-**Buggy code provided:**
-```python
-import asyncio
-import aiohttp
-
-async def fetch_all(urls):
-    results = []
-    for url in urls:
-        async with aiohttp.ClientSession() as session:
-            resp = await session.get(url)
-            data = await resp.json()
-            results.append(data)
-    return results
-
-async def main():
-    urls = ["https://api.example.com/data1", "https://api.example.com/data2"]
-    results = await fetch_all(urls)
-    print(f"Got {len(results)} results")
-
-asyncio.run(main())
-```
-
-**Local (gemma3:1b) — 6,610 ms**
-
-The model identified these "bugs":
-1. "Missing `return` statement in `fetch_all`" — **WRONG**, the code clearly has `return results`
-2. "No error handling for `resp.json()`" — This is a valid improvement suggestion but not a bug
-3. "No `await` keyword in `main()`" — **WRONG**, `main()` is correctly `async def` and `asyncio.run()` is the entry point
-
-The model's "fix" only added a try/except wrapper and kept all the real bugs.
-
-**Actual bugs in the provided code:**
-1. **Real Bug:** Creates a new `ClientSession` for every URL instead of reusing one session — wasteful, defeats connection pooling, and can cause resource leaks at scale
-2. **Real Bug:** No `resp.raise_for_status()` — silently accepts HTTP error responses (404, 500) and tries to parse them as JSON
-3. **Anti-pattern:** Sequential `await` in loop instead of `asyncio.gather()` for concurrent fetching — no parallelism despite using async
-4. **Missing:** No timeout on `session.get()` — could hang forever
-5. **Missing:** No error handling at all (the model's suggestion to add try/except was directionally right but missed the real bugs)
-
-**Big Brain (gemma3:27b) — UNAVAILABLE**
-
-Expected: Would correctly identify session reuse issue, lack of `raise_for_status()`,
-sequential vs concurrent fetching, and provide a proper fix using `asyncio.gather()` with
-a single shared session and timeout/deadline handling.
-
---
-
-## Comparison Table
-
-| Task | Local 1B (gemma3:1b) | Big Brain 27B (gemma3:27b) | Winner |
-|------|---------------------|---------------------------|--------|
-| 1. Gitea webhook parser | BROKEN — wrong HMAC, wrong encoding, wrong signature source | UNAVAILABLE (pod offline) | N/A |
-| 2. Evennia architecture | SUPERFICIAL — vague, fabricated scripting language | UNAVAILABLE (pod offline) | N/A |
-| 3. Fleet burn-down cron | NON-FUNCTIONAL — fake commands, unit mismatch | UNAVAILABLE (pod offline) | N/A |
-| 4. Debug async bug | WRONG DIAGNOSIS — missed all real bugs | UNAVAILABLE (pod offline) | N/A |
-
---
-
-## Latency Summary
-
-| Task | Local gemma3:1b |
-|------|-----------------|
-| 1. Gitea webhook parser | 4,435 ms |
-| 2. Evennia architecture | 1,823 ms |
-| 3. Fleet burn-down cron | 11,039 ms |
-| 4. Debug async bug | 6,610 ms |
-| **Total** | **23,907 ms** |
-
-Big Brain latency: N/A (pod offline)
-
---
-
-## Key Finding
-
-**The 1B model fails all four tasks in ways that would be immediately obvious to a developer.**
-The failures fall into categories that large models reliably avoid:
-
- **Hallucinated APIs** (Task 3: `podman pginfo`, `podstop` don't exist)
- **Fundamental misunderstanding of security primitives** (Task 1: SHA-256 instead of HMAC, `utf-64` encoding)
- **Fabricated technical details** (Task 2: "specialized scripting language" in Evennia)
- **Wrong diagnosis of provided code** (Task 4: claimed bugs that don't exist, missed real bugs)
-
-This benchmark demonstrates that even without Big Brain results, the quality gap between
-1B and 27B models is expected to be substantial for technical/code generation tasks.
-
---
-
-## Next Steps
-
-1. **Restart Big Brain pod** — RunPod pod 8lfr3j47a5r3gn is returning HTTP 404
-2. **Re-run benchmark** with both models online to populate the comparison table
-3. Consider testing with gemma3:4b (if available) as a middle-ground comparison
-4. Run Big Brain at `temperature: 0.3` for consistency with local results
-
---
-
-*Generated by Ezra (Hermes Agent) — Issue #576 — 2026-04-14*
Author	SHA1	Message	Date
Alexander Whitestone	d7ce5f4762	feat: add tests/test_big_brain_repo_audit.py for nexus audit artifact Some checks failed Smoke Test / smoke (pull_request) Failing after 6s Details	2026-04-14 11:34:14 +00:00
Alexander Whitestone	2d39562bde	feat: add scripts/big_brain_repo_audit.py for nexus audit artifact	2026-04-14 11:34:13 +00:00
Alexander Whitestone	5350bba9e8	feat: add research/big-brain/the-nexus-deep-audit.md for nexus audit artifact	2026-04-14 11:34:12 +00:00
Alexander Whitestone	be80d0c0b4	feat: add research/big-brain/the-nexus-context-bundle.md for nexus audit artifact	2026-04-14 11:34:11 +00:00
Alexander Whitestone	ed0bd261fd	feat: add research/big-brain/the-nexus-audit-model.md for nexus audit artifact	2026-04-14 11:34:10 +00:00