Compare commits
5 Commits
fix/562
...
whip/575-1
| Author | SHA1 | Date | |
|---|---|---|---|
| d7ce5f4762 | |||
| 2d39562bde | |||
| 5350bba9e8 | |||
| be80d0c0b4 | |||
| ed0bd261fd |
61
research/big-brain/the-nexus-audit-model.md
Normal file
61
research/big-brain/the-nexus-audit-model.md
Normal file
@@ -0,0 +1,61 @@
|
||||
Based on the provided context, I have analyzed the files to identify key themes, technological stacks, and architectural patterns.
|
||||
|
||||
Here is a structured summary and analysis of the codebase.
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Codebase Analysis Summary
|
||||
|
||||
The codebase appears to be highly specialized in integrating multiple domains for complex automation, mimicking a simulation or state-machine management system. The technologies used suggest a modern, robust, and possibly multi-threaded backend system.
|
||||
|
||||
### 🧩 Core Functionality & Domain Focus
|
||||
1. **State Management & Simulation:** The system tracks a state machine or simulation flow, suggesting discrete states and transitions.
|
||||
2. **Interaction Handling:** There is explicit logic for handling user/input events, suggesting an event-driven architecture.
|
||||
3. **Persistence/Logging:** State and event logging are crucial for debugging, implying robust state tracking.
|
||||
4. **Service Layer:** The structure points to well-defined services or modules handling specific domain logic.
|
||||
|
||||
### 💻 Technology Stack & Language
|
||||
The presence of Python-specific constructs (e.g., `unittest`, file paths) strongly indicates **Python** is the primary language.
|
||||
|
||||
### 🧠 Architectural Patterns
|
||||
* **Dependency Injection/Service Locators:** Implied by how components interact with services.
|
||||
* **Singleton Pattern:** Suggests critical shared resources or state managers.
|
||||
* **State Pattern:** The core logic seems centered on managing `CurrentState` and `NextState` transitions.
|
||||
* **Observer/Publisher-Subscriber:** Necessary for decoupling event emitters from event handlers.
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Key Insights & Focus Areas
|
||||
|
||||
### 1. State Machine Implementation
|
||||
* **Concept:** The core logic revolves around managing state transitions (e.g., `CurrentState` $\rightarrow$ `NextState`).
|
||||
* **Significance:** This is the central control flow. All actions must be validated against the current state.
|
||||
* **Areas to Watch:** Potential for infinite loops or missing transition logic errors.
|
||||
|
||||
### 2. Event Handling
|
||||
* **Concept:** The system relies on emitting and subscribing to events.
|
||||
* **Significance:** This decouples the state transition logic from the effectors. When a state changes, it triggers associated actions.
|
||||
* **Areas to Watch:** Ensuring all necessary listeners are registered and cleaned up properly.
|
||||
|
||||
### 3. State Persistence & Logging
|
||||
* **Concept:** Maintaining a history or current state representation is critical.
|
||||
* **Significance:** Provides auditability and debugging capabilities.
|
||||
* **Areas to Watch:** Thread safety when multiple threads/processes attempt to read/write the state concurrently.
|
||||
|
||||
### 4. Dependency Management
|
||||
* **Concept:** The system needs to gracefully manage its dependencies.
|
||||
* **Significance:** Ensures testability and modularity.
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Suggestions for Improvement (Refactoring & Hardening)
|
||||
|
||||
These suggestions are based on general best practices for complex, stateful systems.
|
||||
|
||||
1. **Use of an Event Bus Pattern:** If the system is becoming large, formalize the communication using a dedicated `EventBus` singleton class to centralize all event emission/subscription logic.
|
||||
2. **State Machine Definition:** Define states and transitions using an **Enum** or a **Dictionary** mapping, rather than using conditional checks (`if current_state == ...`). This makes the state graph explicit and enforces compile-time checks for invalid transitions.
|
||||
3. **Thread Safety:** If state changes can happen from multiple threads, ensure that any write operation to the global state or shared resources is protected by a **Lock** (`threading.Lock` in Python).
|
||||
4. **Dependency Graph Visualization:** Diagramming the relationships between major components will clarify dependencies, which is crucial for onboarding new developers.
|
||||
|
||||
---
|
||||
*Since no specific goal or question was given, this analysis provides a comprehensive overview, identifying the core architectural patterns and areas for robustness improvements.*
|
||||
2892
research/big-brain/the-nexus-context-bundle.md
Normal file
2892
research/big-brain/the-nexus-context-bundle.md
Normal file
File diff suppressed because it is too large
Load Diff
161
research/big-brain/the-nexus-deep-audit.md
Normal file
161
research/big-brain/the-nexus-deep-audit.md
Normal file
@@ -0,0 +1,161 @@
|
||||
# The Nexus Deep Audit
|
||||
|
||||
Date: 2026-04-14
|
||||
Target repo: Timmy_Foundation/the-nexus
|
||||
Audited commit: `dfbd96f7927a377c40ccb488238f5e2b69b033ba`
|
||||
Audit artifact issue: timmy-home#575
|
||||
Follow-on issue filed: the-nexus#1423
|
||||
Supporting artifacts:
|
||||
- `research/big-brain/the-nexus-context-bundle.md`
|
||||
- `research/big-brain/the-nexus-audit-model.md`
|
||||
- `scripts/big_brain_repo_audit.py`
|
||||
|
||||
## Method
|
||||
- Cloned `Timmy_Foundation/the-nexus` at clean `main`.
|
||||
- Indexed 403 text files and ~38.2k LOC (Python-heavy backend plus a substantial browser shell).
|
||||
- Generated a long-context markdown bundle with `scripts/big_brain_repo_audit.py`.
|
||||
- Ran the bundle through local Ollama (`gemma4:latest`) and then manually verified every claim against source and tests.
|
||||
- Validation commands run during audit:
|
||||
- `python3 bin/generate_provenance.py --check` → failed with 7 changed contract files
|
||||
- `pytest -q tests/test_provenance.py` → 1 failed / 5 passed
|
||||
|
||||
## Architecture summary
|
||||
The repo is no longer a narrow "Python cognition only" shell. Current `main` is a mixed system with four active layers:
|
||||
|
||||
1. Browser world / operator shell at repo root
|
||||
- `index.html`, `app.js`, `style.css`, `boot.js`, `gofai_worker.js`, `portals.json`, `vision.json`
|
||||
- Playwright smoke tests explicitly treat these files as the live browser contract (`tests/test_browser_smoke.py:70-88`).
|
||||
|
||||
2. Local bridge / runtime surface
|
||||
- `server.py` runs the WebSocket gateway for the browser shell (`server.py:1-123`).
|
||||
- `electron-main.js` adds a desktop shell / IPC path (`electron-main.js:1-12`).
|
||||
|
||||
3. Python cognition + world adapters under `nexus/`
|
||||
- Mnemosyne archive, A2A card/server/client, Evennia bridge, Morrowind/Bannerlord harnesses.
|
||||
- The archive alone is a significant subsystem (`nexus/mnemosyne/archive.py:21-220`).
|
||||
|
||||
4. Separate intelligence / ops stacks
|
||||
- `intelligence/deepdive/` claims a complete sovereign briefing pipeline (`intelligence/deepdive/README.md:30-43`).
|
||||
- `bin/`, `scripts/`, `docs/`, and `scaffold/` contain a second large surface area of ops tooling, scaffolds, and KT artifacts.
|
||||
|
||||
Net: this is a hybrid browser shell + orchestration + research/ops monorepo. The biggest architectural problem is not missing capability. It is unclear canonical ownership.
|
||||
|
||||
## Top 5 structural issues / code smells
|
||||
|
||||
### 1. Repo truth is internally contradictory
|
||||
`README.md` still says current `main` does not contain a root frontend and that serving the repo root only yields a directory listing (`README.md:42-57`, `README.md:118-143`). That is directly contradicted by:
|
||||
- the actual root files present in the checkout (`index.html`, `app.js`, `style.css`, `gofai_worker.js`)
|
||||
- browser contract tests that require those exact files to be served (`tests/test_browser_smoke.py:70-88`)
|
||||
- provenance tests that treat those root frontend files as canonical (`tests/test_provenance.py:54-65`)
|
||||
|
||||
Impact: contributors cannot trust the repo's own description of what is canonical. The docs are actively steering people away from the code that tests say is real.
|
||||
|
||||
### 2. The provenance contract is stale and currently broken on `main`
|
||||
The provenance system is supposed to prove the browser surface came from a clean checkout (`bin/generate_provenance.py:19-39`, `tests/test_provenance.py:39-51`). But the committed manifest was generated from a dirty feature branch, not clean `main` (`provenance.json:2-8`). On current `main`, the contract is already invalid:
|
||||
- `python3 bin/generate_provenance.py --check` fails on 7 files
|
||||
- `pytest -q tests/test_provenance.py` fails on `test_provenance_hashes_match`
|
||||
|
||||
Impact: the repo's own anti-ghost-world safety mechanism no longer signals truth. That weakens every future visual validation claim.
|
||||
|
||||
### 3. `app.js` is a 4k-line god object with duplicate module ownership
|
||||
`app.js` imports the symbolic engine module (`app.js:105-109`) and then immediately redefines the same classes inline (`app.js:111-652`). The duplicated classes also exist in `nexus/symbolic-engine.js:2-386`.
|
||||
|
||||
This means the symbolic layer has at least two owners:
|
||||
- canonical-looking module: `nexus/symbolic-engine.js`
|
||||
- actual inlined implementation: `app.js:111-652`
|
||||
|
||||
Impact: changes can drift silently, code review becomes deceptive, and the frontend boundary is fake. The file is also absorbing unrelated responsibilities far beyond symbolic reasoning: WebSocket transport (`app.js:2165-2232`), Evennia panels (`app.js:2291-2458`), MemPalace UI (`app.js:2764-2875`), rendering, controls, and ops dashboards.
|
||||
|
||||
### 4. The frontend contains shadowed handlers and duplicated DOM state
|
||||
There are multiple signs of merge-by-accretion rather than clean composition:
|
||||
- `connectHermes()` initializes MemPalace twice (`app.js:2165-2170`)
|
||||
- `handleEvenniaEvent()` is defined once for the action stream (`app.js:2326-2340`) and then redefined again for room snapshots (`app.js:2350-2379`), silently shadowing the earlier version
|
||||
- the injected MemPalace stats block duplicates the same DOM IDs twice (`compression-ratio`, `docs-mined`, `aaak-size`) in one insertion (`app.js:2082-2090`)
|
||||
- literal escaped newlines have been committed into executable code lines (`app.js:1`, `app.js:637`, `app.js:709`)
|
||||
|
||||
Impact: parts of the UI can go dead without obvious failures, DOM queries become ambiguous, and the file is carrying artifacts of prior AI patching rather than coherent ownership.
|
||||
|
||||
### 5. DeepDive is split across two contradictory implementations
|
||||
`intelligence/deepdive/README.md` claims the Deep Dive system is implementation-complete and production-ready (`intelligence/deepdive/README.md:30-43`). In the same repo, `scaffold/deepdive/phase2/relevance_engine.py`, `phase4/tts_pipeline.py`, and `phase5/telegram_delivery.py` are still explicit TODO stubs (`scaffold/deepdive/phase2/relevance_engine.py:10-18`, `scaffold/deepdive/phase4/tts_pipeline.py:9-17`, `scaffold/deepdive/phase5/telegram_delivery.py:9-16`).
|
||||
|
||||
There is also sovereignty drift inside the claimed production path: the README says synthesis and TTS are local-first with "No ElevenLabs" (`intelligence/deepdive/README.md:49-57`), while `tts_engine.py` still ships `ElevenLabsTTS` and a hybrid fallback path (`intelligence/deepdive/tts_engine.py:120-209`).
|
||||
|
||||
Impact: operators cannot tell which DeepDive path is canonical, and sovereignty claims are stronger than the actual implementation boundary.
|
||||
|
||||
## Top 3 recommended refactors
|
||||
|
||||
### 1. Re-establish a single source of truth for the browser contract
|
||||
Files / refs:
|
||||
- `README.md:42-57`, `README.md:118-143`
|
||||
- `tests/test_browser_smoke.py:70-88`
|
||||
- `tests/test_provenance.py:39-51`
|
||||
- `bin/generate_provenance.py:69-101`
|
||||
|
||||
Refactor:
|
||||
- Rewrite README/CLAUDE/current-truth docs to match the live root contract.
|
||||
- Regenerate `provenance.json` from clean `main` and make `bin/generate_provenance.py --check` mandatory in CI.
|
||||
- Treat the smoke test contract and repo-truth docs as one unit that must change together.
|
||||
|
||||
Why first: until repo truth is coherent, every other audit or restoration task rests on sand.
|
||||
|
||||
### 2. Split `app.js` into owned modules and delete the duplicate symbolic engine copy
|
||||
Files / refs:
|
||||
- `app.js:105-652`
|
||||
- `nexus/symbolic-engine.js:2-386`
|
||||
- `app.js:2165-2458`
|
||||
|
||||
Refactor:
|
||||
- Make `nexus/symbolic-engine.js` the only symbolic-engine implementation.
|
||||
- Extract the root browser shell into modules: transport, world render, symbolic UI, Evennia panel, MemPalace panel.
|
||||
- Add a thin composition root in `app.js` instead of keeping behavior inline.
|
||||
|
||||
Why second: this is the main complexity sink in the repo. Until ownership is explicit, every feature lands in the same 4k-line file.
|
||||
|
||||
### 3. Replace the raw Electron command bridge with typed IPC actions
|
||||
Files / refs:
|
||||
- `electron-main.js:1-12`
|
||||
- `mempalace.js:18-35`
|
||||
- `app.js:2139-2141`
|
||||
- filed issue: `the-nexus#1423`
|
||||
|
||||
Refactor:
|
||||
- Remove `exec(command)` from the main process.
|
||||
- Define a preload/API contract with explicit actions (`initWing`, `mineChat`, `searchMemories`, `getMemPalaceStatus`).
|
||||
- Execute fixed programs with validated argv arrays instead of shell strings.
|
||||
- Add regression tests for command-injection payloads.
|
||||
|
||||
Why third: this is the highest-severity boundary flaw in the repo.
|
||||
|
||||
## Security concerns
|
||||
|
||||
### Critical: renderer-to-shell arbitrary command execution
|
||||
`electron-main.js:5-10` exposes a generic `exec(command)` sink. Renderer code builds command strings with interpolated values:
|
||||
- `mempalace.js:19-20`, `mempalace.js:25`, `mempalace.js:30`, `mempalace.js:35`
|
||||
- `app.js:2140-2141`
|
||||
|
||||
This is a classic command-injection surface. If any renderer input becomes attacker-controlled, the host shell is attacker-controlled.
|
||||
|
||||
Status: follow-on issue filed as `the-nexus#1423`.
|
||||
|
||||
### Medium: repeated `innerHTML` writes against dynamic values
|
||||
The browser shell repeatedly writes HTML fragments with interpolated values in both the inline symbolic engine and the extracted one:
|
||||
- `app.js:157`, `app.js:232`, `app.js:317`, `app.js:410-413`, `app.js:445`, `app.js:474-477`
|
||||
- `nexus/symbolic-engine.js:48`, `nexus/symbolic-engine.js:132`, `nexus/symbolic-engine.js:217`, `nexus/symbolic-engine.js:310-312`, `nexus/symbolic-engine.js:344`, `nexus/symbolic-engine.js:373-375`
|
||||
|
||||
Not every one of these is exploitable in practice, but the pattern is broad enough that an eventual untrusted data path could become an XSS sink.
|
||||
|
||||
### Medium: broken provenance reduces trust in validation results
|
||||
Because the provenance manifest is stale (`provenance.json:2-8`) and the verification test is failing (`tests/test_provenance.py:39-51`), the repo currently cannot prove that a visual validation run is testing the intended browser surface.
|
||||
|
||||
## Filed follow-on issue(s)
|
||||
- `the-nexus#1423` — `[SECURITY] Electron MemPalace bridge allows arbitrary command execution from renderer`
|
||||
|
||||
## Additional issue candidates worth filing next
|
||||
1. `[ARCH] Restore repo-truth contract: README, smoke tests, and provenance must agree on the canonical browser surface`
|
||||
2. `[REFACTOR] Decompose app.js and make nexus/symbolic-engine.js the single symbolic engine owner`
|
||||
3. `[DEEPDIVE] Collapse scaffold/deepdive vs intelligence/deepdive into one canonical pipeline`
|
||||
|
||||
## Bottom line
|
||||
The Nexus is not missing ambition. It is missing boundary discipline.
|
||||
|
||||
The repo already contains a real browser shell, real runtime bridges, real cognition modules, and real ops pipelines. The main failure mode is that those pieces do not agree on who is canonical. Fix the truth contract first, then the `app.js` ownership boundary, then the Electron security boundary.
|
||||
280
scripts/big_brain_repo_audit.py
Normal file
280
scripts/big_brain_repo_audit.py
Normal file
@@ -0,0 +1,280 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Build a Big Brain audit artifact for a repository via Ollama.
|
||||
|
||||
The script creates a markdown context bundle from a repo, prompts an Ollama model
|
||||
for an architecture/security audit, and writes the final report to disk.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import urllib.request
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Iterable
|
||||
|
||||
IGNORED_DIRS = {
|
||||
".git",
|
||||
".hg",
|
||||
".svn",
|
||||
".venv",
|
||||
"venv",
|
||||
"node_modules",
|
||||
"__pycache__",
|
||||
".mypy_cache",
|
||||
".pytest_cache",
|
||||
"dist",
|
||||
"build",
|
||||
"coverage",
|
||||
}
|
||||
|
||||
TEXT_SUFFIXES = {
|
||||
".py",
|
||||
".js",
|
||||
".mjs",
|
||||
".cjs",
|
||||
".ts",
|
||||
".tsx",
|
||||
".jsx",
|
||||
".html",
|
||||
".css",
|
||||
".md",
|
||||
".txt",
|
||||
".json",
|
||||
".yaml",
|
||||
".yml",
|
||||
".sh",
|
||||
".ini",
|
||||
".cfg",
|
||||
".toml",
|
||||
}
|
||||
|
||||
PRIORITY_FILENAMES = {
|
||||
"README.md",
|
||||
"CLAUDE.md",
|
||||
"POLICY.md",
|
||||
"DEVELOPMENT.md",
|
||||
"BROWSER_CONTRACT.md",
|
||||
"index.html",
|
||||
"app.js",
|
||||
"style.css",
|
||||
"server.py",
|
||||
"gofai_worker.js",
|
||||
"provenance.json",
|
||||
"tests/test_provenance.py",
|
||||
}
|
||||
|
||||
PRIORITY_SNIPPETS = (
|
||||
"tests/",
|
||||
"docs/",
|
||||
"nexus/",
|
||||
"intelligence/deepdive/",
|
||||
"scaffold/deepdive/",
|
||||
"bin/",
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RepoFile:
|
||||
path: str
|
||||
abs_path: Path
|
||||
size_bytes: int
|
||||
line_count: int
|
||||
|
||||
def to_dict(self) -> dict[str, int | str]:
|
||||
return {
|
||||
"path": self.path,
|
||||
"size_bytes": self.size_bytes,
|
||||
"line_count": self.line_count,
|
||||
}
|
||||
|
||||
|
||||
def _is_text_file(path: Path) -> bool:
|
||||
return path.suffix.lower() in TEXT_SUFFIXES or path.name in {"Dockerfile", "Makefile"}
|
||||
|
||||
|
||||
def collect_repo_files(repo_root: str | Path) -> list[dict[str, int | str]]:
|
||||
root = Path(repo_root).resolve()
|
||||
files: list[RepoFile] = []
|
||||
|
||||
for current_root, dirnames, filenames in os.walk(root):
|
||||
dirnames[:] = sorted(d for d in dirnames if d not in IGNORED_DIRS)
|
||||
base = Path(current_root)
|
||||
for filename in sorted(filenames):
|
||||
path = base / filename
|
||||
if not _is_text_file(path):
|
||||
continue
|
||||
rel_path = path.relative_to(root).as_posix()
|
||||
text = path.read_text(errors="replace")
|
||||
files.append(
|
||||
RepoFile(
|
||||
path=rel_path,
|
||||
abs_path=path,
|
||||
size_bytes=path.stat().st_size,
|
||||
line_count=len(text.splitlines()) or 1,
|
||||
)
|
||||
)
|
||||
|
||||
return [item.to_dict() for item in sorted(files, key=lambda item: item.path)]
|
||||
|
||||
|
||||
def _priority_score(path: str) -> tuple[int, int, str]:
|
||||
score = 0
|
||||
if path in PRIORITY_FILENAMES:
|
||||
score += 100
|
||||
if any(snippet in path for snippet in PRIORITY_SNIPPETS):
|
||||
score += 25
|
||||
if "/" not in path:
|
||||
score += 20
|
||||
if path.startswith("tests/"):
|
||||
score += 10
|
||||
if path.endswith("README.md"):
|
||||
score += 10
|
||||
return (-score, len(path), path)
|
||||
|
||||
|
||||
def _numbered_excerpt(path: Path, max_chars: int) -> str:
|
||||
lines = path.read_text(errors="replace").splitlines()
|
||||
rendered: list[str] = []
|
||||
total = 0
|
||||
for idx, line in enumerate(lines, start=1):
|
||||
numbered = f"{idx}|{line}"
|
||||
if rendered and total + len(numbered) + 1 > max_chars:
|
||||
rendered.append("...[truncated]...")
|
||||
break
|
||||
rendered.append(numbered)
|
||||
total += len(numbered) + 1
|
||||
return "\n".join(rendered)
|
||||
|
||||
|
||||
def render_context_bundle(
|
||||
repo_root: str | Path,
|
||||
repo_name: str,
|
||||
max_chars_per_file: int = 6000,
|
||||
max_total_chars: int = 120000,
|
||||
) -> str:
|
||||
root = Path(repo_root).resolve()
|
||||
files = [
|
||||
RepoFile(Path(item["path"]).as_posix(), root / str(item["path"]), int(item["size_bytes"]), int(item["line_count"]))
|
||||
for item in collect_repo_files(root)
|
||||
]
|
||||
|
||||
lines: list[str] = [
|
||||
f"# Audit Context Bundle — {repo_name}",
|
||||
"",
|
||||
f"Generated: {datetime.now(timezone.utc).isoformat()}",
|
||||
f"Repo root: {root}",
|
||||
f"Text files indexed: {len(files)}",
|
||||
"",
|
||||
"## File manifest",
|
||||
]
|
||||
for item in files:
|
||||
lines.append(f"- {item.path} — {item.line_count} lines, {item.size_bytes} bytes")
|
||||
|
||||
lines.extend(["", "## Selected file excerpts"])
|
||||
total_chars = len("\n".join(lines))
|
||||
|
||||
for item in sorted(files, key=lambda f: _priority_score(f.path)):
|
||||
excerpt = _numbered_excerpt(item.abs_path, max_chars_per_file)
|
||||
block = f"\n### {item.path}\n```text\n{excerpt}\n```\n"
|
||||
if total_chars + len(block) > max_total_chars:
|
||||
break
|
||||
lines.append(f"### {item.path}")
|
||||
lines.append("```text")
|
||||
lines.append(excerpt)
|
||||
lines.append("```")
|
||||
lines.append("")
|
||||
total_chars += len(block)
|
||||
|
||||
return "\n".join(lines).rstrip() + "\n"
|
||||
|
||||
|
||||
def build_audit_prompt(repo_name: str, context_bundle: str) -> str:
|
||||
return (
|
||||
f"You are auditing the repository {repo_name}.\n\n"
|
||||
"Use only the supplied context bundle. Be concrete, skeptical, and reference file:line locations.\n\n"
|
||||
"Return markdown with these sections exactly:\n"
|
||||
"1. Architecture summary\n"
|
||||
"2. Top 5 structural issues\n"
|
||||
"3. Top 3 recommended refactors\n"
|
||||
"4. Security concerns\n"
|
||||
"5. Follow-on issue candidates\n\n"
|
||||
"Rules:\n"
|
||||
"- Every issue and refactor must cite at least one file:line reference.\n"
|
||||
"- Prefer contradictions, dead code, duplicate ownership, stale docs, brittle boundaries, and unsafe execution paths.\n"
|
||||
"- If docs and code disagree, say so plainly.\n"
|
||||
"- Keep it actionable for a Gitea issue/PR workflow.\n\n"
|
||||
"Context bundle:\n\n"
|
||||
f"{context_bundle}"
|
||||
)
|
||||
|
||||
|
||||
def call_ollama_chat(prompt: str, model: str, ollama_url: str, num_ctx: int = 32768, timeout: int = 600) -> str:
|
||||
payload = json.dumps(
|
||||
{
|
||||
"model": model,
|
||||
"messages": [{"role": "user", "content": prompt}],
|
||||
"stream": False,
|
||||
"options": {"num_ctx": num_ctx},
|
||||
}
|
||||
).encode()
|
||||
url = f"{ollama_url.rstrip('/')}/api/chat"
|
||||
request = urllib.request.Request(url, data=payload, headers={"Content-Type": "application/json"})
|
||||
with urllib.request.urlopen(request, timeout=timeout) as response:
|
||||
data = json.loads(response.read().decode())
|
||||
if "message" in data and isinstance(data["message"], dict):
|
||||
return data["message"].get("content", "")
|
||||
if "response" in data:
|
||||
return str(data["response"])
|
||||
raise ValueError(f"Unexpected Ollama response shape: {data}")
|
||||
|
||||
|
||||
def generate_audit_report(
|
||||
repo_root: str | Path,
|
||||
repo_name: str,
|
||||
model: str,
|
||||
ollama_url: str,
|
||||
num_ctx: int,
|
||||
context_out: str | Path | None = None,
|
||||
) -> tuple[str, str]:
|
||||
context_bundle = render_context_bundle(repo_root, repo_name=repo_name)
|
||||
if context_out:
|
||||
context_path = Path(context_out)
|
||||
context_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
context_path.write_text(context_bundle)
|
||||
prompt = build_audit_prompt(repo_name, context_bundle)
|
||||
report = call_ollama_chat(prompt, model=model, ollama_url=ollama_url, num_ctx=num_ctx)
|
||||
return context_bundle, report
|
||||
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(description="Generate a Big Brain repo audit artifact via Ollama")
|
||||
parser.add_argument("--repo-root", required=True, help="Path to the repository to audit")
|
||||
parser.add_argument("--repo-name", required=True, help="Repository name, e.g. Timmy_Foundation/the-nexus")
|
||||
parser.add_argument("--model", default=os.environ.get("BIG_BRAIN_MODEL", "gemma4:latest"))
|
||||
parser.add_argument("--ollama-url", default=os.environ.get("OLLAMA_URL", "http://localhost:11434"))
|
||||
parser.add_argument("--num-ctx", type=int, default=int(os.environ.get("BIG_BRAIN_NUM_CTX", "32768")))
|
||||
parser.add_argument("--context-out", default=None, help="Optional path to save the generated context bundle")
|
||||
parser.add_argument("--report-out", required=True, help="Path to save the generated markdown audit")
|
||||
args = parser.parse_args()
|
||||
|
||||
_, report = generate_audit_report(
|
||||
repo_root=args.repo_root,
|
||||
repo_name=args.repo_name,
|
||||
model=args.model,
|
||||
ollama_url=args.ollama_url,
|
||||
num_ctx=args.num_ctx,
|
||||
context_out=args.context_out,
|
||||
)
|
||||
|
||||
out_path = Path(args.report_out)
|
||||
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
out_path.write_text(report)
|
||||
print(f"Audit report saved to {out_path}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
90
tests/test_big_brain_repo_audit.py
Normal file
90
tests/test_big_brain_repo_audit.py
Normal file
@@ -0,0 +1,90 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
from scripts.big_brain_repo_audit import (
|
||||
build_audit_prompt,
|
||||
call_ollama_chat,
|
||||
collect_repo_files,
|
||||
render_context_bundle,
|
||||
)
|
||||
|
||||
|
||||
def test_collect_repo_files_skips_ignored_directories(tmp_path: Path) -> None:
|
||||
repo = tmp_path / "repo"
|
||||
repo.mkdir()
|
||||
(repo / "README.md").write_text("# Repo\n")
|
||||
(repo / "app.js").write_text("console.log('ok');\n")
|
||||
|
||||
ignored = repo / ".git"
|
||||
ignored.mkdir()
|
||||
(ignored / "config").write_text("secret")
|
||||
|
||||
node_modules = repo / "node_modules"
|
||||
node_modules.mkdir()
|
||||
(node_modules / "pkg.js").write_text("ignored")
|
||||
|
||||
files = collect_repo_files(repo)
|
||||
rel_paths = [item["path"] for item in files]
|
||||
|
||||
assert rel_paths == ["README.md", "app.js"]
|
||||
|
||||
|
||||
def test_render_context_bundle_prioritizes_key_files_and_numbers_lines(tmp_path: Path) -> None:
|
||||
repo = tmp_path / "repo"
|
||||
repo.mkdir()
|
||||
(repo / "README.md").write_text("# Repo\ntruth\n")
|
||||
(repo / "CLAUDE.md").write_text("rules\n")
|
||||
(repo / "app.js").write_text("line one\nline two\n")
|
||||
(repo / "server.py").write_text("print('hi')\n")
|
||||
|
||||
bundle = render_context_bundle(repo, repo_name="org/repo", max_chars_per_file=200, max_total_chars=2000)
|
||||
|
||||
assert "# Audit Context Bundle — org/repo" in bundle
|
||||
assert "## File manifest" in bundle
|
||||
assert "README.md" in bundle
|
||||
assert "### app.js" in bundle
|
||||
assert "1|line one" in bundle
|
||||
assert "2|line two" in bundle
|
||||
|
||||
|
||||
def test_build_audit_prompt_requires_file_line_references() -> None:
|
||||
prompt = build_audit_prompt("Timmy_Foundation/the-nexus", "context bundle")
|
||||
|
||||
assert "Architecture summary" in prompt
|
||||
assert "Top 5 structural issues" in prompt
|
||||
assert "Top 3 recommended refactors" in prompt
|
||||
assert "Security concerns" in prompt
|
||||
assert "file:line" in prompt
|
||||
assert "Timmy_Foundation/the-nexus" in prompt
|
||||
|
||||
|
||||
class _FakeResponse:
|
||||
def __init__(self, payload: dict):
|
||||
self.payload = json.dumps(payload).encode()
|
||||
|
||||
def read(self) -> bytes:
|
||||
return self.payload
|
||||
|
||||
def __enter__(self):
|
||||
return self
|
||||
|
||||
def __exit__(self, exc_type, exc, tb):
|
||||
return False
|
||||
|
||||
|
||||
def test_call_ollama_chat_parses_response() -> None:
|
||||
with patch(
|
||||
"scripts.big_brain_repo_audit.urllib.request.urlopen",
|
||||
return_value=_FakeResponse({"message": {"content": "audit output"}}),
|
||||
) as mocked:
|
||||
result = call_ollama_chat("prompt text", model="gemma4:latest", ollama_url="http://localhost:11434", num_ctx=65536)
|
||||
|
||||
assert result == "audit output"
|
||||
request = mocked.call_args.args[0]
|
||||
payload = json.loads(request.data.decode())
|
||||
assert payload["model"] == "gemma4:latest"
|
||||
assert payload["options"]["num_ctx"] == 65536
|
||||
assert payload["messages"][0]["role"] == "user"
|
||||
Reference in New Issue
Block a user