Compare commits

...

1 Commits

Author SHA1 Message Date
Alexander Whitestone
1050812bb5 fix: make big brain provider wiring generic (#543)
Some checks are pending
Smoke Test / smoke (pull_request) Waiting to run
2026-04-15 01:26:14 -04:00
6 changed files with 441 additions and 358 deletions

View File

@@ -175,12 +175,13 @@ custom_providers:
api_key: ollama
model: qwen3:30b
- name: Big Brain
base_url: https://8lfr3j47a5r3gn-11434.proxy.runpod.net/v1
base_url: https://YOUR_BIG_BRAIN_HOST/v1
api_key: ''
model: gemma3:27b
# RunPod L40S 48GB — Ollama image, gemma3:27b
# Usage: hermes --provider big_brain -p 'Say READY'
# Pod: 8lfr3j47a5r3gn, deployed 2026-04-07
model: gemma4:latest
# OpenAI-compatible Gemma 4 provider for Mac Hermes.
# RunPod example: https://<pod-id>-11434.proxy.runpod.net/v1
# Vertex AI requires an OpenAI-compatible bridge/proxy; point this at that /v1 endpoint.
# Verify with: python3 scripts/verify_big_brain.py
system_prompt_suffix: "You are Timmy. Your soul is defined in SOUL.md \u2014 read\
\ it, live it.\nYou run locally on your owner's machine via Ollama. You never phone\
\ home.\nYou speak plainly. You prefer short sentences. Brevity is a kindness.\n\

View File

@@ -1,46 +1,90 @@
# Big Brain Pod Verification
# Big Brain Provider Verification
Verification script for Big Brain pod with gemma3:27b model.
Repo wiring for the `big_brain` provider used by Mac Hermes.
## Issue #573
## Issue #543
[BIG-BRAIN] Verify pod live: gemma3:27b pulled and responding
[PROVE-IT] Timmy: Wire RunPod/Vertex AI Gemma 4 to Mac Hermes
## Pod Details
## What this repo now supports
- Pod ID: `8lfr3j47a5r3gn`
- GPU: L40S 48GB
- Image: `ollama/ollama:latest`
- Endpoint: `https://8lfr3j47a5r3gn-11434.proxy.runpod.net`
- Cost: $0.79/hour
The repo no longer hardcodes one dead RunPod pod as the truth.
Instead, it defines a **Big Brain provider contract**:
- provider name: `Big Brain`
- model: `gemma4:latest`
- endpoint style: OpenAI-compatible `/v1` by default
- verification path: `scripts/verify_big_brain.py`
## Verification Script
Supported deployment shapes:
1. **RunPod + Ollama/OpenAI-compatible bridge**
- Example base URL: `https://<pod-id>-11434.proxy.runpod.net/v1`
2. **Vertex AI through an OpenAI-compatible bridge/proxy**
- Example base URL: `https://<your-bridge-host>/v1`
`scripts/verify_big_brain.py` checks:
## Config wiring
1. `/api/tags` - Verifies gemma3:27b is in model list
2. `/api/generate` - Tests response time (< 30s requirement)
3. Uptime logging for cost awareness
`config.yaml` now carries a generic provider block:
```yaml
- name: Big Brain
base_url: https://YOUR_BIG_BRAIN_HOST/v1
api_key: ''
model: gemma4:latest
```
Override at runtime if needed:
- `BIG_BRAIN_BASE_URL`
- `BIG_BRAIN_MODEL`
- `BIG_BRAIN_BACKEND` (`openai` or `ollama`)
- `BIG_BRAIN_API_KEY`
## Verification scripts
### 1. `scripts/verify_big_brain.py`
Checks the configured provider using the right protocol for the chosen backend.
For `openai` backends it verifies:
- `GET /models`
- `POST /chat/completions`
For `ollama` backends it verifies:
- `GET /api/tags`
- `POST /api/generate`
Writes:
- `big_brain_verification.json`
### 2. `scripts/big_brain_manager.py`
A more verbose wrapper over the same provider contract.
Writes:
- `pod_verification_results.json`
## Usage
```bash
cd scripts
python3 verify_big_brain.py
python3 scripts/verify_big_brain.py
python3 scripts/big_brain_manager.py
```
## Output
## Honest current state
- Console output with verification results
- `big_brain_verification.json` with detailed results
- Exit code 0 on success, 1 on failure
On fresh main before this fix, the repo was pointing at a stale RunPod endpoint:
- `https://8lfr3j47a5r3gn-11434.proxy.runpod.net`
- verification returned HTTP 404 for both model listing and generation
## Acceptance Criteria
That meant the repo claimed Big Brain wiring existed, but the proof path was stale and tied to a dead specific pod.
- [x] `/api/tags` returns `gemma3:27b` in model list
- [x] `/api/generate` responds to a simple prompt in < 30s
- [x] uptime logged (cost awareness: $0.79/hr)
This fix makes the repo wiring reusable and truthful, but it does **not** provision a fresh paid GPU automatically.
## Previous Issues
## Acceptance mapping
Previous pod (elr5vkj96qdplf) used broken `runpod/ollama:latest` image and never started. Fix: use `ollama/ollama:latest`. Volume mount at `/root/.ollama` for model persistence.
What this repo change satisfies:
- [x] Mac Hermes has a `big_brain` provider contract in `config.yaml`
- [x] Verification script checks that provider through the same API shape Hermes needs
- [x] RunPod and Vertex-style wiring are documented without hardcoding a dead pod
What still depends on live infrastructure outside the repo:
- [ ] GPU instance actually provisioned and running
- [ ] endpoint responsive right now
- [ ] live `hermes chat --provider big_brain` success against a real endpoint

View File

@@ -1,214 +1,123 @@
#!/usr/bin/env python3
"""
Big Brain Pod Management and Verification
Comprehensive script for managing and verifying Big Brain pod.
Big Brain provider management and verification.
Uses the repo's Big Brain provider config rather than a stale hardcoded pod id.
Supports both OpenAI-compatible and raw Ollama backends.
"""
import requests
import time
from __future__ import annotations
import json
import os
import sys
from datetime import datetime
# Configuration
CONFIG = {
"pod_id": "8lfr3j47a5r3gn",
"endpoint": "https://8lfr3j47a5r3gn-11434.proxy.runpod.net",
"cost_per_hour": 0.79,
"model": "gemma3:27b",
"max_response_time": 30, # seconds
"timeout": 10
}
import requests
class PodVerifier:
def __init__(self, config=None):
self.config = config or CONFIG
self.results = {}
def check_connectivity(self):
"""Check basic connectivity to the pod."""
print(f"[{datetime.now().isoformat()}] Checking connectivity to {self.config['endpoint']}...")
from scripts.big_brain_provider import (
build_generate_payload,
resolve_big_brain_provider,
resolve_generate_url,
resolve_models_url,
)
class ProviderVerifier:
def __init__(self, provider: dict | None = None, timeout: int = 10, max_response_time: int = 30):
self.provider = provider or resolve_big_brain_provider()
self.timeout = timeout
self.max_response_time = max_response_time
self.results: dict[str, object] = {}
def _headers(self) -> dict[str, str]:
headers = {"Content-Type": "application/json"}
api_key = self.provider.get("api_key", "")
if api_key:
headers["Authorization"] = f"Bearer {api_key}"
return headers
def check_models(self):
url = resolve_models_url(self.provider)
print(f"[{datetime.now().isoformat()}] Checking models endpoint: {url}")
try:
response = requests.get(self.config['endpoint'], timeout=self.config['timeout'])
print(f" Status: {response.status_code}")
print(f" Headers: {dict(response.headers)}")
return response.status_code
except requests.exceptions.ConnectionError:
print(" ✗ Connection failed - pod might be down or unreachable")
return None
except Exception as e:
print(f" ✗ Error: {e}")
return None
def check_ollama_api(self):
"""Check if Ollama API is responding."""
print(f"[{datetime.now().isoformat()}] Checking Ollama API...")
endpoints_to_try = [
"/api/tags",
"/api/version",
"/"
]
for endpoint in endpoints_to_try:
url = f"{self.config['endpoint']}{endpoint}"
try:
print(f" Trying {url}...")
response = requests.get(url, timeout=self.config['timeout'])
print(f" Status: {response.status_code}")
if response.status_code == 200:
print(f" ✓ Endpoint accessible")
return True, endpoint, response
elif response.status_code == 404:
print(f" - Not found (404)")
else:
print(f" - Unexpected status: {response.status_code}")
except Exception as e:
print(f" ✗ Error: {e}")
return False, None, None
def pull_model(self, model_name=None):
"""Pull a model if not available."""
model = model_name or self.config['model']
print(f"[{datetime.now().isoformat()}] Pulling model {model}...")
try:
payload = {"name": model}
response = requests.post(
f"{self.config['endpoint']}/api/pull",
json=payload,
timeout=60
)
if response.status_code == 200:
print(f" ✓ Model pull initiated")
return True
else:
print(f" ✗ Failed to pull model: {response.status_code}")
return False
except Exception as e:
print(f" ✗ Error pulling model: {e}")
return False
def test_generation(self, prompt="Say hello in one word."):
"""Test generation with the model."""
print(f"[{datetime.now().isoformat()}] Testing generation...")
try:
payload = {
"model": self.config['model'],
"prompt": prompt,
"stream": False,
"options": {"num_predict": 10}
}
start_time = time.time()
response = requests.post(
f"{self.config['endpoint']}/api/generate",
json=payload,
timeout=self.config['max_response_time']
)
elapsed = time.time() - start_time
response = requests.get(url, headers=self._headers(), timeout=self.timeout)
models = []
if response.status_code == 200:
data = response.json()
response_text = data.get("response", "").strip()
print(f" ✓ Generation successful in {elapsed:.2f}s")
print(f" Response: {response_text[:100]}...")
if elapsed <= self.config['max_response_time']:
print(f" ✓ Response time within limit ({self.config['max_response_time']}s)")
return True, elapsed, response_text
if self.provider["backend"] == "openai":
models = [m.get("id", "") for m in data.get("data", [])]
else:
print(f" ✗ Response time {elapsed:.2f}s exceeds limit")
return False, elapsed, response_text
models = [m.get("name", "") for m in data.get("models", [])]
print(f" ✓ Models endpoint OK ({response.status_code})")
else:
print(f"Generation failed: {response.status_code}")
return False, 0, ""
print(f"Models endpoint failed ({response.status_code})")
return response.status_code == 200, models, response.status_code
except Exception as e:
print(f"Error during generation: {e}")
return False, 0, ""
def run_verification(self):
"""Run full verification suite."""
print("=" * 60)
print("Big Brain Pod Verification Suite")
print("=" * 60)
print(f"Pod ID: {self.config['pod_id']}")
print(f"Endpoint: {self.config['endpoint']}")
print(f"Model: {self.config['model']}")
print(f"Cost: ${self.config['cost_per_hour']}/hour")
print("=" * 60)
print()
# Check connectivity
status_code = self.check_connectivity()
print()
# Check Ollama API
api_ok, api_endpoint, api_response = self.check_ollama_api()
print()
# If API is accessible, check for model
models = []
if api_ok and api_endpoint == "/api/tags":
try:
data = api_response.json()
models = [m.get("name", "") for m in data.get("models", [])]
print(f"Available models: {models}")
# Check for target model
has_model = any(self.config['model'] in m.lower() for m in models)
if not has_model:
print(f"Model {self.config['model']} not found. Attempting to pull...")
self.pull_model()
print(f"Models endpoint error: {e}")
return False, [], None
def test_generation(self, prompt: str = "Say READY"):
url = resolve_generate_url(self.provider)
payload = build_generate_payload(self.provider, prompt=prompt)
print(f"[{datetime.now().isoformat()}] Testing generation endpoint: {url}")
try:
response = requests.post(url, headers=self._headers(), json=payload, timeout=self.max_response_time)
text = ""
if response.status_code == 200:
data = response.json()
if self.provider["backend"] == "openai":
text = data.get("choices", [{}])[0].get("message", {}).get("content", "").strip()
else:
print(f"✓ Model {self.config['model']} found")
except:
print("Could not parse model list")
print()
# Test generation
gen_ok, gen_time, gen_response = self.test_generation()
print()
# Summary
text = data.get("response", "").strip()
print(f" ✓ Generation OK ({response.status_code})")
else:
print(f" ✗ Generation failed ({response.status_code})")
return response.status_code == 200, text, response.status_code
except Exception as e:
print(f" ✗ Generation error: {e}")
return False, "", None
def run_verification(self):
print("=" * 60)
print("VERIFICATION SUMMARY")
print("Big Brain Provider Verification Suite")
print("=" * 60)
print(f"Provider: {self.provider['name']}")
print(f"Backend: {self.provider['backend']}")
print(f"Base URL: {self.provider['base_url']}")
print(f"Model: {self.provider['model']}")
print("=" * 60)
print(f"Connectivity: {'' if status_code else ''}")
print(f"Ollama API: {'' if api_ok else ''}")
print(f"Generation: {'' if gen_ok else ''}")
print(f"Response time: {gen_time:.2f}s (limit: {self.config['max_response_time']}s)")
print()
overall_ok = api_ok and gen_ok
print(f"Overall Status: {'✓ POD LIVE' if overall_ok else '✗ POD ISSUES'}")
# Save results
models_ok, models, models_status = self.check_models()
print()
gen_ok, gen_response, gen_status = self.test_generation()
print()
overall_ok = models_ok and gen_ok
self.results = {
"timestamp": datetime.now().isoformat(),
"pod_id": self.config['pod_id'],
"endpoint": self.config['endpoint'],
"connectivity_status": status_code,
"api_accessible": api_ok,
"api_endpoint": api_endpoint,
"provider": self.provider,
"models_ok": models_ok,
"models_status": models_status,
"models": models,
"generation_ok": gen_ok,
"generation_time": gen_time,
"generation_response": gen_response[:200] if gen_response else "",
"generation_status": gen_status,
"generation_response": gen_response[:200],
"overall_ok": overall_ok,
"cost_per_hour": self.config['cost_per_hour']
}
with open("pod_verification_results.json", "w") as f:
json.dump(self.results, f, indent=2)
print("=" * 60)
print(f"Overall Status: {'✓ PROVIDER LIVE' if overall_ok else '✗ PROVIDER ISSUES'}")
print("Results saved to pod_verification_results.json")
return overall_ok
def main():
verifier = PodVerifier()
verifier = ProviderVerifier()
success = verifier.run_verification()
sys.exit(0 if success else 1)
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,72 @@
from __future__ import annotations
import os
from pathlib import Path
from typing import Any
import yaml
DEFAULT_CONFIG_PATH = Path(__file__).resolve().parents[1] / "config.yaml"
def _normalize_base_url(base_url: str) -> str:
return (base_url or "").rstrip("/")
def load_big_brain_provider(config_path: str | Path = DEFAULT_CONFIG_PATH) -> dict[str, Any]:
config = yaml.safe_load(Path(config_path).read_text()) or {}
for provider in config.get("custom_providers", []):
if provider.get("name") == "Big Brain":
return dict(provider)
raise KeyError("Big Brain provider not found in config")
def infer_backend(base_url: str) -> str:
base = _normalize_base_url(base_url)
return "openai" if base.endswith("/v1") else "ollama"
def resolve_big_brain_provider(config_path: str | Path = DEFAULT_CONFIG_PATH) -> dict[str, Any]:
provider = load_big_brain_provider(config_path)
base_url = _normalize_base_url(os.environ.get("BIG_BRAIN_BASE_URL", provider.get("base_url", "")))
model = os.environ.get("BIG_BRAIN_MODEL", provider.get("model", "gemma4:latest"))
backend = os.environ.get("BIG_BRAIN_BACKEND", infer_backend(base_url))
api_key = os.environ.get("BIG_BRAIN_API_KEY", provider.get("api_key", ""))
return {
"name": provider.get("name", "Big Brain"),
"base_url": base_url,
"model": model,
"backend": backend,
"api_key": api_key,
}
def resolve_models_url(provider: dict[str, Any]) -> str:
base = _normalize_base_url(provider["base_url"])
if provider["backend"] == "openai":
return f"{base}/models"
return f"{base}/api/tags"
def resolve_generate_url(provider: dict[str, Any]) -> str:
base = _normalize_base_url(provider["base_url"])
if provider["backend"] == "openai":
return f"{base}/chat/completions"
return f"{base}/api/generate"
def build_generate_payload(provider: dict[str, Any], prompt: str = "Say READY") -> dict[str, Any]:
if provider["backend"] == "openai":
return {
"model": provider["model"],
"messages": [{"role": "user", "content": prompt}],
"stream": False,
"max_tokens": 32,
}
return {
"model": provider["model"],
"prompt": prompt,
"stream": False,
"options": {"num_predict": 32},
}

View File

@@ -1,176 +1,133 @@
#!/usr/bin/env python3
"""
Big Brain Pod Verification Script
Verifies that the Big Brain pod is live with gemma3:27b model.
Issue #573: [BIG-BRAIN] Verify pod live: gemma3:27b pulled and responding
Big Brain provider verification.
Verifies that the Big Brain provider configured for Mac Hermes is reachable and
can answer a simple prompt. Supports both:
- OpenAI-compatible endpoints (`.../v1/models`, `.../v1/chat/completions`)
- Raw Ollama endpoints (`/api/tags`, `/api/generate`)
Refs: timmy-home #543
"""
import requests
import time
from __future__ import annotations
import json
import sys
import time
from datetime import datetime
from pathlib import Path
# Pod configuration
POD_ID = "8lfr3j47a5r3gn"
ENDPOINT = f"https://{POD_ID}-11434.proxy.runpod.net"
COST_PER_HOUR = 0.79 # USD
import requests
def check_api_tags():
"""Check if gemma3:27b is in the model list."""
print(f"[{datetime.now().isoformat()}] Checking /api/tags endpoint...")
from scripts.big_brain_provider import (
build_generate_payload,
resolve_big_brain_provider,
resolve_generate_url,
resolve_models_url,
)
RESULTS_PATH = Path("big_brain_verification.json")
def _headers(provider: dict[str, str]) -> dict[str, str]:
headers = {"Content-Type": "application/json"}
api_key = provider.get("api_key", "")
if api_key:
headers["Authorization"] = f"Bearer {api_key}"
return headers
def check_models(provider: dict[str, str], timeout: int = 10) -> tuple[bool, float, list[str], int | None]:
url = resolve_models_url(provider)
started = time.time()
try:
start_time = time.time()
response = requests.get(f"{ENDPOINT}/api/tags", timeout=10)
elapsed = time.time() - start_time
print(f" Response status: {response.status_code}")
print(f" Response headers: {dict(response.headers)}")
response = requests.get(url, headers=_headers(provider), timeout=timeout)
elapsed = time.time() - started
models: list[str] = []
if response.status_code == 200:
data = response.json()
models = [model.get("name", "") for model in data.get("models", [])]
print(f" ✓ API responded in {elapsed:.2f}s")
print(f" Available models: {models}")
# Check for gemma3:27b
has_gemma = any("gemma3:27b" in model.lower() for model in models)
if has_gemma:
print(" ✓ gemma3:27b found in model list")
return True, elapsed, models
if provider["backend"] == "openai":
models = [m.get("id", "") for m in data.get("data", [])]
else:
print(" ✗ gemma3:27b NOT found in model list")
return False, elapsed, models
elif response.status_code == 404:
print(f" ✗ API endpoint not found (404)")
print(f" This might mean Ollama is not running or endpoint is wrong")
print(f" Trying to ping the server...")
try:
ping_response = requests.get(f"{ENDPOINT}/", timeout=5)
print(f" Ping response: {ping_response.status_code}")
except:
print(" Ping failed - server unreachable")
return False, elapsed, []
else:
print(f" ✗ API returned status {response.status_code}")
return False, elapsed, []
except Exception as e:
print(f" ✗ Error checking API tags: {e}")
return False, 0, []
models = [m.get("name", "") for m in data.get("models", [])]
return response.status_code == 200, elapsed, models, response.status_code
except Exception:
elapsed = time.time() - started
return False, elapsed, [], None
def test_generate():
"""Test generate endpoint with a simple prompt."""
print(f"[{datetime.now().isoformat()}] Testing /api/generate endpoint...")
def test_generation(provider: dict[str, str], prompt: str = "Say READY", timeout: int = 30) -> tuple[bool, float, str, int | None]:
url = resolve_generate_url(provider)
payload = build_generate_payload(provider, prompt=prompt)
started = time.time()
try:
payload = {
"model": "gemma3:27b",
"prompt": "Say hello in one word.",
"stream": False,
"options": {
"num_predict": 10
}
}
start_time = time.time()
response = requests.post(
f"{ENDPOINT}/api/generate",
json=payload,
timeout=30
)
elapsed = time.time() - start_time
response = requests.post(url, headers=_headers(provider), json=payload, timeout=timeout)
elapsed = time.time() - started
response_text = ""
if response.status_code == 200:
data = response.json()
response_text = data.get("response", "").strip()
print(f" ✓ Generate responded in {elapsed:.2f}s")
print(f" Response: {response_text[:100]}...")
if elapsed < 30:
print(" ✓ Response time under 30 seconds")
return True, elapsed, response_text
if provider["backend"] == "openai":
response_text = (
data.get("choices", [{}])[0]
.get("message", {})
.get("content", "")
.strip()
)
else:
print(f" ✗ Response time {elapsed:.2f}s exceeds 30s limit")
return False, elapsed, response_text
else:
print(f" ✗ Generate returned status {response.status_code}")
return False, elapsed, ""
except Exception as e:
print(f" ✗ Error testing generate: {e}")
return False, 0, ""
response_text = data.get("response", "").strip()
return response.status_code == 200, elapsed, response_text, response.status_code
except Exception:
elapsed = time.time() - started
return False, elapsed, "", None
def check_uptime():
"""Estimate uptime based on pod creation (simplified)."""
# In a real implementation, we'd check RunPod API for pod start time
# For now, we'll just log the check time
check_time = datetime.now()
print(f"[{check_time.isoformat()}] Pod verification timestamp")
return check_time
def main():
def main() -> int:
provider = resolve_big_brain_provider()
print("=" * 60)
print("Big Brain Pod Verification")
print(f"Pod ID: {POD_ID}")
print(f"Endpoint: {ENDPOINT}")
print(f"Cost: ${COST_PER_HOUR}/hour")
print("Big Brain Provider Verification")
print(f"Timestamp: {datetime.now().isoformat()}")
print(f"Provider: {provider['name']}")
print(f"Backend: {provider['backend']}")
print(f"Base URL: {provider['base_url']}")
print(f"Model: {provider['model']}")
print("=" * 60)
print()
# Check uptime
check_time = check_uptime()
models_ok, models_time, models, models_status = check_models(provider)
print(f"Models endpoint: {'PASS' if models_ok else 'FAIL'} ({models_time:.2f}s, status={models_status})")
if models:
print(f"Models seen: {models}")
print()
# Check API tags
tags_ok, tags_time, models = check_api_tags()
gen_ok, gen_time, gen_response, gen_status = test_generation(provider)
print(f"Generation endpoint: {'PASS' if gen_ok else 'FAIL'} ({gen_time:.2f}s, status={gen_status})")
if gen_response:
print(f"Response preview: {gen_response[:120]}")
print()
# Test generate
generate_ok, generate_time, response = test_generate()
print()
# Summary
print("=" * 60)
print("VERIFICATION SUMMARY")
print("=" * 60)
print(f"API Tags Check: {'✓ PASS' if tags_ok else '✗ FAIL'}")
print(f" Response time: {tags_time:.2f}s")
print(f" Models found: {len(models)}")
print()
print(f"Generate Test: {'✓ PASS' if generate_ok else '✗ FAIL'}")
print(f" Response time: {generate_time:.2f}s")
print(f" Under 30s: {'✓ YES' if generate_time < 30 else '✗ NO'}")
print()
# Overall status
overall_ok = tags_ok and generate_ok
print(f"Overall Status: {'✓ POD LIVE' if overall_ok else '✗ POD ISSUES'}")
# Cost awareness
print()
print(f"Cost Awareness: Pod costs ${COST_PER_HOUR}/hour")
print(f"Verification time: {check_time.strftime('%Y-%m-%d %H:%M:%S')}")
# Write results to file
results = {
"pod_id": POD_ID,
"endpoint": ENDPOINT,
"timestamp": check_time.isoformat(),
"api_tags_ok": tags_ok,
"api_tags_time": tags_time,
overall_ok = models_ok and gen_ok
result = {
"timestamp": datetime.now().isoformat(),
"provider_name": provider["name"],
"backend": provider["backend"],
"base_url": provider["base_url"],
"model": provider["model"],
"models_ok": models_ok,
"models_status": models_status,
"models_time": models_time,
"models": models,
"generate_ok": generate_ok,
"generate_time": generate_time,
"generate_response": response[:200] if response else "",
"generation_ok": gen_ok,
"generation_status": gen_status,
"generation_time": gen_time,
"generation_response": gen_response[:200],
"overall_ok": overall_ok,
"cost_per_hour": COST_PER_HOUR
}
with open("big_brain_verification.json", "w") as f:
json.dump(results, f, indent=2)
print()
print("Results saved to big_brain_verification.json")
# Exit with appropriate code
sys.exit(0 if overall_ok else 1)
RESULTS_PATH.write_text(json.dumps(result, indent=2))
print(f"Results saved to {RESULTS_PATH}")
print(f"Overall: {'POD/PROVIDER LIVE' if overall_ok else 'PROVIDER ISSUES'}")
return 0 if overall_ok else 1
if __name__ == "__main__":
main()
raise SystemExit(main())

View File

@@ -0,0 +1,100 @@
from __future__ import annotations
import json
from pathlib import Path
import yaml
from scripts.big_brain_provider import (
build_generate_payload,
infer_backend,
load_big_brain_provider,
resolve_big_brain_provider,
resolve_models_url,
resolve_generate_url,
)
def test_load_big_brain_provider_from_config(tmp_path: Path) -> None:
cfg = tmp_path / "config.yaml"
cfg.write_text(
yaml.safe_dump(
{
"custom_providers": [
{"name": "Local Ollama", "base_url": "http://localhost:11434/v1", "model": "qwen3:30b"},
{"name": "Big Brain", "base_url": "https://pod-11434.proxy.runpod.net/v1", "model": "gemma4:latest"},
]
}
)
)
provider = load_big_brain_provider(cfg)
assert provider["name"] == "Big Brain"
assert provider["base_url"] == "https://pod-11434.proxy.runpod.net/v1"
assert provider["model"] == "gemma4:latest"
def test_infer_backend_distinguishes_openai_compat_from_ollama() -> None:
assert infer_backend("https://pod-11434.proxy.runpod.net/v1") == "openai"
assert infer_backend("http://localhost:11434") == "ollama"
def test_resolve_big_brain_provider_prefers_env_overrides(tmp_path: Path, monkeypatch) -> None:
cfg = tmp_path / "config.yaml"
cfg.write_text(
yaml.safe_dump(
{
"custom_providers": [
{"name": "Big Brain", "base_url": "https://old-endpoint/v1", "model": "gemma3:27b"}
]
}
)
)
monkeypatch.setenv("BIG_BRAIN_BASE_URL", "https://vertex-proxy.example/v1")
monkeypatch.setenv("BIG_BRAIN_MODEL", "gemma4:latest")
monkeypatch.setenv("BIG_BRAIN_BACKEND", "openai")
provider = resolve_big_brain_provider(cfg)
assert provider["base_url"] == "https://vertex-proxy.example/v1"
assert provider["model"] == "gemma4:latest"
assert provider["backend"] == "openai"
def test_openai_compat_urls_and_payload() -> None:
provider = {"base_url": "https://pod.proxy.runpod.net/v1", "model": "gemma4:latest", "backend": "openai"}
assert resolve_models_url(provider) == "https://pod.proxy.runpod.net/v1/models"
assert resolve_generate_url(provider) == "https://pod.proxy.runpod.net/v1/chat/completions"
payload = build_generate_payload(provider, prompt="Say READY")
assert payload["model"] == "gemma4:latest"
assert payload["messages"][0]["content"] == "Say READY"
assert payload["stream"] is False
assert payload["max_tokens"] == 32
def test_ollama_urls_and_payload() -> None:
provider = {"base_url": "http://localhost:11434", "model": "gemma4:latest", "backend": "ollama"}
assert resolve_models_url(provider) == "http://localhost:11434/api/tags"
assert resolve_generate_url(provider) == "http://localhost:11434/api/generate"
payload = build_generate_payload(provider, prompt="Say READY")
assert payload == {"model": "gemma4:latest", "prompt": "Say READY", "stream": False, "options": {"num_predict": 32}}
def test_repo_config_big_brain_is_gemma4_not_hardcoded_dead_pod() -> None:
config = Path("config.yaml").read_text()
assert "- name: Big Brain" in config
assert "model: gemma4:latest" in config
assert "8lfr3j47a5r3gn-11434.proxy.runpod.net" not in config
def test_big_brain_readme_mentions_runpod_and_vertex() -> None:
readme = Path("scripts/README_big_brain.md").read_text()
assert "RunPod" in readme
assert "Vertex AI" in readme
assert "gemma4:latest" in readme