Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
1050812bb5 |
11
config.yaml
11
config.yaml
@@ -175,12 +175,13 @@ custom_providers:
|
||||
api_key: ollama
|
||||
model: qwen3:30b
|
||||
- name: Big Brain
|
||||
base_url: https://8lfr3j47a5r3gn-11434.proxy.runpod.net/v1
|
||||
base_url: https://YOUR_BIG_BRAIN_HOST/v1
|
||||
api_key: ''
|
||||
model: gemma3:27b
|
||||
# RunPod L40S 48GB — Ollama image, gemma3:27b
|
||||
# Usage: hermes --provider big_brain -p 'Say READY'
|
||||
# Pod: 8lfr3j47a5r3gn, deployed 2026-04-07
|
||||
model: gemma4:latest
|
||||
# OpenAI-compatible Gemma 4 provider for Mac Hermes.
|
||||
# RunPod example: https://<pod-id>-11434.proxy.runpod.net/v1
|
||||
# Vertex AI requires an OpenAI-compatible bridge/proxy; point this at that /v1 endpoint.
|
||||
# Verify with: python3 scripts/verify_big_brain.py
|
||||
system_prompt_suffix: "You are Timmy. Your soul is defined in SOUL.md \u2014 read\
|
||||
\ it, live it.\nYou run locally on your owner's machine via Ollama. You never phone\
|
||||
\ home.\nYou speak plainly. You prefer short sentences. Brevity is a kindness.\n\
|
||||
|
||||
@@ -1,46 +1,90 @@
|
||||
# Big Brain Pod Verification
|
||||
# Big Brain Provider Verification
|
||||
|
||||
Verification script for Big Brain pod with gemma3:27b model.
|
||||
Repo wiring for the `big_brain` provider used by Mac Hermes.
|
||||
|
||||
## Issue #573
|
||||
## Issue #543
|
||||
|
||||
[BIG-BRAIN] Verify pod live: gemma3:27b pulled and responding
|
||||
[PROVE-IT] Timmy: Wire RunPod/Vertex AI Gemma 4 to Mac Hermes
|
||||
|
||||
## Pod Details
|
||||
## What this repo now supports
|
||||
|
||||
- Pod ID: `8lfr3j47a5r3gn`
|
||||
- GPU: L40S 48GB
|
||||
- Image: `ollama/ollama:latest`
|
||||
- Endpoint: `https://8lfr3j47a5r3gn-11434.proxy.runpod.net`
|
||||
- Cost: $0.79/hour
|
||||
The repo no longer hardcodes one dead RunPod pod as the truth.
|
||||
Instead, it defines a **Big Brain provider contract**:
|
||||
- provider name: `Big Brain`
|
||||
- model: `gemma4:latest`
|
||||
- endpoint style: OpenAI-compatible `/v1` by default
|
||||
- verification path: `scripts/verify_big_brain.py`
|
||||
|
||||
## Verification Script
|
||||
Supported deployment shapes:
|
||||
1. **RunPod + Ollama/OpenAI-compatible bridge**
|
||||
- Example base URL: `https://<pod-id>-11434.proxy.runpod.net/v1`
|
||||
2. **Vertex AI through an OpenAI-compatible bridge/proxy**
|
||||
- Example base URL: `https://<your-bridge-host>/v1`
|
||||
|
||||
`scripts/verify_big_brain.py` checks:
|
||||
## Config wiring
|
||||
|
||||
1. `/api/tags` - Verifies gemma3:27b is in model list
|
||||
2. `/api/generate` - Tests response time (< 30s requirement)
|
||||
3. Uptime logging for cost awareness
|
||||
`config.yaml` now carries a generic provider block:
|
||||
|
||||
```yaml
|
||||
- name: Big Brain
|
||||
base_url: https://YOUR_BIG_BRAIN_HOST/v1
|
||||
api_key: ''
|
||||
model: gemma4:latest
|
||||
```
|
||||
|
||||
Override at runtime if needed:
|
||||
- `BIG_BRAIN_BASE_URL`
|
||||
- `BIG_BRAIN_MODEL`
|
||||
- `BIG_BRAIN_BACKEND` (`openai` or `ollama`)
|
||||
- `BIG_BRAIN_API_KEY`
|
||||
|
||||
## Verification scripts
|
||||
|
||||
### 1. `scripts/verify_big_brain.py`
|
||||
Checks the configured provider using the right protocol for the chosen backend.
|
||||
|
||||
For `openai` backends it verifies:
|
||||
- `GET /models`
|
||||
- `POST /chat/completions`
|
||||
|
||||
For `ollama` backends it verifies:
|
||||
- `GET /api/tags`
|
||||
- `POST /api/generate`
|
||||
|
||||
Writes:
|
||||
- `big_brain_verification.json`
|
||||
|
||||
### 2. `scripts/big_brain_manager.py`
|
||||
A more verbose wrapper over the same provider contract.
|
||||
|
||||
Writes:
|
||||
- `pod_verification_results.json`
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
cd scripts
|
||||
python3 verify_big_brain.py
|
||||
python3 scripts/verify_big_brain.py
|
||||
python3 scripts/big_brain_manager.py
|
||||
```
|
||||
|
||||
## Output
|
||||
## Honest current state
|
||||
|
||||
- Console output with verification results
|
||||
- `big_brain_verification.json` with detailed results
|
||||
- Exit code 0 on success, 1 on failure
|
||||
On fresh main before this fix, the repo was pointing at a stale RunPod endpoint:
|
||||
- `https://8lfr3j47a5r3gn-11434.proxy.runpod.net`
|
||||
- verification returned HTTP 404 for both model listing and generation
|
||||
|
||||
## Acceptance Criteria
|
||||
That meant the repo claimed Big Brain wiring existed, but the proof path was stale and tied to a dead specific pod.
|
||||
|
||||
- [x] `/api/tags` returns `gemma3:27b` in model list
|
||||
- [x] `/api/generate` responds to a simple prompt in < 30s
|
||||
- [x] uptime logged (cost awareness: $0.79/hr)
|
||||
This fix makes the repo wiring reusable and truthful, but it does **not** provision a fresh paid GPU automatically.
|
||||
|
||||
## Previous Issues
|
||||
## Acceptance mapping
|
||||
|
||||
Previous pod (elr5vkj96qdplf) used broken `runpod/ollama:latest` image and never started. Fix: use `ollama/ollama:latest`. Volume mount at `/root/.ollama` for model persistence.
|
||||
What this repo change satisfies:
|
||||
- [x] Mac Hermes has a `big_brain` provider contract in `config.yaml`
|
||||
- [x] Verification script checks that provider through the same API shape Hermes needs
|
||||
- [x] RunPod and Vertex-style wiring are documented without hardcoding a dead pod
|
||||
|
||||
What still depends on live infrastructure outside the repo:
|
||||
- [ ] GPU instance actually provisioned and running
|
||||
- [ ] endpoint responsive right now
|
||||
- [ ] live `hermes chat --provider big_brain` success against a real endpoint
|
||||
|
||||
@@ -1,214 +1,123 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Big Brain Pod Management and Verification
|
||||
Comprehensive script for managing and verifying Big Brain pod.
|
||||
Big Brain provider management and verification.
|
||||
|
||||
Uses the repo's Big Brain provider config rather than a stale hardcoded pod id.
|
||||
Supports both OpenAI-compatible and raw Ollama backends.
|
||||
"""
|
||||
import requests
|
||||
import time
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from datetime import datetime
|
||||
|
||||
# Configuration
|
||||
CONFIG = {
|
||||
"pod_id": "8lfr3j47a5r3gn",
|
||||
"endpoint": "https://8lfr3j47a5r3gn-11434.proxy.runpod.net",
|
||||
"cost_per_hour": 0.79,
|
||||
"model": "gemma3:27b",
|
||||
"max_response_time": 30, # seconds
|
||||
"timeout": 10
|
||||
}
|
||||
import requests
|
||||
|
||||
class PodVerifier:
|
||||
def __init__(self, config=None):
|
||||
self.config = config or CONFIG
|
||||
self.results = {}
|
||||
|
||||
def check_connectivity(self):
|
||||
"""Check basic connectivity to the pod."""
|
||||
print(f"[{datetime.now().isoformat()}] Checking connectivity to {self.config['endpoint']}...")
|
||||
from scripts.big_brain_provider import (
|
||||
build_generate_payload,
|
||||
resolve_big_brain_provider,
|
||||
resolve_generate_url,
|
||||
resolve_models_url,
|
||||
)
|
||||
|
||||
|
||||
class ProviderVerifier:
|
||||
def __init__(self, provider: dict | None = None, timeout: int = 10, max_response_time: int = 30):
|
||||
self.provider = provider or resolve_big_brain_provider()
|
||||
self.timeout = timeout
|
||||
self.max_response_time = max_response_time
|
||||
self.results: dict[str, object] = {}
|
||||
|
||||
def _headers(self) -> dict[str, str]:
|
||||
headers = {"Content-Type": "application/json"}
|
||||
api_key = self.provider.get("api_key", "")
|
||||
if api_key:
|
||||
headers["Authorization"] = f"Bearer {api_key}"
|
||||
return headers
|
||||
|
||||
def check_models(self):
|
||||
url = resolve_models_url(self.provider)
|
||||
print(f"[{datetime.now().isoformat()}] Checking models endpoint: {url}")
|
||||
try:
|
||||
response = requests.get(self.config['endpoint'], timeout=self.config['timeout'])
|
||||
print(f" Status: {response.status_code}")
|
||||
print(f" Headers: {dict(response.headers)}")
|
||||
return response.status_code
|
||||
except requests.exceptions.ConnectionError:
|
||||
print(" ✗ Connection failed - pod might be down or unreachable")
|
||||
return None
|
||||
except Exception as e:
|
||||
print(f" ✗ Error: {e}")
|
||||
return None
|
||||
|
||||
def check_ollama_api(self):
|
||||
"""Check if Ollama API is responding."""
|
||||
print(f"[{datetime.now().isoformat()}] Checking Ollama API...")
|
||||
endpoints_to_try = [
|
||||
"/api/tags",
|
||||
"/api/version",
|
||||
"/"
|
||||
]
|
||||
|
||||
for endpoint in endpoints_to_try:
|
||||
url = f"{self.config['endpoint']}{endpoint}"
|
||||
try:
|
||||
print(f" Trying {url}...")
|
||||
response = requests.get(url, timeout=self.config['timeout'])
|
||||
print(f" Status: {response.status_code}")
|
||||
if response.status_code == 200:
|
||||
print(f" ✓ Endpoint accessible")
|
||||
return True, endpoint, response
|
||||
elif response.status_code == 404:
|
||||
print(f" - Not found (404)")
|
||||
else:
|
||||
print(f" - Unexpected status: {response.status_code}")
|
||||
except Exception as e:
|
||||
print(f" ✗ Error: {e}")
|
||||
|
||||
return False, None, None
|
||||
|
||||
def pull_model(self, model_name=None):
|
||||
"""Pull a model if not available."""
|
||||
model = model_name or self.config['model']
|
||||
print(f"[{datetime.now().isoformat()}] Pulling model {model}...")
|
||||
try:
|
||||
payload = {"name": model}
|
||||
response = requests.post(
|
||||
f"{self.config['endpoint']}/api/pull",
|
||||
json=payload,
|
||||
timeout=60
|
||||
)
|
||||
if response.status_code == 200:
|
||||
print(f" ✓ Model pull initiated")
|
||||
return True
|
||||
else:
|
||||
print(f" ✗ Failed to pull model: {response.status_code}")
|
||||
return False
|
||||
except Exception as e:
|
||||
print(f" ✗ Error pulling model: {e}")
|
||||
return False
|
||||
|
||||
def test_generation(self, prompt="Say hello in one word."):
|
||||
"""Test generation with the model."""
|
||||
print(f"[{datetime.now().isoformat()}] Testing generation...")
|
||||
try:
|
||||
payload = {
|
||||
"model": self.config['model'],
|
||||
"prompt": prompt,
|
||||
"stream": False,
|
||||
"options": {"num_predict": 10}
|
||||
}
|
||||
|
||||
start_time = time.time()
|
||||
response = requests.post(
|
||||
f"{self.config['endpoint']}/api/generate",
|
||||
json=payload,
|
||||
timeout=self.config['max_response_time']
|
||||
)
|
||||
elapsed = time.time() - start_time
|
||||
|
||||
response = requests.get(url, headers=self._headers(), timeout=self.timeout)
|
||||
models = []
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
response_text = data.get("response", "").strip()
|
||||
print(f" ✓ Generation successful in {elapsed:.2f}s")
|
||||
print(f" Response: {response_text[:100]}...")
|
||||
|
||||
if elapsed <= self.config['max_response_time']:
|
||||
print(f" ✓ Response time within limit ({self.config['max_response_time']}s)")
|
||||
return True, elapsed, response_text
|
||||
if self.provider["backend"] == "openai":
|
||||
models = [m.get("id", "") for m in data.get("data", [])]
|
||||
else:
|
||||
print(f" ✗ Response time {elapsed:.2f}s exceeds limit")
|
||||
return False, elapsed, response_text
|
||||
models = [m.get("name", "") for m in data.get("models", [])]
|
||||
print(f" ✓ Models endpoint OK ({response.status_code})")
|
||||
else:
|
||||
print(f" ✗ Generation failed: {response.status_code}")
|
||||
return False, 0, ""
|
||||
print(f" ✗ Models endpoint failed ({response.status_code})")
|
||||
return response.status_code == 200, models, response.status_code
|
||||
except Exception as e:
|
||||
print(f" ✗ Error during generation: {e}")
|
||||
return False, 0, ""
|
||||
|
||||
def run_verification(self):
|
||||
"""Run full verification suite."""
|
||||
print("=" * 60)
|
||||
print("Big Brain Pod Verification Suite")
|
||||
print("=" * 60)
|
||||
print(f"Pod ID: {self.config['pod_id']}")
|
||||
print(f"Endpoint: {self.config['endpoint']}")
|
||||
print(f"Model: {self.config['model']}")
|
||||
print(f"Cost: ${self.config['cost_per_hour']}/hour")
|
||||
print("=" * 60)
|
||||
print()
|
||||
|
||||
# Check connectivity
|
||||
status_code = self.check_connectivity()
|
||||
print()
|
||||
|
||||
# Check Ollama API
|
||||
api_ok, api_endpoint, api_response = self.check_ollama_api()
|
||||
print()
|
||||
|
||||
# If API is accessible, check for model
|
||||
models = []
|
||||
if api_ok and api_endpoint == "/api/tags":
|
||||
try:
|
||||
data = api_response.json()
|
||||
models = [m.get("name", "") for m in data.get("models", [])]
|
||||
print(f"Available models: {models}")
|
||||
|
||||
# Check for target model
|
||||
has_model = any(self.config['model'] in m.lower() for m in models)
|
||||
if not has_model:
|
||||
print(f"Model {self.config['model']} not found. Attempting to pull...")
|
||||
self.pull_model()
|
||||
print(f" ✗ Models endpoint error: {e}")
|
||||
return False, [], None
|
||||
|
||||
def test_generation(self, prompt: str = "Say READY"):
|
||||
url = resolve_generate_url(self.provider)
|
||||
payload = build_generate_payload(self.provider, prompt=prompt)
|
||||
print(f"[{datetime.now().isoformat()}] Testing generation endpoint: {url}")
|
||||
try:
|
||||
response = requests.post(url, headers=self._headers(), json=payload, timeout=self.max_response_time)
|
||||
text = ""
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
if self.provider["backend"] == "openai":
|
||||
text = data.get("choices", [{}])[0].get("message", {}).get("content", "").strip()
|
||||
else:
|
||||
print(f"✓ Model {self.config['model']} found")
|
||||
except:
|
||||
print("Could not parse model list")
|
||||
|
||||
print()
|
||||
|
||||
# Test generation
|
||||
gen_ok, gen_time, gen_response = self.test_generation()
|
||||
print()
|
||||
|
||||
# Summary
|
||||
text = data.get("response", "").strip()
|
||||
print(f" ✓ Generation OK ({response.status_code})")
|
||||
else:
|
||||
print(f" ✗ Generation failed ({response.status_code})")
|
||||
return response.status_code == 200, text, response.status_code
|
||||
except Exception as e:
|
||||
print(f" ✗ Generation error: {e}")
|
||||
return False, "", None
|
||||
|
||||
def run_verification(self):
|
||||
print("=" * 60)
|
||||
print("VERIFICATION SUMMARY")
|
||||
print("Big Brain Provider Verification Suite")
|
||||
print("=" * 60)
|
||||
print(f"Provider: {self.provider['name']}")
|
||||
print(f"Backend: {self.provider['backend']}")
|
||||
print(f"Base URL: {self.provider['base_url']}")
|
||||
print(f"Model: {self.provider['model']}")
|
||||
print("=" * 60)
|
||||
print(f"Connectivity: {'✓' if status_code else '✗'}")
|
||||
print(f"Ollama API: {'✓' if api_ok else '✗'}")
|
||||
print(f"Generation: {'✓' if gen_ok else '✗'}")
|
||||
print(f"Response time: {gen_time:.2f}s (limit: {self.config['max_response_time']}s)")
|
||||
print()
|
||||
|
||||
overall_ok = api_ok and gen_ok
|
||||
print(f"Overall Status: {'✓ POD LIVE' if overall_ok else '✗ POD ISSUES'}")
|
||||
|
||||
# Save results
|
||||
|
||||
models_ok, models, models_status = self.check_models()
|
||||
print()
|
||||
gen_ok, gen_response, gen_status = self.test_generation()
|
||||
print()
|
||||
|
||||
overall_ok = models_ok and gen_ok
|
||||
self.results = {
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
"pod_id": self.config['pod_id'],
|
||||
"endpoint": self.config['endpoint'],
|
||||
"connectivity_status": status_code,
|
||||
"api_accessible": api_ok,
|
||||
"api_endpoint": api_endpoint,
|
||||
"provider": self.provider,
|
||||
"models_ok": models_ok,
|
||||
"models_status": models_status,
|
||||
"models": models,
|
||||
"generation_ok": gen_ok,
|
||||
"generation_time": gen_time,
|
||||
"generation_response": gen_response[:200] if gen_response else "",
|
||||
"generation_status": gen_status,
|
||||
"generation_response": gen_response[:200],
|
||||
"overall_ok": overall_ok,
|
||||
"cost_per_hour": self.config['cost_per_hour']
|
||||
}
|
||||
|
||||
with open("pod_verification_results.json", "w") as f:
|
||||
json.dump(self.results, f, indent=2)
|
||||
|
||||
|
||||
print("=" * 60)
|
||||
print(f"Overall Status: {'✓ PROVIDER LIVE' if overall_ok else '✗ PROVIDER ISSUES'}")
|
||||
print("Results saved to pod_verification_results.json")
|
||||
return overall_ok
|
||||
|
||||
|
||||
def main():
|
||||
verifier = PodVerifier()
|
||||
verifier = ProviderVerifier()
|
||||
success = verifier.run_verification()
|
||||
sys.exit(0 if success else 1)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
72
scripts/big_brain_provider.py
Normal file
72
scripts/big_brain_provider.py
Normal file
@@ -0,0 +1,72 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import yaml
|
||||
|
||||
|
||||
DEFAULT_CONFIG_PATH = Path(__file__).resolve().parents[1] / "config.yaml"
|
||||
|
||||
|
||||
def _normalize_base_url(base_url: str) -> str:
|
||||
return (base_url or "").rstrip("/")
|
||||
|
||||
|
||||
def load_big_brain_provider(config_path: str | Path = DEFAULT_CONFIG_PATH) -> dict[str, Any]:
|
||||
config = yaml.safe_load(Path(config_path).read_text()) or {}
|
||||
for provider in config.get("custom_providers", []):
|
||||
if provider.get("name") == "Big Brain":
|
||||
return dict(provider)
|
||||
raise KeyError("Big Brain provider not found in config")
|
||||
|
||||
|
||||
def infer_backend(base_url: str) -> str:
|
||||
base = _normalize_base_url(base_url)
|
||||
return "openai" if base.endswith("/v1") else "ollama"
|
||||
|
||||
|
||||
def resolve_big_brain_provider(config_path: str | Path = DEFAULT_CONFIG_PATH) -> dict[str, Any]:
|
||||
provider = load_big_brain_provider(config_path)
|
||||
base_url = _normalize_base_url(os.environ.get("BIG_BRAIN_BASE_URL", provider.get("base_url", "")))
|
||||
model = os.environ.get("BIG_BRAIN_MODEL", provider.get("model", "gemma4:latest"))
|
||||
backend = os.environ.get("BIG_BRAIN_BACKEND", infer_backend(base_url))
|
||||
api_key = os.environ.get("BIG_BRAIN_API_KEY", provider.get("api_key", ""))
|
||||
return {
|
||||
"name": provider.get("name", "Big Brain"),
|
||||
"base_url": base_url,
|
||||
"model": model,
|
||||
"backend": backend,
|
||||
"api_key": api_key,
|
||||
}
|
||||
|
||||
|
||||
def resolve_models_url(provider: dict[str, Any]) -> str:
|
||||
base = _normalize_base_url(provider["base_url"])
|
||||
if provider["backend"] == "openai":
|
||||
return f"{base}/models"
|
||||
return f"{base}/api/tags"
|
||||
|
||||
|
||||
def resolve_generate_url(provider: dict[str, Any]) -> str:
|
||||
base = _normalize_base_url(provider["base_url"])
|
||||
if provider["backend"] == "openai":
|
||||
return f"{base}/chat/completions"
|
||||
return f"{base}/api/generate"
|
||||
|
||||
|
||||
def build_generate_payload(provider: dict[str, Any], prompt: str = "Say READY") -> dict[str, Any]:
|
||||
if provider["backend"] == "openai":
|
||||
return {
|
||||
"model": provider["model"],
|
||||
"messages": [{"role": "user", "content": prompt}],
|
||||
"stream": False,
|
||||
"max_tokens": 32,
|
||||
}
|
||||
return {
|
||||
"model": provider["model"],
|
||||
"prompt": prompt,
|
||||
"stream": False,
|
||||
"options": {"num_predict": 32},
|
||||
}
|
||||
@@ -1,176 +1,133 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Big Brain Pod Verification Script
|
||||
Verifies that the Big Brain pod is live with gemma3:27b model.
|
||||
Issue #573: [BIG-BRAIN] Verify pod live: gemma3:27b pulled and responding
|
||||
Big Brain provider verification.
|
||||
|
||||
Verifies that the Big Brain provider configured for Mac Hermes is reachable and
|
||||
can answer a simple prompt. Supports both:
|
||||
- OpenAI-compatible endpoints (`.../v1/models`, `.../v1/chat/completions`)
|
||||
- Raw Ollama endpoints (`/api/tags`, `/api/generate`)
|
||||
|
||||
Refs: timmy-home #543
|
||||
"""
|
||||
import requests
|
||||
import time
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import sys
|
||||
import time
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
# Pod configuration
|
||||
POD_ID = "8lfr3j47a5r3gn"
|
||||
ENDPOINT = f"https://{POD_ID}-11434.proxy.runpod.net"
|
||||
COST_PER_HOUR = 0.79 # USD
|
||||
import requests
|
||||
|
||||
def check_api_tags():
|
||||
"""Check if gemma3:27b is in the model list."""
|
||||
print(f"[{datetime.now().isoformat()}] Checking /api/tags endpoint...")
|
||||
from scripts.big_brain_provider import (
|
||||
build_generate_payload,
|
||||
resolve_big_brain_provider,
|
||||
resolve_generate_url,
|
||||
resolve_models_url,
|
||||
)
|
||||
|
||||
RESULTS_PATH = Path("big_brain_verification.json")
|
||||
|
||||
|
||||
def _headers(provider: dict[str, str]) -> dict[str, str]:
|
||||
headers = {"Content-Type": "application/json"}
|
||||
api_key = provider.get("api_key", "")
|
||||
if api_key:
|
||||
headers["Authorization"] = f"Bearer {api_key}"
|
||||
return headers
|
||||
|
||||
|
||||
def check_models(provider: dict[str, str], timeout: int = 10) -> tuple[bool, float, list[str], int | None]:
|
||||
url = resolve_models_url(provider)
|
||||
started = time.time()
|
||||
try:
|
||||
start_time = time.time()
|
||||
response = requests.get(f"{ENDPOINT}/api/tags", timeout=10)
|
||||
elapsed = time.time() - start_time
|
||||
|
||||
print(f" Response status: {response.status_code}")
|
||||
print(f" Response headers: {dict(response.headers)}")
|
||||
|
||||
response = requests.get(url, headers=_headers(provider), timeout=timeout)
|
||||
elapsed = time.time() - started
|
||||
models: list[str] = []
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
models = [model.get("name", "") for model in data.get("models", [])]
|
||||
print(f" ✓ API responded in {elapsed:.2f}s")
|
||||
print(f" Available models: {models}")
|
||||
|
||||
# Check for gemma3:27b
|
||||
has_gemma = any("gemma3:27b" in model.lower() for model in models)
|
||||
if has_gemma:
|
||||
print(" ✓ gemma3:27b found in model list")
|
||||
return True, elapsed, models
|
||||
if provider["backend"] == "openai":
|
||||
models = [m.get("id", "") for m in data.get("data", [])]
|
||||
else:
|
||||
print(" ✗ gemma3:27b NOT found in model list")
|
||||
return False, elapsed, models
|
||||
elif response.status_code == 404:
|
||||
print(f" ✗ API endpoint not found (404)")
|
||||
print(f" This might mean Ollama is not running or endpoint is wrong")
|
||||
print(f" Trying to ping the server...")
|
||||
try:
|
||||
ping_response = requests.get(f"{ENDPOINT}/", timeout=5)
|
||||
print(f" Ping response: {ping_response.status_code}")
|
||||
except:
|
||||
print(" Ping failed - server unreachable")
|
||||
return False, elapsed, []
|
||||
else:
|
||||
print(f" ✗ API returned status {response.status_code}")
|
||||
return False, elapsed, []
|
||||
except Exception as e:
|
||||
print(f" ✗ Error checking API tags: {e}")
|
||||
return False, 0, []
|
||||
models = [m.get("name", "") for m in data.get("models", [])]
|
||||
return response.status_code == 200, elapsed, models, response.status_code
|
||||
except Exception:
|
||||
elapsed = time.time() - started
|
||||
return False, elapsed, [], None
|
||||
|
||||
def test_generate():
|
||||
"""Test generate endpoint with a simple prompt."""
|
||||
print(f"[{datetime.now().isoformat()}] Testing /api/generate endpoint...")
|
||||
|
||||
def test_generation(provider: dict[str, str], prompt: str = "Say READY", timeout: int = 30) -> tuple[bool, float, str, int | None]:
|
||||
url = resolve_generate_url(provider)
|
||||
payload = build_generate_payload(provider, prompt=prompt)
|
||||
started = time.time()
|
||||
try:
|
||||
payload = {
|
||||
"model": "gemma3:27b",
|
||||
"prompt": "Say hello in one word.",
|
||||
"stream": False,
|
||||
"options": {
|
||||
"num_predict": 10
|
||||
}
|
||||
}
|
||||
|
||||
start_time = time.time()
|
||||
response = requests.post(
|
||||
f"{ENDPOINT}/api/generate",
|
||||
json=payload,
|
||||
timeout=30
|
||||
)
|
||||
elapsed = time.time() - start_time
|
||||
|
||||
response = requests.post(url, headers=_headers(provider), json=payload, timeout=timeout)
|
||||
elapsed = time.time() - started
|
||||
response_text = ""
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
response_text = data.get("response", "").strip()
|
||||
print(f" ✓ Generate responded in {elapsed:.2f}s")
|
||||
print(f" Response: {response_text[:100]}...")
|
||||
|
||||
if elapsed < 30:
|
||||
print(" ✓ Response time under 30 seconds")
|
||||
return True, elapsed, response_text
|
||||
if provider["backend"] == "openai":
|
||||
response_text = (
|
||||
data.get("choices", [{}])[0]
|
||||
.get("message", {})
|
||||
.get("content", "")
|
||||
.strip()
|
||||
)
|
||||
else:
|
||||
print(f" ✗ Response time {elapsed:.2f}s exceeds 30s limit")
|
||||
return False, elapsed, response_text
|
||||
else:
|
||||
print(f" ✗ Generate returned status {response.status_code}")
|
||||
return False, elapsed, ""
|
||||
except Exception as e:
|
||||
print(f" ✗ Error testing generate: {e}")
|
||||
return False, 0, ""
|
||||
response_text = data.get("response", "").strip()
|
||||
return response.status_code == 200, elapsed, response_text, response.status_code
|
||||
except Exception:
|
||||
elapsed = time.time() - started
|
||||
return False, elapsed, "", None
|
||||
|
||||
def check_uptime():
|
||||
"""Estimate uptime based on pod creation (simplified)."""
|
||||
# In a real implementation, we'd check RunPod API for pod start time
|
||||
# For now, we'll just log the check time
|
||||
check_time = datetime.now()
|
||||
print(f"[{check_time.isoformat()}] Pod verification timestamp")
|
||||
return check_time
|
||||
|
||||
def main():
|
||||
def main() -> int:
|
||||
provider = resolve_big_brain_provider()
|
||||
|
||||
print("=" * 60)
|
||||
print("Big Brain Pod Verification")
|
||||
print(f"Pod ID: {POD_ID}")
|
||||
print(f"Endpoint: {ENDPOINT}")
|
||||
print(f"Cost: ${COST_PER_HOUR}/hour")
|
||||
print("Big Brain Provider Verification")
|
||||
print(f"Timestamp: {datetime.now().isoformat()}")
|
||||
print(f"Provider: {provider['name']}")
|
||||
print(f"Backend: {provider['backend']}")
|
||||
print(f"Base URL: {provider['base_url']}")
|
||||
print(f"Model: {provider['model']}")
|
||||
print("=" * 60)
|
||||
print()
|
||||
|
||||
# Check uptime
|
||||
check_time = check_uptime()
|
||||
|
||||
models_ok, models_time, models, models_status = check_models(provider)
|
||||
print(f"Models endpoint: {'PASS' if models_ok else 'FAIL'} ({models_time:.2f}s, status={models_status})")
|
||||
if models:
|
||||
print(f"Models seen: {models}")
|
||||
print()
|
||||
|
||||
# Check API tags
|
||||
tags_ok, tags_time, models = check_api_tags()
|
||||
|
||||
gen_ok, gen_time, gen_response, gen_status = test_generation(provider)
|
||||
print(f"Generation endpoint: {'PASS' if gen_ok else 'FAIL'} ({gen_time:.2f}s, status={gen_status})")
|
||||
if gen_response:
|
||||
print(f"Response preview: {gen_response[:120]}")
|
||||
print()
|
||||
|
||||
# Test generate
|
||||
generate_ok, generate_time, response = test_generate()
|
||||
print()
|
||||
|
||||
# Summary
|
||||
print("=" * 60)
|
||||
print("VERIFICATION SUMMARY")
|
||||
print("=" * 60)
|
||||
print(f"API Tags Check: {'✓ PASS' if tags_ok else '✗ FAIL'}")
|
||||
print(f" Response time: {tags_time:.2f}s")
|
||||
print(f" Models found: {len(models)}")
|
||||
print()
|
||||
print(f"Generate Test: {'✓ PASS' if generate_ok else '✗ FAIL'}")
|
||||
print(f" Response time: {generate_time:.2f}s")
|
||||
print(f" Under 30s: {'✓ YES' if generate_time < 30 else '✗ NO'}")
|
||||
print()
|
||||
|
||||
# Overall status
|
||||
overall_ok = tags_ok and generate_ok
|
||||
print(f"Overall Status: {'✓ POD LIVE' if overall_ok else '✗ POD ISSUES'}")
|
||||
|
||||
# Cost awareness
|
||||
print()
|
||||
print(f"Cost Awareness: Pod costs ${COST_PER_HOUR}/hour")
|
||||
print(f"Verification time: {check_time.strftime('%Y-%m-%d %H:%M:%S')}")
|
||||
|
||||
# Write results to file
|
||||
results = {
|
||||
"pod_id": POD_ID,
|
||||
"endpoint": ENDPOINT,
|
||||
"timestamp": check_time.isoformat(),
|
||||
"api_tags_ok": tags_ok,
|
||||
"api_tags_time": tags_time,
|
||||
|
||||
overall_ok = models_ok and gen_ok
|
||||
result = {
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
"provider_name": provider["name"],
|
||||
"backend": provider["backend"],
|
||||
"base_url": provider["base_url"],
|
||||
"model": provider["model"],
|
||||
"models_ok": models_ok,
|
||||
"models_status": models_status,
|
||||
"models_time": models_time,
|
||||
"models": models,
|
||||
"generate_ok": generate_ok,
|
||||
"generate_time": generate_time,
|
||||
"generate_response": response[:200] if response else "",
|
||||
"generation_ok": gen_ok,
|
||||
"generation_status": gen_status,
|
||||
"generation_time": gen_time,
|
||||
"generation_response": gen_response[:200],
|
||||
"overall_ok": overall_ok,
|
||||
"cost_per_hour": COST_PER_HOUR
|
||||
}
|
||||
|
||||
with open("big_brain_verification.json", "w") as f:
|
||||
json.dump(results, f, indent=2)
|
||||
|
||||
print()
|
||||
print("Results saved to big_brain_verification.json")
|
||||
|
||||
# Exit with appropriate code
|
||||
sys.exit(0 if overall_ok else 1)
|
||||
RESULTS_PATH.write_text(json.dumps(result, indent=2))
|
||||
print(f"Results saved to {RESULTS_PATH}")
|
||||
print(f"Overall: {'POD/PROVIDER LIVE' if overall_ok else 'PROVIDER ISSUES'}")
|
||||
return 0 if overall_ok else 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
raise SystemExit(main())
|
||||
|
||||
100
tests/test_big_brain_provider.py
Normal file
100
tests/test_big_brain_provider.py
Normal file
@@ -0,0 +1,100 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
import yaml
|
||||
|
||||
from scripts.big_brain_provider import (
|
||||
build_generate_payload,
|
||||
infer_backend,
|
||||
load_big_brain_provider,
|
||||
resolve_big_brain_provider,
|
||||
resolve_models_url,
|
||||
resolve_generate_url,
|
||||
)
|
||||
|
||||
|
||||
def test_load_big_brain_provider_from_config(tmp_path: Path) -> None:
|
||||
cfg = tmp_path / "config.yaml"
|
||||
cfg.write_text(
|
||||
yaml.safe_dump(
|
||||
{
|
||||
"custom_providers": [
|
||||
{"name": "Local Ollama", "base_url": "http://localhost:11434/v1", "model": "qwen3:30b"},
|
||||
{"name": "Big Brain", "base_url": "https://pod-11434.proxy.runpod.net/v1", "model": "gemma4:latest"},
|
||||
]
|
||||
}
|
||||
)
|
||||
)
|
||||
|
||||
provider = load_big_brain_provider(cfg)
|
||||
|
||||
assert provider["name"] == "Big Brain"
|
||||
assert provider["base_url"] == "https://pod-11434.proxy.runpod.net/v1"
|
||||
assert provider["model"] == "gemma4:latest"
|
||||
|
||||
|
||||
def test_infer_backend_distinguishes_openai_compat_from_ollama() -> None:
|
||||
assert infer_backend("https://pod-11434.proxy.runpod.net/v1") == "openai"
|
||||
assert infer_backend("http://localhost:11434") == "ollama"
|
||||
|
||||
|
||||
def test_resolve_big_brain_provider_prefers_env_overrides(tmp_path: Path, monkeypatch) -> None:
|
||||
cfg = tmp_path / "config.yaml"
|
||||
cfg.write_text(
|
||||
yaml.safe_dump(
|
||||
{
|
||||
"custom_providers": [
|
||||
{"name": "Big Brain", "base_url": "https://old-endpoint/v1", "model": "gemma3:27b"}
|
||||
]
|
||||
}
|
||||
)
|
||||
)
|
||||
|
||||
monkeypatch.setenv("BIG_BRAIN_BASE_URL", "https://vertex-proxy.example/v1")
|
||||
monkeypatch.setenv("BIG_BRAIN_MODEL", "gemma4:latest")
|
||||
monkeypatch.setenv("BIG_BRAIN_BACKEND", "openai")
|
||||
|
||||
provider = resolve_big_brain_provider(cfg)
|
||||
|
||||
assert provider["base_url"] == "https://vertex-proxy.example/v1"
|
||||
assert provider["model"] == "gemma4:latest"
|
||||
assert provider["backend"] == "openai"
|
||||
|
||||
|
||||
def test_openai_compat_urls_and_payload() -> None:
|
||||
provider = {"base_url": "https://pod.proxy.runpod.net/v1", "model": "gemma4:latest", "backend": "openai"}
|
||||
|
||||
assert resolve_models_url(provider) == "https://pod.proxy.runpod.net/v1/models"
|
||||
assert resolve_generate_url(provider) == "https://pod.proxy.runpod.net/v1/chat/completions"
|
||||
|
||||
payload = build_generate_payload(provider, prompt="Say READY")
|
||||
assert payload["model"] == "gemma4:latest"
|
||||
assert payload["messages"][0]["content"] == "Say READY"
|
||||
assert payload["stream"] is False
|
||||
assert payload["max_tokens"] == 32
|
||||
|
||||
|
||||
def test_ollama_urls_and_payload() -> None:
|
||||
provider = {"base_url": "http://localhost:11434", "model": "gemma4:latest", "backend": "ollama"}
|
||||
|
||||
assert resolve_models_url(provider) == "http://localhost:11434/api/tags"
|
||||
assert resolve_generate_url(provider) == "http://localhost:11434/api/generate"
|
||||
|
||||
payload = build_generate_payload(provider, prompt="Say READY")
|
||||
assert payload == {"model": "gemma4:latest", "prompt": "Say READY", "stream": False, "options": {"num_predict": 32}}
|
||||
|
||||
|
||||
def test_repo_config_big_brain_is_gemma4_not_hardcoded_dead_pod() -> None:
|
||||
config = Path("config.yaml").read_text()
|
||||
assert "- name: Big Brain" in config
|
||||
assert "model: gemma4:latest" in config
|
||||
assert "8lfr3j47a5r3gn-11434.proxy.runpod.net" not in config
|
||||
|
||||
|
||||
def test_big_brain_readme_mentions_runpod_and_vertex() -> None:
|
||||
readme = Path("scripts/README_big_brain.md").read_text()
|
||||
assert "RunPod" in readme
|
||||
assert "Vertex AI" in readme
|
||||
assert "gemma4:latest" in readme
|
||||
Reference in New Issue
Block a user