91 lines
2.6 KiB
Markdown
91 lines
2.6 KiB
Markdown
# Big Brain Provider Verification
|
|
|
|
Repo wiring for the `big_brain` provider used by Mac Hermes.
|
|
|
|
## Issue #543
|
|
|
|
[PROVE-IT] Timmy: Wire RunPod/Vertex AI Gemma 4 to Mac Hermes
|
|
|
|
## What this repo now supports
|
|
|
|
The repo no longer hardcodes one dead RunPod pod as the truth.
|
|
Instead, it defines a **Big Brain provider contract**:
|
|
- provider name: `Big Brain`
|
|
- model: `gemma4:latest`
|
|
- endpoint style: OpenAI-compatible `/v1` by default
|
|
- verification path: `scripts/verify_big_brain.py`
|
|
|
|
Supported deployment shapes:
|
|
1. **RunPod + Ollama/OpenAI-compatible bridge**
|
|
- Example base URL: `https://<pod-id>-11434.proxy.runpod.net/v1`
|
|
2. **Vertex AI through an OpenAI-compatible bridge/proxy**
|
|
- Example base URL: `https://<your-bridge-host>/v1`
|
|
|
|
## Config wiring
|
|
|
|
`config.yaml` now carries a generic provider block:
|
|
|
|
```yaml
|
|
- name: Big Brain
|
|
base_url: https://YOUR_BIG_BRAIN_HOST/v1
|
|
api_key: ''
|
|
model: gemma4:latest
|
|
```
|
|
|
|
Override at runtime if needed:
|
|
- `BIG_BRAIN_BASE_URL`
|
|
- `BIG_BRAIN_MODEL`
|
|
- `BIG_BRAIN_BACKEND` (`openai` or `ollama`)
|
|
- `BIG_BRAIN_API_KEY`
|
|
|
|
## Verification scripts
|
|
|
|
### 1. `scripts/verify_big_brain.py`
|
|
Checks the configured provider using the right protocol for the chosen backend.
|
|
|
|
For `openai` backends it verifies:
|
|
- `GET /models`
|
|
- `POST /chat/completions`
|
|
|
|
For `ollama` backends it verifies:
|
|
- `GET /api/tags`
|
|
- `POST /api/generate`
|
|
|
|
Writes:
|
|
- `big_brain_verification.json`
|
|
|
|
### 2. `scripts/big_brain_manager.py`
|
|
A more verbose wrapper over the same provider contract.
|
|
|
|
Writes:
|
|
- `pod_verification_results.json`
|
|
|
|
## Usage
|
|
|
|
```bash
|
|
python3 scripts/verify_big_brain.py
|
|
python3 scripts/big_brain_manager.py
|
|
```
|
|
|
|
## Honest current state
|
|
|
|
On fresh main before this fix, the repo was pointing at a stale RunPod endpoint:
|
|
- `https://8lfr3j47a5r3gn-11434.proxy.runpod.net`
|
|
- verification returned HTTP 404 for both model listing and generation
|
|
|
|
That meant the repo claimed Big Brain wiring existed, but the proof path was stale and tied to a dead specific pod.
|
|
|
|
This fix makes the repo wiring reusable and truthful, but it does **not** provision a fresh paid GPU automatically.
|
|
|
|
## Acceptance mapping
|
|
|
|
What this repo change satisfies:
|
|
- [x] Mac Hermes has a `big_brain` provider contract in `config.yaml`
|
|
- [x] Verification script checks that provider through the same API shape Hermes needs
|
|
- [x] RunPod and Vertex-style wiring are documented without hardcoding a dead pod
|
|
|
|
What still depends on live infrastructure outside the repo:
|
|
- [ ] GPU instance actually provisioned and running
|
|
- [ ] endpoint responsive right now
|
|
- [ ] live `hermes chat --provider big_brain` success against a real endpoint
|