[SPECTRUM] GCP Vertex AI MaaS — Gemma 4 Serverless Deployment Path #2

New Issue

ezra · 2026-04-03T18:50:40Z

ezra commented

2026-04-03 18:50:40 +00:00

GCP Vertex AI MaaS — Gemma 4 Deployment Guide

Source: Google AI conversation re: serverless Gemma 4 endpoints for Hermes agent integration
Triage Date: 2025-04-03
Priority: HIGH — Blocked deployment path for SPECTRUM initiative

Overview

Gemma 4 is now available via Vertex AI Model-as-a-Service (MaaS) with serverless, pay-as-you-go billing that draws directly from GCP promotional/commitment balance. This provides a zero-cold-start alternative to local llama-server deployment.

1. Enable the Gemma 4 MaaS Endpoint

Go to Google Cloud Console
Navigate to Vertex AI > Model Garden
Search for Gemma 4 (variants: E4B, 9B, 27B)
Click model card → Enable under Managed API (Serverless)
- Select "Pay-as-you-go" to use cloud credits/balance

2. Hermes Agent Configuration (OpenAI-Compatible)

Setting	Value
Base URL	`https://{REGION}-aiplatform.googleapis.com/v1beta1/projects/{PROJECT_ID}/locations/{REGION}/publishers/google/models`
Region	`us-central1` (recommended)
Model	`gemma-4-e4b-it` (or 9B/27B variant)
Auth	Service Account Token

3. Authentication Options

Option A: Temporary CLI Token (Testing)

gcloud auth print-access-token

Paste into Hermes "API Key" field
⚠️ Expires in 1 hour

Option B: Service Account (Production)

IAM & Admin > Service Accounts
Create SA with Vertex AI User role
Generate JSON key

If Hermes supports env vars:

export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json"

If Hermes requires string API key → use LiteLLM proxy:

pip install 'litellm[proxy]'
export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json"
litellm --model vertex_ai/gemma-4-e4b-it

Then point Hermes to http://localhost:4000

Why This Path for SPECTRUM

Factor	Benefit
Agentic Reasoning	Gemma 4 optimized for multi-step planning/tool-calling
Zero Cold Start	Serverless MaaS = always-on vs GKE/Vertex Endpoint minutes
Cost	E4B variant efficient for high-token loops; draws from GCP balance
Billing	Direct from promotional/commitment credits

Blockers / Dependencies

Verify GCP project has Vertex AI API enabled
Confirm service account has roles/aiplatform.user
Test LiteLLM proxy if Hermes doesn't support GCP auth natively
Validate token budget vs expected inference volume

Memory: Gemma 4 GGUF downloaded at /mnt/gemma4/gemma-4-31B-it-Q4_K_M.gguf
Blocked: llama.cpp builds lacking Gemma 4 arch support
Alternative: This GCP path bypasses local inference blockers entirely

Next Step: Evaluate LiteLLM proxy integration vs native GCP auth in Hermes agent.

## GCP Vertex AI MaaS — Gemma 4 Deployment Guide **Source:** Google AI conversation re: serverless Gemma 4 endpoints for Hermes agent integration **Triage Date:** 2025-04-03 **Priority:** HIGH — Blocked deployment path for SPECTRUM initiative --- ### Overview Gemma 4 is now available via Vertex AI Model-as-a-Service (MaaS) with serverless, pay-as-you-go billing that draws directly from GCP promotional/commitment balance. This provides a zero-cold-start alternative to local llama-server deployment. --- ### 1. Enable the Gemma 4 MaaS Endpoint 1. Go to [Google Cloud Console](https://console.cloud.google.com/) 2. Navigate to **Vertex AI > Model Garden** 3. Search for **Gemma 4** (variants: E4B, 9B, 27B) 4. Click model card → **Enable** under **Managed API (Serverless)** - Select **"Pay-as-you-go"** to use cloud credits/balance --- ### 2. Hermes Agent Configuration (OpenAI-Compatible) | Setting | Value | |---------|-------| | Base URL | `https://{REGION}-aiplatform.googleapis.com/v1beta1/projects/{PROJECT_ID}/locations/{REGION}/publishers/google/models` | | Region | `us-central1` (recommended) | | Model | `gemma-4-e4b-it` (or 9B/27B variant) | | Auth | Service Account Token | --- ### 3. Authentication Options #### Option A: Temporary CLI Token (Testing) ```bash gcloud auth print-access-token ``` - Paste into Hermes "API Key" field - ⚠️ Expires in 1 hour #### Option B: Service Account (Production) 1. **IAM & Admin > Service Accounts** 2. Create SA with **Vertex AI User** role 3. Generate JSON key 4. If Hermes supports env vars: ```bash export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json" ``` 5. If Hermes requires string API key → use LiteLLM proxy: ```bash pip install 'litellm[proxy]' export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json" litellm --model vertex_ai/gemma-4-e4b-it ``` Then point Hermes to `http://localhost:4000` --- ### Why This Path for SPECTRUM | Factor | Benefit | |--------|---------| | **Agentic Reasoning** | Gemma 4 optimized for multi-step planning/tool-calling | | **Zero Cold Start** | Serverless MaaS = always-on vs GKE/Vertex Endpoint minutes | | **Cost** | E4B variant efficient for high-token loops; draws from GCP balance | | **Billing** | Direct from promotional/commitment credits | --- ### Blockers / Dependencies - [ ] Verify GCP project has Vertex AI API enabled - [ ] Confirm service account has `roles/aiplatform.user` - [ ] Test LiteLLM proxy if Hermes doesn't support GCP auth natively - [ ] Validate token budget vs expected inference volume --- ### Related - Memory: Gemma 4 GGUF downloaded at `/mnt/gemma4/gemma-4-31B-it-Q4_K_M.gguf` - Blocked: llama.cpp builds lacking Gemma 4 arch support - Alternative: This GCP path bypasses local inference blockers entirely --- **Next Step:** Evaluate LiteLLM proxy integration vs native GCP auth in Hermes agent.

ezra commented

2026-04-04 12:18:12 +00:00

Burn-down: GCP Vertex BLOCKED (no SA key). Local llama-server is the active path. Closing as DEFERRED.

ezra closed this issue

2026-04-04 12:18:13 +00:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: ezra/gemma-spectrum#2