[SPECTRUM] GCP Vertex AI MaaS — Gemma 4 Serverless Deployment Path #2

Closed
opened 2026-04-03 18:50:40 +00:00 by ezra · 1 comment
Owner

GCP Vertex AI MaaS — Gemma 4 Deployment Guide

Source: Google AI conversation re: serverless Gemma 4 endpoints for Hermes agent integration
Triage Date: 2025-04-03
Priority: HIGH — Blocked deployment path for SPECTRUM initiative


Overview

Gemma 4 is now available via Vertex AI Model-as-a-Service (MaaS) with serverless, pay-as-you-go billing that draws directly from GCP promotional/commitment balance. This provides a zero-cold-start alternative to local llama-server deployment.


1. Enable the Gemma 4 MaaS Endpoint

  1. Go to Google Cloud Console
  2. Navigate to Vertex AI > Model Garden
  3. Search for Gemma 4 (variants: E4B, 9B, 27B)
  4. Click model card → Enable under Managed API (Serverless)
    • Select "Pay-as-you-go" to use cloud credits/balance

2. Hermes Agent Configuration (OpenAI-Compatible)

Setting Value
Base URL https://{REGION}-aiplatform.googleapis.com/v1beta1/projects/{PROJECT_ID}/locations/{REGION}/publishers/google/models
Region us-central1 (recommended)
Model gemma-4-e4b-it (or 9B/27B variant)
Auth Service Account Token

3. Authentication Options

Option A: Temporary CLI Token (Testing)

gcloud auth print-access-token
  • Paste into Hermes "API Key" field
  • ⚠️ Expires in 1 hour

Option B: Service Account (Production)

  1. IAM & Admin > Service Accounts
  2. Create SA with Vertex AI User role
  3. Generate JSON key
  4. If Hermes supports env vars:
    export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json"
    
  5. If Hermes requires string API key → use LiteLLM proxy:
    pip install 'litellm[proxy]'
    export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json"
    litellm --model vertex_ai/gemma-4-e4b-it
    
    Then point Hermes to http://localhost:4000

Why This Path for SPECTRUM

Factor Benefit
Agentic Reasoning Gemma 4 optimized for multi-step planning/tool-calling
Zero Cold Start Serverless MaaS = always-on vs GKE/Vertex Endpoint minutes
Cost E4B variant efficient for high-token loops; draws from GCP balance
Billing Direct from promotional/commitment credits

Blockers / Dependencies

  • Verify GCP project has Vertex AI API enabled
  • Confirm service account has roles/aiplatform.user
  • Test LiteLLM proxy if Hermes doesn't support GCP auth natively
  • Validate token budget vs expected inference volume

  • Memory: Gemma 4 GGUF downloaded at /mnt/gemma4/gemma-4-31B-it-Q4_K_M.gguf
  • Blocked: llama.cpp builds lacking Gemma 4 arch support
  • Alternative: This GCP path bypasses local inference blockers entirely

Next Step: Evaluate LiteLLM proxy integration vs native GCP auth in Hermes agent.

## GCP Vertex AI MaaS — Gemma 4 Deployment Guide **Source:** Google AI conversation re: serverless Gemma 4 endpoints for Hermes agent integration **Triage Date:** 2025-04-03 **Priority:** HIGH — Blocked deployment path for SPECTRUM initiative --- ### Overview Gemma 4 is now available via Vertex AI Model-as-a-Service (MaaS) with serverless, pay-as-you-go billing that draws directly from GCP promotional/commitment balance. This provides a zero-cold-start alternative to local llama-server deployment. --- ### 1. Enable the Gemma 4 MaaS Endpoint 1. Go to [Google Cloud Console](https://console.cloud.google.com/) 2. Navigate to **Vertex AI > Model Garden** 3. Search for **Gemma 4** (variants: E4B, 9B, 27B) 4. Click model card → **Enable** under **Managed API (Serverless)** - Select **"Pay-as-you-go"** to use cloud credits/balance --- ### 2. Hermes Agent Configuration (OpenAI-Compatible) | Setting | Value | |---------|-------| | Base URL | `https://{REGION}-aiplatform.googleapis.com/v1beta1/projects/{PROJECT_ID}/locations/{REGION}/publishers/google/models` | | Region | `us-central1` (recommended) | | Model | `gemma-4-e4b-it` (or 9B/27B variant) | | Auth | Service Account Token | --- ### 3. Authentication Options #### Option A: Temporary CLI Token (Testing) ```bash gcloud auth print-access-token ``` - Paste into Hermes "API Key" field - ⚠️ Expires in 1 hour #### Option B: Service Account (Production) 1. **IAM & Admin > Service Accounts** 2. Create SA with **Vertex AI User** role 3. Generate JSON key 4. If Hermes supports env vars: ```bash export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json" ``` 5. If Hermes requires string API key → use LiteLLM proxy: ```bash pip install 'litellm[proxy]' export GOOGLE_APPLICATION_CREDENTIALS="path/to/key.json" litellm --model vertex_ai/gemma-4-e4b-it ``` Then point Hermes to `http://localhost:4000` --- ### Why This Path for SPECTRUM | Factor | Benefit | |--------|---------| | **Agentic Reasoning** | Gemma 4 optimized for multi-step planning/tool-calling | | **Zero Cold Start** | Serverless MaaS = always-on vs GKE/Vertex Endpoint minutes | | **Cost** | E4B variant efficient for high-token loops; draws from GCP balance | | **Billing** | Direct from promotional/commitment credits | --- ### Blockers / Dependencies - [ ] Verify GCP project has Vertex AI API enabled - [ ] Confirm service account has `roles/aiplatform.user` - [ ] Test LiteLLM proxy if Hermes doesn't support GCP auth natively - [ ] Validate token budget vs expected inference volume --- ### Related - Memory: Gemma 4 GGUF downloaded at `/mnt/gemma4/gemma-4-31B-it-Q4_K_M.gguf` - Blocked: llama.cpp builds lacking Gemma 4 arch support - Alternative: This GCP path bypasses local inference blockers entirely --- **Next Step:** Evaluate LiteLLM proxy integration vs native GCP auth in Hermes agent.
Author
Owner

Burn-down: GCP Vertex BLOCKED (no SA key). Local llama-server is the active path. Closing as DEFERRED.

Burn-down: GCP Vertex BLOCKED (no SA key). Local llama-server is the active path. Closing as DEFERRED.
ezra closed this issue 2026-04-04 12:18:13 +00:00
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: ezra/gemma-spectrum#2