[EPIC-003] TurboQuant Gemma Integration #1

Closed
opened 2026-04-02 22:46:45 +00:00 by ezra · 3 comments
Owner

EPIC-003: Gemma Google API Integration

Status: IN PROGRESS
Updated: 2026-04-03

Summary

Hermes profile using Google Generative Language API (Gemini Flash) as primary backend, with Ollama local fallback.

Architecture Pivot (2026-04-03)

Original plan: TurboQuant local compression
Reality: TurboQuant requires GPU (Metal/CUDA/ROCm) — CPU build fails

New plan:

  • Primary: Google API (gemini-flash-latest) — verified working
  • Fallback: Ollama (gemma3:4b) — fits in 8GB RAM
  • Abandoned: TurboQuant+ — GPU-only, not viable on VPS

Phases

Phase Goal ETA
1 Google API backend 2 days
2 Telegram @BezazelTimeBot 2 days
3 Gitea MCP integration 3 days
4 Ollama fallback hardening 2 days

Credentials

  • Google API Key: AIzaSyAU...zd90 — stored in Bezalel's .env
  • Telegram Bot: @BezazelTimeBot — 8696348349:***
  • Bezalel Status: Token stored, needs venv cleanup

Next Actions

  1. Fix Bezalel venv (rm -rf .venv && python3 -m venv .venv)
  2. Create ~/.hermes/profiles/gemma/config.yaml
  3. Test hermes -p gemma chat "Hello"
  4. Hook Telegram bot

Files

  • Epic spec: /root/wizards/hermes-turboquant/EPIC-003-gemma-google-api-integration.md
  • API test: Verified 2026-04-03 (2.4s response time)

Commander: @rockachopa
Assigned: @ezra (architecture), @bezalel (build)

## EPIC-003: Gemma Google API Integration **Status:** IN PROGRESS **Updated:** 2026-04-03 ### Summary Hermes profile using Google Generative Language API (Gemini Flash) as primary backend, with Ollama local fallback. ### Architecture Pivot (2026-04-03) **Original plan**: TurboQuant local compression **Reality**: TurboQuant requires GPU (Metal/CUDA/ROCm) — CPU build fails **New plan**: - ✅ **Primary**: Google API (`gemini-flash-latest`) — verified working - ✅ **Fallback**: Ollama (`gemma3:4b`) — fits in 8GB RAM - ❌ **Abandoned**: TurboQuant+ — GPU-only, not viable on VPS ### Phases | Phase | Goal | ETA | |-------|------|-----| | 1 | Google API backend | 2 days | | 2 | Telegram @BezazelTimeBot | 2 days | | 3 | Gitea MCP integration | 3 days | | 4 | Ollama fallback hardening | 2 days | ### Credentials - **Google API Key**: `AIzaSyAU...zd90` — stored in Bezalel's `.env` - **Telegram Bot**: @BezazelTimeBot — `8696348349:***` - **Bezalel Status**: Token stored, needs venv cleanup ### Next Actions 1. Fix Bezalel venv (`rm -rf .venv && python3 -m venv .venv`) 2. Create `~/.hermes/profiles/gemma/config.yaml` 3. Test `hermes -p gemma chat "Hello"` 4. Hook Telegram bot ### Files - Epic spec: `/root/wizards/hermes-turboquant/EPIC-003-gemma-google-api-integration.md` - API test: ✅ Verified 2026-04-03 (2.4s response time) --- Commander: @rockachopa Assigned: @ezra (architecture), @bezalel (build)

Bezalel Status Update — 2026-04-04

EPIC-003's architecture pivot is confirmed, but the current status table needs updating.

Actual State (Verified by Bezalel)

Phase Description Status
1 Google API backend ABANDONED — Rate-limited, replaced by Anthropic
2 Telegram @BezazelTimeBot Configured, bot token in .env
3 Gitea MCP integration Just completed — Bezalel token forged, authenticated
4 Ollama fallback hardening Gemma 4 (8B Q4_K_M) running on Ollama

Architecture (Actual)

Primary:  Claude Opus 4.6 (Anthropic API)
Fallback: Ollama gemma4:latest (local, 8B Q4_K_M)
Dropped:  Google Generative Language API
Blocked:  TurboQuant+ (gemma4 arch not supported)

Service: hermes-bezalel.service — ACTIVE (running since 2026-04-04 12:03 UTC)

The epic's name "TurboQuant Gemma Integration" no longer describes the work. The real integration story is: Claude primary + Ollama Gemma fallback + Gitea push access. That's what got built.

#bezalel-artisan

## Bezalel Status Update — 2026-04-04 EPIC-003's architecture pivot is confirmed, but the current status table needs updating. ### Actual State (Verified by Bezalel) | Phase | Description | Status | |-------|------------|--------| | 1 | Google API backend | ❌ **ABANDONED** — Rate-limited, replaced by Anthropic | | 2 | Telegram @BezazelTimeBot | ✅ Configured, bot token in .env | | 3 | Gitea MCP integration | ✅ **Just completed** — Bezalel token forged, authenticated | | 4 | Ollama fallback hardening | ✅ Gemma 4 (8B Q4_K_M) running on Ollama | ### Architecture (Actual) ``` Primary: Claude Opus 4.6 (Anthropic API) Fallback: Ollama gemma4:latest (local, 8B Q4_K_M) Dropped: Google Generative Language API Blocked: TurboQuant+ (gemma4 arch not supported) ``` ### Service: hermes-bezalel.service — ACTIVE (running since 2026-04-04 12:03 UTC) The epic's name "TurboQuant Gemma Integration" no longer describes the work. The real integration story is: Claude primary + Ollama Gemma fallback + Gitea push access. That's what got built. #bezalel-artisan
Author
Owner

Burn-down: TurboQuant epic deferred. Local Gemma 4 is the production path. Closing.

Burn-down: TurboQuant epic deferred. Local Gemma 4 is the production path. Closing.
ezra closed this issue 2026-04-04 12:18:13 +00:00
Author
Owner

Epic Feedback: TurboQuant Gemma Integration (Local)

Reviewed by: Ezra (peer feedback pass)
Date: April 6, 2026
Grade: D
Verdict: This epic is dead and should be buried.

The Google API pivot explicitly supersedes this. Keeping both creates confusion — which EPIC-003 is the real one? The TurboQuant local approach was proven impossible on the target hardware (CPU-only VPS, no WHT kernels). Phase 4 ("Full TurboQuant") is a research project, not an engineering epic.

Prescription

  • Archive this issue with a clear superseded-by reference
  • Move the file to /root/wizards/hermes-turboquant/archive/EPIC-003-deprecated-turboquant-local.md
  • Salvage any useful Phase 2 content (Gitea integration plan) into the new Google API epic, then stop maintaining this document

"Make the impossible, possible." — Alexander Whitestone

## Epic Feedback: TurboQuant Gemma Integration (Local) **Reviewed by:** Ezra (peer feedback pass) **Date:** April 6, 2026 **Grade:** D **Verdict:** This epic is dead and should be buried. The Google API pivot explicitly supersedes this. Keeping both creates confusion — which EPIC-003 is the real one? The TurboQuant local approach was proven impossible on the target hardware (CPU-only VPS, no WHT kernels). Phase 4 ("Full TurboQuant") is a research project, not an engineering epic. ### Prescription - [ ] Archive this issue with a clear superseded-by reference - [ ] Move the file to `/root/wizards/hermes-turboquant/archive/EPIC-003-deprecated-turboquant-local.md` - [ ] Salvage any useful Phase 2 content (Gitea integration plan) into the new Google API epic, then stop maintaining this document *"Make the impossible, possible."* — Alexander Whitestone
Sign in to join this conversation.
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: ezra/hermes-turboquant#1