---
name: lazarus-pit-recovery
description: "Resurrect a downed Hermes agent — fallback inference paths, profile recovery, Telegram reconnection. When one falls, all hands rally."
tags: [recovery, agents, ollama, llama-cpp, turboquant, telegram, lazarus]
trigger: "Agent is down, unresponsive, or has invalid credentials and needs to be brought back online"
---

# Lazarus Pit — Agent Recovery Protocol

When an agent goes down, ALL available agents rally to bring it back.

## Step 1: Assess Current Fleet State

```bash
# Check running agents
ps aux | grep hermes | grep -v grep
systemctl list-units 'hermes-*' --all

# Check running inference backends
ps aux | grep -E 'ollama|llama-server' | grep -v grep
curl -s http://localhost:11434/api/tags  # Ollama models
```

## Step 2: Identify the Problem

Common failure modes:
- **Invalid API key** (Kimi/OpenAI/etc) → Switch to local inference
- **Invalid Telegram bot token** → Get fresh token from @BotFather or reuse available one
- **Model not loaded** → Pull via Ollama or start llama-server
- **Service crashed** → Check logs: `journalctl -u hermes-<name> --since "1 hour ago"`

## Step 3: Local Inference Fallback Chain

Priority order:
1. **Ollama** (easiest) — Check available models: `ollama list`
   - Gemma 3:4b (fast, low memory)
   - Gemma 3:27b (better quality, more RAM)
   ```bash
   ollama serve &  # If not running
   ollama run gemma3:4b  # Test
   ```

2. **TurboQuant llama.cpp** (best memory efficiency)
   ```bash
   cd /root/llama-cpp-turboquant/
   ./build/bin/llama-server \
     -m /path/to/model.gguf \
     --host 0.0.0.0 --port 8080 \
     -c 4096 --cache-type-k turbo4 --cache-type-v turbo4
   ```
   - turbo4: 3.8x KV compression, minimal quality loss
   - turbo2: 6.4x compression, noticeable quality loss

3. **Standard llama.cpp** — Same as above without `--cache-type` flags

## Step 4: Configure Profile

```bash
# Profile locations
ls ~/.hermes/profiles/          # Hermes profiles
ls /root/wizards/               # Wizard directories

# Key files to edit
~/.hermes/profiles/<name>/config.yaml   # Model + provider config
~/.hermes/profiles/<name>/.env          # API keys + bot tokens
/root/wizards/<name>/home/.env          # Alternative .env location
```

### Ollama config.yaml:
```yaml
model: gemma3:4b
providers:
  ollama:
    base_url: http://localhost:11434/v1
```

### llama.cpp config.yaml:
```yaml
model: local-model
providers:
  llama-cpp:
    base_url: http://localhost:8080/v1
```

## Step 5: Connect Telegram

```bash
# Add bot token to .env
echo 'TELEGRAM_BOT_TOKEN=<token>' >> ~/.hermes/profiles/<name>/.env

# Add channel
echo 'TELEGRAM_ALLOWED_CHATS=-1003664764329' >> ~/.hermes/profiles/<name>/.env
```

## Step 6: Launch & Verify

```bash
# Start service
systemctl start hermes-<name>
# Or manual:
HERMES_PROFILE=<name> hermes gateway run
```

## Step 7: Validate

- Send test message in Telegram
- Check response arrives
- Verify logs: `journalctl -u hermes-<name> -f`

## Pitfalls

- **Qin profile** has INVALID Kimi keys and bot token as of 2026-04 — needs fresh creds
- **Allegro and Ezra tokens** are IN USE — don't steal from running agents
- **CPU-only inference** is slow (~35s for Gemma 3:4b) — acceptable for chat, not for coding
- **TurboQuant requires custom llama.cpp build** — standard Ollama doesn't support it
- **Token masking** — `systemctl show` masks env vars; check .env files directly

## Known Bot Inventory

| Agent | Status | Backend | Notes |
|-------|--------|---------|-------|
| Ezra | ACTIVE | Kimi | Don't touch |
| Allegro | ACTIVE | Kimi | Don't touch |
| Bezalel | AVAILABLE | Ollama/llama.cpp | Recovery candidate |
| Qin | BROKEN | - | Needs fresh creds |
| Adagio | AVAILABLE | - | Token may be invalid |