Files
allegro-checkpoint/skills/devops/lazarus-pit-recovery/SKILL.md

3.7 KiB

name, description, tags, trigger
name description tags trigger
lazarus-pit-recovery Resurrect a downed Hermes agent — fallback inference paths, profile recovery, Telegram reconnection. When one falls, all hands rally.
recovery
agents
ollama
llama-cpp
turboquant
telegram
lazarus
Agent is down, unresponsive, or has invalid credentials and needs to be brought back online

Lazarus Pit — Agent Recovery Protocol

When an agent goes down, ALL available agents rally to bring it back.

Step 1: Assess Current Fleet State

# Check running agents
ps aux | grep hermes | grep -v grep
systemctl list-units 'hermes-*' --all

# Check running inference backends
ps aux | grep -E 'ollama|llama-server' | grep -v grep
curl -s http://localhost:11434/api/tags  # Ollama models

Step 2: Identify the Problem

Common failure modes:

  • Invalid API key (Kimi/OpenAI/etc) → Switch to local inference
  • Invalid Telegram bot token → Get fresh token from @BotFather or reuse available one
  • Model not loaded → Pull via Ollama or start llama-server
  • Service crashed → Check logs: journalctl -u hermes-<name> --since "1 hour ago"

Step 3: Local Inference Fallback Chain

Priority order:

  1. Ollama (easiest) — Check available models: ollama list

    • Gemma 3:4b (fast, low memory)
    • Gemma 3:27b (better quality, more RAM)
    ollama serve &  # If not running
    ollama run gemma3:4b  # Test
    
  2. TurboQuant llama.cpp (best memory efficiency)

    cd /root/llama-cpp-turboquant/
    ./build/bin/llama-server \
      -m /path/to/model.gguf \
      --host 0.0.0.0 --port 8080 \
      -c 4096 --cache-type-k turbo4 --cache-type-v turbo4
    
    • turbo4: 3.8x KV compression, minimal quality loss
    • turbo2: 6.4x compression, noticeable quality loss
  3. Standard llama.cpp — Same as above without --cache-type flags

Step 4: Configure Profile

# Profile locations
ls ~/.hermes/profiles/          # Hermes profiles
ls /root/wizards/               # Wizard directories

# Key files to edit
~/.hermes/profiles/<name>/config.yaml   # Model + provider config
~/.hermes/profiles/<name>/.env          # API keys + bot tokens
/root/wizards/<name>/home/.env          # Alternative .env location

Ollama config.yaml:

model: gemma3:4b
providers:
  ollama:
    base_url: http://localhost:11434/v1

llama.cpp config.yaml:

model: local-model
providers:
  llama-cpp:
    base_url: http://localhost:8080/v1

Step 5: Connect Telegram

# Add bot token to .env
echo 'TELEGRAM_BOT_TOKEN=<token>' >> ~/.hermes/profiles/<name>/.env

# Add channel
echo 'TELEGRAM_ALLOWED_CHATS=-1003664764329' >> ~/.hermes/profiles/<name>/.env

Step 6: Launch & Verify

# Start service
systemctl start hermes-<name>
# Or manual:
HERMES_PROFILE=<name> hermes gateway run

Step 7: Validate

  • Send test message in Telegram
  • Check response arrives
  • Verify logs: journalctl -u hermes-<name> -f

Pitfalls

  • Qin profile has INVALID Kimi keys and bot token as of 2026-04 — needs fresh creds
  • Allegro and Ezra tokens are IN USE — don't steal from running agents
  • CPU-only inference is slow (~35s for Gemma 3:4b) — acceptable for chat, not for coding
  • TurboQuant requires custom llama.cpp build — standard Ollama doesn't support it
  • Token maskingsystemctl show masks env vars; check .env files directly

Known Bot Inventory

Agent Status Backend Notes
Ezra ACTIVE Kimi Don't touch
Allegro ACTIVE Kimi Don't touch
Bezalel AVAILABLE Ollama/llama.cpp Recovery candidate
Qin BROKEN - Needs fresh creds
Adagio AVAILABLE - Token may be invalid