--- name: lazarus-pit-recovery description: "Resurrect a downed Hermes agent — fallback inference paths, profile recovery, Telegram reconnection. When one falls, all hands rally." tags: [recovery, agents, ollama, llama-cpp, turboquant, telegram, lazarus] trigger: "Agent is down, unresponsive, or has invalid credentials and needs to be brought back online" --- # Lazarus Pit — Agent Recovery Protocol When an agent goes down, ALL available agents rally to bring it back. ## Step 1: Assess Current Fleet State ```bash # Check running agents ps aux | grep hermes | grep -v grep systemctl list-units 'hermes-*' --all # Check running inference backends ps aux | grep -E 'ollama|llama-server' | grep -v grep curl -s http://localhost:11434/api/tags # Ollama models ``` ## Step 2: Identify the Problem Common failure modes: - **Invalid API key** (Kimi/OpenAI/etc) → Switch to local inference - **Invalid Telegram bot token** → Get fresh token from @BotFather or reuse available one - **Model not loaded** → Pull via Ollama or start llama-server - **Service crashed** → Check logs: `journalctl -u hermes- --since "1 hour ago"` ## Step 3: Local Inference Fallback Chain Priority order: 1. **Ollama** (easiest) — Check available models: `ollama list` - Gemma 3:4b (fast, low memory) - Gemma 3:27b (better quality, more RAM) ```bash ollama serve & # If not running ollama run gemma3:4b # Test ``` 2. **TurboQuant llama.cpp** (best memory efficiency) ```bash cd /root/llama-cpp-turboquant/ ./build/bin/llama-server \ -m /path/to/model.gguf \ --host 0.0.0.0 --port 8080 \ -c 4096 --cache-type-k turbo4 --cache-type-v turbo4 ``` - turbo4: 3.8x KV compression, minimal quality loss - turbo2: 6.4x compression, noticeable quality loss 3. **Standard llama.cpp** — Same as above without `--cache-type` flags ## Step 4: Configure Profile ```bash # Profile locations ls ~/.hermes/profiles/ # Hermes profiles ls /root/wizards/ # Wizard directories # Key files to edit ~/.hermes/profiles//config.yaml # Model + provider config ~/.hermes/profiles//.env # API keys + bot tokens /root/wizards//home/.env # Alternative .env location ``` ### Ollama config.yaml: ```yaml model: gemma3:4b providers: ollama: base_url: http://localhost:11434/v1 ``` ### llama.cpp config.yaml: ```yaml model: local-model providers: llama-cpp: base_url: http://localhost:8080/v1 ``` ## Step 5: Connect Telegram ```bash # Add bot token to .env echo 'TELEGRAM_BOT_TOKEN=' >> ~/.hermes/profiles//.env # Add channel echo 'TELEGRAM_ALLOWED_CHATS=-1003664764329' >> ~/.hermes/profiles//.env ``` ## Step 6: Launch & Verify ```bash # Start service systemctl start hermes- # Or manual: HERMES_PROFILE= hermes gateway run ``` ## Step 7: Validate - Send test message in Telegram - Check response arrives - Verify logs: `journalctl -u hermes- -f` ## Pitfalls - **Qin profile** has INVALID Kimi keys and bot token as of 2026-04 — needs fresh creds - **Allegro and Ezra tokens** are IN USE — don't steal from running agents - **CPU-only inference** is slow (~35s for Gemma 3:4b) — acceptable for chat, not for coding - **TurboQuant requires custom llama.cpp build** — standard Ollama doesn't support it - **Token masking** — `systemctl show` masks env vars; check .env files directly ## Known Bot Inventory | Agent | Status | Backend | Notes | |-------|--------|---------|-------| | Ezra | ACTIVE | Kimi | Don't touch | | Allegro | ACTIVE | Kimi | Don't touch | | Bezalel | AVAILABLE | Ollama/llama.cpp | Recovery candidate | | Qin | BROKEN | - | Needs fresh creds | | Adagio | AVAILABLE | - | Token may be invalid |