4.9 KiB
4.9 KiB
name, description, version, author, license, metadata
| name | description | version | author | license | metadata | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| wizard-house-remote-triage | Diagnose and fix a sibling wizard house that is unresponsive or misbehaving on a remote VPS. SSH in, check resources, inspect config, kill runaway processes, restart services. | 1.0.0 | Ezra | MIT |
|
Wizard House Remote Triage
When to Use
- A sibling wizard (Allegro, Bezalel, etc.) is not responding on Telegram
- A wizard's VPS is suspected to be resource-starved or misconfigured
- You need to inspect or fix another wizard's deployment from your own VPS
Prerequisites
- SSH access to the target VPS (key in authorized_keys)
- If no key exists: generate with
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519 -N "" -C "ezra@hermes-vps"and have Alexander drop the pubkey on the target
Phase 1: Can You Reach It?
# Ping first
ping -c 2 -W 2 TARGET_IP
# Try SSH
ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@TARGET_IP 'hostname; whoami'
If SSH connects but banner exchange times out: the box is resource-starved. SSHD can't respond because CPU/RAM is exhausted. You need Alexander to use the DigitalOcean web console (Droplets > droplet > Console) to free resources first.
Quick commands for Alexander to run on DO console:
# Check what's eating RAM
ps aux --sort=-%mem | head -10
free -h
# Common resource hogs to kill
systemctl stop docker containerd nginx
systemctl disable docker containerd
Phase 2: Inspect the Box
Once SSH works:
# Resources
ssh root@TARGET 'free -h; echo "---"; ps aux --sort=-%mem | head -8'
# Wizard service status
ssh root@TARGET 'systemctl status hermes-WIZARD.service 2>&1 | head -20'
# Recent logs (look for errors, stuck sessions, long tool calls)
ssh root@TARGET 'journalctl -u hermes-WIZARD.service --no-pager -n 50'
Phase 3: Inspect Config
Check if Telegram and provider are properly wired:
# Environment file - must have bot token AND provider key
ssh root@TARGET 'cat /root/wizards/WIZARD/home/.env'
# Config - check model, provider, platforms section
ssh root@TARGET 'cat /root/wizards/WIZARD/home/config.yaml'
Required .env vars for Telegram-connected wizard:
- Provider API key (e.g., KIMI_API_KEY, ANTHROPIC_API_KEY)
- TELEGRAM_BOT_TOKEN
- TELEGRAM_HOME_CHANNEL
- TELEGRAM_HOME_CHANNEL_NAME
- TELEGRAM_ALLOWED_USERS
If Telegram vars are missing, the deploy script was incomplete. Add them and restart.
Phase 4: Fix Common Problems
Resource starvation (< 200MB available)
# Kill Docker if not needed
ssh root@TARGET 'systemctl stop docker containerd; systemctl disable docker containerd'
# Kill any runaway processes spawned by the wizard
ssh root@TARGET 'pkill -f "khatru-relay|caddy|nginx|go build" 2>/dev/null'
Stuck session (wizard busy with a long-running task)
# Restart the service - clears the stuck session
ssh root@TARGET 'systemctl restart hermes-WIZARD.service && sleep 2 && systemctl is-active hermes-WIZARD.service'
Wrong provider or missing key
Edit the .env and config.yaml, then restart:
ssh root@TARGET 'systemctl restart hermes-WIZARD.service'
Phase 5: Verify
# Service running?
ssh root@TARGET 'systemctl is-active hermes-WIZARD.service'
# RAM healthy? (> 500MB available on 2GB box)
ssh root@TARGET 'free -h'
# Logs show Telegram connected?
ssh root@TARGET 'journalctl -u hermes-WIZARD.service --no-pager -n 10 | grep -i telegram'
Then ask Alexander to message the wizard on Telegram to confirm it responds.
Reaching Alexander's Mac
If you need to diagnose local Timmy directly, Alexander's Mac is on the Tailscale network:
- Tailscale IP:
100.124.176.28(hostname:mm) - User:
apayne - SSH key must be in
~/.ssh/authorized_keyson the Mac - System python is 3.9; use
~/.hermes/hermes-agent/venv/bin/python3for 3.10+ syntax - llama-server model path:
/Users/apayne/models/hermes4-14b/NousResearch_Hermes-4-14B-Q4_K_M.gguf - Timmy workspace:
~/.timmy/, Hermes home:~/.hermes/
Pitfalls
- SSH banner timeout = RAM starvation. Don't keep retrying SSH. The box needs resources freed via DO console first.
- 2GB droplets cannot run Hermes + Docker + nginx + a relay. Hermes alone needs ~500MB-1GB. Budget accordingly.
- Deploy scripts may be incomplete. Bezalel's deploy script for Allegro only injected KIMI_API_KEY but not Telegram vars. Always verify .env has everything.
- Wizards build infrastructure when left unsupervised. If a wizard was given a broad task, check what processes it spawned. Kill anything not essential.
- Always check the Gitea PR for the deploy script — it's the source of truth for what SHOULD be configured. Compare against what IS configured.
- Restart clears stuck sessions. If a wizard is mid-session on a slow model (Kimi), restarting the service is the fastest way to unstick it.