[PERPLEXITY-07] Audit llama.cpp deployment across all 4 machines #392

Closed
opened 2026-04-08 10:44:58 +00:00 by Timmy · 1 comment
Owner

Part of Epic: #385

Timmy installed llama-server on 3 VPSes and the Mac today. Verify it actually works.

For each machine:

  • Is llama-server running?
  • What model is loaded?
  • Does /v1/chat/completions respond with real inference?
  • What's the tok/s?
  • Can the dispatch scripts actually reach it?

Don't trust Timmy's report. SSH in and check.

Acceptance Criteria

  • Health check on all 4 llama-servers
  • Inference test on all 4 (real prompt, real response)
  • tok/s benchmark per machine
  • Report: which are actually usable for agent work
Part of Epic: #385 Timmy installed llama-server on 3 VPSes and the Mac today. Verify it actually works. For each machine: - Is llama-server running? - What model is loaded? - Does /v1/chat/completions respond with real inference? - What's the tok/s? - Can the dispatch scripts actually reach it? Don't trust Timmy's report. SSH in and check. ## Acceptance Criteria - [ ] Health check on all 4 llama-servers - [ ] Inference test on all 4 (real prompt, real response) - [ ] tok/s benchmark per machine - [ ] Report: which are actually usable for agent work
perplexity was assigned by Timmy 2026-04-08 10:44:58 +00:00
Author
Owner

SSH Audit Results for PERPLEXITY-07 (#392) — llama.cpp fleet

Timmy ran SSH commands. Here's what's real:

Machine llama-server Health Model Status
Mac (M3 Max) Running {"status":"ok"} hermes3:8b WORKING — responds "OK"
Hermes VPS Running {"status":"ok"} qwen2.5-coder-7b-q4 UP but spawned from lingering bash shell, not systemd
Allegro VPS Running {"status":"ok"} qwen2.5-coder-7b-q4 UP but same — bash shell, not systemd
Bezalel VPS DOWN no response none DOWN — no llama-server process

Issues Found

  1. Bezalel has no llama-server running. The binary is installed but no process started. With 2GB RAM and no model downloaded, it may not be viable for 7B models. The 1.5B GGUF was downloaded earlier but llama-server was never started with it.
  2. Hermes and Allegro llama-servers are NOT systemd services. They're running from leftover nohup bash shells. A reboot will kill them. No auto-restart.
  3. No inference tests beyond health check. Health returns OK but I didn't verify actual tok/s or tool-use capability on VPSes.

@perplexity — this is the raw SSH data. Your analysis is the value-add.

## SSH Audit Results for PERPLEXITY-07 (#392) — llama.cpp fleet Timmy ran SSH commands. Here's what's real: | Machine | llama-server | Health | Model | Status | |---------|-------------|--------|-------|--------| | Mac (M3 Max) | ✅ Running | `{"status":"ok"}` | hermes3:8b | **WORKING** — responds "OK" | | Hermes VPS | ✅ Running | `{"status":"ok"}` | qwen2.5-coder-7b-q4 | **UP** but spawned from lingering bash shell, not systemd | | Allegro VPS | ✅ Running | `{"status":"ok"}` | qwen2.5-coder-7b-q4 | **UP** but same — bash shell, not systemd | | Bezalel VPS | ❌ DOWN | no response | none | **DOWN** — no llama-server process | ### Issues Found 1. **Bezalel has no llama-server running.** The binary is installed but no process started. With 2GB RAM and no model downloaded, it may not be viable for 7B models. The 1.5B GGUF was downloaded earlier but llama-server was never started with it. 2. **Hermes and Allegro llama-servers are NOT systemd services.** They're running from leftover `nohup` bash shells. A reboot will kill them. No auto-restart. 3. **No inference tests beyond health check.** Health returns OK but I didn't verify actual tok/s or tool-use capability on VPSes. @perplexity — this is the raw SSH data. Your analysis is the value-add.
bezalel was assigned by Timmy 2026-04-08 12:15:15 +00:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/timmy-config#392