[claude] Hermes health monitor — system resources + model management (#1073) #1133

Merged
Rockachopa merged 1 commits from claude/issue-1073 into main 2026-03-23 18:36:07 +00:00

1 Commits

Author SHA1 Message Date
Alexander Whitestone
aff1a45183 feat: Hermes health monitor — system resources + model management (#1073)
Some checks failed
Tests / lint (pull_request) Failing after 15s
Tests / test (pull_request) Has been skipped
Adds a background health monitor that checks Hermes (M3 Max) every 5 minutes:

Monitors:
- Memory pressure (vm_stat) — unloads Ollama models if <4GB free
- Disk usage (shutil) — cleans /tmp if <10GB free
- Ollama status (/api/tags + /api/ps) — restarts via launchctl/brew if crashed
- Process health (ps aux) — detects zombie processes
- Network (Gitea /api/v1/version) — alerts if unreachable

Auto-resolution:
- Unloads Ollama models via keep_alive=0 on low memory
- Cleans /tmp files older than 24 hours on low disk
- Restarts Ollama via launchctl or brew services on crash

Alerting:
- Push notifications via notifier singleton for warnings and criticals
- Native macOS notification for CRITICAL issues
- Silent auto-fix notifications logged at WARNING level

New endpoints:
- GET /hermes/status — last known health report (cached, non-blocking)
- POST /hermes/check — trigger immediate cycle

Config (env-overridable):
- HERMES_ENABLED (default: true)
- HERMES_INTERVAL_SECONDS (default: 300)
- HERMES_MEMORY_FREE_MIN_GB (default: 4.0)
- HERMES_DISK_FREE_MIN_GB (default: 10.0)

28 unit tests covering all checks, auto-remediation, and serialization.

Fixes #1073

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-23 14:29:05 -04:00