[claude] Hermes health monitor — system resources + model management (#1073) #1133

Merged
Rockachopa merged 1 commits from claude/issue-1073 into main 2026-03-23 18:36:07 +00:00
Collaborator

Fixes #1073

Summary

  • src/infrastructure/hermes/monitor.pyHermesMonitor class with 5 checks: memory (vm_stat), disk (shutil), Ollama (/api/tags + /api/ps), processes (ps aux), network (Gitea ping)
  • src/dashboard/routes/hermes.pyGET /hermes/status (cached) and POST /hermes/check (immediate)
  • src/dashboard/app.py_hermes_scheduler() background task running every 5 minutes
  • src/config.py — 4 new settings: hermes_enabled, hermes_interval_seconds, hermes_memory_free_min_gb, hermes_disk_free_min_gb
  • tests/unit/test_hermes_monitor.py — 28 unit tests, all passing

Auto-remediation

  • Low memory → unloads Ollama models via keep_alive=0
  • Low disk → cleans /tmp files older than 24 hours
  • Ollama down → restarts via launchctl or brew services
  • Push notifications via notifier for warnings/criticals; native macOS notification for CRITICAL

Test plan

  • 28 unit tests pass (tox -e unit)
  • Full suite: 3295 passed, 1 skipped
  • Manual: verify GET /hermes/status returns {"status": "no_data"} before first cycle
  • Manual: POST /hermes/check runs and returns full report
Fixes #1073 ## Summary - **`src/infrastructure/hermes/monitor.py`** — `HermesMonitor` class with 5 checks: memory (vm_stat), disk (shutil), Ollama (/api/tags + /api/ps), processes (ps aux), network (Gitea ping) - **`src/dashboard/routes/hermes.py`** — `GET /hermes/status` (cached) and `POST /hermes/check` (immediate) - **`src/dashboard/app.py`** — `_hermes_scheduler()` background task running every 5 minutes - **`src/config.py`** — 4 new settings: `hermes_enabled`, `hermes_interval_seconds`, `hermes_memory_free_min_gb`, `hermes_disk_free_min_gb` - **`tests/unit/test_hermes_monitor.py`** — 28 unit tests, all passing ## Auto-remediation - Low memory → unloads Ollama models via `keep_alive=0` - Low disk → cleans `/tmp` files older than 24 hours - Ollama down → restarts via launchctl or brew services - Push notifications via `notifier` for warnings/criticals; native macOS notification for CRITICAL ## Test plan - [x] 28 unit tests pass (`tox -e unit`) - [x] Full suite: 3295 passed, 1 skipped - [ ] Manual: verify `GET /hermes/status` returns `{"status": "no_data"}` before first cycle - [ ] Manual: `POST /hermes/check` runs and returns full report
claude added 1 commit 2026-03-23 18:29:38 +00:00
feat: Hermes health monitor — system resources + model management (#1073)
Some checks failed
Tests / lint (pull_request) Failing after 15s
Tests / test (pull_request) Has been skipped
aff1a45183
Adds a background health monitor that checks Hermes (M3 Max) every 5 minutes:

Monitors:
- Memory pressure (vm_stat) — unloads Ollama models if <4GB free
- Disk usage (shutil) — cleans /tmp if <10GB free
- Ollama status (/api/tags + /api/ps) — restarts via launchctl/brew if crashed
- Process health (ps aux) — detects zombie processes
- Network (Gitea /api/v1/version) — alerts if unreachable

Auto-resolution:
- Unloads Ollama models via keep_alive=0 on low memory
- Cleans /tmp files older than 24 hours on low disk
- Restarts Ollama via launchctl or brew services on crash

Alerting:
- Push notifications via notifier singleton for warnings and criticals
- Native macOS notification for CRITICAL issues
- Silent auto-fix notifications logged at WARNING level

New endpoints:
- GET /hermes/status — last known health report (cached, non-blocking)
- POST /hermes/check — trigger immediate cycle

Config (env-overridable):
- HERMES_ENABLED (default: true)
- HERMES_INTERVAL_SECONDS (default: 300)
- HERMES_MEMORY_FREE_MIN_GB (default: 4.0)
- HERMES_DISK_FREE_MIN_GB (default: 10.0)

28 unit tests covering all checks, auto-remediation, and serialization.

Fixes #1073

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rockachopa merged commit 1c1bfb6407 into main 2026-03-23 18:36:07 +00:00
Rockachopa deleted branch claude/issue-1073 2026-03-23 18:36:08 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Rockachopa/Timmy-time-dashboard#1133