[claude] The Testbed Observatory — Health Monitoring & Alerting (#147) #159
Reference in New Issue
Block a user
Delete Branch "claude/issue-147"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes #147
What this adds
observatory.py— a standalone monitoring script and daemon for the Hermes Agent testbed.Health checks (all run every poll cycle)
http://localhost:8642/healthfor 200 + latencyAlerting
critical/errorokTELEGRAM_BOT_TOKEN+OBSERVATORY_ALERT_CHAT_IDDaily digest
OBSERVATORY_DIGEST_CHAT_IDPersistence & history
~/.hermes/observatory.db(path overridable)--history Nprints last N recordsSLOs (documented and tracked)
Usage
Config (env vars)
TELEGRAM_BOT_TOKENOBSERVATORY_ALERT_CHAT_IDOBSERVATORY_DIGEST_CHAT_IDOBSERVATORY_POLL_INTERVALOBSERVATORY_DB_PATH~/.hermes/observatory.dbOBSERVATORY_DISK_WARN_PCT/_CRIT_PCTOBSERVATORY_MEM_WARN_PCT/_CRIT_PCTOBSERVATORY_CPU_WARN_PCT/_CRIT_PCTOBSERVATORY_WEBHOOK_URLhttp://127.0.0.1:8080/healthOBSERVATORY_API_URLhttp://127.0.0.1:8642/healthTests
45 unit tests covering all checks, persistence, alerting deduplication, digest generation, SLO tracking, and the CLI interface. All pass.
Dependencies
psutil(optional — graceful degradation if missing) — added asobservatoryextras group inpyproject.tomlsqlite3,urllib.request,signal)