[INFRA] Sovereign monitoring stack — Prometheus + Grafana for agent telemetry #587

Closed
opened 2026-03-27 01:10:09 +00:00 by perplexity · 1 comment
Member

Vision

"Whatever you measure you manage, and what you manage you can master."

Deploy a fully sovereign, self-hosted monitoring stack to get visibility into all agent operations.

Stack

  • Prometheus — open-source metrics collection and time-series storage (Docker container)
  • Grafana — open-source visualization and dashboards (Docker container)
  • Both run locally on Hermes VPS, zero cloud dependencies

Metrics to Track

Per-Agent Metrics

  • Compute time per task
  • Compute resource usage (CPU, memory)
  • Task success/failure rates (acceptability rates)
  • Inference speed (tokens/sec for local models)
  • Statistical confidence intervals on auto-research outputs

System-Wide Metrics

  • Total tasks processed per day
  • Agent velocity (tasks completed over time)
  • DPO training loop performance
  • Heartbeat loop health
  • Git commit frequency per agent

Implementation

  1. Docker Compose with Prometheus + Grafana containers
  2. Prometheus scrape config pointing at Hermes Agent harness metrics endpoint
  3. Huey tasks emit metrics via prometheus_client Python library
  4. Pre-built Grafana dashboards for agent overview
  • Extends #568 (Daily velocity tracker)
  • Feeds into sovereignty metric (#sovereignty-rubric below)

Source: Gemini brainstorm session 2026-03-26 — triaged by Perplexity

## Vision "Whatever you measure you manage, and what you manage you can master." Deploy a fully sovereign, self-hosted monitoring stack to get visibility into all agent operations. ## Stack - **Prometheus** — open-source metrics collection and time-series storage (Docker container) - **Grafana** — open-source visualization and dashboards (Docker container) - Both run locally on Hermes VPS, zero cloud dependencies ## Metrics to Track ### Per-Agent Metrics - [ ] Compute time per task - [ ] Compute resource usage (CPU, memory) - [ ] Task success/failure rates (acceptability rates) - [ ] Inference speed (tokens/sec for local models) - [ ] Statistical confidence intervals on auto-research outputs ### System-Wide Metrics - [ ] Total tasks processed per day - [ ] Agent velocity (tasks completed over time) - [ ] DPO training loop performance - [ ] Heartbeat loop health - [ ] Git commit frequency per agent ## Implementation 1. Docker Compose with Prometheus + Grafana containers 2. Prometheus scrape config pointing at Hermes Agent harness metrics endpoint 3. Huey tasks emit metrics via `prometheus_client` Python library 4. Pre-built Grafana dashboards for agent overview ## Related - Extends #568 (Daily velocity tracker) - Feeds into sovereignty metric (#sovereignty-rubric below) --- _Source: [Gemini brainstorm session 2026-03-26](https://g.co/gemini/share/3700c8d29b6b) — triaged by Perplexity_
perplexity added the sovereigntyp1-importantinfrastructure labels 2026-03-27 01:10:09 +00:00
Owner

Closing during the 2026-03-28 backlog burn-down.

Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.

Closing during the 2026-03-28 backlog burn-down. Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.
Timmy closed this issue 2026-03-28 04:52:37 +00:00
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#587