#587 - [INFRA] Sovereign monitoring stack — Prometheus + Grafana for agent telemetry - the-nexus

perplexity commented

2026-03-27 01:10:09 +00:00

Member

Vision

"Whatever you measure you manage, and what you manage you can master."

Deploy a fully sovereign, self-hosted monitoring stack to get visibility into all agent operations.

Stack

Prometheus — open-source metrics collection and time-series storage (Docker container)
Grafana — open-source visualization and dashboards (Docker container)
Both run locally on Hermes VPS, zero cloud dependencies

Metrics to Track

Per-Agent Metrics

Compute time per task
Compute resource usage (CPU, memory)
Task success/failure rates (acceptability rates)
Inference speed (tokens/sec for local models)
Statistical confidence intervals on auto-research outputs

System-Wide Metrics

Total tasks processed per day
Agent velocity (tasks completed over time)
DPO training loop performance
Heartbeat loop health
Git commit frequency per agent

Implementation

Docker Compose with Prometheus + Grafana containers
Prometheus scrape config pointing at Hermes Agent harness metrics endpoint
Huey tasks emit metrics via prometheus_client Python library
Pre-built Grafana dashboards for agent overview

Extends #568 (Daily velocity tracker)
Feeds into sovereignty metric (#sovereignty-rubric below)

Source: Gemini brainstorm session 2026-03-26 — triaged by Perplexity

## Vision "Whatever you measure you manage, and what you manage you can master." Deploy a fully sovereign, self-hosted monitoring stack to get visibility into all agent operations. ## Stack - **Prometheus** — open-source metrics collection and time-series storage (Docker container) - **Grafana** — open-source visualization and dashboards (Docker container) - Both run locally on Hermes VPS, zero cloud dependencies ## Metrics to Track ### Per-Agent Metrics - [ ] Compute time per task - [ ] Compute resource usage (CPU, memory) - [ ] Task success/failure rates (acceptability rates) - [ ] Inference speed (tokens/sec for local models) - [ ] Statistical confidence intervals on auto-research outputs ### System-Wide Metrics - [ ] Total tasks processed per day - [ ] Agent velocity (tasks completed over time) - [ ] DPO training loop performance - [ ] Heartbeat loop health - [ ] Git commit frequency per agent ## Implementation 1. Docker Compose with Prometheus + Grafana containers 2. Prometheus scrape config pointing at Hermes Agent harness metrics endpoint 3. Huey tasks emit metrics via `prometheus_client` Python library 4. Pre-built Grafana dashboards for agent overview ## Related - Extends #568 (Daily velocity tracker) - Feeds into sovereignty metric (#sovereignty-rubric below) --- _Source: [Gemini brainstorm session 2026-03-26](https://g.co/gemini/share/3700c8d29b6b) — triaged by Perplexity_

perplexity added the sovereignty p1-important infrastructure labels 2026-03-27 01:10:09 +00:00

Timmy referenced this issue

2026-03-28 04:46:37 +00:00

[INFRA] Sovereign monitoring stack — Prometheus + Grafana for agent telemetry #588

Timmy referenced this issue

2026-03-28 04:46:38 +00:00

[INFRA] Sovereign monitoring stack — Prometheus + Grafana for agent telemetry #590

Timmy commented

2026-03-28 04:52:36 +00:00

Owner

Closing during the 2026-03-28 backlog burn-down.

Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.

Closing during the 2026-03-28 backlog burn-down. Reason: this issue is being retired as part of a backlog reset toward the current final vision: Heartbeat, Harness, and Portal. If the work still matters after reset, it should return as a narrower, proof-oriented next-step issue rather than stay open as a broad legacy frontier.

Timmy closed this issue

2026-03-28 04:52:37 +00:00

[INFRA] Sovereign monitoring stack — Prometheus + Grafana for agent telemetry #587