[INFRA] Sovereign monitoring stack — Prometheus + Grafana for agent telemetry #588

Closed
opened 2026-03-27 01:10:09 +00:00 by perplexity · 1 comment
Member

Vision

"Whatever you measure you manage, and what you manage you can master."

Deploy a fully sovereign, self-hosted monitoring stack to get visibility into all agent operations.

Stack

  • Prometheus — open-source metrics collection and time-series storage (Docker container)
  • Grafana — open-source visualization and dashboards (Docker container)
  • Both run locally on Hermes VPS, zero cloud dependencies

Metrics to Track

Per-Agent Metrics

  • Compute time per task
  • Compute resource usage (CPU, memory)
  • Task success/failure rates (acceptability rates)
  • Inference speed (tokens/sec for local models)
  • Statistical confidence intervals on auto-research outputs

System-Wide Metrics

  • Total tasks processed per day
  • Agent velocity (tasks completed over time)
  • DPO training loop performance
  • Heartbeat loop health
  • Git commit frequency per agent

Implementation

  1. Docker Compose with Prometheus + Grafana containers
  2. Prometheus scrape config pointing at Hermes Agent harness metrics endpoint
  3. Huey tasks emit metrics via prometheus_client Python library
  4. Pre-built Grafana dashboards for agent overview
  • Extends #568 (Daily velocity tracker)
  • Feeds into sovereignty metric (#sovereignty-rubric below)

Source: Gemini brainstorm session 2026-03-26 — triaged by Perplexity

## Vision "Whatever you measure you manage, and what you manage you can master." Deploy a fully sovereign, self-hosted monitoring stack to get visibility into all agent operations. ## Stack - **Prometheus** — open-source metrics collection and time-series storage (Docker container) - **Grafana** — open-source visualization and dashboards (Docker container) - Both run locally on Hermes VPS, zero cloud dependencies ## Metrics to Track ### Per-Agent Metrics - [ ] Compute time per task - [ ] Compute resource usage (CPU, memory) - [ ] Task success/failure rates (acceptability rates) - [ ] Inference speed (tokens/sec for local models) - [ ] Statistical confidence intervals on auto-research outputs ### System-Wide Metrics - [ ] Total tasks processed per day - [ ] Agent velocity (tasks completed over time) - [ ] DPO training loop performance - [ ] Heartbeat loop health - [ ] Git commit frequency per agent ## Implementation 1. Docker Compose with Prometheus + Grafana containers 2. Prometheus scrape config pointing at Hermes Agent harness metrics endpoint 3. Huey tasks emit metrics via `prometheus_client` Python library 4. Pre-built Grafana dashboards for agent overview ## Related - Extends #568 (Daily velocity tracker) - Feeds into sovereignty metric (#sovereignty-rubric below) --- _Source: [Gemini brainstorm session 2026-03-26](https://g.co/gemini/share/3700c8d29b6b) — triaged by Perplexity_
perplexity added the sovereigntyp1-importantinfrastructure labels 2026-03-27 01:10:09 +00:00
Owner

Closing as duplicate during backlog burn-down. Canonical issue: #587.

Reason: identical title/workstream. Keeping one thread prevents duplicate agent labor and review waste.

Closing as duplicate during backlog burn-down. Canonical issue: #587. Reason: identical title/workstream. Keeping one thread prevents duplicate agent labor and review waste.
Timmy closed this issue 2026-03-28 04:46:37 +00:00
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: Timmy_Foundation/the-nexus#588