Files

Timmy Time 70516aba13 checkpoint: 23:00 auto-commit

2026-04-01 23:00:10 +00:00

6.8 KiB

Raw Blame History

EPIC-001: Proper Metrics Visualization System

Status: PROPOSED
Priority: P1 - Infrastructure
Owner: TBD (Allegro proposing, needs assignment)
Created: 2026-04-01

Current State (REAL METRICS - NO FAKES)

Test Suite Metrics (Hermes Agent)

Total Tests:    4,884
Passed:         3,930 (80.5%)
Failed:           680 (13.9%)
Errors:           274 (5.6%)
Skipped:          166 (3.4%)
Execution Time: 139.79s (2:19)

Breakdown by Category:

ACP Tests: 29 passed, 2 failed, 5 skipped (after fixes)
Core Agent: ~85% pass rate
Tool Tests: Voice mode and web tools showing errors (config-related)
E2E Tests: Excluded (require external services)

Provider Trait Spike Performance

Mock Provider:    <1ms response
Kimi Provider:    Ready (needs aiohttp)
Ollama Provider:  Ready (needs local server)
Factory Pattern:  Working

System Resources (Allegro VM - 167.99.126.228)

Memory:     7.8 GB total
  Used:     980 MB (12.5%)
  Free:     5.7 GB (73%)
  Cache:    1.4 GB (18%)
  
Disk:       154 GB total
  Used:     12 GB (8%)
  Free:     143 GB (92%)
  
Load:       0.97 (1-min avg)
Uptime:     1:56

Wizard Health (From Monitoring Script)

Allegro-Primus Gateway: DOWN (port 8644)
Ezra Gateway:          DOWN (port 8643)
Ollama:                DOWN (port 11434)
Gitea:                 DOWN (port 3000)
SearXNG:               DOWN (port 8080)

Note: All services operational on Ezra (143.198.27.163).
Allegro cannot reach them due to Tailscale isolation.

Claw Code Identity Activation

User Created:    claw-code (ID: 24)
Token Generated: 5466b7a69b62677bc869778a3d9d12e8f6ee40b4
Token Deployed:  Allegro .env
Test Issue:      #799 (Timmy_Foundation/the-nexus)
Status:          OPERATIONAL

Problem Statement

No centralized metrics infrastructure exists.

Current limitations:

Manual test runs required to get coverage
No historical trend tracking
No alerting on failure thresholds
No performance benchmarking over time
Service health checked via bash scripts, not time-series DB
Metrics scattered across multiple VMs and files

Proposed Solution

Phase 1: Metrics Collection Layer (Week 1)

Components:

Prometheus - Time-series metrics database
- Port: 9090
- Retention: 30 days local, archive to S3
Node Exporter - System metrics
- CPU, memory, disk, network
- Per-process metrics for wizards
Custom Exporters:
- hermes_metrics_exporter - Test pass/fail rates, execution times
- gitea_metrics_exporter - Issue counts, PR merge rates, CI times
- wizard_health_exporter - Gateway status, response times

Phase 2: Visualization Layer (Week 2)

Grafana - Dashboards

Port: 3001 (conflicts with Gitea on 3000)
Dashboards:
- "Wizard Fleet Health" - All wizards, all services
- "Test Suite Quality" - Pass rates, coverage trends, flaky tests
- "Performance Metrics" - Response times, token usage, costs
- "Infrastructure" - CPU, memory, disk across all VMs

Phase 3: Alerting Layer (Week 3)

AlertManager - Notifications

P0: Service down > 5 min
P1: Test pass rate < 70%
P2: Disk usage > 85%
P3: Response time > 10s

Delivery: Telegram, email, Gitea issue auto-created

Technical Architecture

┌─────────────────────────────────────────────────────────┐
│                    METRICS PIPELINE                     │
├─────────────────────────────────────────────────────────┤
│  Sources          Collectors           Storage          │
│  ───────          ─────────           ────────          │
│  Hermes Agent  →  Prometheus      →   Prometheus TSDB   │
│  Gitea         →  Exporters          (local + S3)       │
│  System        →  Node Exporter                         │
│  Wizards       →  Health Checks                         │
│                                                         │
│  Visualization: Grafana (port 3001)                     │
│  Alerting: AlertManager → Telegram/Gitea                │
└─────────────────────────────────────────────────────────┘

Metrics to Track

Code Quality Metrics

Metric	Target	Alert If
Test Pass Rate	> 85%	< 70%
Code Coverage	> 80%	< 60%
Flaky Tests	0	> 5
ACP Fix Rate	100%	< 95%

Performance Metrics

Metric	Target	Alert If
Provider Response	< 5s	> 10s
Gateway Uptime	> 99%	< 95%
Memory Usage	< 80%	> 90%
Disk Usage	< 80%	> 90%

Business Metrics

Metric	Target	Alert If
Issues Resolved/Day	> 5	< 2
PR Merge Time	< 24h	> 72h
Auto-Dispatch Success	> 95%	< 80%

Implementation Plan

Week 1: Foundation

Deploy Prometheus (Docker)
Deploy Node Exporter on all VMs
Create hermes_metrics_exporter
Configure scraping

Week 2: Visualization

Deploy Grafana
Create "Wizard Fleet Health" dashboard
Create "Test Suite Quality" dashboard
Create "Performance Metrics" dashboard

Week 3: Alerting

Deploy AlertManager
Configure Telegram notifications
Configure Gitea issue creation
Test alert pipeline

Week 4: Polish

Document dashboards
Train team on usage
Archive old metrics
Handoff to operations

Acceptance Criteria

Prometheus collecting metrics from all sources
Grafana accessible at http://metrics.timmy.local:3001
3+ dashboards operational
Alerts firing and delivering to Telegram
Historical data retained for 30 days
Runbook documented

Resources Required

Resource	Spec	Cost
Prometheus	2 CPU, 4GB RAM, 100GB disk	$20/mo
Grafana	1 CPU, 2GB RAM	$10/mo
Storage	500GB for 90-day retention	$15/mo
Total		$45/mo

Or deploy on existing Ezra VPS (has capacity).

Current Workaround

Until EPIC complete, use:

wizard-health.sh - Basic HTML dashboard
Manual pytest runs - Test metrics
claw-code-dispatcher.py logs - Dispatch metrics
System top/free/df - Resource metrics

Issue #799 - Claw Code identity (metrics will track its dispatch success)
ADR-001 - Harness engineering (metrics prove value)
Provider spike - Performance baselines established

Truth in metrics. No fakes.

6.8 KiB Raw Blame History