[Backend] API Observability — structured logging + /api/metrics endpoint #57

Closed
opened 2026-03-21 00:43:52 +00:00 by replit · 1 comment
Owner

What & Why

The API server currently logs to console with no structure and has no metrics endpoint. Operators have no way to know throughput, invoice conversion rate, rejection rate, or latency without reading raw logs.

Done looks like

  • GET /api/metrics returns a JSON document: total jobs created, jobs by state, invoice payment conversion rate, p50/p95 latency for eval and work phases, total sats earned, uptime
  • All existing route handlers emit structured log lines (JSON with timestamp, level, route, jobId, duration_ms)
  • Demo endpoint and rate limiter log hits + 429s with structured fields
  • GET /api/healthz extended to include { status: "ok", uptime_s, jobs_total } (non-breaking)
  • No external telemetry service required — metrics computed in-process from existing DB

Out of scope

  • Prometheus scrape format (JSON only for now)
  • Alerting or dashboards
  • Session metrics (covered by Mode 2 task)

Tasks

  1. Structured logger — Thin logger utility (wrapping console or pino) emitting JSON lines with consistent fields (timestamp, level, component, message, optional context). Replace ad-hoc console.log calls across route and service files.
  2. Metrics aggregationMetricsService querying DB for job counts by state, computing conversion rates, reading in-memory latency histograms populated by route middleware.
  3. Latency middleware — Response-time middleware recording per-route duration into in-memory histogram; structured log line on every response.
  4. Metrics routeGET /api/metrics returning aggregated snapshot; extend GET /api/healthz with uptime and job count.

Relevant files

  • artifacts/api-server/src/index.ts
  • artifacts/api-server/src/routes/jobs.ts
  • artifacts/api-server/src/routes/health.ts
  • artifacts/api-server/src/lib/agent.ts
  • artifacts/api-server/src/lib/lnbits.ts
## What & Why The API server currently logs to console with no structure and has no metrics endpoint. Operators have no way to know throughput, invoice conversion rate, rejection rate, or latency without reading raw logs. ## Done looks like - `GET /api/metrics` returns a JSON document: total jobs created, jobs by state, invoice payment conversion rate, p50/p95 latency for eval and work phases, total sats earned, uptime - All existing route handlers emit structured log lines (JSON with timestamp, level, route, jobId, duration_ms) - Demo endpoint and rate limiter log hits + 429s with structured fields - `GET /api/healthz` extended to include `{ status: "ok", uptime_s, jobs_total }` (non-breaking) - No external telemetry service required — metrics computed in-process from existing DB ## Out of scope - Prometheus scrape format (JSON only for now) - Alerting or dashboards - Session metrics (covered by Mode 2 task) ## Tasks 1. **Structured logger** — Thin logger utility (wrapping `console` or `pino`) emitting JSON lines with consistent fields (timestamp, level, component, message, optional context). Replace ad-hoc `console.log` calls across route and service files. 2. **Metrics aggregation** — `MetricsService` querying DB for job counts by state, computing conversion rates, reading in-memory latency histograms populated by route middleware. 3. **Latency middleware** — Response-time middleware recording per-route duration into in-memory histogram; structured log line on every response. 4. **Metrics route** — `GET /api/metrics` returning aggregated snapshot; extend `GET /api/healthz` with uptime and job count. ## Relevant files - `artifacts/api-server/src/index.ts` - `artifacts/api-server/src/routes/jobs.ts` - `artifacts/api-server/src/routes/health.ts` - `artifacts/api-server/src/lib/agent.ts` - `artifacts/api-server/src/lib/lnbits.ts`
replit added the backend label 2026-03-21 00:43:52 +00:00
claude was assigned by Rockachopa 2026-03-22 23:37:13 +00:00
Collaborator

PR created: #87

All issue requirements implemented:

  • GET /api/metrics returns uptime, HTTP counters, jobs by state, invoice conversion rate, p50/p95 latency per route, total sats earned
  • Structured JSON logger (lib/logger.ts) with LOG_LEVEL env var filtering — adopted across all route and service files
  • Response-time middleware emits structured log line on every request with request_id, method, path, status, duration_ms
  • Request-ID middleware (X-Request-Id header)
  • GET /api/healthz extended with uptime_s and jobs_total (non-breaking)
  • No external telemetry dependencies
PR created: #87 All issue requirements implemented: - `GET /api/metrics` returns uptime, HTTP counters, jobs by state, invoice conversion rate, p50/p95 latency per route, total sats earned - Structured JSON logger (`lib/logger.ts`) with `LOG_LEVEL` env var filtering — adopted across all route and service files - Response-time middleware emits structured log line on every request with `request_id`, `method`, `path`, `status`, `duration_ms` - Request-ID middleware (`X-Request-Id` header) - `GET /api/healthz` extended with `uptime_s` and `jobs_total` (non-breaking) - No external telemetry dependencies
Sign in to join this conversation.
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: replit/timmy-tower#57