Monitoring & Metrics

DevGuard exposes health endpoints for Kubernetes probes and optionally integrates with Prometheus and Grafana via the Helm chart's built-in observability stack.


Health Endpoints

Both services expose health endpoints configured as Kubernetes liveness and readiness probes:

EndpointServicePort
/api/v1/healthAPI8080
/api/healthWeb frontend3000

Distributed Tracing & Span Metrics

When api.tracing.enabled: true is set, the Helm chart injects an OTel Collector sidecar into the API pod. The sidecar runs alongside the API container and handles three things:

  1. Receives traces from the API on localhost:4317 (gRPC) — the API is automatically configured to export traces to the sidecar
  2. Generates span metricsconverts spans into Prometheus-format histogram metrics on port 8889
  3. Forwards traces to your backend (Jaeger, Grafana Tempo, or any OTLP-compatible collector)
API container  →  [localhost:4317]  →  OTel Collector sidecar  →  :8889 (Prometheus scrape)
                                                               →  otlpEndpoint (Jaeger / Tempo / ...)

Enable tracing

Add the following to your values.yaml:

api:
  tracing:
    enabled: true
    # Fraction of requests to trace (0.0 – 1.0)
    sampleRate: "0.1"
    # OTLP endpoint to forward traces to, e.g. Jaeger or Grafana Tempo
    otlpEndpoint: "http://jaeger-collector:4318"

For backends that require authentication, provide credentials via a secret:

api:
  tracing:
    enabled: true
    otlpEndpoint: "https://tempo.example.com:4318"
    existingSecretName: otlp-basic-auth   # keys: username, password

Create the secret:

kubectl create secret generic otlp-basic-auth \
  --from-literal=username="your-username" \
  --from-literal=password="your-password" \
  -n devguard

Sidecar image and resources

The sidecar uses otel/opentelemetry-collector-contrib. You can tune resource limits and the span metrics configuration:

api:
  tracing:
    spanMetrics:
      image:
        repository: otel/opentelemetry-collector-contrib
        tag: "0.147.0"
      # Latency histogram buckets
      histogram:
        buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
      # Span attributes to break metrics down by
      dimensions:
        - name: http.method
        - name: http.status_code
        - name: http.route
      resources:
        limits:
          cpu: 200m
          memory: 256Mi
        requests:
          cpu: 50m
          memory: 64Mi

Prometheus ServiceMonitor (optional)

When using the Prometheus Operator, enable ServiceMonitor resources for automatic scrape target discovery:

observability:
  serviceMonitor:
    enabled: true
    # Labels that your Prometheus Operator uses to discover ServiceMonitors
    additionalLabels: {}
    # Namespace where Prometheus runs (opens NetworkPolicy egress)
    prometheusNamespace: monitoring
    interval: 30s
    scrapeTimeout: 10s

The PostgreSQL ServiceMonitor is created whenever observability.serviceMonitor.enabled: true, regardless of tracing. It deploys a postgres-exporter sidecar and scrapes it on port 9187.


Grafana Dashboard (optional)

A Grafana dashboard ConfigMap for the span metrics can be deployed automatically:

observability:
  grafanaDashboard:
    enabled: true

Checking logs

For operational visibility without a full observability stack, check the API logs directly:

# Kubernetes
kubectl logs -f deployment/devguard-api-deployment -n devguard

# Docker Compose
docker logs -f devguard-api