Skip to content

Getting Started with OpenTelemetry

Opinionated guide

This guide walks through one way to set up observability for Argo Workflows to see it working. It is not a reference architecture or a production recommendation.

It is not security hardened nor kept up to date. Adapt the components and configuration to suit your environment.

Tracing is beta

Tracing is not considered finished and may change in incompatible ways in future minor releases. See Tracing for details.

This guide deploys an OpenTelemetry Collector, Grafana Tempo, Prometheus, and Grafana so you can see traces and metrics from Argo Workflows.

Prerequisites

  • A Kubernetes cluster with kubectl configured

Architecture

flowchart LR
    WC[workflow-controller] -- OTLP gRPC --> Collector[OTel Collector]
    AE[argoexec] -- OTLP gRPC --> Collector
    Collector -- OTLP HTTP --> Tempo
    Collector -- Prometheus Remote Write --> Prometheus
    Tempo --> Grafana
    Prometheus --> Grafana

The workflow-controller and argoexec send spans and metrics to an OpenTelemetry Collector over gRPC. The collector forwards traces to Tempo over OTLP HTTP and metrics to Prometheus via remote write. Grafana queries both Prometheus and Tempo.

Step 1: Deploy Argo Workflows with the Observability Stack

The telemetry quick-start manifest installs Argo Workflows together with an OpenTelemetry Collector, Tempo, Prometheus, and Grafana -- all configured to talk to each otherwise:

kubectl create namespace argo
kubectl apply -n argo --server-side -f https://raw.githubusercontent.com/argoproj/argo-workflows/main/manifests/quick-start-telemetry.yaml

Wait for all pods to be ready:

kubectl wait -n argo --for=condition=Ready pod --all --timeout=120s

This single manifest includes:

  • Argo Workflows (controller, server, MinIO for artifacts)
  • OpenTelemetry Collector receiving OTLP gRPC/HTTP and forwarding to Tempo and Prometheus
  • Grafana Tempo for trace storage
  • Prometheus for metric storage (with remote write receiver enabled)
  • Grafana with Tempo and Prometheus data sources

The workflow-controller is already configured with OTEL_EXPORTER_OTLP_ENDPOINT pointing at the collector, and the executor ConfigMap includes OTEL environment variables so argoexec also sends traces.

Step 2: Access Grafana

Port-forward to the Grafana service:

kubectl port-forward svc/grafana -n argo 3000:3000

Open http://localhost:3000. Anonymous admin access is enabled -- no login required.

The Tempo and Prometheus data sources are already provisioned.

Step 3: Run a Workflow and View Traces

Submit the DAG diamond example workflow:

argo submit -n argo --watch https://raw.githubusercontent.com/argoproj/argo-workflows/main/examples/dag-diamond.yaml

Once the workflow completes, find its traces in Grafana:

  1. Go to Explore
  2. Select the Tempo data source
  3. Choose the Search tab
  4. Select Service Name: look for the workflow-controller service
  5. Click Run query to list recent traces
  6. Click a trace to open it

You should see a span hierarchy like:

  • workflow — the lifetime of the workflow
    • node (one per DAG node: A, B, C, D) — each node in the DAG
      • createWorkflowPod — pod creation
    • reconcileWorkflow — reconciliation loops

Each workflow pod also produces spans from argoexec: runInitContainer, runMainContainer, runWaitContainer, and their children. See Tracing for the full span reference.

Step 4: View Metrics

  1. Go to Explore
  2. Select the Prometheus data source
  3. Try these example PromQL queries:
# Workflows currently running
gauge{phase="Running"}

# Workflow phase counter over 5 minutes
rate(total_count_total{phase="Error"}[5m])

# Operation duration (p95)
histogram_quantile(0.95, rate(operation_duration_seconds_bucket[5m]))

See Metrics for the full list of available metrics.

Cleanup

Remove all resources created in this guide:

kubectl delete namespace argo

Next Steps


Have a question?

Search on GitHub Discussions and Slack.