Getting Started with OpenTelemetry¶
Opinionated guide
This guide walks through one way to set up observability for Argo Workflows to see it working. It is not a reference architecture or a production recommendation.
It is not security hardened nor kept up to date. Adapt the components and configuration to suit your environment.
Tracing is beta
Tracing is not considered finished and may change in incompatible ways in future minor releases. See Tracing for details.
This guide deploys an OpenTelemetry Collector, Grafana Tempo, Prometheus, and Grafana so you can see traces and metrics from Argo Workflows.
Prerequisites¶
- A Kubernetes cluster with
kubectlconfigured
Architecture¶
flowchart LR
WC[workflow-controller] -- OTLP gRPC --> Collector[OTel Collector]
AE[argoexec] -- OTLP gRPC --> Collector
Collector -- OTLP HTTP --> Tempo
Collector -- Prometheus Remote Write --> Prometheus
Tempo --> Grafana
Prometheus --> Grafana
The workflow-controller and argoexec send spans and metrics to an OpenTelemetry Collector over gRPC. The collector forwards traces to Tempo over OTLP HTTP and metrics to Prometheus via remote write. Grafana queries both Prometheus and Tempo.
Step 1: Deploy Argo Workflows with the Observability Stack¶
The telemetry quick-start manifest installs Argo Workflows together with an OpenTelemetry Collector, Tempo, Prometheus, and Grafana -- all configured to talk to each otherwise:
kubectl create namespace argo
kubectl apply -n argo --server-side -f https://raw.githubusercontent.com/argoproj/argo-workflows/main/manifests/quick-start-telemetry.yaml
Wait for all pods to be ready:
kubectl wait -n argo --for=condition=Ready pod --all --timeout=120s
This single manifest includes:
- Argo Workflows (controller, server, MinIO for artifacts)
- OpenTelemetry Collector receiving OTLP gRPC/HTTP and forwarding to Tempo and Prometheus
- Grafana Tempo for trace storage
- Prometheus for metric storage (with remote write receiver enabled)
- Grafana with Tempo and Prometheus data sources
The workflow-controller is already configured with OTEL_EXPORTER_OTLP_ENDPOINT pointing at the collector, and the executor ConfigMap includes OTEL environment variables so argoexec also sends traces.
Step 2: Access Grafana¶
Port-forward to the Grafana service:
kubectl port-forward svc/grafana -n argo 3000:3000
Open http://localhost:3000. Anonymous admin access is enabled -- no login required.
The Tempo and Prometheus data sources are already provisioned.
Step 3: Run a Workflow and View Traces¶
Submit the DAG diamond example workflow:
argo submit -n argo --watch https://raw.githubusercontent.com/argoproj/argo-workflows/main/examples/dag-diamond.yaml
Once the workflow completes, find its traces in Grafana:
- Go to Explore
- Select the Tempo data source
- Choose the Search tab
- Select Service Name: look for the workflow-controller service
- Click Run query to list recent traces
- Click a trace to open it
You should see a span hierarchy like:
workflow— the lifetime of the workflownode(one per DAG node: A, B, C, D) — each node in the DAGcreateWorkflowPod— pod creation
reconcileWorkflow— reconciliation loops
Each workflow pod also produces spans from argoexec: runInitContainer, runMainContainer, runWaitContainer, and their children.
See Tracing for the full span reference.
Step 4: View Metrics¶
- Go to Explore
- Select the Prometheus data source
- Try these example
PromQLqueries:
# Workflows currently running
gauge{phase="Running"}
# Workflow phase counter over 5 minutes
rate(total_count_total{phase="Error"}[5m])
# Operation duration (p95)
histogram_quantile(0.95, rate(operation_duration_seconds_bucket[5m]))
See Metrics for the full list of available metrics.
Cleanup¶
Remove all resources created in this guide:
kubectl delete namespace argo
Next Steps¶
- Telemetry — overview of all telemetry signals
- Tracing — full span reference
- Metrics — available metrics and custom metrics
- Telemetry Configuration — environment variables and ConfigMap options
- Workflow Telemetry — custom metrics defined in workflows
- OpenTelemetry Collector docs
- OpenTelemetry Operator — alternative collector deployment with auto-instrumentation