Skip to content

Tolerating Pod Deletion

v2.12 and after

In Kubernetes, pods are cattle and can be deleted at any time. Deletion could be manually via kubectl delete pod, during a node drain, or for other reasons.

This can be very inconvenient, your workflow will error, but for reasons outside of your control.

A pod disruption budget can reduce the likelihood of this happening. But, it cannot entirely prevent it.

To retry pods that were deleted, set retryStrategy.retryPolicy: OnError.

This can be set at a workflow-level, template-level, or globally (using workflow defaults)

Example

Run the following workflow (which will sleep for 30s):

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: example
spec:
  retryStrategy:
   retryPolicy: OnError
   limit: 1
  entrypoint: main
  templates:
    - name: main
      container:
        image: docker/whalesay:latest
        command:
          - sleep
          - 30s

Then execute kubectl delete pod example. You'll see that the errored node is automatically retried.

💡 Read more on architecting workflows for reliability.

Comments