Tolerating Pod Deletion¶

v2.12 and after

In Kubernetes, pods are cattle and can be deleted at any time. Deletion could be manually via kubectl delete pod, during a node drain, or for other reasons.

This can be very inconvenient, your workflow will error, but for reasons outside of your control.

A pod disruption budget can reduce the likelihood of this happening. But, it cannot entirely prevent it.

To retry pods that were deleted, set retryStrategy.retryPolicy: OnError.

This can be set at a workflow-level, template-level, or globally (using workflow defaults)

Example¶

Run the following workflow (which will sleep for 30s):

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: example
spec:
  retryStrategy:
   retryPolicy: OnError
   limit: 1
  entrypoint: main
  templates:
    - name: main
      container:
        image: busybox
        command:
          - sleep
          - 30s

Then execute kubectl delete pod example. You'll see that the errored node is automatically retried.

💡 Read more on architecting workflows for reliability.

Have a question?

Search on GitHub Discussions and Slack.