Tolerating Pod Deletion¶
v2.12 and after
In Kubernetes, pods are cattle and can be deleted at any time. Deletion could be manually via kubectl delete pod
, during a node drain, or for other reasons.
This can be very inconvenient, your workflow will error, but for reasons outside of your control.
A pod disruption budget can reduce the likelihood of this happening. But, it cannot entirely prevent it.
To retry pods that were deleted, set retryStrategy.retryPolicy: OnError
.
This can be set at a workflow-level, template-level, or globally (using workflow defaults)
Example¶
Run the following workflow (which will sleep for 30s):
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: example
spec:
retryStrategy:
retryPolicy: OnError
limit: 1
entrypoint: main
templates:
- name: main
container:
image: busybox
command:
- sleep
- 30s
Then execute kubectl delete pod example
. You'll see that the errored node is automatically retried.
💡 Read more on architecting workflows for reliability.