Artifacts¶
Note
You will need to configure an artifact repository to run this example.
When running workflows, it is very common to have steps that generate or consume artifacts. Often, the output artifacts of one step may be used as input artifacts to a subsequent step.
The below workflow spec consists of two steps that run in sequence. The first step named generate-artifact
will generate an artifact using the hello-world-to-file
template that will be consumed by the second step named print-message-from-file
that then consumes the generated artifact.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: artifact-passing-
spec:
entrypoint: artifact-example
templates:
- name: artifact-example
steps:
- - name: generate-artifact
template: hello-world-to-file
- - name: consume-artifact
template: print-message-from-file
arguments:
artifacts:
# bind message to the hello-art artifact
# generated by the generate-artifact step
- name: message
from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"
- name: hello-world-to-file
container:
image: busybox
command: [sh, -c]
args: ["echo hello world | tee /tmp/hello_world.txt"]
outputs:
artifacts:
# generate hello-art artifact from /tmp/hello_world.txt
# artifacts can be directories as well as files
- name: hello-art
path: /tmp/hello_world.txt
- name: print-message-from-file
inputs:
artifacts:
# unpack the message input artifact
# and put it at /tmp/message
- name: message
path: /tmp/message
container:
image: alpine:latest
command: [sh, -c]
args: ["cat /tmp/message"]
The hello-world-to-file
template uses the echo
command to generate a file named /tmp/hello-world.txt
. It then outputs
this file as an artifact named hello-art
. In general, the artifact's path
may be a directory rather than just a file. The print-message-from-file
template takes an input artifact named message
, unpacks it at the path
named /tmp/message
and then prints the contents of /tmp/message
using the cat
command.
The artifact-example
template passes the hello-art
artifact generated as an output of the generate-artifact
step as the message
input artifact to the print-message-from-file
step. DAG templates use the tasks prefix to refer to another task, for example {{tasks.generate-artifact.outputs.artifacts.hello-art}}
.
Optionally, for large artifacts, you can set podSpecPatch
in the workflow spec to increase the resource request for the init container and avoid any Out of memory issues.
<... snipped ...>
- name: large-artifact
# below patch gets merged with the actual pod spec and increases the memory
# request of the init container.
podSpecPatch: |
initContainers:
- name: init
resources:
requests:
memory: 2Gi
cpu: 300m
inputs:
artifacts:
- name: data
path: /tmp/large-file
container:
image: alpine:latest
command: [sh, -c]
args: ["cat /tmp/large-file"]
<... snipped ...>
Artifacts are packaged as Tarballs and gzipped by default. You may customize this behavior by specifying an archive strategy, using the archive
field. For example:
<... snipped ...>
outputs:
artifacts:
# default behavior - tar+gzip default compression.
- name: hello-art-1
path: /tmp/hello_world.txt
# disable archiving entirely - upload the file / directory as is.
# this is useful when the container layout matches the desired target repository layout.
- name: hello-art-2
path: /tmp/hello_world.txt
archive:
none: {}
# customize the compression behavior (disabling it here).
# this is useful for files with varying compression benefits,
# e.g. disabling compression for a cached build workspace and large binaries,
# or increasing compression for "perfect" textual data - like a json/xml export of a large database.
- name: hello-art-3
path: /tmp/hello_world.txt
archive:
tar:
# no compression (also accepts the standard gzip 1 to 9 values)
compressionLevel: 0
<... snipped ...>
Artifact Garbage Collection¶
As of version 3.4 you can configure your Workflow to automatically delete Artifacts that you don't need (visit artifact repository capability for the current supported store engine).
Artifacts can be deleted OnWorkflowCompletion
or OnWorkflowDeletion
. You can specify your Garbage Collection strategy on both the Workflow level and the Artifact level, so for example, you may have temporary artifacts that can be deleted right away but a final output that should be persisted:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: artifact-gc-
spec:
entrypoint: main
artifactGC:
strategy: OnWorkflowDeletion # default Strategy set here applies to all Artifacts by default
templates:
- name: main
container:
image: argoproj/argosay:v2
command:
- sh
- -c
args:
- |
echo "can throw this away" > /tmp/temporary-artifact.txt
echo "keep this" > /tmp/keep-this.txt
outputs:
artifacts:
- name: temporary-artifact
path: /tmp/temporary-artifact.txt
s3:
key: temporary-artifact.txt
- name: keep-this
path: /tmp/keep-this.txt
s3:
key: keep-this.txt
artifactGC:
strategy: Never # optional override for an Artifact
Artifact Naming¶
Consider parameterizing your S3 keys by {{workflow.uid}}, etc (as shown in the example above) if there's a possibility that you could have concurrent Workflows of the same spec. This would be to avoid a scenario in which the artifact from one Workflow is being deleted while the same S3 key is being generated for a different Workflow.
Service Accounts and Annotations¶
Does your S3 bucket require you to run with a special Service Account or IAM Role Annotation? You can either use the same ones you use for creating artifacts or generate new ones that are specific for deletion permission. Generally users will probably just have a single Service Account or IAM Role to apply to all artifacts for the Workflow, but you can also customize on the artifact level if you need that:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: artifact-gc-
spec:
entrypoint: main
artifactGC:
strategy: OnWorkflowDeletion
##############################################################################################
# Workflow Level Service Account and Metadata
##############################################################################################
serviceAccountName: my-sa
podMetadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/my-iam-role
templates:
- name: main
container:
image: argoproj/argosay:v2
command:
- sh
- -c
args:
- |
echo "can throw this away" > /tmp/temporary-artifact.txt
echo "keep this" > /tmp/keep-this.txt
outputs:
artifacts:
- name: temporary-artifact
path: /tmp/temporary-artifact.txt
s3:
key: temporary-artifact-{{workflow.uid}}.txt
artifactGC:
####################################################################################
# Optional override capability
####################################################################################
serviceAccountName: artifact-specific-sa
podMetadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/artifact-specific-iam-role
- name: keep-this
path: /tmp/keep-this.txt
s3:
key: keep-this-{{workflow.uid}}.txt
artifactGC:
strategy: Never
If you do supply your own Service Account you will need to create a RoleBinding that binds it with a role like this:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
annotations:
workflows.argoproj.io/description: |
This is the minimum recommended permissions needed if you want to use artifact GC.
name: artifactgc
rules:
- apiGroups:
- argoproj.io
resources:
- workflowartifactgctasks
verbs:
- list
- watch
- apiGroups:
- argoproj.io
resources:
- workflowartifactgctasks/status
verbs:
- patch
This is the artifactgc
role if you installed using one of the quick-start manifest files. If you installed with the install.yaml
file for the release then the same permissions are in the argo-cluster-role
.
If you don't use your own ServiceAccount
and are just using default
ServiceAccount, then the role needs a RoleBinding or ClusterRoleBinding to default
ServiceAccount.
What happens if Garbage Collection fails?¶
If deletion of the artifact fails for some reason (other than the Artifact already having been deleted which is not considered a failure), the Workflow's Status will be marked with a new Condition to indicate "Artifact GC Failure", a Kubernetes Event will be issued, and the Argo Server UI will also indicate the failure. For additional debugging, the user should find 1 or more Pods named <wfName>-artgc-*
and can view the logs.
If the user needs to delete the Workflow and its child CRD objects, they will need to patch the Workflow to remove the finalizer preventing the deletion:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
finalizers:
- workflows.argoproj.io/artifact-gc
The finalizer can be deleted by doing:
kubectl patch workflow my-wf \
--type json \
--patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'
Or for simplicity use the Argo CLI argo delete
command with flag --force
, which under the hood removes the finalizer before performing the deletion.
Release Versions >= 3.5¶
A flag has been added to the Workflow Spec called forceFinalizerRemoval
(see here) to force the finalizer's removal even if Artifact GC fails:
spec:
artifactGC:
strategy: OnWorkflowDeletion
forceFinalizerRemoval: true