Skip to content

Artifacts

Note

You will need to configure an artifact repository to run this example.

When running workflows, it is very common to have steps that generate or consume artifacts. Often, the output artifacts of one step may be used as input artifacts to a subsequent step.

The below workflow spec consists of two steps that run in sequence. The first step named generate-artifact will generate an artifact using the whalesay template that will be consumed by the second step named print-message that then consumes the generated artifact.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-passing-
spec:
  entrypoint: artifact-example
  templates:
  - name: artifact-example
    steps:
    - - name: generate-artifact
        template: whalesay
    - - name: consume-artifact
        template: print-message
        arguments:
          artifacts:
          # bind message to the hello-art artifact
          # generated by the generate-artifact step
          - name: message
            from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"

  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["cowsay hello world | tee /tmp/hello_world.txt"]
    outputs:
      artifacts:
      # generate hello-art artifact from /tmp/hello_world.txt
      # artifacts can be directories as well as files
      - name: hello-art
        path: /tmp/hello_world.txt

  - name: print-message
    inputs:
      artifacts:
      # unpack the message input artifact
      # and put it at /tmp/message
      - name: message
        path: /tmp/message
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["cat /tmp/message"]

The whalesay template uses the cowsay command to generate a file named /tmp/hello-world.txt. It then outputs this file as an artifact named hello-art. In general, the artifact's path may be a directory rather than just a file. The print-message template takes an input artifact named message, unpacks it at the path named /tmp/message and then prints the contents of /tmp/message using the cat command. The artifact-example template passes the hello-art artifact generated as an output of the generate-artifact step as the message input artifact to the print-message step. DAG templates use the tasks prefix to refer to another task, for example {{tasks.generate-artifact.outputs.artifacts.hello-art}}.

Optionally, for large artifacts, you can set podSpecPatch in the workflow spec to increase the resource request for the init container and avoid any Out of memory issues.

<... snipped ...>
  - name: large-artifact
    # below patch gets merged with the actual pod spec and increses the memory
    # request of the init container.
    podSpecPatch: |
      initContainers:
        - name: init
          resources:
            requests:
              memory: 2Gi
              cpu: 300m
    inputs:
      artifacts:
      - name: data
        path: /tmp/large-file
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["cat /tmp/large-file"]
<... snipped ...>

Artifacts are packaged as Tarballs and gzipped by default. You may customize this behavior by specifying an archive strategy, using the archive field. For example:

<... snipped ...>
    outputs:
      artifacts:
        # default behavior - tar+gzip default compression.
      - name: hello-art-1
        path: /tmp/hello_world.txt

        # disable archiving entirely - upload the file / directory as is.
        # this is useful when the container layout matches the desired target repository layout.   
      - name: hello-art-2
        path: /tmp/hello_world.txt
        archive:
          none: {}

        # customize the compression behavior (disabling it here).
        # this is useful for files with varying compression benefits, 
        # e.g. disabling compression for a cached build workspace and large binaries, 
        # or increasing compression for "perfect" textual data - like a json/xml export of a large database.
      - name: hello-art-3
        path: /tmp/hello_world.txt
        archive:
          tar:
            # no compression (also accepts the standard gzip 1 to 9 values)
            compressionLevel: 0
<... snipped ...>

Artifact Garbage Collection

As of version 3.4 you can configure your Workflow to automatically delete Artifacts that you don't need (visit artifact repository capability for the current supported store engine).

Artifacts can be deleted OnWorkflowCompletion or OnWorkflowDeletion. You can specify your Garbage Collection strategy on both the Workflow level and the Artifact level, so for example, you may have temporary artifacts that can be deleted right away but a final output that should be persisted:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-gc-
spec:
  entrypoint: main
  artifactGC:
    strategy: OnWorkflowDeletion  # default Strategy set here applies to all Artifacts by default
  templates:
    - name: main
      container:
        image: argoproj/argosay:v2
        command:
          - sh
          - -c
        args:
          - |
            echo "can throw this away" > /tmp/temporary-artifact.txt
            echo "keep this" > /tmp/keep-this.txt
      outputs:
        artifacts:
          - name: temporary-artifact
            path: /tmp/temporary-artifact.txt
            s3:
              key: temporary-artifact.txt
          - name: keep-this
            path: /tmp/keep-this.txt
            s3:
              key: keep-this.txt
            artifactGC:
              strategy: Never   # optional override for an Artifact

Artifact Naming

Consider parameterizing your S3 keys by {{workflow.uid}}, etc (as shown in the example above) if there's a possibility that you could have concurrent Workflows of the same spec. This would be to avoid a scenario in which the artifact from one Workflow is being deleted while the same S3 key is being generated for a different Workflow.

Service Accounts and Annotations

Does your S3 bucket require you to run with a special Service Account or IAM Role Annotation? You can either use the same ones you use for creating artifacts or generate new ones that are specific for deletion permission. Generally users will probably just have a single Service Account or IAM Role to apply to all artifacts for the Workflow, but you can also customize on the artifact level if you need that:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-gc-
spec:
  entrypoint: main
  artifactGC:
    strategy: OnWorkflowDeletion 
    ##############################################################################################
    #    Workflow Level Service Account and Metadata
    ##############################################################################################
    serviceAccountName: my-sa
    podMetadata:
      annotations:
        eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/my-iam-role
  templates:
    - name: main
      container:
        image: argoproj/argosay:v2
        command:
          - sh
          - -c
        args:
          - |
            echo "can throw this away" > /tmp/temporary-artifact.txt
            echo "keep this" > /tmp/keep-this.txt
      outputs:
        artifacts:
          - name: temporary-artifact
            path: /tmp/temporary-artifact.txt
            s3:
              key: temporary-artifact-{{workflow.uid}}.txt
            artifactGC:
              ####################################################################################
              #    Optional override capability
              ####################################################################################
              serviceAccountName: artifact-specific-sa
              podMetadata:
                annotations:
                  eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/artifact-specific-iam-role
          - name: keep-this
            path: /tmp/keep-this.txt
            s3:
              key: keep-this-{{workflow.uid}}.txt
            artifactGC:
              strategy: Never

If you do supply your own Service Account you will need to create a RoleBinding that binds it with a role like this:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  annotations:
    workflows.argoproj.io/description: |
      This is the minimum recommended permissions needed if you want to use artifact GC.
  name: artifactgc
rules:
- apiGroups:
  - argoproj.io
  resources:
  - workflowartifactgctasks
  verbs:
  - list
  - watch
- apiGroups:
  - argoproj.io
  resources:
  - workflowartifactgctasks/status
  verbs:
  - patch

This is the artifactgc role if you installed using one of the quick-start manifest files. If you installed with the install.yaml file for the release then the same permissions are in the argo-cluster-role.

If you don't use your own ServiceAccount and are just using default ServiceAccount, then the role needs a RoleBinding or ClusterRoleBinding to default ServiceAccount.

What happens if Garbage Collection fails?

If deletion of the artifact fails for some reason (other than the Artifact already having been deleted which is not considered a failure), the Workflow's Status will be marked with a new Condition to indicate "Artifact GC Failure", a Kubernetes Event will be issued, and the Argo Server UI will also indicate the failure. For additional debugging, the user should find 1 or more Pods named <wfName>-artgc-* and can view the logs.

If the user needs to delete the Workflow and its child CRD objects, they will need to patch the Workflow to remove the finalizer preventing the deletion:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
  finalizers:
  - workflows.argoproj.io/artifact-gc

The finalizer can be deleted by doing:

kubectl patch workflow my-wf \
    --type json \
    --patch='[ { "op": "remove", "path": "/metadata/finalizers" } ]'

Or for simplicity use the Argo CLI argo delete command with flag --force, which under the hood removes the finalizer before performing the deletion.

Release Versions >= 3.5

A flag has been added to the Workflow Spec called forceFinalizerRemoval (see here) to force the finalizer's removal even if Artifact GC fails:

spec:
  artifactGC:
    strategy: OnWorkflowDeletion 
    forceFinalizerRemoval: true

Comments