Skip to main content

Fusion Snapshots for AWS Batch

info

Fusion Snapshots is in private preview. Contact Seqera to request access to this feature.

Fusion Snapshots enable you to create a checkpoint of a running Nextflow process and restore it on a different machine. Leveraging the Fusion file system, you can move the task between different instances connected to the same S3 bucket.

More specifically, the first use case for this feature is for Seqera Platform users to leverage AWS Spot instances without restarting an entire task from scratch if an instance is terminated. When a Spot instance interruption occurs, the task is restored from the last checkpoint on a new AWS instance, saving time and computational resources.

Seqera Platform compute environment requirements

Fusion Snapshots v1.0.0 requires the following Seqera compute environment configuration:

  • Provider: AWS Batch
  • Pipeline work directory: An S3 bucket located in the same region as your AWS Batch compute resources
  • Enable Wave containers
  • Enable Fusion v2
  • Enable fast instance storage
  • Config mode: Batch Forge
  • Provisioning model: Spot
  • Nextflow config: Set AWS Batch max spot attempts and the custom Fusion container URL received from Seqera shown below
  • Instance types: See recommended instance sizes below
  • AMI ID: Amazon Linux 2023 ECS Optimized

Enable Snapshots via Nextflow config

note

Fusion Snapshots is a private preview feature that requires setting a custom fusion.containerConfigUrl. Contact Seqera to request access to this feature.

To use Fusion Snapshots in a Seqera AWS Batch compute environment, set the following Nextflow config values in the Nextflow config field under Staging options:

aws.batch.maxSpotAttempts= 5
fusion.containerConfigUrl = '<CUSTOM_CONTAINER_URL>'

maxSpotAttempts must be a value higher than 0.

EC2 instance selection guidelines

  • Choose EC2 Spot instances with sufficient memory and network bandwidth to dump the cache of task intermediate files to S3 storage before AWS terminates an instance.
  • Select instances with guaranteed network bandwidth (not instances with bandwidth "up to" a maximum value).
  • Maintain a 5:1 ratio between memory (GiB) and network bandwidth (Gbps).
  • Recommended instance families: c6id, r6id, or m6id series instances work optimally with Fusion fast instance storage.
Example

A c6id.8xlarge instance provides 64 GiB memory and 12.5 Gbps guaranteed network bandwidth. This configuration can transfer the entire memory contents to S3 in approximately 70 seconds, well within the 2-minute reclamation window.

Instances with memory:bandwitdth ratios over 5:1 may not complete transfers before termination, potentially resulting in task failures.

Instance typeMemory (GiB)Network bandwidth (Gbps)Memory:bandwidth ratioEst. Snapshot time
c6id.4xlarge3212.52.56:1~45 seconds
c6id.8xlarge6412.55.12:1~70 seconds
r6id.2xlarge6412.55.12:1~70 seconds
m6id.4xlarge6412.55.12:1~70 seconds
c6id.12xlarge9618.755.12:1~70 seconds
r6id.4xlarge12812.510.24:1~105 seconds
m6id.8xlarge128255.12:1~70 seconds

(Seqera Enterprise only) Select an Amazon Linux 2023 ECS-optimized AMI

To obtain sufficient performance, Fusion Snapshots require instances with Amazon Linux 2023 (which ships with Linux Kernel 6.1), with an ECS Container-optimized AMI.

note

Selecting a custom Amazon Linux 2023 ECS-optimized AMI is only required for compute environments in Seqera Enterprise deployments. Seqera Cloud AWS Batch compute environments use Amazon Linux 2023 AMIs by default.

To find the recommended AL2023 ECS-optimized AMI for your region, run the following (replace eu-central-1 with your AWS region):

export REGION=eu-central-1
aws ssm get-parameter --name "/aws/service/ecs/optimized-ami/amazon-linux-2023/recommended" --region $REGION

The result for the eu-central-1 region is similar to the following:

{
"Parameter": {
"Name": "/aws/service/ecs/optimized-ami/amazon-linux-2023/recommended",
"Type": "String",
"Value": "{\"ecs_agent_version\":\"1.88.0\",\"ecs_runtime_version\":\"Docker version 25.0.6\",\"image_id\":\"ami-0281c9a5cd9de63bd\",\"image_name\":\"al2023-ami-ecs-hvm-2023.0.20241115-kernel-6.1-x86_64\",\"image_version\":\"2023.0.20241115\",\"os\":\"Amazon Linux 2023\",\"schema_version\":1,\"source_image_name\":\"al2023-ami-minimal-2023.6.20241111.0-kernel-6.1-x86_64\"}",
"Version": 61,
"LastModifiedDate": "2024-11-18T17:08:46.926000+01:00",
"ARN": "arn:aws:ssm:eu-central-1::parameter/aws/service/ecs/optimized-ami/amazon-linux-2023/recommended",
"DataType": "text"
}

Note the image_id in your result (in this example, ami-0281c9a5cd9de63bd). Specify this ID in the AMI ID field under Advanced options when you create your Seqera compute environment.