Azure Batch

Fusion simplifies and improves the efficiency of Nextflow pipelines in Azure Batch in several ways:

No need to use the Azure CLI tool for copying data to and from Azure Blob Storage.
No need to install the Azure CLI tool to the node machine
By replacing the Azure CLI with a native API client, the transfer is much more robust at scale.
By streaming relevant data and monitoring the virtual machine storage, Fusion can use more data than the capacity of the attached storage drive

Platform Azure Batch compute environments

Seqera Platform supports Fusion in Batch Forge and manual Azure Batch compute environments.

See Azure Batch for compute and storage recommendations and instructions to enable Fusion.

Nextflow CLI

tip

We recommend selecting machine types with a local temp storage disk of at least 200 GB and a random read speed of 1000 MBps or more for large and long-lived production pipelines. The suffix d after the core number (e.g., Standard_E16*d*_v5) denotes a VM with a local temp disk. Select instances with Standard SSDs — Fusion does not support Azure network-attached storage (Premium SSDv2, Ultra Disk, etc.). Larger local storage increases Fusion's throughput and reduces the chance of overloading the machine. See Sizes for virtual machines in Azure for more information.

Add the following to your nextflow.config file:

process.executor = 'azure-batch'
wave.enabled = true
fusion.enabled = true
tower.accessToken = '<PLATFORM_ACCESS_TOKEN>'    

Replace <PLATFORM_ACCESS_TOKEN> with your Platform access token.

Run the pipeline with the Nextflow run command:
```
nextflow run <PIPELINE_SCRIPT> -w az://<BLOB_STORAGE>/scratch
```
Replace the following:
- <PIPELINE_SCRIPT>: your pipeline Git repository URI.
- <BLOB_STORAGE>: your Azure Blob Storage.

Platform Azure Batch compute environments​

Nextflow CLI​

Platform Azure Batch compute environments

Nextflow CLI