Azure Batch
This guide assumes you already have an Azure account with a valid Azure Subscription. For details, visit Azure Free Account. Ensure you have sufficient permissions to create resource groups, an Azure Storage account, and an Azure Batch account.
Concepts
Accounts
Seqera Platform relies on an existing Azure Storage and Batch account. You need at least one valid Storage and Batch account within your subscription.
Azure uses 'accounts' for each service. For example, an Azure Storage account will house a collection of blob containers, file shares, queues, and tables. While you can have multiple Azure Storage and Batch accounts in an Azure subscription, each compute environment on Platform can only use one of each (one storage and one Batch account). You can set up multiple compute environments on Platform with different credentials, storage accounts, and Batch accounts.
Resource group
To create Batch and Storage accounts, first create a resource group in your preferred region.
A resource group can be created while creating a Storage or Batch account.
Regions
Azure resources can operate across regions, but this incurs additional costs and security requirements. It is recommended to place all resources in the same region. See the Azure product page on data residency for more information.
Resource group
A resource group in Azure is a unit of related resources in Azure. As a rule of thumb, resources that have a similar lifecycle should be within the same resource group. You can delete a resource group and all associated components together. We recommend placing all platform compute resources in the same resource group, but this is not necessary.
Create a resource group
- Log in to your Azure account, go to the Create Resource group page, and select Create new resource group.
- Enter a name for the resource group, such as seqeracompute.
- Choose the preferred region.
- Select Review and Create to proceed.
- Select Create.
Storage account
After creating a resource group, set up an Azure storage account.
Create a storage account
-
Log in to your Azure account, go to the Create storage account page, and select Create a storage account.
If you haven't created a resource group, you can do so now.
-
Enter a name for the storage account, such as seqeracomputestorage.
-
Choose the preferred region (same as the Batch account).
-
The platform supports any performance or redundancy settings — select the most appropriate settings for your use case.
-
Select Next: Advanced.
-
Enable storage account key access.
-
Select Next: Networking.
- Enable public access from all networks. You can enable public access from selected virtual networks and IP addresses, but you will be unable to use Forge to create compute resources. Disabling public access is not supported.
-
Select Data protection.
- Configure appropriate settings. All settings are supported by the platform.
-
Select Encryption.
- Only Microsoft-managed keys (MMK) are supported.
-
In tags, add any required tags for the storage account.
-
Select Review and Create.
-
Select Create to create the Azure Storage account.
-
You will need at least one blob storage container to act as a working directory for Nextflow.
-
Go to your new storage account and select + Container to create a new Blob storage container. A new container dialogue will open. Enter a suitable name, such as seqeracomputestorage-container.
-
Go to the Access Keys section of your new storage account (seqeracomputestorage in this example).
-
Store the access keys for your Azure Storage account, to be used when you create a Seqera compute environment.
Blob container storage credentials are associated with the Batch pool configuration. Avoid changing these credentials in your Seqera instance after you have created the compute environment.
Batch account
After you have created a resource group and storage account, create a Batch account.
Create a Batch account
-
Log in to your Azure account and select Create a batch account on this page.
-
Select the existing resource group or create a new one.
-
Enter a name for the Batch account, such as seqeracomputebatch.
-
Choose the preferred region (same as the storage account).
-
Select Advanced.
-
For Pool allocation mode, select Batch service.
-
For Authentication mode, ensure Shared Key is selected.
-
Select Networking. Ensure networking access is sufficient for Platform and any additional required resources.
-
In tags, add any required tags for the Batch account.
-
Select Review and Create.
-
Select Create.
-
Go to your new Batch account, then select Access Keys.
-
Store the access keys for your Batch account, to be used when you create a Seqera compute environment.
A newly-created Azure Batch account may not be entitled to create virtual machines without making a service request to Azure. See Azure Batch service quotas and limits for more information.
-
Select the + Quotas tab of the Azure Batch account to check and increase existing quotas if necessary.
-
Select + Request quota increase and add the quantity of resources you require. Here is a brief guideline:
- Active jobs and schedules: Each Nextflow process will require an active Batch job per pipeline while running, so increase this number to a high level. See here to learn more about jobs in Batch.
- Pools: Each platform compute environment requires one Batch pool. Each pool is composed of multiple machines of one virtual machine size.
To use separate pools for head and compute nodes, see this FAQ entry.
- Batch accounts per region per subscription: Set this to the number of Azure Batch accounts per region per subscription. Only one is required.
- Spot/low-priority vCPUs: Platform does not support spot or low-priority machines when using Forge, so when using Forge this number can be zero. When manually setting up a pool, select an appropriate number of concurrent vCPUs here.
- Total Dedicated vCPUs per VM series: See the Azure documentation for virtual machine sizes to help determine the machine size you need. We recommend the latest version of the ED series available in your region as a cost-effective and appropriately-sized machine for running Nextflow. However, you will need to select alternative machine series that have additional requirements, such as those with additional GPUs or faster storage. Increase the quota by the number of required concurrent CPUs. In Azure, machines are charged per cpu minute so there is no additional cost for a higher number.
Compute environment
There are two ways to create an Azure Batch compute environment in Seqera Platform:
- Batch Forge: Automatically creates Batch resources.
- Manual: For using existing Batch resources.
Batch Forge
Batch Forge automatically creates resources that you may be charged for in your Azure account. See Cloud costs for guidelines to manage cloud resources effectively and prevent unexpected costs.
Create a Batch Forge compute environment:
-
In a workspace, select Compute Environments > New Environment.
-
Enter a descriptive name, such as Azure Batch (east-us).
-
Select Azure Batch as the target platform.
-
Choose existing Azure credentials or add a new credential. If you are using existing credentials, skip to step 7.
You can create multiple credentials in your Seqera environment.
-
Enter a name for the credentials, e.g., Azure Credentials.
-
Add the Batch account and Blob Storage account names and access keys.
-
Select a Region, e.g., eastus.
-
In the Pipeline work directory field, enter the Azure blob container created previously, e.g.,
az://seqeracomputestorage-container/work
.When you specify a Blob Storage bucket as your work directory, this bucket is used for the Nextflow cloud cache by default. You can specify an alternative cache location with the Nextflow config file field on the pipeline launch form.
-
Select Enable Wave containers to facilitate access to private container repositories and provision containers in your pipelines using the Wave containers service. See Wave containers for more information.
-
Select Enable Fusion v2 to allow access to your Azure Blob Storage data via the Fusion v2 virtual distributed file system. This speeds up most data operations. The Fusion v2 file system requires Wave containers to be enabled. See Fusion file system for configuration details.
-
Set the Config mode to Batch Forge.
-
Enter the default VMs type, depending on your quota limits set previously. The default is Standard_D4_v3.
-
Enter the VMs count. If autoscaling is enabled (default), this is the maximum number of VMs you wish the pool to scale up to. If autoscaling is disabled, this is the fixed number of virtual machines in the pool.
-
Enable Autoscale to scale up and down automatically, based on the number of pipeline tasks. The number of VMs will vary from 0 to VMs count.
-
Enable Dispose resources for Seqera to automatically delete the Batch pool if the compute environment is deleted on the platform.
-
Select or create Container registry credentials to authenticate a registry (used by the Wave containers service). It is recommended to use an Azure Container registry within the same region for maximum performance.
-
Apply Resource labels. This will populate the Metadata fields of the Batch pool.
-
Expand Staging options to include:
- Optional pre- or post-run Bash scripts that execute before or after the Nextflow pipeline execution in your environment.
- Global Nextflow configuration settings for all pipeline runs launched with this compute environment. Values defined here are pre-filled in the Nextflow config file field in the pipeline launch form. These values can be overridden during pipeline launch.
Configuration settings in this field override the same values in the pipeline repository
nextflow.config
file. See Nextflow config file for more information on configuration priority. -
Specify custom Environment variables for the Head job and/or Compute jobs.
-
Configure any advanced options you need:
- Use Jobs cleanup policy to control how Nextflow process jobs are deleted on completion. Active jobs consume the quota of the Azure Batch account. By default, jobs are terminated by Nextflow and removed from the quota when all tasks succesfully complete. If set to Always, all jobs are deleted by Nextflow after pipeline completion. If set to Never, jobs are never deleted. If set to On success, successful tasks are removed but failed tasks will be left for debugging purposes.
- Use Token duration to control the duration of the SAS token generated by Nextflow. This must be as long as the longest period of time the pipeline will run.
-
Select Add to finalize the compute environment setup. It will take a few seconds for all the resources to be created before the compute environment is ready to launch pipelines.
See Launch pipelines to start executing workflows in your Azure Batch compute environment.
Manual
This section is for users with a pre-configured Batch pool. This requires an existing Azure Batch account with an existing pool.
Your Seqera compute environment uses resources that you may be charged for in your Azure account. See Cloud costs for guidelines to manage cloud resources effectively and prevent unexpected costs.
Create a manual Seqera Azure Batch compute environment
-
In a workspace, select Compute Environments > New Environment.
-
Enter a descriptive name for this environment, such as Azure Batch (east-us).
-
Select Azure Batch as the target platform.
-
Select your existing Azure credentials or select + to add new credentials. If you choose to use existing credentials, skip to step 7.
You can create multiple credentials in your Seqera environment.
-
Enter a name, e.g., Azure Credentials.
-
Add the Batch account and Blob Storage credentials you created previously.
-
Select a Region, e.g., eastus (East US).
-
In the Pipeline work directory field, add the Azure blob container created previously, e.g.,
az://seqeracomputestorage-container/work
.When you specify a Blob Storage bucket as your work directory, this bucket is used for the Nextflow cloud cache by default. You can specify an alternative cache location with the Nextflow config file field on the pipeline launch form.
-
Set the Config mode to Manual.
-
Enter the Compute Pool name. This is the name of the Batch pool you created previously in the Azure Batch account.
The default Azure Batch implementation uses a single pool for head and compute nodes. To use separate pools for head and compute nodes (for example, to use low-priority VMs for compute jobs), see this FAQ entry.
-
Enter a user-assigned Managed identity client ID, if one is attached to your Batch pool. See Managed Identity below.
-
Apply Resource labels. This will populate the Metadata fields of the Azure Batch pool.
-
Expand Staging options to include:
- Optional pre- or post-run Bash scripts that execute before or after the Nextflow pipeline execution in your environment.
- Global Nextflow configuration settings for all pipeline runs launched with this compute environment. Values defined here are pre-filled in the Nextflow config file field in the pipeline launch form. These values can be overridden during pipeline launch.
Configuration settings in this field override the same values in the pipeline repository
nextflow.config
file. See Nextflow config file for more information on configuration priority.To use managed identities, Platform requires Nextflow version 24.06.0-edge or later. Add
export NXF_VER=24.06.0-edge
to the Global Nextflow config field for your compute environment to use this Nextflow version by default. -
Specify custom Environment variables for the Head job and/or Compute jobs.
-
Configure any advanced options you need:
- Use Jobs cleanup policy to control how Nextflow process jobs are deleted on completion. Active jobs consume the quota of the Batch account. By default, jobs are terminated by Nextflow and removed from the quota when all tasks succesfully complete. If set to Always, all jobs are deleted by Nextflow after pipeline completion. If set to Never, jobs are never deleted. If set to On success, successful tasks are removed but failed tasks will be left for debugging purposes.
- Use Token duration to control the duration of the SAS token generated by Nextflow. This must be as long as the longest period of time the pipeline will run.
-
Select Add to finalize the compute environment setup. It will take a few seconds for all the resources to be created before you are ready to launch pipelines.
See Launch pipelines to start executing workflows in your Azure Batch compute environment.