Azure Cloud
This compute environment type is currently in public preview. Please consult this guide for the latest information on recommended configuration and limitations. This guide assumes you already have an Azure account with a valid Azure subscription.
Many of the current implementations of compute environments for cloud providers rely on the use of batch services such as AWS Batch, Azure Batch, and Google Batch for the execution and management of submitted jobs, including pipelines and Studio session environments. Batch services are suitable for large-scale workloads, but they add management complexity. In practical terms, the currently used batch services result in some limitations:
- Complex setup: Azure Batch compute environments require users to independently configure their own Batch accounts.
- Long-lived credentials: Azure Batch uses access keys to authenticate with Batch and Storage accounts. These credentials are long-lived, and Azure has hard limits on the number of access keys that can be created per resource type.
- Quotas: Azure Batch accounts have limits for jobs, pools, and compute resources. If these limits are exceeded, no additional pipelines can run until the existing resources are removed.
The Azure Cloud compute environment addresses these pain points with:
- Simplified configuration: Fewer configurable options, with opinionated defaults, provide the best Nextflow pipeline and Studio session execution environment, with both Wave and Fusion enabled.
- More secure credentials: Authenticate exclusively via Entra ID. This provides enhanced security by default, with automatic configuration for the user.
This type of compute environment is best suited to run Studios and small to medium-sized pipelines. It offers more predictable compute pricing, given the fixed instance types. It spins up a standalone virtual machine and executes a Nextflow pipeline or Studio session with a local executor on the virtual machine. At the end of the execution, the instance is terminated.
Limitations
- The Nextflow pipeline will run entirely on a single virtual machine. If the instance does not have sufficient resources, the pipeline execution will fail. For this reason, the number of tasks Nextflow can execute in parallel is limited by the number of cores of the instance type selected. If you need more computing resources, you must create a new compute environment with a larger instance type. This makes the compute environment less suited for larger, more complex pipelines.
- There is a considerable delay before streaming logs can be queried. This means that if your pipeline completes in under a minute, you might not see streaming logs during the execution.