Google Cloud Batch

note

This guide assumes you have an existing Google Cloud account. Sign up for a free account here. Seqera Platform provides integration to Google Cloud via the Batch API.

The guide is split into two parts:

How to configure your Google Cloud account to use the Batch API.
How to create a Google Cloud Batch compute environment in Seqera.

Configure Google Cloud

Create a project

Go to the Google Project Selector page and select an existing project, or select Create project.

Enter a name for your new project, e.g., tower-nf.

If you are part of an organization, the location will default to your organization.

Enable billing

See here to enable billing in your Google Cloud account.

Enable APIs

See here to enable the following APIs for your project:

Batch API
Compute Engine API
Cloud Storage API

Select your project from the drop-down and select Enable.

Alternatively, you can enable each API manually by selecting your project in the navigation bar and visiting each API page:

IAM

Seqera requires a service account with appropriate permissions to interact with your Google Cloud resources. As an IAM user, you must have access to the service account that submits Batch jobs.

caution

By default, Google Cloud Batch uses the default Compute Engine service account to submit jobs. This service account is granted the Editor (roles/Editor) role. While this service account has the necessary permissions needed by Seqera, this role is not recommended for production environments. Control job access using a custom service account with only the permissions necessary for Seqera to execute Batch jobs instead.

Service account permissions

Create a custom service account with at least the following permissions:

Batch Agent Reporter (roles/batch.agentReporter) on the project
Batch Job Editor (roles/batch.jobsEditor) on the project
Logs Writer (roles/logging.logWriter) on the project (to let jobs generate logs in Cloud Logging)
Logs Viewer (roles/logging.logViewer) on the project (to view and retrieve logs from Cloud Logging)
Service Account User (roles/iam.serviceAccountUser)

If your Google Cloud project does not require access restrictions on any of its Cloud Storage buckets, you can grant project Storage Admin (roles/storage.admin) permissions to your service account to simplify setup. To grant access only to specific buckets, add the service account as a principal on each bucket individually. See Cloud Storage bucket below.

User permissions

Ask your Google Cloud administrator to grant you the following IAM user permissions to interact with your custom service account:

Batch Job Editor (roles/batch.jobsEditor) on the project
Service Account User (roles/iam.serviceAccountUser) on the job's service account (default: Compute Engine service account)
View Service Accounts (roles/iam.serviceAccountViewer) on the project

Authentication methods

Seqera supports two methods for authenticating with Google Cloud:

Service account keys

To authenticate using a service account key, create a service account JSON key file:

In the Google Cloud navigation menu, select IAM & Admin > Service Accounts.
Select the email address of the service account.

note
The Compute Engine default service account is not recommended for production environments due to its powerful permissions. To use a service account other than the Compute Engine default, specify the service account email address under Advanced options on the Seqera compute environment creation form.
Select Keys > Add key > Create new key.
Select JSON as the key type.
Select Create.

A JSON file is downloaded to your computer. This file contains the credential needed to configure the compute environment in Seqera.

You can manage your key from the Service Accounts page.

Workload Identity Federation

Workload Identity Federation (WIF) is the recommended authentication method for production and regulated environments because it eliminates the need for long-lived service account keys. WIF uses short-lived OIDC tokens for authentication, which are generated by Seqera Platform.

Platform's OIDC issuer is the issuer value advertised at https://cloud.seqera.io/api/.well-known/openid-configuration.

Setting up WIF requires the following steps in the GCP Console:

Create a Workload Identity Pool and Provider in your Google Cloud project.
Set Seqera as an OIDC provider within the pool. Set the Issuer URL to https://cloud.seqera.io/api.
Set the Allowed audiences. If left empty, GCP derives a default audience from the provider resource path in the format //iam.googleapis.com/projects/{PROJECT}/locations/global/workloadIden tityPools/{POOL}/providers/{PROVIDER}. If you specify a custom value, it must match exactly what you enter in the Token audience field when creating the Google WIF credential in Seqera.
Define an attribute mapping and condition. At a minimum set google.subject=assertion.sub. This maps the subject claim from Seqera's JWT to GCP's identity space. For more information see here. You may see a pop-up asking to configure your application and provide an OIDC ID token path. This pop-up can be dismissed.
Grant roles/iam.workloadIdentityUser on the service account that WIF will impersonate to the Workload Identity Pool principal. This can be set for all pool identities or for a specific workspace. If you have not yet created a service account do so following the guidelines above.
If you use the same WIF credential for Data Explorer, grant roles/iam.serviceAccountTokenCreator on the service account to itself:
```
gcloud iam service-accounts add-iam-policy-binding SA_EMAIL \
  --member="serviceAccount:SA_EMAIL" \
  --role="roles/iam.serviceAccountTokenCreator"
```
Replace SA_EMAIL with the service account email. Without this role, viewing or downloading file contents in Data Explorer fails with a signing error. Pipeline runs are not affected.

After setting up WIF in the GCP Console, you need the following information to create a credential in Seqera Platform:

Service Account Email: The email address of the Google Cloud service account that WIF will impersonate.
Workload Identity Provider: The full resource path of the Workload Identity Provider (e.g., projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/POOL_ID/providers/PROVIDER_ID).
Token Audience (optional): The intended audience for the OIDC token. Configure this if your Workload Identity Provider requires a specific audience value. Ensure this matches what you have configured in the Allowed Audiences value in the GCP console.

caution

If WIF authentication fails at runtime, verify that:

The service account has the required roles (see Service account permissions).
The Workload Identity Pool principal has roles/iam.workloadIdentityUser on the service account.
The Issuer URL configured in the WIF provider matches Platform's URL.
The Token Audience in the credential (if set) matches the Allowed Audiences in the WIF provider.

Cloud Storage bucket

Google Cloud Storage is a type of object storage. To access files and store the results for your pipelines, create a Cloud bucket that your Seqera service account can access.

Create a Cloud Storage bucket

In the hamburger menu (≡), select Cloud Storage.
From the Buckets tab, select Create.
Enter a name for your bucket. You will reference this name when you create the compute environment in Platform.
Select Region for the Location type and select the Location for your bucket. You'll reference this location when you create the compute environment in Seqera.

note
The Batch API is available in a limited number of locations. These locations are only used to store metadata about the pipeline operations. The storage bucket and compute resources can be in any region.
Select Standard for the default storage class.
To restrict public access to your bucket data, select the Enforce public access prevention on this bucket checkbox.
Under Access control, select Uniform.
Select any additional object data protection tools, per your organization's data protection requirements.
Select Create.

Assign bucket permissions

After the bucket is created, you are redirected to the Bucket details page.
Select Permissions, then Grant access under View by principals.
Copy the email address of your service account into New principals.
Select the Storage Admin role, then select Save.

tip

You've created a project, enabled the necessary Google APIs, created a bucket, and created a service account JSON key file with the required credentials. You now have what you need to set up a new compute environment in Seqera.

Seqera compute environment

caution

Your Seqera compute environment uses resources that you may be charged for in your Google Cloud account. See Cloud costs for guidelines to manage cloud resources effectively and prevent unexpected costs.

After your Google Cloud resources have been created, create a new Platform compute environment:

In a workspace, select Compute Environments > New Environment.
Enter a descriptive name for this environment, e.g., Google Cloud Batch (europe-north1).
Select Google Cloud Batch as the target platform.

Credentials

From the Credentials drop-down, select existing Google credentials or select + to add new credentials. If you choose to use existing credentials, skip to the next section.
Enter a name for the credentials, e.g., Google Cloud Credentials.
Paste the contents of the JSON file created previously in the Service account key field.

Location and work directory

Select the Location where you will execute your pipelines. See Location to learn more.

In the Work directory field, enter your storage bucket URL, e.g., gs://my-bucket. This bucket must be accessible in the location selected in the previous step.

note

When you specify a Cloud Storage bucket as your work directory, this bucket is used for the Nextflow cloud cache by default. You can specify an alternative cache location with the Nextflow config file field on the pipeline launch form.

Seqera features

Select Enable Wave containers to facilitate access to private container repositories and provision containers in your pipelines using the Wave containers service. See Wave containers for more information.
Select Enable Fusion v2 to allow access to your Google Cloud Storage data via the Fusion v2 virtual distributed file system. This speeds up most data operations. The Fusion v2 file system requires Wave containers to be enabled. See Fusion file system for configuration details.

note

The compute recommendations below are based on internal benchmarking performed by Seqera. Benchmark runs of nf-core/rnaseq used profile test_full, consisting of an input dataset with 16 FASTQ files and a total size of approximately 123.5 GB.

Use Seqera Platform version 23.1 or later.
Use a Google Cloud Storage bucket as the work directory.
Enable Wave containers and Fusion v2.
Specify suitable virtual machine types and local storage settings, or accept the default machine settings listed below. An n2-highmem-16-lssd VM or larger is recommended for production use.

note

To specify virtual machine settings in Platform during compute environment creation, use the Global Nextflow config field to apply custom Nextflow process directives to all pipeline runs launched with this compute environment.

To specify virtual machine settings per pipeline run in Platform, or as a persistent configuration in your Nextflow pipeline repository, use Nextflow process directives. See Google Cloud Batch process definition for more information.

When Fusion v2 is enabled, the following virtual machine settings are applied:

A 375 GB local NVMe SSD is selected for all compute jobs.
If you do not specify a machine type, a VM from families that support local SSDs is selected.
Any machine types you specify in the Nextflow config must support local SSDs.
Local SSDs are only offered in multiples of 375 GB. You can increment the number of SSDs used per process with the disk directive to request multiples of 375 GB. To work with files larger than 100 GB, use at least two SSDs (750 GB or more).
Fusion v2 can also use persistent disks for caching. Override the disk requested by Fusion using the disk directive and the type: pd-standard.
The machineType directive can be used to specify a VM instance type, family, or custom machine type in a comma-separated list of patterns. For example, c2-*, n1-standard-1, custom-2-4, n*, m?-standard-*.

note

Wave containers and Fusion v2 are recommended features for added capability and improved performance, but neither are required to execute workflows in your compute environment.

GCP resources

Enable Spot to use Spot instances, which have significantly reduced cost compared to On-Demand instances.

From Nextflow version 24.10, the default Spot reclamation retry setting changed to 0 on AWS and Google. By default, no internal retries are attempted on these platforms. Spot reclamations now lead to an immediate failure, exposed to Nextflow in the same way as other generic failures (returning for example, exit code 1 on AWS). Nextflow will treat these failures like any other job failure unless you actively configure a retry strategy. For more information, see Spot instance failures and retries.

info

When a Spot instance is reclaimed by Google Cloud, Seqera Platform displays a human-readable description in the task details. Google Batch reserves exit codes in the 50001–59999 range for infrastructure events:

Exit code	Description
50001	Spot instance was reclaimed by Google Cloud
50002	VM became unresponsive (host event or crash)
50003	VM unexpectedly rebooted during task execution
50004	Task reached unresponsive time limit and could not be cancelled
50005	Task exceeded maximum allowed runtime

Exit codes 50006–59999 display a generic infrastructure failure message. Standard application exit codes (1–255) are displayed as before.

Apply Resource labels to the cloud resources consumed by this compute environment. Workspace default resource labels are prefilled.

Scripting and environment variables

Expand Staging options to include:
- Optional pre- or post-run Bash scripts that execute before or after the Nextflow pipeline execution in your environment.
- Global Nextflow configuration settings for all pipeline runs launched with this compute environment. Values defined here are pre-filled in the Nextflow config file field in the pipeline launch form. These values can be overridden during pipeline launch.
info
Configuration settings in this field override the same values in the pipeline repository nextflow.config file. See Nextflow config file for more information on configuration priority.
Under Environment variables, add each variable with a Name, Value, and Target Environment:
- Head job: Adds the variable to the Nextflow head job container, which evaluates nextflow.config and submits tasks to the compute backend. Use this target for variables that Nextflow or its plugins read, such as NXF_OPTS, NXF_JVM_ARGS, NXF_PLUGINS_DEFAULT, or proxy settings the head node uses to reach external services.
- Compute job: Adds the variable to the worker containers that run individual pipeline tasks. Use this target for variables your pipeline tools read, such as OPENAI_API_KEY for a process that calls the OpenAI API, registry credentials needed inside the task container, or tool-specific settings like JAVA_HOME.
- Head and Compute jobs: Adds the variable to both the head job and the compute jobs. Use this target for values needed in both places, such as an HTTP proxy used by both Nextflow and task tools, or a credential needed in both the head job and individual compute tasks.
note
For sensitive values such as API keys and tokens, use pipeline secrets instead of custom environment variables. Custom environment variables are stored in the compute environment configuration and cannot be edited after creation. To rotate a value, recreate the compute environment.

Advanced options

note

If you use VM instance templates for the head or compute jobs (see below), resource allocation and networking values specified in the templates override any conflicting values you specify while creating your Seqera compute environment.

Enable Use Private Address to ensure that your Google Cloud VMs aren't accessible to the public internet.
Use Boot disk size to control the persistent disk size that each task and the head job are provided.
Use Boot Disk Image to select a specific boot disk image for the compute instances. The drop-down is populated with available images from the GCP Compute API and supports autocomplete filtering. This field is optional. If not set, Google Batch uses the default image.
Use Instance Type to select a specific machine type for the compute instances. The drop-down is populated with available instance types for the selected region and supports autocomplete filtering. This field is optional. If not set, Google Batch selects an appropriate machine type automatically.

note
The Instance Type field sets a default machine type at the compute environment level. You can override this for individual processes using the machineType process directive in your Nextflow configuration.
Use Head job CPUs and Head job memory to specify the CPUs and memory allocated for the head job.
Use Service Account email to specify a service account email address other than the Compute Engine default to execute workflows with this compute environment (recommended for productions environments).
Use VPC and Subnet to specify the name of a VPC network and subnet to be used by this compute environment. You can apply network tags directly in the Network Tags field (see below) or through VM instance templates used for the Nextflow head and compute jobs.

note
You must specify both a VPC and Subnet for your compute environment to use either.
Use Network Tags to apply GCP network tags to the compute instances in this environment. Network tags control which firewall rules and routing policies apply to your instances within their VPC. Enter tags as free-text values. Tags must follow GCP format requirements: lowercase letters, numbers, and hyphens only, between 1 and 63 characters. You can add up to 64 tags per instance.

note
Network tags require a VPC and Subnet to be configured. This field is disabled when no VPC is set.
Use Head job instance template and Compute jobs instance template to specify the name or fully-qualified reference of a VM instance template, without the template:// prefix, to use for the head and compute jobs. VM instance templates allow you to define the resources allocated to Batch jobs. Configuration values defined in a VM instance template override any conflicting values you specify while creating your Seqera compute environment.

caution
Seqera does not validate the VM instance template you specify in these fields. Generally, use templates that define only the machine type, network, disk, and configuration values that will not change across multiple VM instances and Seqera compute environments. See Create instance templates for instructions to create your instance templates.
To prevent errors during workflow execution, ensure that the instance templates you use are suitably configured for your needs with an appropriate machine type. You can define multiple instance templates with varying machine type sizes in your Nextflow configuration using the machineType process directive (e.g., process.machineType = 'template://my-template-name'). You can use process selectors to assign separate templates to each of your processes.

Select Create to finalize the compute environment setup.

info

See Launch pipelines to start executing workflows in your Google Cloud Batch compute environment.

Help

Company

Google Cloud Batch

Configure Google Cloud

Create a project

Enable billing

Enable APIs

IAM

Service account permissions

User permissions

Authentication methods

Cloud Storage bucket

Create a Cloud Storage bucket

Assign bucket permissions

Seqera compute environment

Credentials

Location and work directory

Seqera features

GCP resources

Scripting and environment variables

Advanced options

Help

Company

Configure Google Cloud​

Create a project​

Enable billing​

Enable APIs​

IAM​

Service account permissions​

User permissions​

Authentication methods​

Cloud Storage bucket​

Create a Cloud Storage bucket​

Assign bucket permissions​

Seqera compute environment​

Credentials​

Location and work directory​

Seqera features​

GCP resources​

Scripting and environment variables​

Advanced options​

Configure Google Cloud

Create a project

Enable billing

Enable APIs

IAM

Service account permissions

User permissions

Authentication methods

Cloud Storage bucket

Create a Cloud Storage bucket

Assign bucket permissions

Seqera compute environment

Credentials

Location and work directory

Seqera features

GCP resources

Scripting and environment variables

Advanced options