Skip to main content

Fusion with Google cloud Batch and Google object storage

Fusion allows the use of Google Storage as a virtual distributed file system with Google Cloud Batch.

The minimal Nextflow configuration looks like the following:

fusion.enabled = true
wave.enabled = true
process.scratch = false
process.executor = 'google-batch'
google.location = '<YOUR GOOGLE LOCATION>'

In the above snippet replace YOUR GOOGLE LOCATION with the Google region of your choice e.g. europe-west2, and save it to a file named nextflow.config into the pipeline launching directory.

Then launch the pipeline execution with the usual run command:

nextflow run <YOUR PIPELINE SCRIPT> -w gs://<YOUR-BUCKET>/work

Make sure to specify a Google Storage bucket to which you have read-write access as work directory.

Google credentials should be provided via the GOOGLE_APPLICATION_CREDENTIALS environment variable or by using the gcloud auth application-default login command. You can find more details at in the Nextflow documentation.

When Fusion is enabled, by default, only machine types that allow the mount of local SSD disks will be used. If you specify your own machine type or machine series make sure they allow the use of local SSD disks, otherwise the job scheduling will fail.