Installing and Configuring Nextflow on a RONIN Machine

If you’re working in bioinformatics, machine learning, or any domain where data pipelines are central, chances are you’ve heard of Nextflow. Built for portability, scalability, and reproducibility, Nextflow is a domain-specific language and workflow manager that makes it easy to define complex workflows using a simple script — and run them anywhere: from your laptop to high-performance clusters, cloud services, or Dockerised environments.

In this blog post, you’ll learn how to:

  • Install Nextflow on a fresh Ubuntu-based machine
  • Configure it to use persistent storage volumes (like EBS)
  • Run your first pipeline with ease

Let’s dive in and set up your Nextflow environment on your RONIN machine from scratch:

  1. Launch an Ubuntu 22.04 machine - you can keep the root drive small, but make sure you attach an additional SSD drive in step 4 of machine creation that is large enough for all of your inputs, outputs and temporary files. We will make sure Nextflow uses this drive to write all the input and output files so that your root drive doesn't fill up with temporary files during analysis and crash the operating system.
  2. Update apt:
sudo apt update && sudo apt upgrade -y
  1. Install Java - Nextflow relies on Java
sudo apt install openjdk-17-jdk -y
  1. Download and install the Nextflow binary
curl -s https://get.nextflow.io | bash
  1. Move the binary to a standard path folder so that you can call it from anywhere
sudo mv nextflow /usr/local/bin/
  1. Confirm Nextflow is installed correctly
nextflow -v
  1. Create Nextflow folders on your attached storage drive (modify /mnt/sdd as needed, but this is often the default mountpoint)
mkdir -p /mnt/sdd/nextflow-work
mkdir -p /mnt/sdd/tmp
  1. Set these directories as the default WORK and TMP directory for Nextflow by setting these variables in your .bashrc (this file is run each time you login to the terminal)
echo 'export NXF_WORK=/mnt/sdd/nextflow-work' >> ~/.bashrc
echo 'export TMPDIR=/mnt/sdd/tmp' >> ~/.bashrc
source ~/.bashrc
  1. Create a Nextflow configuration file
nano ~/.nextflow/config
  1. Also set these directories as the default in a Nextflow configuration file by pasting in the below, just to cover all bases (since some applications of Nextflow will not use the .bashrc file)
workDir = '/mnt/sdd/nextflow-work'

process {
    executor = 'local'
}

env.TMPDIR = '/mnt/sdd/tmp'
  1. Run a test workflow to make sure everything is configured correctly - the -resume flag will keep any temporary files to make sure things are being written to the correct directories we specified earlier
nextflow run hello -resume

This should:

  • Store workflow tasks under /mnt/sdd/nextflow-work
  • Use /mnt/sdd/tmp for temporary files

If you can see things being written in these directories, you are right to start running your own Nextflow pipeline!

via GIPHY