Installing and managing bioinformatics software with Conda

One of the first hurdles when getting started with cloud computing is understanding how to best install and manage the software you wish to use for your analyses. Fortunately, this can be a breeze thanks to the package management system known as Conda.

Installing and managing bioinformatics software with Conda

One of the first hurdles when moving from shared computing resources (where software is often installed and managed by the computing service providers) to cloud computing (where you are in complete control of setting up and managing your compute environment) is understanding how to best install and manage the software you wish to use for your analyses.

This can be particularly tricky when working with common bioinformatics tools as often they may utilise the same software dependencies, but may require slightly different versions of those dependencies.

Fortunately, installing and managing most bioinformatics software can be a breeze thanks to the package management system known as Conda and the associated bioinformatics channel known as Bioconda.

Below we provide a simple guide to get you started with Conda on your new RONIN machine.

Installing Conda and Bioconda

Once you have created your machine in RONIN and connected to the terminal, you can easily install Miniconda (the minimal installer for Conda) using the following commands:

curl -O

You will need to follow the prompts to read and accept the license agreement and decide on where you would like to install the main Miniconda directory.

Note: The default install location on the root SSD drive will usually be suitable in most situations, but if you would prefer to keep the folder on an attached storage drive you will need to specify the desired path instead.

Once Miniconda is installed, you should then initialise your Conda environment when prompted and then close and reopen the terminal for the changes to take effect. This process ensures Conda is always available whenever you open a new terminal session.

Finally, you will need to add the Bioconda channel as well as the other Conda channels it depends on:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

And voila! You can now easily install the thousands of bioinformatic tools available through Bioconda.

Installing software with Bioconda

All software packages available through Bioconda can be installed with a simple 3-word command. For example, to install the Burrows-Wheeler Aligner package, you can run:

conda install bwa

The install command will find the software package, list all the required dependencies that need to be installed and ask whether you wish to go ahead with the installation.

Note: The install command will always tell you whether any pre-existing software/dependencies will need to be updated or downgraded etc. Usually it is fine to accept these changes, unless you have another software package that is relying on a particular version of a dependency. This is where Conda environments come in - see section below for more details.

If you wish to install a particular version of a software package, just add the version to the package name as follows:

conda install bwa=0.7.17

It's also just as easy to remove a software package:

conda remove bwa

To search for a particular software package and available versions, use the Conda search command:

conda search bwa

To update a package to the latest compatible version, use the Conda update command:

conda update bwa

The Conda list command will show you all packages and respective versions already installed:

conda list

Note: Often conda packages aren't created by the original software developers, but rather some other helpful human trying to make your life easier. For this reason, you may sometimes (though rarely) encounter some issues with certain packages. If any software isn't running as it should, usually the easiest fix is to try installing a slightly different version of the package.

Managing software with Conda environments

Often different bioinformatics software will utilise the same dependencies, whether this be common bioinformatics tools such as bwa or common programming languages such as Python. However; different software may only work with certain versions of dependencies meaning you sometimes may need two separate versions of the one program installed at once.

This problem can easily be overcome by creating separate Conda environments for each analysis. A Conda environment is basically a directory that contains a specific collection of Conda packages that are completely separate to those from your main installation directory.

You can easily create a new Conda environment called "myenvironment" using the following command:

conda create -n myenvironment

Depending on which version of Miniconda you installed, when you create a new environment it will automatically install either Python 2 or Python 3. If you want to specify a particular version of python for the environment, you can easily add this information to the create command:

conda create -n myenvironment python=2.7

Note: A number of helpful default packages are automatically installed when creating a new Conda environment. If you don't want to install these defualt packages you can add the --no-default-packages flag to your conda create command

Once you have created your environment you can then activate you environment so that you can install, manage and run any software within the environment using the same commands from the previous section:

conda activate myenvironment

conda install bwa

To deactivate your environment and go back to your standard Conda packages just run:

conda deactive

To list all available environments run:

conda env list

To remove an environment run:

conda env remove -n myenvironment

And there you have it, everything you need to install and manage all of your bioinformatics software with just a few simple commands!