Installing and Managing Bioinformatics Software with Conda
One of the first hurdles when moving from shared computing resources (where software is often installed and managed by the computing service providers) to cloud computing (where you are in complete control of setting up and managing your compute environment) is understanding how to best install and manage the software you wish to use for your analyses.
This can be particularly tricky when working with common bioinformatics tools as often they may utilise the same software dependencies, but may require slightly different versions of those dependencies.
Fortunately, installing and managing most bioinformatics software can be a breeze thanks to the package management system known as Conda and the associated bioinformatics channel known as Bioconda.
Below we provide a simple guide to get you started with Conda on your new RONIN machine.
Installing Conda and Bioconda
Once you have created your machine in RONIN and connected to the terminal, you can easily install Miniconda (the minimal installer for Conda) using the following commands:
curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh
You will need to follow the prompts to read and accept the license agreement and decide on where you would like to install the main Miniconda directory.
Note: The default install location on the root SSD drive will usually be suitable in most situations for single machines, but for clusters you should ensure the Miniconda directory is installed in the
/apps
or/shared
directories so that it is available to all of the compute nodes. If you plan on installing a large amount of software with Miniconda on a single machine, you should specify the path to an attached storage drive e.g./mnt/volume1
so that your root drive does not run out of storage space.
Once Miniconda is installed, you should then initialise your Conda environment when prompted and then close and reopen the terminal for the changes to take effect. This process ensures Conda is always available whenever you open a new terminal session.
Note: Conda will only be available to the user that installed Conda. So if you run the Conda installation with the default ubuntu user, the root user will not automatically have access to Conda and vice versa. Similarly, compute nodes on a cluster will not automatically have Conda added to the default path (unless the cluster has been launched from a package where Conda was already installed and initialised). To temporarily give other users or compute nodes access to Conda, you will need to export the path to the Miniconda bin directory:
export PATH=$PATH:/path/to/miniconda3/bin
Finally, you will need to add the Bioconda channel as well as the other Conda channels it depends on:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
And voila! You can now easily install the thousands of bioinformatic tools available through Bioconda.
Installing software with Bioconda
All software packages available through Bioconda can be installed with a simple 3-word command. For example, to install the Burrows-Wheeler Aligner package, you can run:
conda install bwa
The install command will find the software package, list all the required dependencies that need to be installed and ask whether you wish to go ahead with the installation.
Note: The install command will always tell you whether any pre-existing software/dependencies will need to be updated or downgraded etc. Usually it is fine to accept these changes, unless you have another software package that is relying on a particular version of a dependency. This is where Conda environments come in - see section below for more details.
If you wish to install a particular version of a software package, just add the version to the package name as follows:
conda install bwa=0.7.17
It's also just as easy to remove a software package:
conda remove bwa
To search for a particular software package and available versions, use the Conda search command:
conda search bwa
To update a package to the latest compatible version, use the Conda update command:
conda update bwa
The Conda list command will show you all packages and respective versions already installed:
conda list
Note: Often conda packages aren't created by the original software developers, but rather some other helpful human trying to make your life easier. For this reason, you may sometimes (though rarely) encounter some issues with certain packages. If any software isn't running as it should, usually the easiest fix is to try installing a slightly different version of the package.
Managing software with Conda environments
Often different bioinformatics software will utilise the same dependencies, whether this be common bioinformatics tools such as bwa or common programming languages such as Python. However; different software may only work with certain versions of dependencies meaning you sometimes may need two separate versions of the one program installed at once.
This problem can easily be overcome by creating separate Conda environments for each analysis. A Conda environment is basically a directory that contains a specific collection of Conda packages that are completely separate to those from your main installation directory.
You can easily create a new Conda environment called "myenvironment" using the following command:
conda create -n myenvironment
Depending on which version of Miniconda you installed, when you create a new environment it will automatically install either Python 2 or Python 3. If you want to specify a particular version of python for the environment, you can easily add this information to the create command:
conda create -n myenvironment python=2.7
Note: A number of helpful default packages are automatically installed when creating a new Conda environment. If you don't want to install these defualt packages you can add the
--no-default-packages
flag to your conda create command
Once you have created your environment you can then activate you environment so that you can install, manage and run any software within the environment using the same commands from the previous section:
conda activate myenvironment
conda install bwa
To deactivate your environment and go back to your standard Conda packages just run:
conda deactive
To list all available environments run:
conda env list
To remove an environment run:
conda env remove -n myenvironment
Note: If you need to activate a conda environment within a SLURM (or other scheduler) script when working with a cluster, you will need to add a couple of extra commands in order for the compute nodes to be able to find and work with your Conda environments:
export PATH=$PATH:/apps/miniconda3/bin
< Add the path to your miniconda3 bin directory so that the nodes can find Conda.eval "$(conda shell.bash hook)"
< Launch the base Conda environment.conda activate myenvironment
< Activate the required Conda environment.
And there you have it, everything you need to install and manage all of your bioinformatics software with just a few simple commands!