software-collection

Creating and managing your own Galaxy server in RONIN

Galaxy is an open-source platform which allows users to use command-line tools via a graphical web interface. This blog post will teach you how to set up your own Galaxy server with RONIN.

Parice Brandies

10 May 2022 • 10 min read

What is Galaxy?

Galaxy is an open-source platform that allows users to use command-line tools via a graphical web interface. With thousands of tools available, Galaxy is a great option for users who need an intermediate solution between a desktop application and the command-line. There are a number of places where you can use and deploy Galaxy for free including the main UseGalaxy servers in the US, EU and Australia, as well as a number of public servers for researchers — see a complete list here. These Platforms are great for getting started with Galaxy and learning the basics of how to use it as all of the computational resources and administration (e.g. the availability of tools, workflows and reference datasets) is already taken care of. However, if you have some strict security regulations for protecting your data, are working with datasets that are larger than the default quota limits for the main servers, would like to manage your own reference datasets, or would like to control your computational resources to run jobs more quickly, this blog post will teach you how to set up your own Galaxy server with RONIN.

Setting up your own Galaxy server

1. Create your Galaxy Machine

Create a new Ubuntu machine in RONIN and connect to a terminal with RONIN LINK.

We recommend selecting a machine with at least 8 CPUs to begin with (or a larger machine if you are familiar with the compute requirements of your analyses). If you are working with large input and output files or reference datasets, ensure you add sufficient storage in Step 4 of your machine creation for all of your analyses and Galaxy tools.

2. Check Dependencies

Check you have python version 3.6 or higher installed:

python3 --version

If you do not have python 3.6 or higher, you will need to run sudo apt update and then sudo apt install -y python3.

3. Download and Install Galaxy

Choose where you would like to download and install Galaxy. This location is where all of the associated data will be stored, so if you plan on working with big files or lots of reference datasets and tools we recommend downloading Galaxy to an attached storage drive (e.g. that you added during Step 4 of your machine creation), rather than your home folder on the root drive. To do so, make sure your attached storage drive is available and then change into that directory, e.g. cd /mnt/volume1.

Once you are in the directory where you would like Galaxy and all of its data to be installed, run the following command to download Galaxy:

git clone -b release_21.09 https://github.com/galaxyproject/galaxy.git

Move into the new Galaxy folder:

cd galaxy

Install and start Galaxy:

sh run.sh

Note: This step will take a while to complete for the first time because Galaxy is installing and setting up key dependencies and default tools. When you see the following lines, your server should be ready to connect to (note that there may be some DEBUG alerts below these lines):

Galaxy server instance 'main.web.1' is running
Starting server in PID 7552.
serving on http://127.0.0.1:8080

4. Connect to Galaxy

Check that Galaxy is running by connecting to port 8080 via the "LINK TO A CUSTOM APPLICATION" option in RONIN LINK:

You should see the following screen in your browser (it might take a minute to load):

5. Create your user account

Navigate to "Login or Register" in the top bar. Even if you have a pre-existing Galaxy account on one of the public Galaxy servers, you will still need to register as a new user on your new server so click the "Register here" link and fill out the details (remember to keep these details handy).

6. Configure Galaxy

6.1 Administrator Access

Once you have registered your user account and have logged in successfully, you will want to make yourself an Administrator so that you can install tools, manage reference datasets, and so on. To do this, you will need to create and modify a Galaxy config file on your machine, but before we start messing with the config file, it's a good idea to first stop Galaxy from running and then restart it again once we are finished. So, go back to your terminal window where you ran sh run.sh and press Ctrl-c to stop Galaxy.

Note: If you close the terminal window where you initially ran the sh run.sh command, this will usually stop Galaxy from running. If Galaxy is still running (i.e. a refresh of Galaxy in your browser works) but you can't find the terminal to stop it from running, you can instead find the running process and kill it using the following command:

kill $(ps -ef | grep "galaxy/.venv/bin/python3 .venv/bin/uwsgi" | awk 'NR==1{print $2}')

Once Galaxy has stopped, create your Galaxy config file by running:

cp config/galaxy.yml.sample config/galaxy.yml

This config file can be used to configure many different options and settings in Galaxy. There are descriptions and default values listed for all options within the config file that should help you find what you may want to adjust (and some extra documentation here).

To give your user account administrator access, you will need to adjust the admin_users option on line 1604 of the config/galaxy.yml file, changing null to your registered email address. You can either do this using a text editor or with the following command (replacing user@example.com with the email you used to register an account earlier):

sed -i '1604s/#admin_users: null/admin_users: user@example.com/' config/galaxy.yml

6.2 Resource Management

In the cloud, you can make your machine as big as it needs to be, meaning that if you select a machine type with multiple CPUs you can either run lots of little jobs at once in Galaxy, or just one big job faster across multiple CPUs. Not all tools or jobs can be run across multiple CPUs, but those that can (e.g. genomic alignment tools) can benefit greatly if this is enabled!

There are a number of ways you may want to configure Galaxy to use the resources on your given machine, and of course you can change these at any time too (just remember to stop and restart Galaxy when you do), but how you configure Galaxy largely depends on what analyses, and how many analyses, you plan on running. For example, by default Galaxy will run each job on 1 CPU, with up to 4 jobs being able to run simultaneously, meaning if you wanted to run 4 jobs your machine would need at least 4 CPUs in total. However, while some simple analyses are fine on just 1 CPU, other analyses (like the genomic alignment tools we mentioned earlier) would run incredibly slowly on just 1 CPU — especially if you are working with large input or reference datasets. This means if you only had a machine with 4 CPUs, you would be much better off just running 1 job at a time across all 4 CPUs (as this would provide roughly a 4x speed improvement on your alignment). Alternatively, you could instead select a bigger machine with 32 CPUs and run 8 jobs simultaneously that each use 4 CPUs each, OR run 4 jobs simultaneously that each use 8 CPUs each (you get the drift) — but regardless, we would need to tell Galaxy that we want this kind of special configuration.

Once you have determined how many jobs you might want to run simultaneously (we will refer to this as the number of "workers"), and how many CPUs each job may need (8 CPUs is always a good start if you know your tool can run across multiple CPUs - we will refer to this as the number of "slots"), make sure your machine has enough total CPUs (no. workers * no. slots) by running the command nproc. If you feel like you need to scale your machine size up or down to accommodate the number or type of jobs you wish to run you can easily change your machine size in RONIN first.

TIP:
If you aren't sure about the optimal number of CPUs for your tool (make sure you check the tool documentation first), you can try running your analysis a few times across differing numbers of CPUs and monitor CPU usage, walltime and other computational resources (such as RAM) with a tool such as Netdata. This will also help you determine what machine type might be most suitable.

Then to tell Galaxy how you would like to divide your machines resources into "workers" and "slots" for your jobs, create a Job config file by running:

cp config/job_conf.xml.sample_basic config/job_conf.xml

Open the config file with a text editor e.g. nano config/job_conf.xml. The default number of workers is 4 which you should see in line 6. Change this value to however many jobs you want to run simultaneously. The default number of "slots" is one, but this is not currently defined in the config file, so to define it, you will need to replace line 9 which contains <destination id="local" runner="local"/> with the following 3 lines:

        <destination id="local" runner="local">
           <param id="local_slots">1</param>
        </destination>

You can then replace 1 with your desired number of "slots" or CPUs per job. Then save your changes.

You will then need to restart Galaxy for all of your config changes to take place: sh run.sh.

TIP:
We recommend running Galaxy (i.e. the sh run.sh command) in a screen session to prevent any jobs from being interrupted if your terminal is disconnected. Click here to see how to launch and manage a screen session. Alternatively, you can set Galaxy to run automatically whenever your machine starts (see next section below).

7. Installing Galaxy Tools and Managing Reference Datasets

After you have restarted Galaxy, connect to Galaxy again (navigate to localhost:8080 in your browser) and you should now see that you are an Admin (you may need to login again). Click the Admin button in the top bar and you will have all of the admin options, including the ability to manage reference datasets (see below) and install the tools that you need.

If you would like to add and manage your own reference datasets within Galaxy, we highly recommend watching the video below, which demonstrates how to do so via data managers that can be installed like standard Galaxy tools.

8. Upload Your Input Data and Run Your Analyses

Once you have installed your required tools and reference datasets, you can upload any required input data from your computer or a particular database using the "Get Data" option in the left menu of the homepage. You can then get started running your analyses by searching or selecting your tools from the left menu. You can even create your own workflows which can be used to link multiple tools and datasets together.

If you are not familiar with galaxy and need some help getting started, there's a ton of helpful training materials available at: https://training.galaxyproject.org/

Running Galaxy on Startup

If you would like Galaxy to automatically start running when your machine is started, edit your crontab file (a list of commands that you want to run on a regular schedule) by running crontab -e and add the following line to the bottom of the file (replacing the path to where your galaxy folder is located):

@reboot sh /path/to/galaxy/run.sh

Galaxy will now run automatically whenever your machine is started.

Once you have your Galaxy server up and running, you may want others to be able to access the server. For example, you may want students or other researchers in your lab to have access to the same tools and reference datasets, or you may wish to work collaboratively on projects with colleagues from a different institution. There are a variety of different ways to share your galaxy server with RONIN which we will describe below.

1. Create a Project Package of the Galaxy Server

Are you a RONIN Admin?
You can create and configure a Galaxy server, package it and make it available to all of your users via the RONIN Service Catalogue!

If you simply want other users within your RONIN project to be able to access the same Galaxy setup and tools, you can create a package of your machine. Other users within your project will then be able to select this project package when creating a new machine, essentially creating a clone of the original so that the same configuration, tools and reference datasets are available. New users will just need to connect to port 8080 on their new machine with RONIN LINK (as long as Galaxy has been configured to run on startup — see section above) and then register a user account to get started.

Note: This method can also be used to create a machine for a student or collaborator that may not have access to RONIN. You will just need to create a new machine for the student/collaborator and provide them with the machine key. Note that we recommend setting up a suitable machine schedule, because the user will not have access to turn the machine on or off as they please.

Check out this video to see how easy is it to create and connect to your own Galaxy server (at a cost of less than $1 / hour) using a RONIN package below:

Instead of each user having their own Galaxy instance, multiple users can all connect to the same server. There are two ways to do this:

Sharing a single machine key with multiple users so that they can connect to the machine securely and register their own separate user account on the server (or in some cases you may prefer that users share a common login).
Opening port 8080 on the machine to a range of IP addresses (i.e. to make it available within an institution) or to all IP addresses (i.e. to make it available to anyone on the internet who navigates to the machine's IP address). This method requires the help of a RONIN Administrator who can edit the security group rules on the Galaxy machine via the AWS console and is not recommended without some additional security measures in place. For example, in your config/galaxy.yml config file you should set require_login to true and allow_user_creation to false. This will prevent anyone from being able to run jobs unless they are provided with login credentials that a Galaxy Administrator has created. Some other options you may want to configure, such as setting quotas for users and some additional security features, are described here.

We don't usually recommend this method of sharing a Galaxy server because the responsibility will usually fall on a single administrator to monitor and manage storage, access and computational resources on the server. With RONIN, because machines can be packaged, individual users do not need to share a single Galaxy server the way that you would an on-premise resource. Nevertheless, if you wish to use this method, ensure that the machine type is of a sufficient size to handle the number of users and adjust the config/job_conf.xml parameters accordingly.

With your very own Galaxy server the possibilities are endless, so go ahead and shoot for the stars!