hpc

A Beginner's Glossary for HPC in the Cloud

Aaron Albrighton

19 Nov 2021 • 5 min read

High Performance Computing (HPC) has its own terminology, which is complexified (a real word!) by cloud terminology. Our goal is to simply define these terms to make it easier for users who are not cloud or IT gurus to create and launch clusters using RONIN on AWS. (It's not actually to use as many jargon terms in as short a blog post as possible, though that's nice too). To do this, we enlisted the help of guest blogger Jim Maltby, a Real HPC Expert, who has been known to be both authoritative and witty, occasionally at the same time.

Machine

Think of a machine as your starting point. It is a computer very similar to a computer you would purchase as a laptop or a server. The difference is that RONIN machines are hosted remotely, through a cloud provider. To share computers among many people, cloud providers normally give you a "virtual machine". A virtual machine is a computer created in a software layer that runs on top of a physical machine (running directly is sometimes called "on the metal" to make it sound tough). This is key to the flexibility of the cloud - you can get a very large and expensive machine just for a few hours, and then switch to a smaller machine, even with the same underlying hardware. The software layer that creates the virtual machine is called a hypervisor, and a machine created in the cloud on AWS is called an "instance".

Now let's get confusing

A CPU (Central Processing Unit) is the brain of the computer. Long ago, a CPU had one processing core within it that did all the computation. Today a modern CPU will have multiple cores within it. Each core can run programs independently, so the more cores you have, the more you can do at one time. However, you often want to run more things than the number of cores.

One technique for doing that is multithreading. A program can split up its work into multiple portions, called threads, which can be executed on multiple cores to speed up the overall program execution. It can often make sense to run multiple threads on a single core, to make sure the processor is always busy. It is also possible for the hardware to support running multiple threads directly, using a technique called Simultaneous MultiThreading (SMT) or hyperthreading (Intel (tm) marketing term). This capability allows a single core to be split by the hypervisor (no relation) into virtual CPUs (vCPUs), which are simply shares of a physical CPU, using hyperthreading. There aren't really two cores in a single core with hyperthreading, so two identical workloads will take a little longer to run on a hyperthreaded core than on a single core with hyperthreading disabled.

Cluster

A cluster is a collection of machines that are connected so that you can run software that is too large or compute intensive to run on a single machine. Even if the software can run on a big cloud machine, it can often be more cost-effective on the cloud to create a cluster of small machines to run hundreds of individual jobs rather than to create a very big machine that can run them all at once. This is because clusters are auto-scaling - they can grow and shrink in size to handle the individual jobs. Each machine (or instance) in a cluster is called a node, and you can select a minimum and a maximum number of nodes to bound the autoscaling. A cluster is structured so that one node, the head node, controls the cluster. The remaining worker nodes are compute nodes.

Note that on AWS, CPUs are vCPUs, and on a cluster you can disable hyperthreading. You must now ask yourself - is the extra time that it takes to run two identical workloads on a hyperthreaded core worth the penalty of disabling hyperthreading and paying twice the price? Cloud computing always has the extra dimension of optimizing cost, not just time, as in traditional HPC.

Jobs (not Steve)

A job is a script or program that you need to run. However, on a cluster, you do not just run your job as you would on a machine. You write a script to "submit" it to a scheduler, or the software that runs on the nodes of a cluster, that decides if and where it can be run. In a shared cluster, there may be many jobs in line before you, so your job is forced to wait in a queue.

Slurm

The Slurm Workload manager is an open-source scheduler for Linux clusters that is popular in universities. We focus on Slurm, because it's really fun to say, but keep in mind that what comes next is relevant to any job scheduler used on the cloud. If you are coming to RONIN with slurm commands to submit your jobs to those HPC systems, keep in mind that things work differently on the cloud. Your slurm commands will probably have directives in them to describe how much memory and cores you need assuming a particular shared cluster configuration. When you design a cluster for your specific workload, you can often leave all of those directives out and think about running one job per node.

MPI (Message Passing Interface)

Sometimes you have to run a job that requires multiple nodes - breaking up the computation across the nodes to collectively work on a single problem. The different parts of the computation can communicate with each other using an implementation of the Message Passing Interface (MPI), which is a standard for what a message passing library should look like. For this to be at all efficient, you would want the nodes to be connected with as fast a network as possible.

On AWS, Elastic Fabric Adapter (EFA) is a network interface that provides fast and scalable network connectivity for applications that communicate across multiple nodes (e.g., tightly coupled applications). Only certain instance types support EFA, and you can see them in RONIN with the EFA label beside them.

Often a node has multiple vCPUs within it, and those vCPUs can communicate using shared memory rather than sending messages over a network. OpenMP and Pthreads (Posix threads) are two models for writing threaded code. MPI and OpenMP/Pthreads can be combined to take advantage of threaded parallelism using shared memory within a node, and message passing over a network between nodes.

MPI has a concept of MPI Processes. There can be any number of MPI processes (sometimes called MPI Ranks) per job. Within a rank, there can be multiple OpenMP threads or Pthreads.

All Clear Now?

I hope it's obvious now that HPC terminology is hopelessly overloaded and therefore your confusion is completely justified. However, this blog post will give you the words that you need to bring HPC to your research workloads with RONIN.

via GIPHY