How to Test Intel MPI and Slurm on AWS ParallelCluster

How to Test Intel MPI and Slurm on AWS ParallelCluster

Running parallel jobs on an HPC cluster requires that your job scheduler (Slurm) and your MPI library (Intel MPI) are correctly integrated. Before launching a complex application, it's essential to run a simple test to ensure everything is working.

Here’s a quick guide on how to do it.

1. Create a Simple MPI Test Program

The best way to test MPI is with a classic "Hello World" program. This C code will have every process report its unique ID (rank) and the node it's running on, confirming that they can communicate.

ℹ️
First, run the setvars script to ensure your environment loads the Intel oneAPI by running the command below
source /opt/intel/oneapi/setvars.sh

Create a file named mpi_hello.c:

#include <mpi.h>
#include <stdio.h>

int main(int argc, char** argv) {
    MPI_Init(NULL, NULL);
    int world_size, world_rank;
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;

    MPI_Comm_size(MPI_COMM_WORLD, &world_size);
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
    MPI_Get_processor_name(processor_name, &name_len);

    printf("Hello from node %s, rank %d of %d\n",
           processor_name, world_rank, world_size);

    MPI_Finalize();
}

Compile the program using the Intel MPI compiler wrapper:

mpicc -o mpi_hello mpi_hello.c

2. Create the Slurm Submission Script

The Slurm script is where the magic happens. Two lines are critical for integrating Intel MPI with Slurm on older ParallelCluster setups:

  1. source /opt/intel/oneapi/setvars.sh: This loads the necessary Intel MPI environment variables.
  2. export I_MPI_PMI_LIBRARY=/opt/slurm/lib/libpmi.so: This explicitly tells Intel MPI where to find Slurm's process management library, which is essential for communication.

Here is the complete script. We'll request two 4-core nodes for a total of 8 MPI processes. Create a file with nano mpi_test.sh and put this inside.

#!/bin/bash
#SBATCH --job-name=MPI_Test
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --output=mpi_test.log

# Load the Intel oneAPI environment for the job
source /opt/intel/oneapi/setvars.sh

# Set the PMI library path for Slurm-MPI integration
export I_MPI_PMI_LIBRARY=/opt/slurm/lib/libpmi.so

# Launch the MPI executable using srun
srun ./mpi_hello

3. Submit and Check

Submit the job and view the output:

sbatch mpi_test.sh
cat mpi_test.log

A successful run will show output from all 8 processes, each reporting a unique rank from 0 to 7.

Expected Output:

If you see this, your Slurm and Intel MPI setup is working perfectly. ✅