Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This guide provides an introduction to the SLURM job scheduler and its application on the c2b2 clusters. The primary, general purpose compute cluster at C2B2 is now called "Ganesha.", This HPC cluster s a Linux-based (Rocky9.4) compute cluster consisting of 62 Dell Server 2 head nodes and a virtualized pool of login (submit) nodes 8 Weka nodes. The nodes fit in a dense configuration in 9 high-density racks and are cooled by dedicated rack refrigeration systems.

The clusters comprise:

  • 8 20 compute nodes, each with 20-192 core processors and 128 768 GB of memory

  • Some 2 nodes have with 192 cores and 1.5 TB of memory

  • 1 40 GPU node featuring 2 NVIDIA L40s GPU cards 192 cores processors and 768 GB memory

  • 1 GPU node with a Superchip GH200 ARM architecture, 1 GPU, and 570 GB of memory

This guide will help you get started with using SLURM on these clusters.

Introductions

Jobs are executed in batch mode, without user intervention. The typical process involves

  • Logging into the login node ((link unavailable))

  • Preparing a job script that specifies the work to be done and required computing resources

  • Submitting the job to the queue

  • Optionally logging out while the job runs

  • Returning to collect the output data once the job is complete

...

  • Essential SLURM commands

  • Creating a job submission script

  • Understanding SLURM partitions

  • Submitting jobs to the queue

  • Monitoring job progress

  • Canceling jobs from the queue

  • Setting environment variables

  • Managing job dependencies and job arrays

Commands

The following table summarizes the most commonly used SLURM commands:

...

Command

...

Description

...

sbatch

...

Submits a job script for execution.

...

sinfo

...

Displays the status of SLURM-managed partitions and nodes, with customizable filtering, sorting, and formatting options.

...

squeue

...

Shows the status of jobs, with options for filtering, sorting, and formatting, defaulting to priority order for running and pending jobs.

...

srun

...

Run a parallel job on cluster.

...

scancel

...

Cancels a pending or running job

...

sacct

...

Provides accounting information for active or completed jobs.

...

salloc

...

This command is used to allocate resources and submit an interactive job to Slurm, allowing users to execute tasks in real-time with manual input.

SLURM commands offer detailed documentation and guidance through their manual (man) pages, which can be accessed by typing, for example

man sinfo

Submission script

A submission script is a shell script that outlines the computing tasks to be performed, including the application, input/output, and resource requirements (e.g., CPUs, memory). A basic example is a job that needs a single node with the following specifications:

  • Uses 1 node

  • Runs a single-process application

  • Has a maximum runtime of 100 hours

  • Is named "MyHelloBatch"

  • Sends email notifications to the user when the job starts, stops, or aborts"

Example 1: job running on single node

Code Block
languagebash
#!/bin/bash
#MyHelloBatch.slurm
#
#SBATCH -J test                           # Job name, any string
#SBATCH -o job.%j.out                     # Name of stdout output file (%j=jobId)
#SBATCH -N 1                              # Total number of nodes requested
#SBATCH -n 8                              # Total number of cpu requested
#SBATCH -t 01:30:00                       # Run time (hh:mm:ss) - 1.5 hours
#SBATCH --mail-user=UNI@cumc.columbia.edu # use only Columbia address
#SBATCH --mail-type=ALL                   # send email alert on all events
 
module load anaconda/3.0                  # load the appropriate module(s) needed by
python hello.py                           # you program

A submission script begins with #!/bin/bash, indicating it's a Linux bash script. Comments start with #, while #SBATCH lines specify job scheduling resources for SLURM. Note that #SBATCH directives must be placed at the top of the script, before any other commands. The script requests resources, such as:

#SBATCH -N n or #SBATCH --nodes=n : specifies the number of compute nodes (only 1 in this case)

#SBATCH -t T or #SBATCH --time=T: sets the maximum walltime (hh:mm:ss format)

#SBATCH -J “name" or #SBATCH --job-name="name": assigns a job name

#SBATCH --mail-user=<email_address>: sends email notifications

#SBATCH --mail-type=<type>: sets notification options (BEGIN, END, FAIL, REQUEUE, or ALL)

The script's final section is a standard Linux bash script, outlining job operations. By default, the job starts in the submission folder with the same environment variables as the user. In this example, the script simply runs the python hello.py.

Example 2: job running on multiple nodes

To execute an MPI application across multiple nodes, we need to modify the submission script to request additional resources and specify the MPI execution command:

Code Block
#!/bin/bash
#MyHelloBatch.slurm
#
#SBATCH -J test                           # Job name, any string
#SBATCH -o job.%j.out                     # Name of stdout output file (%j=jobId)
#SBATCH -N 2                              # Total number of nodes requested
#SBATCH --ntasks-per-node=16              # set the number of tasks (processes) per node
#SBATCH -t 01:30:00                       # Run time (hh:mm:ss) - 1.5 hours
#SBATCH -p highmem                        # Queue name. Specify gpu for the GPU node.
#SBATCH --mail-user=UNI@cumc.columbia.edu # use only Columbia address
#SBATCH --mail-type=ALL                   # send email alert on all events
 
module load openmpi4/4.1.1                # load the appropriate module(s) needed by
mpirun myMPICode                          # you program

The multi-node script is similar to the single-node one, with the key addition of #SBATCH --ntasks-per-node=m to reserve cores and enable MPI parallel processing.Each node has a 25 Gbps ethernet connection and 100 Gbps HDR InfiniBand. Additionally, a set of login nodes running on Proxmox virtualization provide a pool of virtual login nodes for user access to this and other systems.

Like our previous clusters, this cluster the system is controlled by SLURM. Storage for the cluster is provided exclusively by our Weka parallel filesystem with over 1 PB of total capacity.

For assistance with cluster-related issues, please email dsbit-help@cumc.columbia.edu, including the following details in your message:

  • Your Columbia University Network ID (UNI)

  • Job ID numbers, if your inquiry pertains to a specific job issue

This information will help ensure a prompt and accurate response to your cluster-related questions.

Getting Started

Job Examples

Research Products

Available software

Storage

Submitting Jobs

Technical Information

Working on Ginsburg