Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This guide provides an introduction to the SLURM job scheduler and its application on the Ganesha c2b2 cluster.

Introductions

Jobs are executed in batch mode, without user intervention. The typical process involves

...

  • Essential SLURM commands

  • Creating a job submission script

  • Understanding SLURM partitions

  • Submitting jobs to the queue

  • Monitoring job progress

  • Canceling jobs from the queue

  • Setting environment variables

  • Managing job dependencies and job arrays

Commands

The following table summarizes the most commonly used SLURM commands:

...

SLURM commands offer detailed documentation and guidance through their manual (man) pages, which can be accessed by typing, for example

man sinfo

Submission script

A submission script is a shell script that outlines the computing tasks to be performed, including the application, input/output, and resource requirements (e.g., CPUs, memory). A basic example is a job that needs a single node with the following specifications:

  • Uses 1 node

  • Runs a single-process application

  • Has a maximum runtime of 100 hours

  • Is named "MyHelloBatch"

  • Sends email notifications to the user when the job starts, stops, or aborts"

Example 1: job running on single node

Code Block
languagebash
#!/bin/bash
#MyHelloBatch.slurm
#
#SBATCH -J test                           # Job name, any string
#SBATCH -o job.%j.out                     # Name of stdout output file (%j=jobId)
#SBATCH -N 1                              # Total number of nodes requested
#SBATCH -n 8                              # Total number of cpu requested
#SBATCH -t 01:30:00                       # Run time (hh:mm:ss) - 1.5 hours
#SBATCH --mail-user=UNI@cumc.columbia.edu # use only Columbia address
#SBATCH --mail-type=ALL                   # send email alert on all events
 
module load anaconda/3.0                  # load the appropriate module(s) needed by
python hello.py                           # you program

...

The script's final section is a standard Linux bash script, outlining job operations. By default, the job starts in the submission folder with the same environment variables as the user. In this example, the script simply runs the python hello.py.

Example 2: job running on multiple nodes

To execute an MPI application across multiple nodes, we need to modify the submission script to request additional resources and specify the MPI execution command:

...