...
This guide provides an introduction to the SLURM job scheduler and its application on the Ganesha c2b2 cluster.
Introductions
Jobs are executed in batch mode, without user intervention. The typical process involves
...
Essential SLURM commands
Creating a job submission script
Understanding SLURM partitions
Submitting jobs to the queue
Monitoring job progress
Canceling jobs from the queue
Setting environment variables
Managing job dependencies and job arrays
Commands
The following table summarizes the most commonly used SLURM commands:
...
SLURM commands offer detailed documentation and guidance through their manual (man) pages, which can be accessed by typing, for example
|
---|
Submission script
A submission script is a shell script that outlines the computing tasks to be performed, including the application, input/output, and resource requirements (e.g., CPUs, memory). A basic example is a job that needs a single node with the following specifications:
Uses 1 node
Runs a single-process application
Has a maximum runtime of 100 hours
Is named "MyHelloBatch"
Sends email notifications to the user when the job starts, stops, or aborts"
Example 1: job running on single node
Code Block | ||
---|---|---|
| ||
#!/bin/bash #MyHelloBatch.slurm # #SBATCH -J test # Job name, any string #SBATCH -o job.%j.out # Name of stdout output file (%j=jobId) #SBATCH -N 1 # Total number of nodes requested #SBATCH -n 8 # Total number of cpu requested #SBATCH -t 01:30:00 # Run time (hh:mm:ss) - 1.5 hours #SBATCH --mail-user=UNI@cumc.columbia.edu # use only Columbia address #SBATCH --mail-type=ALL # send email alert on all events module load anaconda/3.0 # load the appropriate module(s) needed by python hello.py # you program |
...
The script's final section is a standard Linux bash script, outlining job operations. By default, the job starts in the submission folder with the same environment variables as the user. In this example, the script simply runs the python hello.py.
Example 2: job running on multiple nodes
To execute an MPI application across multiple nodes, we need to modify the submission script to request additional resources and specify the MPI execution command:
...
The multi-node script is similar to the single-node one, with the key addition of #SBATCH --ntasks-per-node=m to reserve cores and enable MPI parallel processing.
Interactive Jobs
An interactive job is a type of job that provides a command-line interface, allowing users to interact with the application or debug issues in real-time, rather than simply running a script. To submit an interactive job, use the salloc command in Slurm. Once the job begins, you'll gain access to a command-line prompt on one of the assigned compute nodes, enabling you to execute commands and utilize the allocated resources directly on that node
...
By default, jobs submitted through salloc will be allocated 1 CPU and 4GB of memory, unless specified otherwise. If your job requires more resources, you can request them using additional options with the salloc command. For instance, the following example demonstrates how to allocate 2 nodes, each with 4 CPUs and 4GB of memory.
...
GPU Jobs
GPUs will not be assigned to jobs unless explicitly requested using specific options with sbatch or srun during the resource allocation process.
...