Job Arrays

We have made a simple interactive tutorial to demonstrate the different types of array batch jobs available.

To start the tutorial login to the submit node and run cp-demo.sh to copy the sample files as shown below.

Slurm batch tutorial

$ ssh axon.rc.zi.columbia.edu
[aa3301@axon ~]$ cp-demo.sh
Copying Slurm Tutorial samples to slurm-tutorial1 in your home directory
[aa3301@axon ~]$ cd slurm-tutorial1/
[aa3301@axon slurm-tutorial1]$ ls -l
total 10
-rwxr-xr-x 1 aa3301 domain users 824 Sep 12 13:19 array_job.sh
-rwxr-xr-x 1 aa3301 domain users 507 Sep 12 13:19 bad_array_job.sh
-rw-r--r-- 1 aa3301 domain users 464 Sep 12 13:19 hello-world.sh
-rwxr-xr-x 1 aa3301 domain users  76 Sep 12 13:19 jobstep.slurm
-rw-r--r-- 1 aa3301 domain users 233 Sep 12 13:19 my-jobste-array.slurm

Let's take a look at some of the job files in the directory starting with array_job.sh.

array_job.sh

#!/bin/bash
#SBATCH --job-name=array_job_test   # Job name
#SBATCH --mail-type=FAIL            # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=myemail         # Where to send mail (e.g. uni123@columbia.edu)
#SBATCH --ntasks=1                  # Run a single task
#SBATCH --mem=1gb                   # Job Memory
#SBATCH --time=00:05:00             # Time limit hrs:min:sec
#SBATCH --output=array_%A-%a.log    # Standard output and error log
#SBATCH --array=1-5                 # Array range

echo "There are $(env | grep -c SLURM) slurm environmental variables set."
env | grep SLURM | sort

if [ -z $SLURM_JOB_ID ]
then
echo "You're not running in a slurm job."
exit
else
FILE=/usr/share/dict/american-english
WORD=$(sed -n ${SLURM_JOB_ID}p $FILE)
echo
echo "$WORD is word number $SLURM_JOB_ID in $FILE"
fi

This script shows an example of how SLURM environmental variables change between jobs a single array and how they can be leveraged so that the work done by each job can be different.

Another item of note in this script is that we have made it executable (e.g. chmod +x array_job.sh), so if we run it directly from the shell it will function as a normal shell script, ignoring any slurm directives since they are interpreted as comments.

Running normally.

[aa3301@axon slurm-tutorial1]$ ./array_job.sh
There are 0 slurm environmental variables set.
You're not running in a slurm job.

In this case the SBATCH variables are ignored, no SLURM environmental variables are detected and we can change the output of the program to behave differently in this environment.

Now we can try running the same script through Slurm and see how it behaves.

Running as a submission script

[aa3301@axon slurm-tutorial1]$ sbatch array_job.sh
Submitted batch job 2759
[aa3301@axon slurm-tutorial1]$ cat array_2759-1.log
There are 40 slurm environmental variables set.
SLURM_ARRAY_JOB_ID=2759
SLURM_ARRAY_TASK_COUNT=5
SLURM_ARRAY_TASK_ID=1
SLURM_ARRAY_TASK_MAX=5
SLURM_ARRAY_TASK_MIN=1
SLURM_ARRAY_TASK_STEP=1
SLURM_CHECKPOINT_IMAGE_DIR=/var/slurm/checkpoint
SLURM_CLUSTER_NAME=axon
SLURM_CPUS_ON_NODE=1
SLURMD_NODENAME=ax04
SLURM_GTIDS=0
SLURM_JOB_ACCOUNT=zrc
SLURM_JOB_CPUS_PER_NODE=1
SLURM_JOB_GID=413600513
SLURM_JOB_ID=2760
SLURM_JOBID=2760
SLURM_JOB_NAME=array_job_test
SLURM_JOB_NODELIST=ax04
SLURM_JOB_NUM_NODES=1
SLURM_JOB_PARTITION=burst
SLURM_JOB_QOS=normal
SLURM_JOB_UID=413601236
SLURM_JOB_USER=aa3301
SLURM_LOCALID=0
SLURM_MEM_PER_NODE=1024
SLURM_NNODES=1
SLURM_NODE_ALIASES=(null)
SLURM_NODEID=0
SLURM_NODELIST=ax04
SLURM_NPROCS=1
SLURM_NTASKS=1
SLURM_PRIO_PROCESS=0
SLURM_PROCID=0
SLURM_SUBMIT_DIR=/axsys/home/aa3301/slurm-tutorial1
SLURM_SUBMIT_HOST=axon.rc.zi.columbia.edu
SLURM_TASK_PID=29726
SLURM_TASKS_PER_NODE=1
SLURM_TOPOLOGY_ADDR=ax04
SLURM_TOPOLOGY_ADDR_PATTERN=node
SLURM_WORKING_CLUSTER=axon:slurm:6817:8448

Achorn is word number 2760 in /usr/share/dict/words

Running the job from the batch mode expands the SLURM environmental variables which provide a fair deal of information about the running job including the directory which it was started on and other characteristics of the environment. We put in some conditional logic which used the SLURM_JOB_ID variable to read a specific line from a file. Typically it probably makes more sense to read one of the task ids such as SLURM_ARRAY_TASK_ID , but this gives us a little variety.

As opposed to the hello_word script from before we are now modifying the log file names via a SBATCH directive: #SBATCH --output=array_%A-%a.log. Let's compare the output of two of the logs.

Comparing two tasks in the array

[aa3301@axon slurm-tutorial1]$ sdiff array_2759-1.log array_2759-2.log -s
SLURM_ARRAY_TASK_ID=1                                         | SLURM_ARRAY_TASK_ID=2
SLURM_JOB_ID=2760                                             | SLURM_JOB_ID=2761
SLURM_JOBID=2760                                              | SLURM_JOBID=2761
SLURM_TASK_PID=29726                                          | SLURM_TASK_PID=29727
Achorn is word number 2760 in /usr/share/dict/words           | Achras is word number 2761 in /usr/share/dict/words

Comparing the output you can see the differences in the SLURM environments between the two jobs. While 325 was the initial job number of the array the actual tasks ran as successive jobs. Again you can see SLURM_ARRAY_TASK_ID is probably the best variable to work with, but feel free to use any that you see fit.

The next example job named bad_array_job.sh is a simple script to show how Slurm interprets program crashes and job failures.

bad_array.sh

#!/bin/bash
#SBATCH --job-name=bad_array_test   # Job name
#SBATCH --output=bad_array_%A-%a.log    # Standard output and error log
#SBATCH --array=1-3                 # Array range

echo "This isn't going to turn out good!"
echo SLURM_ARRAY_TASK_ID=$SLURM_ARRAY_TASK_ID
let number=$SLURM_ARRAY_TASK_ID-2
let result=1000/$number
echo $result

In this job we are subtracting 2 from the SLURM_ARRAY_TASK_ID variable, and then performing a division with that number, so when the variable is equal to 0 we will get an error. We are going to run the job and cat the logs.

Crashes in the array

[aa3301@axon slurm-tutorial1]$ sbatch bad_array_job.sh
Submitted batch job 2764
[aa3301@axon slurm-tutorial1]$ cat bad_array_*.log
This isn't going to turn out good!
SLURM_ARRAY_TASK_ID=1
-1000
This isn't going to turn out good!
SLURM_ARRAY_TASK_ID=2
/var/spool/slurmd/job02766/slurm_script: line 12: let: result=1000/0: division by 0 (error token is "0")

This isn't going to turn out good!
SLURM_ARRAY_TASK_ID=3
1000

So all 3 tasks ran even though the middle task crashed. Let's take a look at how Slurm thought the job did.

Sacct info on our bad array

[aa3301@axon slurm-tutorial1]$ sacct -j 2764
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
2764_3       bad_array+      burst        zrc          1  COMPLETED      0:0
2764_3.batch      batch                   zrc          1  COMPLETED      0:0
2764_1       bad_array+      burst        zrc          1  COMPLETED      0:0
2764_1.batch      batch                   zrc          1  COMPLETED      0:0
2764_2       bad_array+      burst        zrc          1  COMPLETED      0:0
2764_2.batch      batch                   zrc          1  COMPLETED      0:0

As you can see in the output above from Slurm's sacct command the job and tasks registered as completed. Two other items to note here, the jobs didn't show they completed in order and the numbering in this report is actually different from the actual job id.

Slurm reports jobs as failing in due to different conditions. In this case the fact that the script in task 2 failed didn't register a failure since slurm was able to complete the command it was trying to run successfully, even though that command actually failed when it was executed.

Conversely if we place some random characters at the bottom of that script so Slurm can't run all the commands in the submission script and we get a different result.

Slurm failed job

[aa3301@axon slurm-tutorial1]$ echo bad-command >> bad_array_job.sh
[aa3301@axon slurm-tutorial1]$ sbatch bad_array_job.sh
Submitted batch job 2770
[aa3301@axon slurm-tutorial1]$ cat bad_array_2770-*.log
This isn't going to turn out good!
SLURM_ARRAY_TASK_ID=1
-1000
/var/spool/slurmd/job02771/slurm_script: line 14: bad-command: command not found
This isn't going to turn out good!
SLURM_ARRAY_TASK_ID=2
/var/spool/slurmd/job02772/slurm_script: line 12: let: result=1000/0: division by 0 (error token is "0")

/var/spool/slurmd/job02772/slurm_script: line 14: bad-command: command not found
This isn't going to turn out good!
SLURM_ARRAY_TASK_ID=3
1000
/var/spool/slurmd/job02770/slurm_script: line 14: bad-command: command not found
[aa3301@axon slurm-tutorial1]$ sacct -j 2770
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
2770_3       bad_array+      burst        zrc          1     FAILED    127:0
2770_3.batch      batch                   zrc          1     FAILED    127:0
2770_1       bad_array+      burst        zrc          1     FAILED    127:0
2770_1.batch      batch                   zrc          1     FAILED    127:0
2770_2       bad_array+      burst        zrc          1     FAILED    127:0
2770_2.batch      batch                   zrc          1     FAILED    127:0

Now the job shows as failed, with an accompanying exit code (127 "command not found"). Still in this case the subsequent jobs/tasks continued executing even though the first failed.

Additional resources

There is another sample array job in the tutorial directory which consists of two files my-jobste-array.slurm and jobstep.slurm, which is invoked by running 'sbatch my-jobste-array.slurm' from the command line. This job uses a bash loop to submit its tasks which has an interesting effect of directing the output and success rate to a single file which could be helpful in some circumstances.

There is good example of how to clump up lots of tiny tasks into groups for more effective scheduling in this link from the University of Florida. While the goal of arrays is to run many jobs in parallel if they are too small they can end up being limited by the overhead of the scheduler and starting new sessions so it can be beneficial to break them up into clumps.

This tutorial didn't cover job dependencies, in all these examples the arrays completed regardless of the previous success. This can be a complicated problem which we may cover more in the future. In the interim there is some information near the bottom of the Slurm Array page as well as this page for NIH.