...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Table of Contents | ||
---|---|---|
|
Processes and Jobs (sbatch, srun, and more)
...
After logging in, you land on a login (aka "head") node, from which users normally launch their jobs. The login node has some restrictions on the scope of processes that can be run on it. In order to allow for special projects and activities, these restrictions are still quite lenient at this time, and we strongly rely on our users to severely limit launching processes on the head node.
Warning |
---|
When a user ignores that recommendation and executes processes that are compute-intensive, longer than momentary, and especially requiring multiple cores, the login node becomes overloaded, preventing other users from doing their regular work. In such cases, we typically terminate the processes and inform the user about it with a request to run the processes within jobs instead. Please be aware that the cluster is a shared resource and cooperate with us on trying to limit the computing activity on the head node to minimum. |
If you need to run a CPU intensive process, please start an interactive job as described bellow:
...
Cores can be requested using either -c or --cpus-per-task (in this and the following examples, "or" indicates an exactly equivalent alternative syntax).
Code Block |
---|
#SBATCH -c 1 or |
...
Code Block |
---|
#SBATCH --cpus-per-task=1 |
It is important to also specify your memory requirement when using less than a full node as this will allow the scheduler to ensure that there will be enough memory available on the node where your job runs.
...
To specify the number of nodes, use -N
or --nodes
.
Code Block |
---|
#SBATCH -N 1 or |
...
Code Block |
---|
#SBATCH --nodes=1 |
You may specify a memory requirement when requesting use of a node.
...
Code Block |
---|
#SBATCH -C mem1024 or |
...
Code Block |
---|
#SBATCH --constraint=mem1024 |
Please keep in mind that the above directives will only secure the corresponding type of a node but do not ensure that all its memory is available to the job (even if one specifies its --exclusive usage). By default, 5 5,800 MB is allocated to each core.
Interactive Jobs
Interactive jobs allow user interaction during their execution. They deliver to the user a new shell from which applications can be launched.
...
The following table lists some common directives used in Slurm submit scripts. Each should be preceded by #SBATCH when used in a submit script. Many directives have a short alternate name and these are listed where available. The examples sometimes use the long version of a given directive and sometimes the short version; in either case no preference is implied.
Directive | Short Version | Description | Example | Notes |
---|---|---|---|---|
--account=<account> | -A <account> | Account. | #SBATCH --account=stats | |
--job-name=<job name> | -J <job name> | Job name. | #SBATCH -J DiscoProject | |
--time=<time> | -t <time> | Time required. | #SBATCH --time=10:00:00 | The maximum time allowed is five days. |
--mem=<memory> | Memory required per node. | #SBATCH --mem=16gb | ||
--mem-per-cpu=<memory> | Memory per cpu. | #SBATCH --mem-per-cpu=5G | ||
--constraint=mem1024 | Specifying a large (1024 GB RAM) node. | #SBATCH -C mem1024 | ||
--cpus-per-task=<cpus> | -c <cpus> | CPU cores per task. | #SBATCH -c 1 | Nodes have a maximum of 24 cores. |
--nodes=<nodes> | -N <nodes> | Nodes required for the job. | #SBATCH -N 4 | |
--array=<indexes> | -a <indexes> | Submit a job array. | #SBATCH -a 1-4 | See below for discussion of job arrays. |
--mail-type=<ALL,BEGIN,END,FAIL,NONE> | Send email job notifications | #SBATCH --mail-type=ALL | ||
--mail-user=<email_address> | Email address | #SBATCH --mail-user=me@email.com |
Walltime
The walltime is specified with "-t" flag. For example:
Code Block |
---|
#SBATCH -t 10:00:00 |
That is walltime format that translates to 10 hours (00 minutes and 00 seconds).
If you want to request just 1 hour walltime, you should request 1:00:00
.
Acceptable time formats in Slurm scheduler are: "minutes", "minutes:seconds", "hours:minutes:seconds",
"days-hours", "days-hours:minutes" and "days-hours:minutes:seconds"
.
The maximum time allowed is five days.
Memory Requests
There are two ways to ask for memory and they are are mutually exclusive. You can ask either for for:
1) memory per cpu
or
2) memory per node
If you do not specify the memory requirement, by default you get 5,800 MB per CPU.
At this writing (February 2024), Insomnia has 25 Standard Nodes with 512 GB of memory (approx 510 GB usable).
Insomnia's9 high memory nodes have 1TB of memory each. They are otherwise identical to the standard nodes.
For example, :
Code Block |
---|
--mem-per-cpu=5gb |
Minimum memory required per allocated CPU. If you request 32 cores (one node) you will get 160gb of memory on both standard node and on high-memory node.
If you specify the real memory required per node:
Code Block |
---|
--mem=160gb |
You will get the same.
However, if you specify specify:
Code Block |
---|
#SBATCH --exclusive |
...
#SBATCH -C mem1024 |
...
#SBATCH --mem=900gb |
You will get 900 gb on high memory node.
Job Arrays
Multiple copies of a job can be submitted by using a job array. The --array option can be used to specify the job indexes Slurm should apply.
...
Code Block |
---|
$ sbatch -a 1-5 helloworld.sh
Submitted batch job 629249
|
In this example the job IDs will be the number 629249 followed by _1, _2, etc. so the first job in the array can be accessed using the job ID 629249_1 and the last 629249_5.
Code Block |
---|
$ scontrol show job 629249_1
|
Note: There is a limit of 1,001 max job elements in an job array. If you try to submit more than 1,001 elements, the scheduler issues the following:
"Batch job submission failed: Invalid job array specification".
The environment variable $SLURM_ARRAY_TASK_ID indicates the index of the array element (i.e. job) in the job array, and is accessible from within that job.
Job scheduling basics
Note |
---|
The walltime limit on the cluster is 5 days (120 hours). 5 day jobs may only be run on nodes that your group owns. For all other nodes, the time limit is 12 hours. |
...
Note |
---|
Note: There is a limit of 1,005 500 max jobs running per user. Any jobs that exceed this limit will remain in queue until (some of) the user's other running jobs complete. Additionally, there is a limit of 5,000 max submitted jobs per user simultaneously. If you try to submit more than 5,000 jobs simultaneously, the scheduler displays the following error:
|
...