Table of Contents | ||
---|---|---|
|
Processes and Jobs (sbatch, srun, and more)
In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. A process is normally launched by invoking it by the name of the executable (compiled code) associated with it, either directly at the Unix shell prompt, or within a shell script.
...
Warning |
---|
When a user ignores that recommendation and executes processes that are compute-intensive, longer than momentary, and especially requiring multiple cores, the login node becomes overloaded, preventing other users from doing their regular work. In such cases, we typically terminate the processes and inform the user about it with a request to run the processes within jobs instead. Please be aware that the cluster is a shared resource and cooperate with us on trying to limit the computing activity on the head node to minimum. |
If you need to run a CPU intensive process, please start an interactive job as described bellow:
https://confluencecolumbiauniversity.columbiaatlassian.edunet/confluencewiki/display/rcs/Insomnia+-+Submitting+Jobs#Insomnia-SubmittingJobs-InteractiveJobs
Jobs can request compute resources on a per-core basis or a per-node basis.
...
Code Block |
---|
#SBATCH --nodes=1 |
You should may specify a memory requirement when requesting use of a node.
Code Block |
---|
#SBATCH --mem=187G510G # Standard nodes have approximately 187G510G of total memory available |
...
To explicitly request a "standard node" with 192 and 400 GB RAM, you may specify the mem192 mem400 feature.
Code Block |
---|
#SBATCH -C mem192 mem400 |
If your job requires more than the standard 192 512 GB then you may optionally add a constraint to request one of the cluster's high memory nodes, each of which has 768 GB 1024 GB (i.e. 1TB) of memory. The feature to request is "mem768mem1024".
Code Block |
---|
#SBATCH -C mem768mem1024 |
or
Code Block |
---|
#SBATCH --constraint=mem768mem1024 |
Please keep in mind that the above directives will only secure the corresponding type of a node but do not ensure that all its memory is available to the job (even if one specifies its --exclusive usage). By default, 5,800 MB is allocated to each core.
Interactive Jobs
Interactive jobs allow user interaction during their execution. They deliver to the user a new shell from which applications can be launched.
...
Code Block |
---|
srun --pty -t 0-01:00 -C mem768mem1024 -A <ACCOUNT> /bin/bash |
If a node is available, it will be picked for you automatically, and you will see a command line prompt on a shell running on it. If no nodes are available, your current shell will wait.
...
Directive | Short Version | Description | Example | Notes |
---|---|---|---|---|
--account=<account> | -A <account> | Account. | #SBATCH --account=stats | |
--job-name=<job name> | -J <job name> | Job name. | #SBATCH -J DiscoProject | |
--time=<time> | -t <time> | Time required. | #SBATCH --time=10:00:00 | The maximum time allowed is five days. |
--mem=<memory> | Memory required per node. | #SBATCH --mem=16gb | ||
--mem-per-cpu=<memory> | Memory per cpu. | #SBATCH --mem-per-cpu=5G | ||
--constraint=mem768mem1024 | Specifying a large (768 1024 GB RAM) node. | #SBATCH -C mem768mem1024 | ||
--cpus-per-task=<cpus> | -c <cpus> | CPU cores per task. | #SBATCH -c 1 | Nodes have a maximum of 24 cores. |
--nodes=<nodes> | -N <nodes> | Nodes required for the job. | #SBATCH -N 4 | |
--array=<indexes> | -a <indexes> | Submit a job array. | #SBATCH -a 1-4 | See below for discussion of job arrays. |
--mail-type=<ALL,BEGIN,END,FAIL,NONE> | Send email job notifications | #SBATCH --mail-type=ALL | ||
--mail-user=<email_address> | Email address | #SBATCH --mail-user=me@email.com |
Walltime
The walltime is specified with "-t" flag. For example:
#SBATCH -t 10:00:00
That is walltime format that translates to 10 hours (00 minutes and 00 seconds).
If you want to request just 1 hour walltime, you should request 1:00:00
Acceptable time formats in Slurm scheduler are: "minutes", "minutes:seconds", "hours:minutes:seconds",
"days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".
The maximum time allowed is five days.
Memory Requests
There are two ways to ask for memory and they are are mutually exclusive. You can ask either for
1) memory per cpu
or
2) memory per node
If you do not specify the memory requirement, by default you get 5,800 MB per CPU.
At this writing (February 2024), Insomnia has 191 25 Standard Nodes with 192 with 512 GB of memory (about 187 approx 510 GB usable).
Insomnia's 56 9 high memory nodes have 768 GB 1TB of memory each. They are otherwise identical to the standard nodes.
For example,
--mem-per-cpu=5gb
Minimum memory required per allocated CPU. If you request 32 cores (one node) you will get 160gb of memory on both standard node and on high-memory node.
If you specify the real memory required per node:
--mem=160gb
You will get the same.
However, if you specify
#SBATCH --exclusive
#SBATCH -C mem768 mem1024
#SBATCH --mem=700gb900gb
You will get 700 900 gb on high memory node.
Job Arrays
Multiple copies of a job can be submitted by using a job array. The --array option can be used to specify the job indexes Slurm should apply.
...