Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
maxLevel2

Processes and Jobs (sbatch, srun, and more)

In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. A process is normally launched by invoking it by the name of the executable (compiled code) associated with it, either directly at the Unix shell prompt, or within a shell script.

...

Warning
When a user ignores that recommendation and executes processes that are compute-intensive, longer than momentary, and especially requiring multiple cores, the login node becomes overloaded, preventing other users from doing their regular work. In such cases, we typically terminate the processes and inform the user about it with a request to run the processes within jobs instead. Please be aware that the cluster is a shared resource and cooperate with us on trying to limit the computing activity on the head node to minimum.


If you need to run a CPU intensive process, please start an interactive job as described bellow:

https://confluence.columbia.edu/confluence/display/rcs/Insomnia+-+Submitting+Jobs#Insomnia-SubmittingJobs-InteractiveJobs

...

Please keep in mind that the above directives will only secure the corresponding type of a node but do not ensure that all its memory is available to the job (even if one specifies its --exclusive usage). By default,  5,800 MB is allocated to each core.

Interactive Jobs


Interactive jobs allow user interaction during their execution. They deliver to the user a new shell from which applications can be launched.

...

Directive

Short Version

Description

Example

Notes

--account=<account>

-A <account>

Account.

#SBATCH --account=stats


--job-name=<job name>

-J <job name>

Job name.

#SBATCH -J DiscoProject


--time=<time>

-t <time>

Time required.

#SBATCH --time=10:00:00

The maximum time allowed is five days.

--mem=<memory>


Memory required per node.

#SBATCH --mem=16gb


--mem-per-cpu=<memory>
Memory per cpu.#SBATCH --mem-per-cpu=5G

--constraint=mem768


Specifying a large (768 GB RAM) node.

#SBATCH -C mem768


--cpus-per-task=<cpus>

-c <cpus>

CPU cores per task.

#SBATCH -c 1

Nodes have a maximum of 24 cores.

--nodes=<nodes>

-N <nodes>

Nodes required for the job.

#SBATCH -N 4


--array=<indexes>

-a <indexes>

Submit a job array.

#SBATCH -a 1-4

See below for discussion of job arrays.

--mail-type=<ALL,BEGIN,END,FAIL,NONE>
Send email job notifications#SBATCH --mail-type=ALL
--mail-user=<email_address>  
Email address#SBATCH --mail-user=me@email.com


Walltime 

The walltime is specified with "-t" flag. For example:

#SBATCH -t 10:00:00

That is walltime format that translates to 10 hours (00 minutes and 00 seconds).
If you want to request just 1 hour walltime,  you should request 1:00:00

Acceptable time formats in Slurm scheduler are: "minutes", "minutes:seconds", "hours:minutes:seconds", 
"days-hours", "days-hours:minutes" and "days-hours:minutes:seconds".

The maximum time allowed is five days.


Memory Requests


There are two ways to ask for memory and they are are mutually exclusive. You can ask either for 
1) memory per cpu 
or 
2) memory per node 

If you do not specify the memory requirement, by default you get 5,800 MB per CPU. 

Insomnia has 191 Standard Nodes with 192 GB of memory (about 187 GB usable).

Insomnia's 56 high memory nodes have 768 GB of memory each. They are otherwise identical to the standard nodes.


For example, 
--mem-per-cpu=5gb 
Minimum memory required per allocated CPU. If you request 32 cores (one node) you will get 160gb of memory on both standard node and on high-memory node. 

If you specify the real memory required per node: 
--mem=160gb 

You will get the same. 

However, if you specify 
#SBATCH --exclusive 
#SBATCH -C mem768 
#SBATCH --mem=700gb

You will get 700 gb on high memory node. 

Job Arrays


Multiple copies of a job can be submitted by using a job array. The --array option can be used to specify the job indexes Slurm should apply.

...