Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Note: This guide provides an introduction to the SLURM job scheduler and its application on the c2b2 clusters. The clusters comprise:

  • 8 compute nodes, each with 20-core processors and 128 GB of memory

  • Some nodes have 192 cores and 1.5 TB of memory

  • 1 GPU node featuring 2 NVIDIA L40s GPU cards

  • 1 GPU node with a Superchip GH200 ARM architecture, 1 GPU, and 570 GB of memory

This guide will help you get started with using SLURM on these clusters.

Introductions

Jobs are executed in batch mode, without user intervention. The typical process involves

  • Logging into the login node ((link unavailable))

  • Preparing a job script that specifies the work to be done and required computing resources

  • Submitting the job to the queue

  • Optionally logging out while the job runs

  • Returning to collect the output data once the job is complete


This guide provides an introduction to submitting and monitoring jobs using SLURM. The covered topics include:

  • Essential SLURM commands

  • Creating a job submission script

  • Understanding SLURM partitions

  • Submitting jobs to the queue

  • Monitoring job progress

  • Canceling jobs from the queue

  • Setting environment variables

  • Managing job dependencies and job arrays

Commands

The following table summarizes the most commonly used SLURM commands:

Command

Description

sbatch

Submits a job script for execution.

sinfo

Displays the status of SLURM-managed partitions and nodes, with customizable filtering, sorting, and formatting options.

squeue

Shows the status of jobs, with options for filtering, sorting, and formatting, defaulting to priority order for running and pending jobs.

srun

Run a parallel job on cluster.

scancel

Cancels a pending or running job

sacct

Provides accounting information for active or completed jobs.

salloc

This command is used to allocate resources and submit an interactive job to Slurm, allowing users to execute tasks in real-time with manual input.

SLURM commands offer detailed documentation and guidance through their manual (man) pages, which can be accessed by typing, for example

man sinfo

  • No labels