Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: typo

Table of Contents
maxLevel2

...

This program will print out "Hello World!" when run on a gpu server or print "Hello Hello" when no gpu module is found. 

Singularity 

Singularity is a software tool that brings Docker-like containers and reproducibility to scientific computing and HPC. Singularity has Docker container support and enables users to easily  run different flavors of Linux with different software stacks. These containers provide a single universal on-ramp from the laptop, to HPC, to cloud.

Users can run Singularity containers just as they run any other program on our HPC clusters. Example usage of Singularity is listed below. For additional details on how to use Singularity, please contact us or refer to the Singularity User Guide.

Downloading Pre-Built Containers

Singularity makes it easy to quickly deploy and use software stacks or new versions of software. Since Singularity has Docker support, users can simply pull existing Docker images from Docker Hub or download docker images directly from software repositories that increasingly support the Docker format. Singularity Container Library also provides a number of additional containers.


You can use the pull command to download pre-built images from an external resource into your current working directory. The docker:// uri reference can be used to pull Docker images. Pulled Docker images will be automatically converted to the Singularity container format. 

...

Here's an example of pulling the latest stable release of the Tensorflow Docker image and running it with Singularity. (Note: these pre-built versions may not be optimized for use with our CPUs.)

...

Singularity - Interactive Shell 

The shell command allows you to spawn a new shell within your container and interact with it as though it were a small virtual machine:

...

Code Block
Singularity tensorflow.simg:~> python
>>> import tensorflow as tf
>>> print(tf.__version__)
1.13.1
>>> exit()


When done, you may exit the Singularity interactive shell with the "exit" command.


Singularity tensorflow.simg:~> exit

Singularity: Executing Commands

The exec command allows you to execute a custom command within a container by specifying the image file. This is the way to invoke commands in your job submission script.

...

Singularity: Running a Batch Job

Below is an example of job submission script named submit.sh that runs Singularity. Note that you may need to specify the full path to the Singularity image you wish to run.


Code Block
#!/bin/bash
# Singularity example submit script for Slurm.
#
# Replace <ACCOUNT> with your account name before submitting.
#
#SBATCH -A <ACCOUNT>           # Set Account name
#SBATCH --job-name=tensorflow  # The job name
#SBATCH -c 1                   # Number of cores
#SBATCH -t 0-0:30              # Runtime in D-HH:MM
#SBATCH --mem-per-cpu=4gb      # Memory per cpu core

module load singularity
singularity exec tensorflow.simg python -c 'import tensorflow as tf; print(tf.__version__)'


Then submit the job to the scheduler. This example prints out the tensorflow version.


$ sbatch submit.sh

For additional details on how to use Singularity, please contact us or refer to the Singularity User Guide.


Example of R run

For this example, the R code below is used to generate a graph ''Rplot.pdf'' of a discrete Delta-hedging of a call. It hedges along a path and repeats over many paths. There are two R files required:

...

This program will leave several files in the output directory: slurm-<jobid>.out, out.mat, and matoutfile.

Matlab (multi-threading)

...

Important note: On Yeti, where Matlab was single thread by default, it appeared that the more recent versions of Matlab took liberties to grab all the cores within a node even when fewer (or even only one) cores were specified as above. On Terremoto, we believe this has been addressed by implementing a system mechanism which enforces the proper usage of the number of specified cores.

Matlab with Parallel Server

Matlab 2020b and 2022b on Terremoto now have access to Parallel Server, and the toolbox is installed. The first time you run Matlab, it can take a few minutes to fully open, especially over WiFi. In order to use Parallel Server, a Cluster Profile needs to be created to use the Slurm job scheduler. You will need to request the number of nodes desired as well and may need to increase the amount of memory desired. With an interactive job requesting two nodes start with:

srun --pty -t 0-04:00  --nodes=2 --mem=10gb -A <your-account> /bin/bash

Step One

Using the Configure for Slurm MathWorks tutorial as a guide:

  1. On the Home tab, in the Environment area, select Parallel > Create and Manage Clusters. Click ok on the dialog box Product Required: MATLAB Parallel Server.
  2. Create a new profile in the Cluster Profile Manager by selecting Add Cluster Profile > Slurm.
  3. With the new profile selected in the list, click Rename and edit the profile name something informative for future use, e.g., InstallTest. Press Enter.
  4. In the Properties tab, provide settings for the following fields:
    1. Set the Description field to something informative, e.g., For testing installation.
    2. Set the JobStorageLocation to the location where you want job and task data to be stored, e.g.,  /moto/home/<your-directory>.
      1. Note: JobStorageLocation should not be shared by parallel computing products running different versions; each version on your cluster should have its own JobStorageLocation.
  5. Set the NumWorkers field to the number of workers you want to run the validation tests on. This should be not be more than what is specified by --nodes= in the interactive job request, i.e., srun.
  6. Set the ClusterMatlabRoot to the installation location of the MATLAB version, i.e., /moto/opt/matlab/R2020b or /moto/opt/matlab/R2022b.
  7. Within ADDITIONAL SLURM PROPERTIES add  A <your account-name> (replace <your account-name> accordingly).
  8. Click Done to save your cluster profile.
  9. Step 2: Validate the Cluster Profile

Step Two
In this step you verify your cluster profile, and thereby your installation. You can specify the number of workers to use when validating your profile. If you do not specify the number of workers in the Validation tab, then the validation will attempt to use as many workers as the value specified by the NumWorkers property on the Properties tab. You can specify a smaller number of workers to validate your configuration without occupying the whole cluster.

  1. If it is not already open, start the Cluster Profile Manager from the MATLAB desktop. On the Home tab, in the Environment area, select Parallel > Create and Manage Clusters.
  2. Select your cluster profile in the listing.
  3. Click Validation tab.
  4. Use the checkboxes to choose all tests, or a subset of the validation stages, and specify the number of workers to use when validating your profile.
  5. Click Validate. Note when the Parallel pool test (parpool) starts running, the screen flips

...

  1. back to Matlab, and in the very bottom left  status bar, you will

...

  1. see  Starting Parallel Pool on the profile name you created in Step 1.
  2. The Validation Results tab shows the output as shown in the MathWorks tutorial.
  3. If your validation passed, you now have a valid profile that you can use in other parallel applications. You can make any modifications to your profile appropriate for your applications, such as NumWorkersRange, AttachedFiles, AdditionalPaths, etc.

Python and JULIA

To use python you need to use:

...

Code Block
$ srun --pty -t 0-02:00:00 --gres=gpu:1 -A <group_name> /bin/bash


Then load the singularity environment module and run the tensorflow container, which was built from the Tensorflow docker image. You can start an interactive singularity shell and specify the --nv flag which instructs singularity to use the Nvidia GPU driver.


Code Block
$ module load singularity

$ singularity shell --nv /moto/opt/singularity/tensorflow-1.13-gpu-py3-moto.simg

Singularity tensorflow-1.13-gpu-py3-moto.simg:~> python
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
..
>>> exit()


You may type "exit" to exit when you're done with the Singularity shell.



Singularity tensorflow-1.13-gpu-py3-moto.simg:~> exit

Below is an example of job submission script named submit.sh that runs Tensorflow with GPU support using Singularity. 


Code Block
#!/bin/bash
# Tensorflow with GPU support example submit script for Slurm.
#
# Replace <ACCOUNT> with your account name before submitting.
#
#SBATCH -A <ACCOUNT>           # Set Account name
#SBATCH --job-name=tensorflow  # The job name
#SBATCH -c 1                   # Number of cores
#SBATCH -t 0-0:30              # Runtime in D-HH:MM
#SBATCH --gres=gpu:1           # Request a gpu module

module load singularity
singularity exec --nv /moto/opt/singularity/tensorflow-1.13-gpu-py3-moto.simg python -c 'import tensorflow as tf; print(tf.__version__)'


Then submit the job to the scheduler. 
This example prints out the tensorflow version.


$ sbatch submit.sh

For additional details on how to use Singularity, please contact us, see our Singularity documentation, or refer to the Singularity User Guide.


Another option:

Please note that you should not work on our head node.

...