Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Jupyter notebook users should request a GPU on the srun line if that's what they want. Also, renumbered the instructions. Useful later

Table of ContentsmaxLevel

Table of Contents
maxLevel2

In order for the scripts in these examples to work, you will need to replace <ACCOUNT> with your group's account name.

...

This program will print out "Hello World!" when run on a gpu server or print "Hello Hello" when no gpu module is found. 

Singularity 

Singularity is a software tool that brings Docker-like containers and reproducibility to scientific computing and HPC. Singularity has Docker container support and enables users to easily  run different flavors of Linux with different software stacks. These containers provide a single universal on-ramp from the laptop, to HPC, to cloud.

Users can run Singularity containers just as they run any other program on our HPC clusters. Example usage of Singularity is listed below. For additional details on how to use Singularity, please contact us or refer to the Singularity User Guide.

Downloading Pre-Built Containers

Singularity makes it easy to quickly deploy and use software stacks or new versions of software. Since Singularity has Docker support, users can simply pull existing Docker images from Docker Hub or download docker images directly from software repositories that increasingly support the Docker format. Singularity Container Library also provides a number of additional containers.


You can use the pull command to download pre-built images from an external resource into your current working directory. The docker:// uri reference can be used to pull Docker images. Pulled Docker images will be automatically converted to the Singularity container format. 

...

Here's an example of pulling the latest stable release of the Tensorflow Docker image and running it with Singularity. (Note: these pre-built versions may not be optimized for use with our CPUs.)

...

Singularity - Interactive Shell 

The shell command allows you to spawn a new shell within your container and interact with it as though it were a small virtual machine:

...

Code Block
Singularity tensorflow.simg:~> python
>>> import tensorflow as tf
>>> print(tf.__version__)
1.13.1
>>> exit()


When done, you may exit the Singularity interactive shell with the "exit" command.


Singularity tensorflow.simg:~> exit

Singularity: Executing Commands

The exec command allows you to execute a custom command within a container by specifying the image file. This is the way to invoke commands in your job submission script.

...

Singularity: Running a Batch Job

Below is an example of job submission script named submit.sh that runs Singularity. Note that you may need to specify the full path to the Singularity image you wish to run.


Code Block
#!/bin/bash
# Singularity example submit script for Slurm.
#
# Replace <ACCOUNT> with your account name before submitting.
#
#SBATCH -A <ACCOUNT>           # Set Account name
#SBATCH --job-name=tensorflow  # The job name
#SBATCH -c 1                   # Number of cores
#SBATCH -t 0-0:30              # Runtime in D-HH:MM
#SBATCH --mem-per-cpu=4gb      # Memory per cpu core

module load singularity
singularity exec tensorflow.simg python -c 'import tensorflow as tf; print(tf.__version__)'


Then submit the job to the scheduler. This example prints out the tensorflow version.


$ sbatch submit.sh

For additional details on how to use Singularity, please contact us or refer to the Singularity User Guide.

...

Swak4FOAM in a Singularity container

swak4Foam Swak4FOAM (SWiss Army Knife for Foam) can be run inside a container. Using this Docker container as inspiration, here is a sample tutorial.

Code Block
module load singularity 
singularity pull docker://hfdresearch/swak4foamandpyfoam:latest-v7.0
singularity shell swak4foamandpyfoam_latest-v7.0.sif

From the pulsedPitzDaily tutorial about halfway down the page, change to use /local as it has more space than /tmp:

Code Block
source /opt/openfoam7/etc/bashrc
cd /local
cp -r /opt/swak4Foam/Examples/groovyBC/pulsedPitzDaily .
cd pulsedPitzDaily
pyFoamPrepareCase.py .

You should see the following output:

Code Block
Looking for template values .

Used values

              Name - Value
----------------------------------------
          caseName - "pulsedPitzDaily"
          casePath - "/local/pulsedPitzDaily"
          foamFork - openfoam
       foamVersion - 7
numberOfProcessors - 1

No script ./derivedParameters.py for derived values
Clearing .
Clearing /local/pulsedPitzDaily/PyFoam.blockMesh.logfile
Clearing /local/pulsedPitzDaily/PyFoamPrepareCaseParameters
Writing parameters to ./PyFoamPrepareCaseParameters
Writing report to ./PyFoamPrepareCaseParameters.rst
Found 0.org . Clearing 0

Looking for templates with extension .template in  /local/pulsedPitzDaily
Looking for templates with extension .template in  /local/pulsedPitzDaily/0.org
Looking for templates with extension .template in  /local/pulsedPitzDaily/constant
Looking for templates with extension .template in  /local/pulsedPitzDaily/constant/polyMesh
Found template for /local/pulsedPitzDaily/constant/LESProperties
Found template for /local/pulsedPitzDaily/constant/turbulenceProperties
Looking for templates with extension .template in  /local/pulsedPitzDaily/system

No script for mesh creation found. Looking for 'blockMeshDict'
/local/pulsedPitzDaily/constant/polyMesh/blockMeshDict found. Executing 'blockMesh'
/*---------------------------------------------------------------------------*\
  =========                 |
  \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox
   \\    /   O peration     | Website:  https://openfoam.org
    \\  /    A nd           | Version:  7
     \\/     M anipulation  |
\*---------------------------------------------------------------------------*/
Build  : 7-1ff648926f77
Exec   : blockMesh -case /local/pulsedPitzDaily
Date   : May 03 2023
Time   : 16:23:59
PID    : 2082964
I/O    : uncollated
Case   : /local/pulsedPitzDaily
nProcs : 1
sigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).
fileModificationChecking : Monitoring run-time modified files using timeStampMaster (fileModificationSkew 10)
allowSystemOperations : Allowing user-supplied system call operations

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Not deleting polyMesh directory
    "/local/pulsedPitzDaily/constant/polyMesh"
    because it contains blockMeshDict
Creating block mesh from
    "/local/pulsedPitzDaily/constant/polyMesh/blockMeshDict"
Creating block edges
No non-planar block faces defined
Creating topology blocks
Creating topology patches

Reading patches section

Creating block mesh topology

Reading physicalType from existing boundary file

Default patch type set to empty

Check topology

    Basic statistics
        Number of internal faces : 18
        Number of boundary faces : 42
        Number of defined boundary faces : 42
        Number of undefined boundary faces : 0
    Checking patch -> block consistency

Creating block offsets
Creating merge list .

Creating polyMesh from blockMesh
Creating patches
Creating cells
Creating points with scale 0.001
    Block 0 cell size :
        i : 0.00158284 .. 0.000791418
        j : 0.000313389 .. 0.000564101
        k : 0.001
    Block 1 cell size :
        i : 0.00158284 .. 0.000791418
        j : 0.000440611 .. 0.00176244
        k : 0.001
    Block 2 cell size :
        i : 0.00158284 .. 0.000791418
        j : 0.00178262 .. 0.000445655
        k : 0.001
    Block 3 cell size :
        i : 0.000528387 .. 0.00211355
        j : 0.00113333 .. 0.00113333 0.00113283 .. 0.00113283 0.00113333 .. 0.00113333 0.00113283 .. 0.00113283
        k : 0.001
    Block 4 cell size :
        i : 0.000528464 .. 0.00211385 0.000528454 .. 0.00211383 0.000528464 .. 0.00211385 0.000528454 .. 0.00211383
        j : 0.000766355 .. 0.000383178 0.000766938 .. 0.000384514 0.000766355 .. 0.000383178 0.000766938 .. 0.000384514
        k : 0.001
    Block 5 cell size :
        i : 0.000528387 .. 0.00211355
        j : 0.000313389 .. 0.000564101 0.000314853 .. 0.00056517 0.000313389 .. 0.000564101 0.000314853 .. 0.00056517
        k : 0.001
    Block 6 cell size :
        i : 0.000528464 .. 0.00211385 0.000528492 .. 0.00211397 0.000528464 .. 0.00211385 0.000528492 .. 0.00211397
        j : 0.000440611 .. 0.00176244 0.000442137 .. 0.00176067 0.000440611 .. 0.00176244 0.000442137 .. 0.00176067
        k : 0.001
    Block 7 cell size :
        i : 0.000528502 .. 0.00211401 0.000528472 .. 0.00211389 0.000528502 .. 0.00211401 0.000528472 .. 0.00211389
        j : 0.00178262 .. 0.000445655 0.00178107 .. 0.000445268 0.00178262 .. 0.000445655 0.00178107 .. 0.000445268
        k : 0.001
    Block 8 cell size :
        i : 0.0020578 .. 0.00514451 0.00205689 .. 0.00514223 0.0020578 .. 0.00514451 0.00205689 .. 0.00514223
        j : 0.000938889 0.000929955 .. 0.000929955 0.000938889 0.000929955 .. 0.000929955
        k : 0.001
    Block 9 cell size :
        i : 0.00204731 .. 0.00511826 0.00204716 .. 0.0051179 0.00204731 .. 0.00511826 0.00204716 .. 0.0051179
        j : 0.000944444 .. 0.000944444 0.000938489 .. 0.000938489 0.000944444 .. 0.000944444 0.000938489 .. 0.000938489
        k : 0.001
    Block 10 cell size :
        i : 0.0020466 .. 0.00511651
        j : 0.000928571 .. 0.000928571 0.00092161 .. 0.00092161 0.000928571 .. 0.000928571 0.00092161 .. 0.00092161
        k : 0.001
    Block 11 cell size :
        i : 0.00204718 .. 0.00511796 0.00204744 .. 0.0051186 0.00204718 .. 0.00511796 0.00204744 .. 0.0051186
        j : 0.00105 .. 0.00105 0.00104025 .. 0.00104025 0.00105 .. 0.00105 0.00104025 .. 0.00104025
        k : 0.001
    Block 12 cell size :
        i : 0.00205182 .. 0.00512954 0.00205252 .. 0.00513131 0.00205182 .. 0.00512954 0.00205252 .. 0.00513131
        j : 0.00117906 .. 0.000294764 0.00116948 .. 0.00029237 0.00117906 .. 0.000294764 0.00116948 .. 0.00029237
        k : 0.001

Adding cell zones
    0    center

Writing polyMesh
----------------
Mesh Information
----------------
  boundingBox: (-0.0206 -0.0254 -0.0005) (0.29 0.0254 0.0005)
  nPoints: 25012
  nCells: 12225
  nFaces: 49180
  nInternalFaces: 24170
----------------
Patches
----------------
  patch 0 (start: 24170 size: 30) name: inlet
  patch 1 (start: 24200 size: 57) name: outlet
  patch 2 (start: 24257 size: 223) name: upperWall
  patch 3 (start: 24480 size: 250) name: lowerWall
  patch 4 (start: 24730 size: 24450) name: frontAndBack

End

No mesh decomposition necessary
Looking for originals in /local/pulsedPitzDaily
Looking for originals in /local/pulsedPitzDaily/constant
Looking for originals in /local/pulsedPitzDaily/constant/polyMesh
Looking for originals in /local/pulsedPitzDaily/system
Copying /local/pulsedPitzDaily/0.org to /local/pulsedPitzDaily/0

No field decomposition necessary
Looking for templates with extension .postTemplate in  /local/pulsedPitzDaily
Looking for templates with extension .postTemplate in  /local/pulsedPitzDaily/0.org
Looking for templates with extension .postTemplate in  /local/pulsedPitzDaily/constant
Looking for templates with extension .postTemplate in  /local/pulsedPitzDaily/constant/polyMesh
Looking for templates with extension .postTemplate in  /local/pulsedPitzDaily/system
Looking for templates with extension .postTemplate in  /local/pulsedPitzDaily/0

No script for case-setup found. Nothing done

No case decomposition necessary
Looking for templates with extension .finalTemplate in  /local/pulsedPitzDaily
Looking for templates with extension .finalTemplate in  /local/pulsedPitzDaily/0.org
Looking for templates with extension .finalTemplate in  /local/pulsedPitzDaily/constant
Looking for templates with extension .finalTemplate in  /local/pulsedPitzDaily/constant/polyMesh
Looking for templates with extension .finalTemplate in  /local/pulsedPitzDaily/system
Looking for templates with extension .finalTemplate in  /local/pulsedPitzDaily/0
Clearing templates
Looking for extension .template in /local/pulsedPitzDaily/0
Looking for extension .postTemplate in /local/pulsedPitzDaily/0
Looking for extension .finalTemplate in /local/pulsedPitzDaily/0

Case setup finished

...

Since R will know where to look for libraries, a call to library(sm) will be successful (however, this line is not necessary per se for the install.packages(...) call, as the directory is already specified in it).

...

MATLAB

...

MATLAB (single thread)

The file linked below is a Matlab MATLAB M-file containing a single function, simPoissGLM, that takes one argument (lambda).

...

No Format
#!/bin/sh
#
# Simple MatlabMATLAB submit script for Slurm.
#
#
#SBATCH -A astro                 # The account name for the job.
#SBATCH -J SimpleMLJob           # The job name.
#SBATCH -t 1:00                  # The time the job will take to run.
#SBATCH --mem-per-cpu=1gb        # The memory the job will use per cpu core.

module load matlabMATLAB 
echo echo "Launching an MatlabMATLAB run"
date

#define parameter lambda
LAMBDA=10

#Command to execute MatlabMATLAB code matlabMATLAB -nosplash -nodisplay -nodesktop -r "simPoissGLM($LAMBDA)" # > matoutfile

# End of script

...

This program will leave several files in the output directory: slurm-<jobid>.out, out.mat, and matoutfile.

Matlab (multi-threading)

Matlab has built-in implicit multi-threading (even without applying its Parallel Computing Toolbox, PCT), which causes it to use several cores on the node it is running on. It consumes the number of cores assigned by Slurm.The user can activate explicit (PCT) multi-threading by specifying the number of cores desired also in the Matlab program.

The Torque submit script (simpoiss.sh) should contain the following line:

No Format
#SBATCH -c 6

The -c flag determines the number of cores (up to 24 are allowed).

For explicit multi-threading, the users must include the following corresponding statement within their Matlab program:

No Format
parpool('local', 6)

The second argument passed to parpool must equal the number specified with the ppn directive. Users who are acquainted with the use of commands like parfor need to specify explicit multi-threading with the help of parpool command above.

Note: maxNumCompThreads() is being deprecated by Mathworks. It is being replaced by parpool:

The command to execute Matlab code remains unchanged from the single thread example above.

Important note: On Yeti, where Matlab was single thread by default, it appeared that the more recent versions of Matlab took liberties to grab all the cores within a node even when fewer (or even only one) cores were specified as above. On Terremoto, we believe this has been addressed by implementing a system mechanism which enforces the proper usage of the number of specified cores.

Matlab with Parallel Server

Matlab 2020b and 2022b on Terremoto now have access to Parallel Server, and the toolbox is installed. The first time you run Matlab, it can take a few minutes to fully open, especially over WiFi. In order to use Parallel Server, a Cluster Profile needs to be created to use the Slurm job scheduler. You will need to request the number of nodes desired as well and may need to increase the amount of memory desired. With an interactive job requesting two nodes start with:

...

srun --pty -t 0-04:00  --nodes=2 --mem=10gb -A <your-account> /bin/bash

Step One

Using the Configure for Slurm MathWorks tutorial as a guide:

  1. On the Home tab, in the Environment area, select Parallel > Create and Manage Clusters. Click ok on the dialog box Product Required: MATLAB Parallel Server.
  2. Create a new profile in the Cluster Profile Manager by selecting Add Cluster Profile > Slurm.
  3. With the new profile selected in the list, click Rename and edit the profile name something informative for future use, e.g., InstallTest. Press Enter.
  4. In the Properties tab, provide settings for the following fields:
    1. Set the Description field to something informative, e.g., For testing installation.
    2. Set the JobStorageLocation to the location where you want job and task data to be stored, e.g.,  /moto/home/<your-directory>.
      1. Note: JobStorageLocation should not be shared by parallel computing products running different versions; each version on your cluster should have its own JobStorageLocation.
  5. Set the NumWorkers field to the number of workers you want to run the validation tests on. This should be not be more than what is specified by --nodes= in the interactive job request, i.e., srun.
  6. Set the ClusterMatlabRoot to the installation location of the MATLAB version, i.e., /moto/opt/matlab/R2020b or /moto/opt/matlab/R2022b.
  7. Within ADDITIONAL SLURM PROPERTIES add  A <your account-name> (replace <your account-name> accordingly).
  8. Click Done to save your cluster profile.
  9. Step 2: Validate the Cluster Profile

Step Two
In this step you verify your cluster profile, and thereby your installation. You can specify the number of workers to use when validating your profile. If you do not specify the number of workers in the Validation tab, then the validation will attempt to use as many workers as the value specified by the NumWorkers property on the Properties tab. You can specify a smaller number of workers to validate your configuration without occupying the whole cluster.

...

MATLAB with Parallel Server

Running MATLAB via X11 Forwarding

MATLAB Parallel Server is now configured on Terremot for R2020b and R2022a. Note that MATLAB 2023a and greater cannot be installed due to kernel and minimum version of Red Hat 7.9. X11 Forwarding is available and for Apple Mac computers, XQuartz is recommended and for Windows, MobaXterm. The first time you run MATLAB via X11, it can take a few minutes to fully open, especially over WiFi.  You can run one simple command to enable the Toolbox:

>> configCluster

You should see:

Must set AccountName before submitting jobs to TERREMOTO.  E.g.
    >> c = parcluster;
    >> c.AdditionalProperties.AccountName = 'group-account-name';
    >> c.saveProfile

Complete.  Default cluster profile set to "Terremoto".

Running MATLAB From Your Desktop/Laptop

You can now also install MATLAB on your laptop/desktop and download it from MathWorks Columbia page, where students can download it for free, and currently only 2022b and 2020b are supported. You will need to download a zip file which contains all the necessary integration scripts including the license. You will also need to be on the Columbia WiFi or VPN and copy the network.lic file into your device's MATLAB directory. On a Mac, you would use Finder, Applications, MATLAB, ctl-click the mouse, Show Package Contents, then licenses. Alternately you can run the userpath command. In MATLAB, navigate to the Coumbia-University.Desktop folder. In the Command Window type configCluster. You will be prompted for Ginsburg and Terremoto, select 2, for Terremoto. Enter your UNI (without @columbia.edu). You should see:

>> c = parcluster;
    >> c.AdditionalProperties.AccountName = 'group-account-name';
    >> c.saveProfile

Complete.  Default cluster profile set to "Terremoto".

Inside the zip file is a Getting Started tutorial in a Word document. You can start with getting a handle to the cluster:

>> c = parcluster;

Submission to the remote cluster requires SSH credentials.  You will be prompted for your SSH username and password or identity file (private key).  The username and location of the private key will be stored in MATLAB for future sessions. Jobs will now default to the cluster rather than submit to the local machine.

Configuring Jobs

Prior to submitting the job, we can specify various parameters to pass to our jobs, such as queue, e-mail, walltime, etc. See AdditionalProperties for the complete list.  AccountName and MemPerCPU are the only fields that are mandatory.

>> % Specify the account
>> c.AdditionalProperties.AccountName = 'group-account-name';

>> % Specify memory to use, per core (default: 4gb)
>> c.AdditionalProperties.MemPerCPU = '6gb';

Python and JULIA

To use python you need to use:

...

Code Block
$ srun --pty -t 0-02:00:00 --gres=gpu:1 -A <group_name> /bin/bash


Then load the singularity environment module and run the tensorflow container, which was built from the Tensorflow docker image. You can start an interactive singularity shell and specify the --nv flag which instructs singularity to use the Nvidia GPU driver.


Code Block
$ module load singularity

$ singularity shell --nv /moto/opt/singularity/tensorflow-1.13-gpu-py3-moto.simg

Singularity tensorflow-1.13-gpu-py3-moto.simg:~> python
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
..
>>> exit()


You may type "exit" to exit when you're done with the Singularity shell.



Singularity tensorflow-1.13-gpu-py3-moto.simg:~> exit

Below is an example of job submission script named submit.sh that runs Tensorflow with GPU support using Singularity. 


Code Block
#!/bin/bash
# Tensorflow with GPU support example submit script for Slurm.
#
# Replace <ACCOUNT> with your account name before submitting.
#
#SBATCH -A <ACCOUNT>           # Set Account name
#SBATCH --job-name=tensorflow  # The job name
#SBATCH -c 1                   # Number of cores
#SBATCH -t 0-0:30              # Runtime in D-HH:MM
#SBATCH --gres=gpu:1           # Request a gpu module

module load singularity
singularity exec --nv /moto/opt/singularity/tensorflow-1.13-gpu-py3-moto.simg python -c 'import tensorflow as tf; print(tf.__version__)'


Then submit the job to the scheduler. 
This example prints out the tensorflow version.


$ sbatch submit.sh

For additional details on how to use Singularity, please contact us, see our Singularity documentation, or refer to the Singularity User Guide.


Another option:

Please note that you should not work on our head node.

...

This is one way to set up and run a jupyter notebook on Terremoto. As your notebook will listen on a port that will be accessible to anyone logged in on a the submit node, you should first create a password (as shown below).

Creating a Password

The following steps can be run on the submit node or in an interactive job.

...

Running a Jupyter Notebook

16. Log in to the submit node. Start an interactive job.

Code Block
$ srun --pty -t 0-01:00 -A <ACCOUNT> /bin/bash

OR, if you want the notebook to run on a GPU node

$ srun --pty -t 0-01:00 --gres=gpu:1 -A <ACCOUNT> /bin/bash

Please note that the example above specifies time limit of one 1 hour only. That can be set to a much higher value, and in fact the default (i.e. if not specified at all) is as long as 5 days.


27. Get rid of XDG_RUNTIME_DIR environment variable

Code Block
$ unset XDG_RUNTIME_DIR

38. Load the anaconda environment module.

Code Block
$ module load anaconda/3-2019.10

49. Look up the IP of the node your interactive job is running on.

Code Block
$ hostname -i
10.43.4.206

510. Start the jupyter notebook, specifying the node IP.

Code Block
$ jupyter notebook --no-browser --ip=10.43.4.206

611. Look for the following line in the startup output to get the port number.

Code Block
The Jupyter Notebook is running at: http://10.43.4.206:8888/

712. From your local system, open a second connection to Terremoto that forwards a local port to the remote node and port. Replace UNI below with your uni.

Code Block
$ ssh -L 8080:10.43.4.206:8888 UNI@moto.rcs.columbia.edu

813. Open a browser session on your desktop and enter the URL 'localhost:8080' (i.e. the string within the single quotes) into its search field. You should now see the notebook.