Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Jupyter notebook users should request a GPU on the srun line if that's what they want. Also, renumbered the instructions. Useful later

Table of Contents
maxLevel2

...

This program will print out "Hello World!" when run on a gpu server or print "Hello Hello" when no gpu module is found. 

Singularity 

Singularity is a software tool that brings Docker-like containers and reproducibility to scientific computing and HPC. Singularity has Docker container support and enables users to easily  run different flavors of Linux with different software stacks. These containers provide a single universal on-ramp from the laptop, to HPC, to cloud.

Users can run Singularity containers just as they run any other program on our HPC clusters. Example usage of Singularity is listed below. For additional details on how to use Singularity, please contact us or refer to the Singularity User Guide.

Downloading Pre-Built Containers

Singularity makes it easy to quickly deploy and use software stacks or new versions of software. Since Singularity has Docker support, users can simply pull existing Docker images from Docker Hub or download docker images directly from software repositories that increasingly support the Docker format. Singularity Container Library also provides a number of additional containers.


You can use the pull command to download pre-built images from an external resource into your current working directory. The docker:// uri reference can be used to pull Docker images. Pulled Docker images will be automatically converted to the Singularity container format. 

...

Here's an example of pulling the latest stable release of the Tensorflow Docker image and running it with Singularity. (Note: these pre-built versions may not be optimized for use with our CPUs.)

...

Singularity - Interactive Shell 

The shell command allows you to spawn a new shell within your container and interact with it as though it were a small virtual machine:

...

Code Block
Singularity tensorflow.simg:~> python
>>> import tensorflow as tf
>>> print(tf.__version__)
1.13.1
>>> exit()


When done, you may exit the Singularity interactive shell with the "exit" command.


Singularity tensorflow.simg:~> exit

Singularity: Executing Commands

The exec command allows you to execute a custom command within a container by specifying the image file. This is the way to invoke commands in your job submission script.

...

Singularity: Running a Batch Job

Below is an example of job submission script named submit.sh that runs Singularity. Note that you may need to specify the full path to the Singularity image you wish to run.


Code Block
#!/bin/bash
# Singularity example submit script for Slurm.
#
# Replace <ACCOUNT> with your account name before submitting.
#
#SBATCH -A <ACCOUNT>           # Set Account name
#SBATCH --job-name=tensorflow  # The job name
#SBATCH -c 1                   # Number of cores
#SBATCH -t 0-0:30              # Runtime in D-HH:MM
#SBATCH --mem-per-cpu=4gb      # Memory per cpu core

module load singularity
singularity exec tensorflow.simg python -c 'import tensorflow as tf; print(tf.__version__)'


Then submit the job to the scheduler. This example prints out the tensorflow version.


$ sbatch submit.sh

For additional details on how to use Singularity, please contact us or refer to the Singularity User Guide.

Swak4FOAM in a Singularity container

Swak4FOAM (SWiss Army Knife for Foam) can be run inside a container. Using this Docker container as inspiration, here is a sample tutorial.

...

Prior to submitting the job, we can specify various parameters to pass to our jobs, such as queue, e-mail, walltime, etc.  The following is a partial list of parameters. See AdditionalProperties for the complete list.  AccountName and MemPerCPU are the only fields that are mandatory.

...

Code Block
$ srun --pty -t 0-02:00:00 --gres=gpu:1 -A <group_name> /bin/bash


Then load the singularity environment module and run the tensorflow container, which was built from the Tensorflow docker image. You can start an interactive singularity shell and specify the --nv flag which instructs singularity to use the Nvidia GPU driver.


Code Block
$ module load singularity

$ singularity shell --nv /moto/opt/singularity/tensorflow-1.13-gpu-py3-moto.simg

Singularity tensorflow-1.13-gpu-py3-moto.simg:~> python
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
..
>>> exit()


You may type "exit" to exit when you're done with the Singularity shell.



Singularity tensorflow-1.13-gpu-py3-moto.simg:~> exit

Below is an example of job submission script named submit.sh that runs Tensorflow with GPU support using Singularity. 


Code Block
#!/bin/bash
# Tensorflow with GPU support example submit script for Slurm.
#
# Replace <ACCOUNT> with your account name before submitting.
#
#SBATCH -A <ACCOUNT>           # Set Account name
#SBATCH --job-name=tensorflow  # The job name
#SBATCH -c 1                   # Number of cores
#SBATCH -t 0-0:30              # Runtime in D-HH:MM
#SBATCH --gres=gpu:1           # Request a gpu module

module load singularity
singularity exec --nv /moto/opt/singularity/tensorflow-1.13-gpu-py3-moto.simg python -c 'import tensorflow as tf; print(tf.__version__)'


Then submit the job to the scheduler. 
This example prints out the tensorflow version.


$ sbatch submit.sh

For additional details on how to use Singularity, please contact us, see our Singularity documentation, or refer to the Singularity User Guide.


Another option:

Please note that you should not work on our head node.

...

This is one way to set up and run a jupyter notebook on Terremoto. As your notebook will listen on a port that will be accessible to anyone logged in on a the submit node, you should first create a password (as shown below).

Creating a Password

The following steps can be run on the submit node or in an interactive job.

...

Running a Jupyter Notebook

16. Log in to the submit node. Start an interactive job.

Code Block
$ srun --pty -t 0-01:00 -A <ACCOUNT> /bin/bash

OR, if you want the notebook to run on a GPU node

$ srun --pty -t 0-01:00 --gres=gpu:1 -A <ACCOUNT> /bin/bash

Please note that the example above specifies time limit of one 1 hour only. That can be set to a much higher value, and in fact the default (i.e. if not specified at all) is as long as 5 days.


27. Get rid of XDG_RUNTIME_DIR environment variable

Code Block
$ unset XDG_RUNTIME_DIR

38. Load the anaconda environment module.

Code Block
$ module load anaconda/3-2019.10

49. Look up the IP of the node your interactive job is running on.

Code Block
$ hostname -i
10.43.4.206

510. Start the jupyter notebook, specifying the node IP.

Code Block
$ jupyter notebook --no-browser --ip=10.43.4.206

611. Look for the following line in the startup output to get the port number.

Code Block
The Jupyter Notebook is running at: http://10.43.4.206:8888/

712. From your local system, open a second connection to Terremoto that forwards a local port to the remote node and port. Replace UNI below with your uni.

Code Block
$ ssh -L 8080:10.43.4.206:8888 UNI@moto.rcs.columbia.edu

813. Open a browser session on your desktop and enter the URL 'localhost:8080' (i.e. the string within the single quotes) into its search field. You should now see the notebook.