Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
maxLevel2

...

If --partition=ocp_gpu is omitted, the scheduler will request any gpu across the cluster by default.

Singularity 

Singularity is a software tool that brings Docker-like containers and reproducibility to scientific computing and HPC. Singularity has Docker container support and enables users to easily  run different flavors of Linux with different software stacks. These containers provide a single universal on-ramp from the laptop, to HPC, to cloud.

Users can run Singularity containers just as they run any other program on our HPC clusters. Example usage of Singularity is listed below. For additional details on how to use Singularity, please contact us or refer to the Singularity User Guide.

Downloading Pre-Built Containers

Singularity makes it easy to quickly deploy and use software stacks or new versions of software. Since Singularity has Docker support, users can simply pull existing Docker images from Docker Hub or download docker images directly from software repositories that increasingly support the Docker format. Singularity Container Library also provides a number of additional containers.


You can use the pull command to download pre-built images from an external resource into your current working directory. The docker:// uri reference can be used to pull Docker images. Pulled Docker images will be automatically converted to the Singularity container format. 

...

Here's an example of pulling the latest stable release of the Tensorflow Docker image and running it with Singularity. (Note: these pre-built versions may not be optimized for use with our CPUs.)

...

Singularity - Interactive Shell 

The shell command allows you to spawn a new shell within your container and interact with it as though it were a small virtual machine:

...

Code Block
Singularity> python
>>> import tensorflow as tf
>>> print(tf.__version__)
2.4.1
>>> exit()


When done, you may exit the Singularity interactive shell with the "exit" command.


Singularity> exit

Singularity: Executing Commands

The exec command allows you to execute a custom command within a container by specifying the image file. This is the way to invoke commands in your job submission script.

...

Singularity: Running a Batch Job

Below is an example of job submission script named submit.sh that runs Singularity. Note that you may need to specify the full path to the Singularity image you wish to run.


Code Block
#!/bin/bash
# Singularity example submit script for Slurm.
#
# Replace <ACCOUNT> with your account name before submitting.
#
#SBATCH -A <ACCOUNT>           # Set Account name
#SBATCH --job-name=tensorflow  # The job name
#SBATCH -c 1                   # Number of cores
#SBATCH -t 0-0:30              # Runtime in D-HH:MM
#SBATCH --mem-per-cpu=5gb      # Memory per cpu core

module load singularity
singularity exec tensorflow.sif python -c 'import tensorflow as tf; print(tf.__version__)'


Then submit the job to the scheduler. This example prints out the tensorflow version.

$ sbatch submit.sh

To run a similar job accessing a GPU, you would need to make the following changes. We will call this script "submit-GPU.sh" .

...

Note that without the --nv in the singularity line, GPU access for the container will not occur.

Using MAKER in a Singularity container

MAKER is an easy-to-use genome annotation pipeline designed to be usable by small research groups with little bioinformatics experience. It has many dependencies, especially for Perl and using a container is a convenient way to have all the requirements in one place. The BioContainers website maintains a Singularity container as of Dec. 2023, version 3.01.03. Here is a sample tutorial.

...

To use Maker with OpenMPI, e.g., requesting 8 CPU ntasks (which are processes that a job executes in parallel in one or more nodes), you can use the following suggested options, which will help reduce warnings. Start with an interactive session using the salloc command and increase the requested memory as needed:

salloc --ntasks=8 --account=test --mem=50GB srun -n1 -N1 --mem-per-cpu=0 --gres=NONE --pty --preserve-env --mpi=none $SHELL
module load openmpi/gcc/64/4.1.5a1 singularity
mpirun -np 2 --mca btl '^openib' --mca orte_base_help_aggregate 0 singularity run https://depot.galaxyproject.org/singularity/maker:3.01.03--pl5262h8f1cd36_2 bash -c "export LIBDIR=/usr/local/share/RepeatMasker && maker"

Additionally samtools (used for reading/writing/editing/indexing/viewing SAM/BAM/CRAM format) is available in the container:

Singularity> samtools --version
samtools 1.7
Using htslib 1.7-2
Copyright (C) 2018 Genome Research Ltd.

Note, if you are testing maker and kill jobs/processes look out for .NFSLock files, which you will likely need to delete for subsequent runs of maker. You will need to use the -a option with ls as files that start with a dot/period are hidden from the ls command by default.

Using GATK in a Singularity container

GATK, the Genome Analysis Toolkit has several dependencies and can run inside a container. Here is a sample tutorial: 

...

-rw-r--r-- 1 rk3199 user 128 Nov 28 16:48 output.bai
-rw-r--r-- 1 rk3199 user 62571 Nov 28 16:48 output.bam

Using GeoChemFoam in a Singularity container

GeoChemFoam is open source code, based on the OpenFoam CFD toolbox developed at the Institute of GeoEnergy Engineering, Heriot-Watt University.

Choose one of Docker containers, and use Singulairty/Apptainer to 'pull' it down into .sif format.

singularity pull docker://jcmaes/geochemfoam-5.1

Some additional steps/tweaks are needed to get all of the features working. For this tutorial we assume GeoChemFoam version 5.1, and use the Test Case 01 Species transport in a Ketton Micro-CT image tutorial. You can choose your version of Anaconda Python, but note that Python 3.10 returns the following error when running the first script, createMesh.sh:

ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'

...

For additional details on how to use Singularity, please contact us or refer to the Singularity User Guide.

Using Couenne in a Singularity container

Couenne (Convex Over and Under ENvelopes for Nonlinear Estimation) is a branch&bound algorithm to solve Mixed-Integer Nonlinear Programming (MINLP) problems. It includes a suite of programs with several dependencies. Fortunately there is a Docker container which can be used to access these programs, e.g., bonmin, couenne, Ipopt, Cgl, and Cbc, via Singularity. You can use these sample .nl files to test with Couenne.


singularity pull docker://coinor/coin-or-optimization-suite 
singularity shell coin-or-optimization-suite_latest.sif
Singularity> couenne hs015.nl 
Couenne 0.5 -- an Open-Source solver for Mixed Integer Nonlinear Optimization
Mailing list: couenne@list.coin-or.org
Instructions: http://www.coin-or.org/Couenne

NLP0012I 
              Num      Status      Obj             It       time                 Location
NLP0014I             1         OPT 306.49998       22 0.004007
Couenne: new cutoff value 3.0649997900e+02 (0.009883 seconds)
Loaded instance "hs015.nl"
Constraints:            2
Variables:              2 (0 integer)
Auxiliaries:            8 (0 integer)

Coin0506I Presolve 29 (-1) rows, 9 (-1) columns and 64 (-2) elements
Clp0006I 0  Obj 0.25 Primal inf 473.75936 (14)
Clp0006I 13  Obj 0.31728151
Clp0000I Optimal - objective value 0.31728151
Clp0032I Optimal objective 0.3172815065 - 13 iterations time 0.002, Presolve 0.00
Clp0000I Optimal - objective value 0.31728151
Cbc0012I Integer solution of 306.49998 found by Couenne Rounding NLP after 0 iterations and 0 nodes (0.00 seconds)
NLP Heuristic: NLP0014I             2         OPT 306.49998        5 0.001228
solution found, obj. 306.5
Clp0000I Optimal - objective value 0.31728151
Optimality Based BT: 3 improved bounds
Probing: 2 improved bounds
Cbc0031I 1 added rows had average density of 2
Cbc0013I At root node, 4 cuts changed objective from 0.31728151 to 306.49998 in 1 passes
Cbc0014I Cut generator 0 (Couenne convexifier cuts) - 4 row cuts average 2.0 elements, 3 column cuts (3 active)
Cbc0001I Search completed - best objective 306.4999790004336, took 0 iterations and 0 nodes (0.00 seconds)
Cbc0035I Maximum depth 0, 0 variables fixed on reduced cost

couenne: Optimal

     "Finished"

Linearization cuts added at root node:         30
Linearization cuts added in total:             30  (separation time: 2.4e-05s)
Total solve time:                        0.003242s (0.003242s in branch-and-bound)
Lower bound:                                306.5
Upper bound:                                306.5  (gap: 0.00%)
Branch-and-bound nodes:                         0
Performance of                           FBBT:        2.8e-05s,        4 runs. fix:          0 shrnk: 0.000103838 ubd:       2.75 2ubd:        0.5 infeas:          0
Performance of                           OBBT:       0.000742s,        1 runs. fix:          0 shrnk:    6.70203 ubd:          0 2ubd:          0 infeas:          0

...

Anaconda Python makes it easy to install Tensorflow, enabling your data science, machine learning, and artificial intelligence workflows.

https://docs.anaconda.com/anaconda/user-guide/tasks/tensorflow/

Tensorflow 

First, load the anaconda python module.

$ module load anaconda

You may need to run "conda init bash" to initialize your conda shell.
$ conda init bash
==> For changes to take effect, close and re-open your current shell. <==

To install the current release of CPU-only TensorFlow:

$ conda create -n tf tensorflow
$ conda activate tf

...

$ python
>>> import tensorflow as tf
>>> print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))


NetCDF

NetCDF (Network Common Data Form) is an interface for array-oriented data access and a library that provides an implementation of the interface. The NetCDF library also defines a machine-independent format for representing scientific data. Together, the interface, library, and format support the creation, access, and sharing of scientific data. 

To load the NetCDF Fortran Intel module:

...

13. From your local system, open a second connection to Ginsburg that forwards a local port to the remote node and port. Replace UNI below with your uni.

Code Block
$ ssh  -f -L 8080:10.43.4.206:8888 -N UNI@burg.rcs.columbia.edu  (This is not for Windows users. Windows users, see step 13B, below)

...

13B. Windows users generally are using PuTTY and not a native command line, so step 13 instructions, which use Port Forwarding, may be particularly hard to replicate. To accomplish Step 13 while using PuTTY, you should do this -

I.   Open PuTTY.
II.  In the "Session" category on the left side, enter the hostname or IP address of the remote server in the "Host Name (or IP address)" field. (In this case - burg.rcs.columbia.edu).


III.  Make sure the connection type is set to SSH.


IV.  In the "Connection" category, expand the "SSH" tab and select "Tunnels".
V.  In the "Source port" field, enter 8080.
VI. In the "Destination" field, enter 10.43.4.206:8888 (Remember, this is only an example IP. the one you use will be different)
VII. Make sure the "Local" radio button is selected.
VIII. Click the "Add" button to add the port forwarding rule to the list.
IX.  Now, return to the "Session" category on the left side.
X.  Optionally, enter a name for this configuration in the "Saved Sessions" field, then
XI.  Click "Save" to save these settings for future use.


XII. Click "Open" to start the SSH connection with the port forwarding configured.


14. Open a browser session on your desktop and enter the URL 'localhost:8080' (i.e. the string within the single quotes) into its search field. You should now see the notebook.

...