/
Containers

Containers

We support the use of Singularity/Apptainer on Axon, but not Docker (due to security concerns.) Luckily it's possible to import Docker containers into Singularity/Apptainer.

We have some pre-built containers on Axon currently

$ ls /share/singularity/                                                                                                                                                                                                        │
DeepMimic.def  DeepMimic-GPU.def  DeepMimic-GPU.sif  DeepMimic.sif  mmaction2.def  mmaction2.sif  mmaction.def  mmaction.sif  tsn_latest.sif 

If you need a different container, your best bet is to build it on a machine that you have root or Admin rights on, and then upload it to Axon. If you can't get that to work you can reach out to us to build the container for you.

There are online repositories with pre-built containers as well https://cloud.sylabs.io/library

Interactive Session Usage

You can experiment with a container by starting on in an interactive session before coding a job. This example uses the mmaction2 container that is available on Axon, and loads CUDA and CUDNN (the versions are just examples.)

# On axon.rc, open an interactive session with 1 GPU.
srun --pty --gres=gpu:1 bash -i
# Load CUDA and CUDNN
ml load cuda/10.1.168
ml load cudnn/7.3.0
singularity run --nv  /share/singularity/mmaction2.sif
# Note that if you require access to data available on Axon, you can bind an Axon path to be available in the container, don't use "[" and "]", just the path:
singularity run --nv --bind [Path to data on Axon]:[Path where you would like data to be at within the container] /share/singularity/mmaction2.sif

Batch Script Usage

You can use SLURM to submit batch jobs that run in containers as well, here's an example making a script out of the interactive session above. Paths to the data are examples only, be sure to use your actual paths. In this example, the CTN projects directory located at /share/ctn/projects is mounted as /projects within the container, and the config file lives within /share/ctn/projects during a normal Axon session but within /projects once the container is active. Note the use of singularity exec as opposed to singularity run.

#SBATCH --job-name=mmaction    # The job name.
#SBATCH -c 16 # The number of cores.
#SBATCH --mem-per-cpu=1gb        # The memory the job will use per cpu core.
#SBATCH --gres=gpu:1     # The number of GPUs

ml load cuda/10.1.168
ml load cudnn/7.3.0
singularity exec --nv --bind /share/ctn/projects:/projects /share/singularity/mmaction2.sif bash -c "python /mmaction2/tools/train.py /projects/config.py [optional arguments]"

Containers as a module: Singularity Registry HPC (shpc)

Singularity Registry HPC (shpc) allows us to install containers as modules. It is available in the Miniforge-24.7.1-2 module. What follows is a tutorial on how to use PyTorch 22.06 inside a container with Python 3.8.13. We'll use an interactive session with srun. When we upgrade Slurm salloc will be the preferred interactive method.

srun --pty -t 06:00:00 --gres=gpu:1 /bin/bash 

# Next load the Miniforge module that has shpc:
ml Miniforge-24.7.1-2

# Let's load the the shpc modules:
ml use module use /share/apps/Miniforge/lib/python3.12/site-packages/modules

# We'll see some new modules become available to load:

ml av
--------- /share/apps/Miniforge/lib/python3.12/site-packages/modules ---------
bids/freesurfer/V30-a43f1f/module.tcl
nvcr.io/nvidia/pytorch/22.06-py3/module.tcl
rocker/tidyverse/4.4.2/module.tcl
tensorflow/tensorflow/2.7.1-gpu/module.tcl
tensorflow/tensorflow/latest-gpu/module.tcl

# Let's load the PyTorch module:

ml nvcr.io/nvidia/pytorch/22.06-py3/module.tcl

# Next load Python from within the container:

pytorch-shell 
INFO: Environment variable SINGULARITY_SHELL is set, but APPTAINER_SHELL is preferred

# Then start Python and start using PyTorch:

python
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10) 
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.device_count()
1
>>> print(torch.__version__)
1.13.0a0+340c412

# Now we can check that PyTorch is using a GPU:
>>> import torch
>>> 
>>> # Step 1: Check if CUDA is available
>>> device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
>>> print(f'Using device: {device}')
Using device: cuda
>>> 
>>> # Step 2: Create a sample tensor and move it to the GPU
>>> tensor_size = (10000, 10000) # Size of the tensor
>>> a = torch.randn(tensor_size, device=device) # Random tensor on GPU

>>> b = torch.randn(tensor_size, device=device) # Another random tensor on GPU
>>> 
>>> # Step 3: Perform operations on GPU
>>> c = a + b # Element-wise addition
>>> 
>>> # Print the result (moving back to CPU for printing)
>>> print("Result shape (moved to CPU for printing):", c.cpu().shape)
Result shape (moved to CPU for printing): torch.Size([10000, 10000])
>>> 
>>> # Optional: Check if GPU memory is being utilized
>>> print("Current GPU memory usage:")
Current GPU memory usage:
>>> print(f"Allocated: {torch.cuda.memory_allocated(device) / (1024 ** 2):.2f} MB")
Allocated: 1146.00 MB
>>> print(f"Cached: {torch.cuda.memory_reserved(device) / (1024 ** 2):.2f} MB")
Cached: 1146.00 MB

Related content

Using Software and Libraries
Using Software and Libraries
More like this
Interactive Sessions
Interactive Sessions
More like this
PyTorch
More like this
Insomnia - Job Examples
Insomnia - Job Examples
Read with this
Axon: GPU Cluster
Axon: GPU Cluster
More like this
Managing Files and Data
Managing Files and Data
Read with this