Containers

We support the use of Singularity on Axon, but not Docker (due to security concerns.) Luckily it's possible to import Docker containers into Singularity.

We have some pre-built containers on Axon currently

$ ls /share/singularity/                                                                                                                                                                                                        │
DeepMimic.def  DeepMimic-GPU.def  DeepMimic-GPU.sif  DeepMimic.sif  mmaction2.def  mmaction2.sif  mmaction.def  mmaction.sif  tsn_latest.sif 

If you need a different container, your best bet is to build it on a machine that you have root or Admin rights on, and then upload it to Axon. If you can't get that to work you can reach out to us to build the container for you.

There are online repositories with pre-built containers as well https://cloud.sylabs.io/library

Interactive Session Usage

You can experiment with a container by starting on in an interactive session before coding a job. This example uses the mmaction2 container that is available on Axon, and loads CUDA and CUDNN (the versions are just examples.)

# On axon.rc, open an interactive session with 1 GPU.
srun --pty --gres=gpu:1 bash -i
# Load CUDA and CUDNN
ml load cuda/10.1.168
ml load cudnn/7.3.0
singularity run --nv  /share/singularity/mmaction2.sif
# Note that if you require access to data available on Axon, you can bind an Axon path to be available in the container, don't use "[" and "]", just the path:
singularity run --nv --bind [Path to data on Axon]:[Path where you would like data to be at within the container] /share/singularity/mmaction2.sif

Batch Script Usage

You can use SLURM to submit batch jobs that run in containers as well, here's an example making a script out of the interactive session above. Paths to the data are examples only, be sure to use your actual paths. In this example, the CTN projects directory located at /share/ctn/projects is mounted as /projects within the container, and the config file lives within /share/ctn/projects during a normal Axon session but within /projects once the container is active. Note the use of singularity exec as opposed to singularity run.

#SBATCH --job-name=mmaction    # The job name.
#SBATCH -c 16 # The number of cores.
#SBATCH --mem-per-cpu=1gb        # The memory the job will use per cpu core.
#SBATCH --gres=gpu:1     # The number of GPUs

ml load cuda/10.1.168
ml load cudnn/7.3.0
singularity exec --nv --bind /share/ctn/projects:/projects /share/singularity/mmaction2.sif bash -c "python /mmaction2/tools/train.py /projects/config.py [optional arguments]"