Containers
We support the use of Singularity on Axon, but not Docker (due to security concerns.) Luckily it's possible to import Docker containers into Singularity.
We have some pre-built containers on Axon currently
$ ls /share/singularity/ ā DeepMimic.def DeepMimic-GPU.def DeepMimic-GPU.sif DeepMimic.sif mmaction2.def mmaction2.sif mmaction.def mmaction.sif tsn_latest.sif
If you need a different container, your best bet is to build it on a machine that you have root or Admin rights on, and then upload it to Axon. If you can't get that to work you can reach out to us to build the container for you.
There are online repositories with pre-built containers as well https://cloud.sylabs.io/library
Interactive Session Usage
You can experiment with a container by starting on in an interactive session before coding a job. This example uses the mmaction2 container that is available on Axon, and loads CUDA and CUDNN (the versions are just examples.)
# On axon.rc, open an interactive session with 1 GPU. srun --pty --gres=gpu:1 bash -i # Load CUDA and CUDNN ml load cuda/10.1.168 ml load cudnn/7.3.0 singularity run --nv /share/singularity/mmaction2.sif # Note that if you require access to data available on Axon, you can bind an Axon path to be available in the container, don't use "[" and "]", just the path: singularity run --nv --bind [Path to data on Axon]:[Path where you would like data to be at within the container] /share/singularity/mmaction2.sif
Batch Script Usage
You can use SLURM to submit batch jobs that run in containers as well, here's an example making a script out of the interactive session above. Paths to the data are examples only, be sure to use your actual paths. In this example, the CTN projects directory located at /share/ctn/projects is mounted as /projects within the container, and the config file lives within /share/ctn/projects during a normal Axon session but within /projects once the container is active. Note the use of singularity exec as opposed to singularity run.
#SBATCH --job-name=mmaction # The job name. #SBATCH -c 16 # The number of cores. #SBATCH --mem-per-cpu=1gb # The memory the job will use per cpu core. #SBATCH --gres=gpu:1 # The number of GPUs ml load cuda/10.1.168 ml load cudnn/7.3.0 singularity exec --nv --bind /share/ctn/projects:/projects /share/singularity/mmaction2.sif bash -c "python /mmaction2/tools/train.py /projects/config.py [optional arguments]"