Available Software
At this time the cluster has Anaconda and Cuda installed for work on systems like Tensorflow and Pytorch. The preferred method of installing software is to work off of Anaconda environments or to use Singularity containers. If you need additional software installed please email rc@zi.columbia.edu to request it.
Environment Modules
Axon uses a system of environmental modules, specifically using a piece of software called lmod, which allows people to specify the version of the program they want to run. This enables the cluster to seamlessly have multiple versions of a single program. Below is an example of how environment modules work:
[aa3301@axon ~]$ python --version Python 2.7.5 [aa3301@axon ~]$ which python /usr/bin/python [aa3301@axon ~]$ ml anaconda3-2019.03 [aa3301@axon ~]$ python --version Python 3.7.3 [aa3301@axon ~]$ which python /share/apps/anaconda3-2019.03/bin/python
In the example above when we log in and run Python we get the default system python, but if we run ml (which can also be run as module load) and load anaconda we get a much newer version with a bunch of modules already installed.
Running the ml (or module list) will show you the currently loaded modules, and ml av (or module avail) will show you the available modules to load:
[aa3301@axon ~]$ ml Currently Loaded Modules: 1) anaconda3-2019.03 [aa3301@axon ~]$ ml av --------------------------------------------------------------------------- /share/modulefiles ---------------------------------------------------------------------------- anaconda3-2019.03 (L) cuda/9.2.88 cuda/10.1.168 (D) cudnn/7.5.1-10.0 cudnn/7.6.1.34-10.1 (D) kilosort2 matlab/2017b matlab/2018b (D) cuda/9.0.176 cuda/10.0.130 cudnn/7.3.0 cudnn/7.6.1.34-10.0 gcc/4.9.1 matlab/2017a matlab/2018a Where: L: Module is loaded D: Default Module Use "module spider" to find all possible modules. Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
The ml alias is convenient shortcut to most commands. If you type ml and press tab it will show you all the possible completion the same is true of the module load command:
[aa3301@axon ~]$ ml add cuda/10.1.168 cudnn/7.5.1-10.0 gcc/4.9.1 load matlab/2018b show unuse anaconda3-2019.03 cuda/9.0.176 cudnn/7.6.1.34-10.0 help matlab purge sl update avail cuda/9.2.88 cudnn/7.6.1.34-10.1 keyword matlab/2017a restore spider use cuda cudnn delete kilosort2 matlab/2017b rm swap whatis cuda/10.0.130 cudnn/7.3.0 gcc list matlab/2018a save unload [aa3301@axon ~]$ ml load cud cuda cuda/10.1.168 cuda/9.2.88 cudnn/7.3.0 cudnn/7.6.1.34-10.0 cuda/10.0.130 cuda/9.0.176 cudnn cudnn/7.5.1-10.0 cudnn/7.6.1.34-10.1
Conda Environments
We strongly suggest the use of conda environments for jobs on Axon where the software you need either isn't installed or isn't the right version. Even CUDA and CUDNN versions can be installed in a conda environment.
Here's are the steps for setting up an example conda environment that was set up for a ZI researcher:
# It's generally advisable to perform installations in an interactive session (https://confluence.columbia.edu/confluence/display/zmbbi/Interactive+Sessions) # on one of the nodes, since there are memory limitations on the login node that can occasionally # lead to out-of-memory errors during the installation process. # In this case I am requesting 2 CPUs on node ax08, 1 GPU, for 1 hour, and to use a bash interactive shell srun --pty -c 2 --nodelist=ax08 --gres=gpu:1 -t 0-01:00 /bin/bash # Next I activate conda by loading the module for it ml load anaconda3-2019.03 # next I create an environment. Note that I specify a version of python that I want conda create -n demo-env python=3.6 # once the environment finishes building I need to activate it conda activate demo-env # then I can install software that is available in the conda repos # N.B. the -c flag will specify a "channel" or specific repo conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch conda install -c conda-forge tensorboard # you can also do pip installs for python requirements which will be local to this environment pip install gym
Once you have set up your conda environment you can use it in batch scripts by including the following lines in the beginning of your script:
ml load anaconda3-2019.03 conda activate demo-env