Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

Available Software

At this time the cluster has Anaconda and Cuda installed for work on systems like Tensorflow and Pytorch. The preferred method of installing software is to work off of Anaconda environments or to use Singularity containers. If you need additional software installed please email rc@zi.columbia.edu to request it.

Environment Modules

Axon uses a system of environmental modules, specifically using a piece of software called lmod, which allows people to specify the version of the program they want to run. This enables the cluster to seamlessly have multiple versions of a single program. Below is an example of how environment modules work:

Python before and after loading a module
[aa3301@axon ~]$ python --version
Python 2.7.5
[aa3301@axon ~]$ which python
/usr/bin/python
[aa3301@axon ~]$ ml anaconda3-2019.03
[aa3301@axon ~]$ python --version
Python 3.7.3
[aa3301@axon ~]$ which python
/share/apps/anaconda3-2019.03/bin/python

In the example above when we log in and run Python we get the default system python, but if we run ml (which can also be run as module load) and load anaconda we get a much newer version with a bunch of modules already installed.

Running the ml (or module list) will show you the currently loaded modules, and ml av (or module avail) will show you the available modules to load:

More module commands
[aa3301@axon ~]$ ml

Currently Loaded Modules:
  1) anaconda3-2019.03



[aa3301@axon ~]$ ml av

--------------------------------------------------------------------------- /share/modulefiles ----------------------------------------------------------------------------
   anaconda3-2019.03 (L)    cuda/9.2.88      cuda/10.1.168 (D)    cudnn/7.5.1-10.0       cudnn/7.6.1.34-10.1 (D)    kilosort2       matlab/2017b    matlab/2018b (D)
   cuda/9.0.176             cuda/10.0.130    cudnn/7.3.0          cudnn/7.6.1.34-10.0    gcc/4.9.1                  matlab/2017a    matlab/2018a

  Where:
   L:  Module is loaded
   D:  Default Module

Use "module spider" to find all possible modules.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".

The ml alias is convenient shortcut to most commands. If you type ml and press tab it will show you all the possible completion the same is true of the module load command:

Use tab completion to navigate the modules
[aa3301@axon ~]$ ml
add                  cuda/10.1.168        cudnn/7.5.1-10.0     gcc/4.9.1            load                 matlab/2018b         show                 unuse
anaconda3-2019.03    cuda/9.0.176         cudnn/7.6.1.34-10.0  help                 matlab               purge                sl                   update
avail                cuda/9.2.88          cudnn/7.6.1.34-10.1  keyword              matlab/2017a         restore              spider               use
cuda                 cudnn                delete               kilosort2            matlab/2017b         rm                   swap                 whatis
cuda/10.0.130        cudnn/7.3.0          gcc                  list                 matlab/2018a         save                 unload
[aa3301@axon ~]$ ml load cud
cuda                 cuda/10.1.168        cuda/9.2.88          cudnn/7.3.0          cudnn/7.6.1.34-10.0
cuda/10.0.130        cuda/9.0.176         cudnn                cudnn/7.5.1-10.0     cudnn/7.6.1.34-10.1

Conda Environments

We strongly suggest the use of conda environments for jobs on Axon where the software you need either isn't installed or isn't the right version. Even CUDA and CUDNN versions can be installed in a conda environment.

Here's are the steps for setting up an example conda environment that was set up for a ZI researcher:

# It's generally advisable to perform installations in an interactive session (https://confluence.columbia.edu/confluence/display/zmbbi/Interactive+Sessions)
# on one of the nodes, since there are memory limitations on the login node that can occasionally 
# lead to out-of-memory errors during the installation process.
# In this case I am requesting 2 CPUs on node ax08, 1 GPU, for 1 hour, and to use a bash interactive shell
srun --pty -c 2 --nodelist=ax08 --gres=gpu:1 -t 0-01:00 /bin/bash

# Next I activate conda by loading the module for it
ml load anaconda3-2019.03

# next I create an environment. Note that I specify a version of python that I want
conda create -n demo-env python=3.6

# once the environment finishes building I need to activate it
conda activate demo-env

# then I can install software that is available in the conda repos
# N.B. the -c flag will specify a "channel" or specific repo
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
conda install -c conda-forge tensorboard

# you can also do pip installs for python requirements which will be local to this environment
pip install gym

Once you have set up your conda environment you can use it in batch scripts by including the following lines in the beginning of your script:

ml load anaconda3-2019.03
conda activate demo-env
  • No labels