Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: A few stylistic updates and fixing inaccuracies (e.g., cluster is no longer just 3 nodes)

An interactive session is , in research cluster parlance, a scheduler job where you are free to enter run commands the same as if you logged into the server via SSHcan run commands in real-time (i.e., it is qualitatively identical a remote session started via Secure Shell or a terminal session that you've started on your laptop). If you're developing code in the cluster environment , want to check your code, or just run some commands on a cluster node or your analysis workflow isn't well-defined and you need to experiment with different command line tools then this is the option you want.The job will stay open similar to an ssh session, that is if you disconnect the session the job is over. This limitation can be circumvented through the use of programs such as screen or tmux

Warning

Similar to a Secure Shell (SSH) session or a terminal window, if you close your interactive session the applications that are running in it will be quit and the resources will be freed up for use by other jobs on the cluster.  If you need your interactive session to persist, we recommend using the screen or tmux utilities on the login node.  Tutorials for these tools can be found below:

To run a simple an interactive job on the cluster use , invoke the srun command as shown below.on the login node (axon.rc) as demonstrated below:

Code Block
languagebash
themeFadeToGrey
titleSimple interactive job
srun --pty bash -i

Running the above command on the login node This command would start an interactive job with the default limits. After running this you You would then be running from one of the three gpu be provided with a session on one of Axon's nodes with one CPU core (and no GPU) allocation for a length of five daysfor the default job time limit (see here; currently 10 days as of 3/21).

Code Block
languagebash
themeFadeToGrey
titleComplicated interactive job
srun --pty -c 2 --nodelist=ax08 --gres=gpu:gtx1080:2 -t 0-01:00 /bin/bash

In the command above the parameters are a as follows:

--pty says to start a pseudo terminal interactive session (mandatory for interactive sessions).

...

/bin/bash is to run the program you wish to start in the interactive session, the bash shell (mandatory, you need to specify a program it doesn't have to be a command line interpreter like bash it could by python or matlab if that's what you want to run)

Running Jupyter Notebook Sessions on Axon

Using sjupyter

Warning

The sjupyter command is meant to quickly run a jupyter notebook using the default anaconda environment with one command. For more complicated usage in batch files and such running the command directly from an interactive session or batch file is a better approach.

Axon has a command (sjupyter) that will start up a Jupyter notebook server automatically and give you the URL to connect. To use this command, first connect to the login node (e.g. ssh axon.rc.zi.columbia.edu), then navigate to the directory you want to start your notebook in, and finally run sjupyter.  For instance, if you wanted to run a Jupyter notebook within a directory named my-great-analysis that existed within your home directory, you might run the following commands:

Code Block
languagebash
themeFadeToGrey
titleJupyter helper script
jsp2205@Johns-MacBook-Pro [12:01:49]> ssh axon.rc.zi.columbia.edu
(base) [jsp2205@axon ~]$ cd ~/my-great-analysis
(base) [jsp2205@axon my-great-analysis]$ sjupyter
Waiting for the Jupyter Notebook SLURM job to start...
.
Jupyter Notebook is starting...
............................
Jupyter Notebook has started
    To access the notebook, open this file in a browser:
        file:///home/jsp2205/.local/share/jupyter/runtime/nbserver-86057-open.html
    Or copy and paste one of these URLs:
        http://10.198.24.59:8614/?token=9c53efc98ab7757cb102b875bfee4c5da858f801ae90b906
To turn off Jupyter notebook, run "scancel 45737".
To reprint the URL for the Jupyter notebook, run "/usr/local/bin/sjupyter --get-notebook-url=45737".

The notebook will stay running for the duration of the job so you can reconnect to it if needed using the URL.  If you forget the URL, you can reprint the URL by using the --get-netbook-url option along with the SLURM job id as follows:

Code Block
languagebash
themeFadeToGrey
(base) [jsp2205@axon ~]$ sjupyter --get-notebook-url=45737
    To access the notebook, open this file in a browser:
        file:///home/jsp2205/.local/share/jupyter/runtime/nbserver-86057-open.html
    Or copy and paste one of these URLs:
        http://10.198.24.59:8614/?token=9c53efc98ab7757cb102b875bfee4c5da858f801ae90b906

To turn off a Jupyter notebook instance, you may use the standard SLURM job cancelling command (scancel).  Note that there is a time limit of 5 days for jobs that run on the shared or burst partitions (see Slurm Overview), so any Jupyter notebook servers running on these partitions will be cancelled automatically after the time limit is exceeded.

sjupyter can also be run with any of the standard sbatch flags, so if you want to specify exactly what resources should be available to the Jupyter notebook instance, you may do so by appending sbatch options to the command you run.  For instance, the command below requests that 3 GPUs and 16 GB of memory per CPU be available to the Jupyter notebook:

...

languagebash
themeFadeToGrey

...

to

...

run

...

Manually starting a Jupyter notebook

Launching a jupyter notebook from a slurm session requires two things, a python environment which has jupyter installed (the default anaconda environment from modules meets this requirement) and a special shell variable initialized (XDG_RUNTIME_DIR="").

The command to start a jupyter notebook is as follows jupyter notebook --no-browser --port=${PORT} --ip=${IP}, where ${PORT} is the port that jupyter will listen on (can't be in use on that server so should random or check to see if port is in use) and ${IP} is the external IP address of the server (otherwise it will default to the localhost port which is only accessible from that machine, which doesn't have a web browser).

Code Block
languagebash
themeMidnight
titleLaunching Jupyter notebook from an interactive session
[aa3301@axon ~]$ srun --pty -c 8 --gres=gpu:2 -t 0-01:00 /bin/bash
[aa3301@ax08 ~]$ ml anaconda3-2019.03
[aa3301@ax08 ~]$ XDG_RUNTIME_DIR=""
[aa3301@ax08 ~]$  jupyter notebook --no-browser --ip=$(hostname -I | awk '{print $1}') --port=$(shuf -i 8888-9000 -n1)
[I 11:43:24.018 NotebookApp] [nb_conda_kernels] enabled, 1 kernels found
[I 11:43:25.289 NotebookApp] JupyterLab extension loaded from /share/apps/anaconda3-2019.03/lib/python3.7/site-packages/jupyterlab
[I 11:43:25.289 NotebookApp] JupyterLab application directory is /share/apps/anaconda3-2019.03/share/jupyter/lab
[I 11:43:25.296 NotebookApp] [nb_conda] enabled
[I 11:43:25.296 NotebookApp] Serving notebooks from local directory: /share/zrc/aa3301
[I 11:43:25.296 NotebookApp] The Jupyter Notebook is running at:
[I 11:43:25.296 NotebookApp] http://10.198.24.59:8959/?token=9e73528834e240a01eeffe298d35a04d3d670bbd8b6b4b87
[I 11:43:25.297 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 11:43:25.307 NotebookApp]

    To access the notebook, open this file in a browser:
        file:///share/zrc/aa3301/.local/share/jupyter/runtime/nbserver-222039-open.html
    Or copy and paste one of these URLs:
        http://10.198.24.59:8959/?token=9e73528834e240a01eeffe298d35a04d3d670bbd8b6b4b87
[I 11:43:48.650 NotebookApp] 302 GET /?token=9e73528834e240a01eeffe298d35a04d3d670bbd8b6b4b87 (128.59.216.48) 1.55ms

In the example above we use commands to determine the ip and make a random port number. The notebook is accessed via the url near the bottom as instructed. In this approach the notebook will end when the job expires (we asked it to last for an hour in the first command) or when the application is quit (using CTRL + C) in the terminal.