Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: A few stylistic updates and fixing inaccuracies (e.g., cluster is no longer just 3 nodes)

An interactive session is , in research cluster parlance, a scheduler job where you are free to enter run commands the same as if you logged into the server via SSHcan run commands in real-time (i.e., it is qualitatively identical a remote session started via Secure Shell or a terminal session that you've started on your laptop). If you're developing code in the cluster environment , want to check your code, or just run some commands on a cluster node or your analysis workflow isn't well-defined and you need to experiment with different command line tools then this is the option you want.The job will stay open similar to an ssh session, that is if you disconnect the session the job is over. This limitation can be circumvented through the use of programs such as screen or tmux

Warning

Similar to a Secure Shell (SSH) session or a terminal window, if you close your interactive session the applications that are running in it will be quit and the resources will be freed up for use by other jobs on the cluster.  If you need your interactive session to persist, we recommend using the screen or tmux utilities on the login node.  Tutorials for these tools can be found below:

To run a simple an interactive job on the cluster use , invoke the srun command as shown below.on the login node (axon.rc) as demonstrated below:

Code Block
languagebash
themeFadeToGrey
titleSimple interactive job
srun --pty bash -i

Running the above command on the login node This command would start an interactive job with the default limits. After running this you You would then be running from one of the three gpu be provided with a session on one of Axon's nodes with one CPU core (and no GPU) allocation for a length of five daysfor the default job time limit (see here; currently 10 days as of 3/21).

Code Block
languagebash
themeFadeToGrey
titleComplicated interactive job
srun --pty -c 2 --nodelist=ax08 --gres=gpu:gtx1080:2 -t 0-01:00 /bin/bash

In the command above the parameters are a as follows:

--pty says to start a pseudo terminal interactive session (mandatory for interactive sessions).

...

/bin/bash is to run the program you wish to start in the interactive session, the bash shell (mandatory, you need to specify a program it doesn't have to be a command line interpreter like bash it could by python or matlab if that's what you want to run)

Jupyter notebooks from interactive session

To run a Jupyter notebook (server), please first start an interactive session (using the necessary number of GPUs), and then during the session execute

Code Block
languagebash
themeFadeToGrey
titleJupyter job setting
ml anaconda3-2019.03;unset XDG_RUNTIME_DIR; jupyter notebook --port=8123 --ip=`hostname -i`

In the command above the components have the following meaning:

unset XDG_RUNTIME_DIR makes sure that the environment variable set by Slurm doesn't interfere with Jupyter

--port=8123 sets the port on which your notebook server can be reached (if you get an error message, it's possible the port you chose is already in use, pick a different number, typically between 8100 and 8999)

--ip=`hostname -i` makes sure the notebook server isn't run on localhost, so you can connect to it from within the Columbia network address range

Jupyter notebooks from a batch session

We also have a script which will run a Jupyter notebook automatically and give you the url to connect. To use it connect to the login node (e.g. ssh axon.rc.zi.columbia.edu) navigate to the directory you want to start your notebook in and run the respective sjupyter command. Currently the script has 3 variants: sjupyter, sjupyter-1gpu, and sjupyter-2gpu.   Each of these specify the number of gpus for the notebook respectively.

The notebook will stay running for the duration of the job (five days by default) so you can reconnect to it if needed using the token, but the job will stop if your shell session dies so run it in tmux or screen to keep it active if you think you will disconnect. Please stop the job by hitting CTRL+C in the session once you're done. 

Code Block
languagebash
themeFadeToGrey
titleJupyter helper script
[aa3301@axon ~]$ sjupyter
Waiting for the jupyter notebook slurm job to start
Jupyter notebook is starting... (please wait)
[I 11:54:38.451 NotebookApp] Loading IPython parallel extension
[I 11:54:38.453 NotebookApp] Serving notebooks from local directory: /home/aa3301
[I 11:54:38.453 NotebookApp] The Jupyter Notebook is running at:
[I 11:54:38.453 NotebookApp] http://10.198.24.56:8946/?token=112d7083855a4e2d85071d2688df9d9fe20a634a344b16bd
[I 11:54:38.453 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 11:54:38.462 NotebookApp]

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://10.198.24.56:8946/?token=112d7083855a4e2d85071d2688df9d9fe20a634a344b16bd

...