Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

An interactive session is, in research cluster parlance, a scheduler job where you are free to enter run commands the same as if you logged into the server via SSH. If you're developing code in the cluster environment, want to check your code, or just run some commands on a cluster node this is the option you want.

The job will stay open similar to an ssh session, that is if you disconnect the session the job is over. This limitation can be circumvented through the use of programs such as screen or tmux on the login node. 

To run a simple interactive job on the cluster use the srun command as shown below.

Simple interactive job
srun --pty bash -i

Running the above command on the login node would start an interactive job with the default limits. After running this you would then be running from one of the three gpu nodes with one core (and no GPU) allocation for a length of five days.

Complicated interactive job
srun --pty -c 2 --nodelist=ax08 --gres=gpu:gtx1080:2 -t 0-01:00 /bin/bash

In the command above the parameters are a follows

--pty says to start a pseudo terminal interactive session (mandatory for interactive sessions).

-c 2 means use 2 cores (optional, default is 1 core)

--nodelist=ax08 means only run on the node named ax08 (optional, default is any to run on any available node)

--gres=gpu:gtx1080:2 is to use 2 gpu boards of the gtx1080 variety (optional, by default no gpus are allocated, note allocated gpus will appear starting as id 0)

-t 0-01:00 is to set the time for the session, in this case 1 hour (generally, the shorter the time you request, the higher the likelihood your job can be scheduled, this will be come more important as the cluster fills up!)

/bin/bash is to run the program you wish to start in the interactive session, the bash shell (mandatory, you need to specify a program it doesn't have to be a command line interpreter like bash it could by python or matlab if that's what you want to run)

Jupyter notebooks from interactive session

To run a Jupyter notebook (server), please first start an interactive session (using the necessary number of GPUs), and then during the session execute

Jupyter job setting
ml anaconda3-2019.03;unset XDG_RUNTIME_DIR; jupyter notebook --port=8123 --ip=`hostname -i`

In the command above the components have the following meaning:

unset XDG_RUNTIME_DIR makes sure that the environment variable set by Slurm doesn't interfere with Jupyter

--port=8123 sets the port on which your notebook server can be reached (if you get an error message, it's possible the port you chose is already in use, pick a different number, typically between 8100 and 8999)

--ip=`hostname -i` makes sure the notebook server isn't run on localhost, so you can connect to it from within the Columbia network address range

Jupyter notebooks from a batch session

We also have a script which will run a Jupyter notebook automatically and give you the url to connect. To use it connect to the login node (e.g. ssh axon.rc.zi.columbia.edu) navigate to the directory you want to start your notebook in and run the respective sjupyter command. Currently the script has 3 variants: sjupyter, sjupyter-1gpu, and sjupyter-2gpu.   Each of these specify the number of gpus for the notebook respectively.

The notebook will stay running for the duration of the job (five days by default) so you can reconnect to it if needed using the token, but the job will stop if your shell session dies so run it in tmux or screen to keep it active if you think you will disconnect. Please stop the job by hitting CTRL+C in the session once you're done. 

Jupyter helper script
[aa3301@axon ~]$ sjupyter
Waiting for the jupyter notebook slurm job to start
Jupyter notebook is starting... (please wait)
[I 11:54:38.451 NotebookApp] Loading IPython parallel extension
[I 11:54:38.453 NotebookApp] Serving notebooks from local directory: /home/aa3301
[I 11:54:38.453 NotebookApp] The Jupyter Notebook is running at:
[I 11:54:38.453 NotebookApp] http://10.198.24.56:8946/?token=112d7083855a4e2d85071d2688df9d9fe20a634a344b16bd
[I 11:54:38.453 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 11:54:38.462 NotebookApp]

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://10.198.24.56:8946/?token=112d7083855a4e2d85071d2688df9d9fe20a634a344b16bd

B

  • No labels