...
The job will stay open similar to an ssh session, that is if you disconnect the session the job is over. This limitation can be circumvented through the use of programs such as screen or tmux on the login node.
To run a simple interactive job on the cluster use the srun command as shown below.
...
/bin/bash is to run the program you wish to start in the interactive session, the bash shell (mandatory, you need to specify a program it doesn't have to be a command line interpreter like bash it could by python or matlab if that's what you want to run)
Jupyter notebooks from interactive session
To run a Jupyter notebook (server), please first start an interactive session (using the necessary number of GPUs), and then during the session execute
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
ml anaconda3-2019.03;unset XDG_RUNTIME_DIR; jupyter notebook --port=8123 --ip=`hostname -i` |
In the command above the components have the following meaning:
unset XDG_RUNTIME_DIR makes sure that the environment variable set by Slurm doesn't interfere with Jupyter
--port=8123 sets the port on which your notebook server can be reached (if you get an error message, it's possible the port you chose is already in use, pick a different number, typically between 8100 and 8999)
--ip=`hostname -i` makes sure the notebook server isn't run on localhost, so you can connect to it from within the Columbia network address range
Jupyter notebooks from a batch session
...
Running Jupyter Notebook Sessions on Axon
Axon has a command (sjupyter) that will start up a Jupyter notebook server automatically and give you the url URL to connect. To use it this command, first connect to the login node (e.g. ssh axon.rc.zi.columbia.edu), then navigate to the directory you want to start your notebook in, and finally run the respective sjupyter command. Currently the script has 3 variants: sjupyter, sjupyter-1gpu, and sjupyter-2gpu. Each of these specify the number of gpus for the notebook respectively.The notebook will stay running for the duration of the job (five days by default) so you can reconnect to it if needed using the token, but the job will stop if your shell session dies so run it in tmux or screen to keep it active if you think you will disconnect. Please stop the job by hitting CTRL+C in the session once you're done. . For instance, if you wanted to run a Jupyter notebook within a directory named my-great-analysis that existed within your home directory, you might run the following commands:
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
[aa3301@axonjsp2205@Johns-MacBook-Pro [12:01:49]> ssh axon.rc.zi.columbia.edu (base) [jsp2205@axon ~]$ cd ~/my-great-analysis (base) [jsp2205@axon my-great-analysis]$ sjupyter Waiting for the jupyterJupyter notebookNotebook slurmSLURM job to start... . Jupyter notebookNotebook is starting... (please wait) [I 11:54:38.451 NotebookApp] Loading IPython parallel extension [I 11:54:38.453 NotebookApp] Serving notebooks from local directory: /home/aa3301 [I 11:54:38.453 NotebookApp] The Jupyter Notebook is running at: [I 11:54:38.453 NotebookApp]............................ Jupyter Notebook has started To access the notebook, open this file in a browser: file:///home/jsp2205/.local/share/jupyter/runtime/nbserver-86057-open.html Or copy and paste one of these URLs: http://10.198.24.59:8614/?token=9c53efc98ab7757cb102b875bfee4c5da858f801ae90b906 To turn off Jupyter notebook, run "scancel 45737". To reprint the URL for the Jupyter notebook, run "/usr/local/bin/sjupyter --get-notebook-url=45737". |
The notebook will stay running for the duration of the job so you can reconnect to it if needed using the URL. If you forget the URL, you can reprint the URL by using the --get-netbook-url option along with the SLURM job id as follows:
Code Block |
---|
(base) [jsp2205@axon ~]$ sjupyter --get-notebook-url=45737 To access the notebook, open this file in a browser: file:///home/jsp2205/.local/share/jupyter/runtime/nbserver-86057-open.html Or copy and paste one of these URLs: http://10.198.24.5659:89468614/?token=112d7083855a4e2d85071d2688df9d9fe20a634a344b16bd [I 11:54:38.453 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 11:54:38.462 NotebookApp] Copy/paste this URL into your browser when you connect for the first time, to login with a token9c53efc98ab7757cb102b875bfee4c5da858f801ae90b906 |
To turn off a Jupyter notebook instance, you may use the standard SLURM job cancelling command (scancel). Note that there is a time limit of 5 days for jobs that run on the shared or burst partitions (see Slurm Overview), so any Jupyter notebook servers running on these partitions will be cancelled automatically after the time limit is exceeded.
sjupyter can also be run with any of the standard sbatch flags, so if you want to specify exactly what resources should be available to the Jupyter notebook instance, you may do so by appending sbatch options to the command you run. For instance, the command below requests that 3 GPUs and 16 GB of memory per CPU be available to the Jupyter notebook:
Code Block |
---|
(base) [jsp2205@axon slurm]$ sjupyter --mem-per-cpu=16G --gres=gpu:3 Waiting for the Jupyter Notebook SLURM job to start... . Jupyter Notebook is starting... .............................. Jupyter Notebook has started To access the notebook, open this file in a browser: file:///home/jsp2205/.local/share/jupyter/runtime/nbserver-187906-open.html Or copy and paste one of these URLs: http://10.198.24.5659:89468074/?token=112d7083855a4e2d85071d2688df9d9fe20a634a344b16bd |
...
298fbd14329110b21ecb0d2765858fa836cd6694edabb502
To turn off Jupyter notebook, run "scancel 45747".
To reprint the URL for the Jupyter notebook, run "/usr/local/bin/sjupyter --get-notebook-url=45747". |