Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

Running Jupyter Notebook on Axon

Using the sjupyter script

The sjupyter command is meant to quickly run a jupyter notebook using the default anaconda environment with one command. For more complicated usage in batch files and such running the command directly from an interactive session or batch file is a better approach.

Axon has a command (sjupyter) that will start up a Jupyter notebook server automatically and give you the URL to connect. To use this command, first connect to the login node (e.g. ssh axon.rc.zi.columbia.edu), then navigate to the directory you want to start your notebook in, and finally run sjupyter.  For instance, if you wanted to run a Jupyter notebook within a directory named my-great-analysis that existed within your home directory, you might run the following commands:

Jupyter helper script
jsp2205@Johns-MacBook-Pro [12:01:49]> ssh axon.rc.zi.columbia.edu
(base) [jsp2205@axon ~]$ cd ~/my-great-analysis
(base) [jsp2205@axon my-great-analysis]$ sjupyter
Waiting for the Jupyter Notebook SLURM job to start...
.
Jupyter Notebook is starting...
............................
Jupyter Notebook has started
    To access the notebook, open this file in a browser:
        file:///home/jsp2205/.local/share/jupyter/runtime/nbserver-86057-open.html
    Or copy and paste one of these URLs:
        http://10.198.24.59:8614/?token=9c53efc98ab7757cb102b875bfee4c5da858f801ae90b906
To turn off Jupyter notebook, run "scancel 45737".
To reprint the URL for the Jupyter notebook, run "/usr/local/bin/sjupyter --get-notebook-url=45737".

The notebook will stay running for the duration of the job so you can reconnect to it if needed using the URL.  If you forget the URL, you can reprint the URL by using the --get-netbook-url option along with the SLURM job id as follows:

(base) [jsp2205@axon ~]$ sjupyter --get-notebook-url=45737
    To access the notebook, open this file in a browser:
        file:///home/jsp2205/.local/share/jupyter/runtime/nbserver-86057-open.html
    Or copy and paste one of these URLs:
        http://10.198.24.59:8614/?token=9c53efc98ab7757cb102b875bfee4c5da858f801ae90b906

To turn off a Jupyter notebook instance, you may use the standard SLURM job cancelling command (scancel).  Note that there is a time limit of 5 days for jobs that run on the shared or burst partitions (see Slurm Overview), so any Jupyter notebook servers running on these partitions will be cancelled automatically after the time limit is exceeded.

sjupyter can also be run with any of the standard sbatch flags, so if you want to specify exactly what resources should be available to the Jupyter notebook instance, you may do so by appending sbatch options to the command you run.  For instance, the command below requests that 3 GPUs and 16 GB of memory per CPU be available to the Jupyter notebook:

(base) [jsp2205@axon slurm]$ sjupyter --mem-per-cpu=16G --gres=gpu:3
Waiting for the Jupyter Notebook SLURM job to start...
.
Jupyter Notebook is starting...
..............................
Jupyter Notebook has started
    To access the notebook, open this file in a browser:
        file:///home/jsp2205/.local/share/jupyter/runtime/nbserver-187906-open.html
    Or copy and paste one of these URLs:
        http://10.198.24.59:8074/?token=298fbd14329110b21ecb0d2765858fa836cd6694edabb502
To turn off Jupyter notebook, run "scancel 45747".
To reprint the URL for the Jupyter notebook, run "/usr/local/bin/sjupyter --get-notebook-url=45747".

Manually starting a Jupyter notebook from an interactive session

Launching a jupyter notebook from a slurm session requires two things, a python environment which has jupyter installed (the default anaconda environment from modules meets this requirement) and a special shell variable initialized (XDG_RUNTIME_DIR="").

The command to start a jupyter notebook is as follows jupyter notebook --no-browser --port=${PORT} --ip=${IP}, where ${PORT} is the port that jupyter will listen on (can't be in use on that server so should random or check to see if port is in use) and ${IP} is the external IP address of the server (otherwise it will default to the localhost port which is only accessible from that machine, which doesn't have a web browser).

Launching Jupyter notebook from an interactive session
[aa3301@axon ~]$ srun --pty -c 6 --gres=gpu:1 -t 01:00:00 /bin/bash
[aa3301@ax08 ~]$ ml anaconda3-2019.03
[aa3301@ax08 ~]$ XDG_RUNTIME_DIR=""
[aa3301@ax08 ~]$  jupyter notebook --no-browser --ip=$(hostname -I | awk '{print $1}') --port=$(shuf -i 8888-9000 -n1)
[I 11:43:24.018 NotebookApp] [nb_conda_kernels] enabled, 1 kernels found
[I 11:43:25.289 NotebookApp] JupyterLab extension loaded from /share/apps/anaconda3-2019.03/lib/python3.7/site-packages/jupyterlab
[I 11:43:25.289 NotebookApp] JupyterLab application directory is /share/apps/anaconda3-2019.03/share/jupyter/lab
[I 11:43:25.296 NotebookApp] [nb_conda] enabled
[I 11:43:25.296 NotebookApp] Serving notebooks from local directory: /share/zrc/aa3301
[I 11:43:25.296 NotebookApp] The Jupyter Notebook is running at:
[I 11:43:25.296 NotebookApp] http://10.198.24.59:8959/?token=9e73528834e240a01eeffe298d35a04d3d670bbd8b6b4b87
[I 11:43:25.297 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 11:43:25.307 NotebookApp]

    To access the notebook, open this file in a browser:
        file:///share/zrc/aa3301/.local/share/jupyter/runtime/nbserver-222039-open.html
    Or copy and paste one of these URLs:
        http://10.198.24.59:8959/?token=9e73528834e240a01eeffe298d35a04d3d670bbd8b6b4b87
[I 11:43:48.650 NotebookApp] 302 GET /?token=9e73528834e240a01eeffe298d35a04d3d670bbd8b6b4b87 (128.59.216.48) 1.55ms

In the example above we use commands to determine the ip and make a random port number. The notebook is accessed via the url near the bottom as instructed. In this approach the notebook will end when the job expires (we asked it to last for an hour in the first command) or when the application is quit (using CTRL + C) in the terminal.

Running a Jupyter notebook in a batch job

Jupyter notebooks can easily can be run in a batch session, the only complication being discovering the server where the notebook is running, which can be resolved by looking at the slurm output file or using a predetermined port and looking up the running server name in squeue.

SBATCH script to run Jupyter Notebook on Axon
#!/bin/sh
#
# running a Jupyter Notebook in Slurm.
#
#SBATCH --job-name=jupyter-notebook       # The job name.
#SBATCH -c 6                             # The number of cpu cores to use.
#SBATCH --time=5:00:00              # The time the job will take to run.
#SBATCH --mem-per-cpu=1gb        # The memory the job will use per cpu core.
#SBATCH --gres=gpu:1              # The number of GPUs (1) and the (optional) variety (gtx1080)

ml anaconda3-2019.03
# conda activate myenvironment
# The above command here is where you would activate your custom conda environment (note the environment must have jupyter installed see https://jupyter.org/install
XDG_RUNTIME_DIR=""
jupyter notebook --no-browser --ip=$(hostname -I | awk '{print $1}') --port=$(shuf -i 8888-9000 -n1)

# End of script

Using the submission script above we can launch a jupyter notebook and tail the slurm output to see what the url is we need to access it. Note it may take a minute for the notebook output to show up.

Launching a jupyter notebook from a submission script
[aa3301@axon test]$ sbatch sbatch-jupyter.sh
Submitted batch job 47255
[aa3301@axon test]$ tail -f slurm-47255.out
[I 16:33:21.425 NotebookApp] [nb_conda_kernels] enabled, 1 kernels found
[I 16:33:23.445 NotebookApp] JupyterLab extension loaded from /share/apps/anaconda3-2019.03/lib/python3.7/site-packages/jupyterlab
[I 16:33:23.445 NotebookApp] JupyterLab application directory is /share/apps/anaconda3-2019.03/share/jupyter/lab
[I 16:33:23.463 NotebookApp] [nb_conda] enabled
[I 16:33:23.464 NotebookApp] Serving notebooks from local directory: /share/zrc/aa3301/test
[I 16:33:23.464 NotebookApp] The Jupyter Notebook is running at:
[I 16:33:23.464 NotebookApp] http://10.198.24.68:8919/?token=8fced6a2fd7d9dac7f18533be3a10077174a8415e097d41b
[I 16:33:23.464 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:33:23.482 NotebookApp]

    To access the notebook, open this file in a browser:
        file:///share/zrc/aa3301/.local/share/jupyter/runtime/nbserver-415087-open.html
    Or copy and paste one of these URLs:
        http://10.198.24.68:8919/?token=8fced6a2fd7d9dac7f18533be3a10077174a8415e097d41b

Once you have successfully pasted the URL in your browser you can press CTRL + C to quit the tail, and unlike the previous version of the in the interactive session the jupyter notebook will continue running for the duration specified in the batch file, which is 5 hours in the example above.

If you finish earlier and you no longer need to use your notebook please cancel your job so others can use the resources of the cluster. You can run jobstats.py to see your running jobs and then stop them using the scancel command.

Cancelling your running job when its done.
[aa3301@axon test]$ jobstats.py
User: aa3301
Default Account: zrc
User is part of the following slurm accounts ['zrc']
User Raw Share: 1
User Raw Usage: 476411
Number of Pending Jobs: 0
Number of Running Jobs: 1
Total Jobs Completed: 5
Total Jobs Completed Successfully: 0
Total Jobs Failed: 0
Total Jobs Cancelled: 0
Total Jobs Timeout: 0

                            Running Jobs
________________________________________________________________________________
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             47257    shared jupyter-   aa3301  R       0:48      1 ax08

                  Running + Pending Jobs
________________________________________________________________________________
             JOBID PARTITION PRIOR     NAME     USER    STATE       TIME  TIME_LIMIT  NODES CPUS TRES_P           START_TIME     NODELIST(REASON)      QOS
             47257    shared   108 jupyter-   aa3301  RUNNING       0:48     5:00:00      1    6  gpu:1  2020-03-05T16:50:33                 ax08   normal

[aa3301@axon test]$ scancel 47257



  • No labels