Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
minLevel2

Running Jupyter Notebook on Axon

...

...

Using the sjupyter script

...

Code Block
languagebash
themeFadeToGrey
(base) [jsp2205@axon slurm]$ sjupyter --mem-per-cpu=16G --gres=gpu:3
Waiting for the Jupyter Notebook SLURM job to start...
.
Jupyter Notebook is starting...
..............................
Jupyter Notebook has started
    To access the notebook, open this file in a browser:
        file:///home/jsp2205/.local/share/jupyter/runtime/nbserver-187906-open.html
    Or copy and paste one of these URLs:
        http://10.198.24.59:8074/?token=298fbd14329110b21ecb0d2765858fa836cd6694edabb502
To turn off Jupyter notebook, run "scancel 45747".
To reprint the URL for the Jupyter notebook, run "/usr/local/bin/sjupyter --get-notebook-url=45747".

...


Running a Jupyter notebook

...

via an

...

Launching a jupyter notebook from a slurm session requires two things, a python environment which has jupyter installed (the default anaconda environment from modules meets this requirement) and a special shell variable initialized (XDG_RUNTIME_DIR="").

...

SSH tunnel

If you're connecting to Axon without using the VPN you won't be able to reach the Jupyter notebook on the compute node to access it. To get around this you can redirect the traffic from the compute node through the login node to your remote machine via an ssh tunnel.

Code Block
languagebash
themeMidnight
titleLaunching Launch a Jupyter notebook from an interactive session
[aa3301@axon ~]$ srun --pty -c 6 --gres=gpu:1 -t 01:00:00 /bin/bash
[aa3301@ax08 ~]$ ml anaconda3-2019.03
[aa3301@ax08 ~]$ XDG_RUNTIME_DIR=""
[aa3301@ax08 ~]$  jupyter notebook --no-browser --ip=$(hostname -I | awk '{print $1}') --port=$(shuf -i 8888-9000 -n1)
[I 11:43:24.018 NotebookApp] [nb_conda_kernels] enabled, 1 kernels found
[I 11:43:25.289 NotebookApp] JupyterLab extension loaded from /share/apps/anaconda3-2019.03/lib/python3.7/site-packages/jupyterlab
[I 11:43:25.289 NotebookApp] JupyterLab application directory is /share/apps/anaconda3-2019.03/share/jupyter/lab
[I 11:43:25.296 NotebookApp] [nb_conda] enabled
[I 11:43:25.296 NotebookApp] Serving notebooks from local directory: /share/zrc/aa3301
[I 11:43:25.296 NotebookApp] The Jupyter Notebook is running at:
[I 11:43:25.296 NotebookApp] http://10.198.24.59:8959/?token=9e73528834e240a01eeffe298d35a04d3d670bbd8b6b4b87
[I 11:43:25.297 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 11:43:25.307 NotebookApp]

    To access the notebook, open this file in a browser:
        file:///share/zrc/aa3301/.local/share/jupyter/runtime/nbserver-222039-open.html
    Or copy and paste one of these URLs:
        http://10.198.24.59:8959/?token=9e73528834e240a01eeffe298d35a04d3d670bbd8b6b4b87
[I 11:43:48.650 NotebookApp] 302 GET /?token=9e73528834e240a01eeffe298d35a04d3d670bbd8b6b4b87 (128.59.216.48) 1.55ms

In the example above we use commands to determine the ip and make a random port number. The notebook is accessed via the url near the bottom as instructed. In this approach the notebook will end when the job expires (we asked it to last for an hour in the first command) or when the application is quit (using CTRL + C) in the terminal.

Running a Jupyter notebook in a batch job

Jupyter notebooks can easily can be run in a batch session, the only complication being discovering the server where the notebook is running, which can be resolved by looking at the slurm output file or using a predetermined port and looking up the running server name in squeue.

Code Block
languagebash
titleSBATCH script to run Jupyter Notebook on Axon
#!/bin/sh
#
# running a Jupyter Notebook in Slurm.
#
#SBATCH --job-name=jupyter-notebook       # The job name.
#SBATCH -c 6                             # The number of cpu cores to use.
#SBATCH --time=5:00:00              # The time the job will take to run.
#SBATCH --mem-per-cpu=1gb        # The memory the job will use per cpu core.
#SBATCH --gres=gpu:1              # The number of GPUs (1) and the (optional) variety (gtx1080)

ml anaconda3-2019.03
# conda activate myenvironment
# The above command here is where you would activate your custom conda environment (note the environment must have jupyter installed see https://jupyter.org/install )
XDG_RUNTIME_DIR=""
jupyter notebook --no-browser --ip=$(hostname -I | awk '{print $1}') --port=$(shuf -i 8888-9000 -n1)

# End of script

Using the submission script above we can launch a jupyter notebook and tail the slurm output to see what the url is we need to access it. Note it may take a minute for the notebook output to show up.

[aa3301@axon test]$ sbatch sbatch-jupyter.sh Submitted batch job 47255 [aa3301@axon test]$ tail -f slurm-47255.out [I 16:33:21.425 NotebookApp] [nb_conda_kernels] enabled, 1 kernels found [I 16:33:23.445 NotebookApp] JupyterLab extension loaded from /share/apps/anaconda3-2019.03/lib/python3.7/site-packages/jupyterlab [I 16:33:23.445 NotebookApp] JupyterLab application directory is /share/apps/anaconda3-2019.03/share/jupyter/lab [I 16:33:23.463 NotebookApp] [nb_conda] enabled [I 16:33:23.464 NotebookApp] Serving notebooks from local directory: /share/zrc/aa3301/test [I 16:33:23.464 NotebookApp] The Jupyter Notebook is running at: [I 16:33:23.464 NotebookApp] http://10.198.24.68:8919/?token=8fced6a2fd7d9dac7f18533be3a10077174a8415e097d41b [I 16:33:23.464 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 16:33:23.482 NotebookApp]
Code Block
languagebash
themeMidnight
titleLaunching a jupyter notebook from a submission script
using sjupyter
(base) [jsp2205@axon slurm]$ sjupyter --mem-per-cpu=16G --gres=gpu:3
Waiting for the Jupyter Notebook SLURM job to start...
.
Jupyter Notebook is starting...
..............................
Jupyter Notebook has started
    To access the notebook, open this file in a browser:
        file:///sharehome/zrcjsp2205/aa3301/.local/share/jupyter/runtime/nbserver-415087187906-open.html
    Or copy and paste one of these URLs:
        http://10.198.24.6859:89198074/?token=8fced6a2fd7d9dac7f18533be3a10077174a8415e097d41b

Once you have successfully pasted the URL in your browser you can press CTRL + C to quit the tail, and unlike the previous version of the in the interactive session the jupyter notebook will continue running for the duration specified in the batch file, which is 5 hours in the example above.

If you finish earlier and you no longer need to use your notebook please cancel your job so others can use the resources of the cluster. You can run jobstats.py to see your running jobs and then stop them using the scancel command.

Code Block
languagebash
themeMidnight
titleCancelling your running job when its done.
[aa3301@axon test]$ jobstats.py
User: aa3301
Default Account: zrc
User is part of the following slurm accounts ['zrc']
User Raw Share: 1
User Raw Usage: 476411
Number of Pending Jobs: 0
Number of Running Jobs: 1
Total Jobs Completed: 5
Total Jobs Completed Successfully: 0
Total Jobs Failed: 0
Total Jobs Cancelled: 0
Total Jobs Timeout: 0

                            Running Jobs
________________________________________________________________________________
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             47257    shared jupyter-   aa3301  R       0:48      1 ax08

                  Running + Pending Jobs
________________________________________________________________________________
             JOBID PARTITION PRIOR     NAME     USER    STATE       TIME  TIME_LIMIT  NODES CPUS TRES_P           START_TIME     NODELIST(REASON)      QOS
             47257    shared   108 jupyter-   aa3301  RUNNING       0:48     5:00:00      1    6  gpu:1  2020-03-05T16:50:33                 ax08   normal

[aa3301@axon test]$ scancel 47257

...

If you're connecting to Axon the external SSH connection with out the VPN you won't be able to reach the Jupyter notebook on the compute node to access it. To get around this you can redirect the traffic from the compute node through the login node to your remote machine via an ssh tunnel.

Code Block
languagebash
themeMidnight
titleLaunch a Jupyter notebook
[aa3301@axon ~]$ srun --pty -c 6 --gres=gpu:1 -t 01:00:00 /bin/bash
[aa3301@ax01 ~]$ ml anaconda3-2019.03
[aa3301@ax01 ~]$ XDG_RUNTIME_DIR=""
[aa3301@ax01 ~]$ jupyter notebook --no-browser --ip=$(hostname -I | awk '{print$1}') --port=$(shuf -i 8888-9000 -n1)
[I 16:36:47.176 NotebookApp] [nb_conda_kernels] enabled, 1 kernels found
[I 16:36:51.677 NotebookApp] JupyterLab extension loaded from /share/apps/anaconda3-2019.03/lib/python3.7/site-packages/jupyterlab
[I 16:36:51.677 NotebookApp] JupyterLab application directory is /share/apps/anaconda3-2019.03/share/jupyter/lab
[I 16:36:51.711 NotebookApp] [nb_conda] enabled
[I 16:36:51.711 NotebookApp] Serving notebooks from local directory: /share/zrc/aa3301
[I 16:36:51.712 NotebookApp] The Jupyter Notebook is running at:
[I 16:36:51.712 NotebookApp] http://10.198.24.12:8944/?token=84cb9ff65b505f63f3e6ffcc03253adc7d133e3e5e063773
[I 16:36:51.712 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:36:51.758 NotebookApp]

    To access the notebook, open this file in a browser:
        file:///share/zrc/aa3301/.local/share/jupyter/runtime/nbserver-418198-open.html
    Or copy and paste one of these URLs:
        http://10.198.24.12:8944/?token=84cb9ff65b505f63f3e6ffcc03253adc7d133e3e5e063773

...

298fbd14329110b21ecb0d2765858fa836cd6694edabb502
To turn off Jupyter notebook, run "scancel 45747".
To reprint the URL for the Jupyter notebook, run "/usr/local/bin/sjupyter --get-notebook-url=45747".

Once you have started the Jupyter notebook you will need to make note of the ip IP and port number of the url URL listed in the last line above. In this case the server ip is 10IP is 10.198.24.12 59 and the port number is 89448074.

When you have this you can open a tunnel from your machine to the the server , by running the following openssh command from a terminal on your machine:

Code Block
languagebash
ssh -N -L 8080:<notebook ip>:<notebook port> -p 55 <uni>@axon-remote.rc.zi.columbia.edu

In the example below we are running this command where the notebook ip is 10.198.24.1259, notebook port is 8944 is 8074 and the uni is aa3301jsp2205.

Code Block
languagebash
themeMidnight
titleStarting an ssh tunnel from your local machine
> ssh -N -L 8080:10.198.24.1259:89448074 -p 55 aa3301@axonjsp2205@axon-remote.rc.columbia.edu
aa3301@axon-remote.rc.columbia.edu's password:
Last login: Mon Apr  6 10:10:02 2020 from adm.rc.zi.columbia.edu
Welcome to the Axon
GPU Cluster!
...

...

The SSH command will seem to hang, but don't worry!  What's actually happening is SSH is running in the background making sure that data is forwarded from Axon to your local computer.

Now take the URL from the sjupyter output (e.g., http://10.198.24.12:8944 (from the example above).Now put the following url in your web browser: 59:8074/?token=298fbd14329110b21ecb0d2765858fa836cd6694edabb502) and replace the IP address and port with localhost:8080 (e.g., http://localhost:8080 and you will see a page like this.

Image Removed

If we look back at the original command you can see the token which was generated when we launched the jupyter notebook embedded in the url in the last line. We can now take the portion after token= (which is 84cb9ff65b505f63f3e6ffcc03253adc7d133e3e5e063773 in the example above) and paste it into the "Password or token:" field in the page above, and you will be good to go/?token=298fbd14329110b21ecb0d2765858fa836cd6694edabb502).  You should now have access to jupyter in the same manner as if you were on-campus or using the VPN.

Info

The bash terminal with the tunnel ssh session command needs to stay open as long as you're using Jupyter notebook. It may look idle, but it is keeping the tunnel open. You can use this session for any other work, but when it closes your tunnel to Axon will close as well.