Recently, I analyzed a few single cell RNA-seq datasets and experimented with several new tools from recent publication. While it was fun, most datasets were just too large for my poor laptop to process, and I relied a lot on our server.
I have to admit I am not too good an analyst and am spoiled by the
freedom interpreted languages provided — to try and error line by line.
However, this freedom would be gone if I have to do run my analysis like
Rscript my-analysis.R or
python my-analysis.py. Besides, an
interactive interface to play and receive feedback immediately would
also make learning new tools easier.
So, I decided to use Jupyter Notebook because I am not allowed to run RStudio Server here. Here’s a brief note of how I did it.
First, I connect to the server via SSH, and then run
jupyter notebook --no-browser --port 8889 (as this nice tutorial
This would start a Jupyter server, which could be then channelled back
to my own machine via SSH. In another terminal session, this time on my
ssh -N -L localhost:8888:localhost:8889 firstname.lastname@example.org would channel whatever from port 8889 of the remote server to localhost:8888.
Open the browser, connect to
localhost:8888, and I am ready to go.
Usually I would play with a smaller dataset to form my analysis. Then, I
load the real dataset to the notebook and run the whole thing. For this
one, I don’t really want to see it run but only am interested in the
result, so I would do it with
jupyter nbconvert --ExecutePreprocessor.timeout=-1 --to notebook --execute myanalysis.ipynb. It runs Jupyter notebook in command line, and the
--ExecutePreprocessor.timeout decides after how long
nbconvert should give up on running a cell. In my case, a cell could
run for hours, so I set it to -1, allowing a cell to run forever. This
is important because the default is something like 30 minutes.
Similar idea works for Rmarkdown. I draft my analysis with a smaller
dataset in my local RStudio, upload the .Rmd file, start R session on
the server, and
rmarkdown::render("myanalysis.Rmd") would do the
Just remember to use
tmux to before running your analysis
so it would keep going when you close your SSH connection, and when you
are done, both Jupyter notebook and SSH channelling could be turned off