Being a student again takes a longer while than I expected to get used to, which gives me a long hiatus in my posting, but also puts me in a different and stimulating environment.
This post was inspired by a question from my colleague, who want to first find some file entries and then merge them into a file, so the initial attempt was:
# There are 6 files in my example directory > ls 1.
Openly shared data is invaluable. It provides a way for others to test reproducibility of analysis and reduces the need of repeated screening experiments. Besides, these data is also an excellent training ground for amateurs like me.
Sometimes, the dataset I want consists of multiple samples. I first clicked all the download links manually, but I soon got lost and forgot which ones I hadn’t downloaded. Thankfully, I realized repetitive tasks like this on a computer can often be automated.
What you are going to find here A minimal introduction of the awk command in Linux and Mac (For Mac user, installing GNU awk might be necessary. It introduced some new functions like sorting an array with asort().) An awk command that would randomly subsample k reads from a given fastq file of a pair-ended sequencing. Why I am making this note In single cell RNA-sequencing, there seems to be no good way telling how deep you should sequence to date.
Recently, I analyzed a few single cell RNA-seq datasets and experimented with several new tools from recent publication. While it was fun, most datasets were just too large for my poor laptop to process, and I relied a lot on our server.
I have to admit I am not too good an analyst and am spoiled by the freedom interpreted languages provided — to try and error line by line. However, this freedom would be gone if I have to do run my analysis like Rscript my-analysis.
After a while of playing around, I’ll say the best way to use R with a Notebook-style interface on a server where you are no superuser would be using Anaconda, and then run R inside Anaconda to get whatever package you need. It is designed to run for a normal user, so there’s no need for superuser permission, and dependency issue is also taken care of most of the time.