NGS

Subsampling a fastq file with awk

What you are going to find here A minimal introduction of the awk command in Linux and Mac (For Mac user, installing GNU awk might be necessary. It introduced some new functions like sorting an array with asort().) An awk command that would randomly subsample k reads from a given fastq file of a pair-ended sequencing. Why I am making this note In single cell RNA-sequencing, there seems to be no good way telling how deep you should sequence to date.

Using Limma to find differentially expressed genes

Ritchie, ME, Phipson, B, Wu, D, Hu, Y, Law, CW, Shi, W, and Smyth, GK (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies.Nucleic Acids Research 43(7), e47. limma is an R package hosted on Bioconductor which finds differentially expressed genes for RNA-seq or microarray. Recently I’ve been working on a PCR-based low-density array and noticed that I forgot how to use limma for the one hundredth time, so I decided to make a note.

Wandering into next-generation seqeuncing

No Longer That “Next” Generation When I was doing my undergraduate project, microarray was like black magic that turned the labyrinth of gene expression into colorful heatmap and brought your paper into top-notch journals. Several years later, when I came back from my clinical internship to life science, I was overwhelmed. Microarray became as common as microwave, and people started using next-generation sequencing to check everything. It’s like every single word I’ve learned in class now has to be followed by -seq.