What exactly is sequencing depth in RNAseq?
3
2
Entering edit mode
6.4 years ago
carmacae ▴ 20

I am new to learning about RNAseq analysis and am confused as to what exactly sequencing depth refers to. For example, if I need to calculate "how deep each sample was sequenced" does this refer to the total number of paired end reads that came out of the sequencer OR the number of paired end reads that mapped?

depth RNA-Seq • 16k views
ADD COMMENT
0
Entering edit mode

Hi Carmacae,

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

Cheers,
Wouter

ADD REPLY
7
Entering edit mode
6.4 years ago

Depth is commonly a term used for genome or exome sequencing and means the number of reads covering each position. But that is for RNA-seq totally pointless since the coverage pattern is so uneven due to differences in expression.

More commonly, in RNA-seq the term "number of reads" is used, for example,10 million reads or 100 million reads. If all goes well a high percentage of the reads should align, ~90%. That makes the number of reads out of the sequencer not so different from the number of mapped reads. I would report the number of reads sequenced unless it's very different from what is aligned. But then you should also figure out why it's such a big difference.

ADD COMMENT
0
Entering edit mode

Thank you for the clear explanation! I was wondering because I've got some data to play around with in which most of the samples have around a 70% alignment rate (not too surprising as the genome quality isn't very good). In this case, say I have 30 million total reads of which 70% mapped... in this case my number of reads would be 21 million?

ADD REPLY
0
Entering edit mode

Number of reads is still 30M out of which 70% mapped. Why the rest did not is something you could investigate. They could be rRNA, contamination or just plain who knows what (though that fraction should generally be very small).

ADD REPLY
0
Entering edit mode

Right. I'm asking though because I'm playing around with different packages, mostly just to learn, and for one (CQN package) it specifically asks for a vector containing "... the sizeFactors which simply tells us how deep each sample was sequenced". So in my example, would I use 30M or 21M for this?

Here's the package in case you're curious: http://bioconductor.org/packages/release/bioc/vignettes/cqn/inst/doc/cqn.pdf

ADD REPLY
0
Entering edit mode

Those sizeFactors would only matter if there is a big difference between samles, say one sequenced to 20M reads and another to 80M reads. If the alignment fraction is similar for all - it again doesn't really matter.

ADD REPLY
0
Entering edit mode

Got it! My alignments range from around 69% to around 73%, so pretty similar? So for this specific example, I could set the sizeFactors to NULL and it would be fine? Thank you so much!!

ADD REPLY
0
Entering edit mode

Seems pretty similar indeed. I don't know about this particular package so I don't know about setting it to NULL, but setting the sizeFactor to the total number of reads (sequenced or aligned - whatever) might be the most accurate.

ADD REPLY
0
Entering edit mode

Since they removed genes with 0 counts in all samples they are only considering those that have mapped reads. As @Wouter points out if the numbers are unbalanced across the dataset, then you would need to account for that.

ADD REPLY
0
Entering edit mode

Ahhhh got it. So in my case, if mine are all around 69-73%, I could set sizeFactors to NULL and be okay? Thank you so much, this is so helpful.

ADD REPLY
4
Entering edit mode
6.4 years ago
Tao ▴ 530

I want to give you a very intuitive but maybe not very accurate explanation: You can imagine each base (A/G/C/T) is a grain of rice. Suppose you sequenced a big of rice (say 100M bases), and then fill them into a very narrow but very long rice box with width = 1 grain of rice. The rice height in the box would be the sequence depth. For WGS, the length would be length of whole genome. For RNAseq, people might only consider reads mapped onto exons, and also only consider the length as union of exon length.

ADD COMMENT
1
Entering edit mode
6.4 years ago
Charles Plessy ★ 2.9k

"how deep each sample was sequenced" means "how many reads (or pairs)" were sequenced for each sample. Some results will vary with the number of sequenced reads (the sequencing depth), therefore it is an important information.

ADD COMMENT

Login before adding your answer.

Traffic: 2034 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6