Question

What exactly is sequencing depth in RNAseq?

2

Entering edit mode

6.4 years ago

carmacae ▴ 20

I am new to learning about RNAseq analysis and am confused as to what exactly sequencing depth refers to. For example, if I need to calculate "how deep each sample was sequenced" does this refer to the total number of paired end reads that came out of the sequencer OR the number of paired end reads that mapped?

depth RNA-Seq • 16k views

ADD COMMENT • link updated 6.4 years ago by Charles Plessy ★ 2.9k • written 6.4 years ago by carmacae ▴ 20

0

Entering edit mode

Hi Carmacae,

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

Cheers,
Wouter

ADD REPLY • link 6.4 years ago by WouterDeCoster 47k

score 7 · Answer 1 · 2017-11-09

7

Entering edit mode

6.4 years ago

WouterDeCoster 47k

Depth is commonly a term used for genome or exome sequencing and means the number of reads covering each position. But that is for RNA-seq totally pointless since the coverage pattern is so uneven due to differences in expression.

More commonly, in RNA-seq the term "number of reads" is used, for example,10 million reads or 100 million reads. If all goes well a high percentage of the reads should align, ~90%. That makes the number of reads out of the sequencer not so different from the number of mapped reads. I would report the number of reads sequenced unless it's very different from what is aligned. But then you should also figure out why it's such a big difference.

ADD COMMENT • link 6.4 years ago by WouterDeCoster 47k

0

Entering edit mode

Thank you for the clear explanation! I was wondering because I've got some data to play around with in which most of the samples have around a 70% alignment rate (not too surprising as the genome quality isn't very good). In this case, say I have 30 million total reads of which 70% mapped... in this case my number of reads would be 21 million?

ADD REPLY • link 6.4 years ago by carmacae ▴ 20

0

Entering edit mode

Number of reads is still 30M out of which 70% mapped. Why the rest did not is something you could investigate. They could be rRNA, contamination or just plain who knows what (though that fraction should generally be very small).

ADD REPLY • link 6.4 years ago by GenoMax 141k

0

Entering edit mode

Right. I'm asking though because I'm playing around with different packages, mostly just to learn, and for one (CQN package) it specifically asks for a vector containing "... the sizeFactors which simply tells us how deep each sample was sequenced". So in my example, would I use 30M or 21M for this?

Here's the package in case you're curious: http://bioconductor.org/packages/release/bioc/vignettes/cqn/inst/doc/cqn.pdf

ADD REPLY • link 6.4 years ago by carmacae ▴ 20

0

Entering edit mode

Those sizeFactors would only matter if there is a big difference between samles, say one sequenced to 20M reads and another to 80M reads. If the alignment fraction is similar for all - it again doesn't really matter.

ADD REPLY • link 6.4 years ago by WouterDeCoster 47k

0

Entering edit mode

Got it! My alignments range from around 69% to around 73%, so pretty similar? So for this specific example, I could set the sizeFactors to NULL and it would be fine? Thank you so much!!

ADD REPLY • link 6.4 years ago by carmacae ▴ 20

0

Entering edit mode

Seems pretty similar indeed. I don't know about this particular package so I don't know about setting it to NULL, but setting the sizeFactor to the total number of reads (sequenced or aligned - whatever) might be the most accurate.

ADD REPLY • link 6.4 years ago by WouterDeCoster 47k

0

Entering edit mode

Since they removed genes with 0 counts in all samples they are only considering those that have mapped reads. As @Wouter points out if the numbers are unbalanced across the dataset, then you would need to account for that.

ADD REPLY • link 6.4 years ago by GenoMax 141k

0

Entering edit mode

Ahhhh got it. So in my case, if mine are all around 69-73%, I could set sizeFactors to NULL and be okay? Thank you so much, this is so helpful.

ADD REPLY • link 6.4 years ago by carmacae ▴ 20

score 4 · Answer 2 · 2017-11-09

I want to give you a very intuitive but maybe not very accurate explanation: You can imagine each base (A/G/C/T) is a grain of rice. Suppose you sequenced a big of rice (say 100M bases), and then fill them into a very narrow but very long rice box with width = 1 grain of rice. The rice height in the box would be the sequence depth. For WGS, the length would be length of whole genome. For RNAseq, people might only consider reads mapped onto exons, and also only consider the length as union of exon length.

score 1 · Answer 3 · 2017-11-09

1

Entering edit mode

6.4 years ago

Charles Plessy ★ 2.9k

"how deep each sample was sequenced" means "how many reads (or pairs)" were sequenced for each sample. Some results will vary with the number of sequenced reads (the sequencing depth), therefore it is an important information.

ADD COMMENT • link 6.4 years ago by Charles Plessy ★ 2.9k