Question

How many reads should I expect for paired end reads when coverage = 30 million?

3

Entering edit mode

8.5 years ago

Kristin Muench ▴ 620

Hello,

My lab ordered paired end sequencing, and we received a reported coverage of 30 million reads per sample.

Just to confirm - this means that there are 30 million reads across both directions? So, 15 mil per end in the paired end, so after alignment with TopHat2/counting with htseq-count, I should expect there to be about 15 million reads (i.e., read-pairs) for each sample?

Or should I expect to see 30 million reads, representing 30 million pairs/60 million total ends?

Thank you for the sanity check!

RNA-Seq • 8.0k views

ADD COMMENT • link updated 5.6 years ago by Biostar 20 • written 8.5 years ago by Kristin Muench ▴ 620

0

Entering edit mode

Coverage usually has a different meaning.

ADD REPLY • link 8.5 years ago by h.mon 35k

0

Entering edit mode

Oh, excuse me! I meant that the total number of sequence reads = 30 million. It was unclear from the sequencing company if this meant 30 mil per direction, or 30 mil altogether.

ADD REPLY • link 8.5 years ago by Kristin Muench ▴ 620

0

Entering edit mode

It's a really important question to ask up front when you get contract sequencing done. "Is that reads, or read pairs?" - as obviously the latter is half the former.

ADD REPLY • link 8.5 years ago by User 59 13k

0

Entering edit mode

It's most likely 15M per end, which is on the low end. As reads can be of varying lengths, I prefer to measure and be quoted by G of bases.

ADD REPLY • link 5.7 years ago by Eric Lim ★ 2.1k

score 3 · Answer 1 · 2015-10-22

3

Entering edit mode

8.5 years ago

Antonio R. Franco ★ 5.1k

It should be 15 million per end

ADD COMMENT • link 8.5 years ago by Antonio R. Franco ★ 5.1k

6

Entering edit mode

As this post is warmed up in 2018, I strongly argue against the word should in this context. Rather than that, call the facility and ask, making sure that everyone is on the same page. I have witnessed so much confusion, even within our group where we typically know the vocabulary of each other, when talking about reads, coverage, depth, read number vs. fragment number etc.

ADD REPLY • link 5.7 years ago by ATpoint 81k

1

Entering edit mode

Thanks! Actually, since we have a bit of lived experience since this post was first made, I can share our experience: indeed, there was a miscommunication with the facility - what we meant was thirty million reads for analysis, but sixty million total/paired end reads. We ended up with thirty million total, and fifteen million functional coverage. We later re-sequenced the data at the appropriate depth and the data made so, so much more sense. So - two votes for calling your facility and making sure everyone is on the same page!

ADD REPLY • link 5.7 years ago by Kristin Muench ▴ 620

score 2 · Answer 2 · 2018-08-20

2

Entering edit mode

5.7 years ago

grant.hovhannisyan ★ 2.6k

The post is quite old, but I see some confusion here. The read numbers might be different from facility to facility. For example, here at CRG, if you order 30mln paired-end reads, you get 30 mln per each mate. And I think this approach makes more sense (especially in case of RNAseq) since paired end sequencing is performed by sequencing the same fragment, but from both ends, which doesn't add up to expression levels.

ADD COMMENT • link 5.7 years ago by grant.hovhannisyan ★ 2.6k

0

Entering edit mode

Indeed, that was the case! (see above) Fortunately we were able to re-sequence this dataset at an appropriate depth.