Biostar Beta. Not for public use.
Question: Duplicate reads in RNA-seq
Entering edit mode

Hi everyone,

I have some paired end RNA-seq samples that have high levels of duplication (some as high as only 6% remaining after de-duplication). I think it was due to low concentration of input RNA (~1ng), and smaller subset of genes being expressed (because the RNA is from a specific cell type isolated from brain). Even after a poly-A selection, the highest gene expressed in my samples was a ribosomal RNA transcript.

I used Picard's MarkDuplicates to remove duplicated reads from my samples and looked at how that affected counting. I was happy to see that the counts for the rRNA gene were greatly reduced, but it also seems that the counts for almost every single gene are reduced. I thought that only high expressing genes would have duplicate reads. I also did a correlation analysis between the regular samples and the de-duplicated samples and saw that there was excellent correlation between them, but I'm just confused now.

If basically every gene has duplicates, what does it mean? Should I only use de-duplicated samples for further analysis? I know there are lots of other threads on this issue but it seems like my duplication is more severe.

ADD COMMENTlink 3.6 years ago mmrcksn • 50 • updated 3.6 years ago igor 7.7k
Entering edit mode

Someone with better experimental chops will need to confirm but perhaps extra cycles of amplifications caused this problem?

If you feel that the experiment did not work as intended then perhaps it is time to consider redoing (at least the library part) (that is easy for someone like me to say, so apologies in advance, if this is an irreplaceable sample/difficult experiment).

ADD REPLYlink 3.6 years ago
Entering edit mode

You definitely have more duplicates than usual. If you started with little RNA, then you must have amplified a lot, so it makes sense that you have a lot of duplicates. They would be found in all genes, since you are amplifying all genes. Thus, all genes would have fewer counts after duplicate removal.

See previous extensive discussion on the topic here: How detrimental are duplicate reads in RNAseq experiments?

ADD COMMENTlink 3.6 years ago igor 7.7k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0