Question

high duplication levels in R1 reads of 10x Genomics samples

2

Entering edit mode

5.2 years ago

Assa Yeroslaviz ★ 1.8k

Hi, I'm not sure if there is a reason to worry, but i would like to try and understand the problem.

we have a 10x Genomics run sequenced in a nextSeq 500. I know that during the sequencing there can be duplications, but for the first time, we're now seeing that in the R1 reads, so basically, where the 10x-barcode and the UNI-barcode are sequenced.

in the run we have almost 40 samples, but not all of them are showing this behavior. I'm attaching the images of the two samples. R2 duplication is normal, and what it always looks like, but R1 is strange.

Has anyone seen this before and maybe has an explanation for that?

thanks

R1

R2

10x duplication RNA-Seq fastqc • 3.1k views

ADD COMMENT • link updated 5.2 years ago by i.sudbery 19k • written 5.2 years ago by Assa Yeroslaviz ★ 1.8k

score 2 · Answer 1 · 2019-03-01

It's perfect possible to have duplicate R1s without duplicate R2s in 10X: The 10X protocol attaches the cell barcode and UMI when when cDNA frist strand synthesis is done so each molecule gets a unique CB/UMI combination. There is then an amplification step, followed by fragmentation. There is then another round of amplification .Because fragmentation follows amplification you can have two copies of the same RNA molecule that have been fragmented at different position, and those will have different R2 sequences, but the same R1 sequence.

Or more concisely duplicates in R2 come only from the second amplification step, while those in R1 come from both the first and the second step.

This is also why you cannot use the mapping co-ordinates of read2 to do deduplication, but must only use the identity of the gene the read maps to.