Question

Transferring UMI from paired-end read 2 to header of read 1

0

Entering edit mode

3.5 years ago

robert.gnuegge • 0

I would like to deduplicate reads using UMIs. I have paired-end data where read 1 contains the mappable sequence, while read 2 only contains a 12 bp UMI sequence. Is it possible to extract the UMIs from read 2 and add them to the header of read 1 (for deduplication after mapping of read 1)? UMI-tools seems to be great for UMI extraction and deduplication, but I didn't find the option of transferring the extracted UMI from read 2 to the header of read 1. Is there a work-around within UMI-tools or are there alternative tools to conveniently achieve this?

deduplication next-gen umi umitoools • 3.5k views

ADD COMMENT • link updated 3.5 years ago by i.sudbery 19k • written 3.5 years ago by robert.gnuegge • 0

score 1 · Accepted Answer · 2020-11-09

1

Entering edit mode

3.5 years ago

i.sudbery 19k

Its easy to do this in UMI-tools. Simple pass read2 to UMI-tools as its primary input and tell it to add the UMI sequence to read1 as a secondary input:

umi_tools extract -I read2.fastq.gz -S read2_processed.fastq.gz --read2-in=read1.fastq.gz --read2-out=read1_processed.fastq.gz --bc-pattern=NNNNNNNNNNNN

ADD COMMENT • link 3.5 years ago by i.sudbery 19k

0

Entering edit mode

Thanks for the quick answer. So, the UMIs from read 2 would then be added to the headers of both read 1 and read 2? Is that the default behavior of UMI-tools extract for paired-end reads with a UMI in just one of the reads?

ADD REPLY • link 3.5 years ago by robert.gnuegge • 0

1

Entering edit mode

I'll be on the read hearders of both reads. For some historic reasons, -S is short for standard out. The input from -I, with UMI trimeed and added to the read header is output there. Of course, because the read contains only the UMI, the reads will actaully be empty. The output from --read2-out will be the reads from --read2-in with the same UMI attached to the header. This is the default behavoir of extract with paired data.

ADD REPLY • link 3.5 years ago by i.sudbery 19k

0

Entering edit mode

No only to read 1based on --read2-out being the only output. You won't have a need for Read 2 at that point since it only contains UMI?

ADD REPLY • link 3.5 years ago by GenoMax 142k

0

Entering edit mode

Yes, I won't need read 2 after UMI extraction. I'm just not sure if I understand where the extracted UMIs will end up. With -I read2.fastq.gz -S read2_processed.fastq.gz --bc-pattern=NNNNNNNNNNNN the UMIs should be added to the read headers in read2_processed.fastq.gz, right? But will the same UMIs be added to the corresponding read headers in read1_processed.fastq.gz, too?

ADD REPLY • link 3.5 years ago by robert.gnuegge • 0

1

Entering edit mode

Considering @Ian Sudbery is author of umi-tools his advice is going to be accurate. Why don't you try the command out and see. I don't have an install of umi-tools to see what the -S option does.

ADD REPLY • link 3.5 years ago by GenoMax 142k

0

Entering edit mode

Why don't you try the command out and see.

Good point! I'm still in the library preparation process, but already want to figure out how to analyze the anticipated data. So, I created a small toy data set and Ian's suggested command works beautifully. The UMI tags are added to the read headers of both read 1 and read 2. Thank you so much for your answers and suggestions.

ADD REPLY • link 3.5 years ago by robert.gnuegge • 0