Question

UMI read consensus calling

0

Entering edit mode

4.2 years ago

SemiQuant ▴ 80

Hi

I have amplicon sequence data where I included a UMI (unique molecular identifier) on my reads to allow me to correct sequencing errors. I have removed the UMIs from the reads and added them to a tag in the fastq files. I have then aligned the reads to my reference and would now like to make consensus reads for those with the same UMI, i.e., that arose from the same DNA molecule. The sequence data is very noisy and there are many indels in the reads.

I have tried using fgbio but this cannot handle indels. I have also tried gencore, which is for pair-end read data but should work using the UMIs for single reads, however, it did nothing to the data, even when running on the least stringent setting possible. Does anyone know of a tool that can do what I need?

next-gen umi unique molecular identifier nanopore • 2.6k views

ADD COMMENT • link updated 4.2 years ago by i.sudbery 19k • written 4.2 years ago by SemiQuant ▴ 80

score 0 · Answer 1 · 2020-02-01

0

Entering edit mode

4.2 years ago

i.sudbery 19k

You might try looking at Calib (https://academic.oup.com/bioinformatics/article-abstract/35/11/1829/5142725). We have applied for funding to employ someone to implement this in UMI-tools, but as of now it is not implemented.

ADD COMMENT • link 4.2 years ago by i.sudbery 19k

0

Entering edit mode

Thanks, but Cablib can only deal with pair-end reads (I wasn't clear in my initial question) so I cant use it without a lot of customization (or maybe I can filter by length and then just split the fatqs?). I hope you get the funding to implement it in UMI-tools; I'm sure it would be used a lot with the increase in nanopore assays!

ADD REPLY • link 4.2 years ago by SemiQuant ▴ 80