Nanopore reads of simple repeats
1
0
Entering edit mode
5.0 years ago

I'm aware that nanopore has a problem accurately sequencing simple repeats (eg. atatatata). But what about tetra-, penta-, hexa-nucleotide repeats? At what repeat length is nanopore (sufficiently) accurate?

nanopore • 1.8k views
ADD COMMENT
2
Entering edit mode
5.0 years ago

As far as I know (citation needed) nanopore has (only) a problem with homopolymers, less so with anything with a larger repeat motif.

ADD COMMENT
0
Entering edit mode

I also thought it was only homopolymers that's a problem in nanopore reads. But from doing structural variant calling, I discovered a huge number of calls in simple repeat regions which I was suspicious of so I dug a bit deeper. Then I also read this in the PacBio whitepaper on human structural variations

The Oxford Nanopore Technologies (ONT) MinION produces continuous reads that span several kilobases, but due to systematic errors the technology is currently unable to capture simple repeats reliably and suffers from an extremely high false positive rate for deletion calling.

Edit: I've never heard anything about repeats being a problem from any ONT talks I heard or papers I've read, so this is all news to me.

ADD REPLY
1
Entering edit mode

Homopolymers are definitely a known issue (the current signal doesn't change in the pore, so can't easily figure out how many nucleotides are seen). I wouldn't really trust a PacBio whitepaper on ONT performance :-)

That said, it is also a known feature that repeats are highly polymorphic. Some of the calls in those repeat regions might as well be real.

ADD REPLY
0
Entering edit mode

So there's clearly a conflict of interest when PacBio says ONT sucks! But I doubt they would outright lie (??)

Yes, repeats being highly polymorphic is a valid point and I did think about that, but the problem with my alignments is that they are all of different lengths. If they come from a clonal population, we should expect them to be of (somewhat) equal length?

See example: https://imgur.com/a/qHMR130

Edit: This is the NA12878 dataset from Miten Jain paper.

ADD REPLY
1
Entering edit mode

That looks roughly how I would expect it. Note that part of the noisiness is also because of the aligner shifting the deletion a bit more to the left or to the right.

I think things are slightly better using more recent base callers, i.e. Guppy Flipflop (suspect your screenshot is from an older guppy). If you give me the coordinates and the genome build of that repeat I can take a look in my data and show you a screenshot.

For what it's worth, in the context of SV calling I would remove everything that is smaller than 30 bp. Small indels isn't exactly what I'd use long read sequencing for.

ADD REPLY
0
Entering edit mode

Yeah, this was basecalled using Albacore from way back in 2018. The shifting left or right would make sense if they were perfect repeats, but there's some slight variations in the repeat motifs so if basecalling was not a problem, the breakpoints should align almost exactly like so: https://imgur.com/a/9mRt94E

I think it means tatataaataaa may be mistaken for tataataataa These reads are long enough to be anchored on both sides to non-repeat regions so this isn't an issue of the aligner not knowing where to place the reads either.

I'd truly appreciate it if you could! hg38 chr5:52,029,030-52,029,135

ADD REPLY
1
Entering edit mode

This is how it looks like in our NA19240 PromethION data (guppy flipflop): https://imgur.com/ck60LZS (which is available https://www.ebi.ac.uk/ena/data/view/SAMEA5418551 )

ADD REPLY
1
Entering edit mode

That does look better. So it seems repeats shouldn't be a problem with the new basecaller.

Thanks Wouter, you truly are the patron saint of Nanopore! I'm a big fan of NanoPlot by the way.

ADD REPLY

Login before adding your answer.

Traffic: 1468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6