Question

Needed Read Length To Solve Dna Fragments Containing Alu Elements In Vertebrate Genomes?

1

Entering edit mode

11.1 years ago

14134125465346445 ★ 3.6k

What is the minimum read length that will uniquely resolve most DNA fragments that contains a repetitive region such as an Alu element in vertebrate genomes, like the human genome?

According to the Alu element page in wikipedia:

The Alu family is a family of repetitive elements in the human genome. Modern Alu elements are about 300 base pairs long and are therefore classified as short interspersed elements (SINEs) among the class of repetitive DNA elements.

Would a 300bp short-read be enough to uniquely map the reads containing Alus to their correct genomic location? One would think there is the need for even longer than 300bp reads to have enough flanking sequences at 5' and 3' of a 300bp Alu element to be able to uniquely place it in the genome. Does the ability of sequencing paired-end reads change that minimum read length at all?

Illumina's MiSeq is generating 2x250bp now and could reach 2x400bp soon. Ion Torrent's PGM is reaching 400bp for a fraction of the reads but doesn't seem to be able to do 400bp paired-end reads.

repeats human • 2.4k views

ADD COMMENT • link updated 6.6 years ago by Biostar 20 • written 11.1 years ago by 14134125465346445 ★ 3.6k

0

Entering edit mode

Just a note to add that it's not just length of the reads that matters, it's depth and quality. Since Alus from the same subfamily may differ by only one or a few basepairs, you have to be confident that you're getting reliable information at those positions

ADD REPLY • link 11.1 years ago by Chris Miller 22k

score 1 · Answer 1 · 2013-03-24

The question is a little bit ambiguous since any given ALU will, by definition map to multiple locations, so there is no correct location.

I am guessing that your question is more about the situation of having a DNA fragment that contains a repetitive region then how much of it needs to be unique to be able to correctly place it. That in turn now depends on just how repetitive the sequence is, but the length of the flanking bases shouldn't need to be that high - just a few bases would anchor it at the right location. Note that this would only anchor the borders, but most of the time that's what one needs, after all the middle is repetitive and thus known.

If you are interested in identifying DNA fragments that contain an entire ALU then all you need is paired end sequencing with insert sizes longer than the ALU.