Ngs: How Many Reads Should Be Expected To Map To The Reference Sequence
3
3
Entering edit mode
14.0 years ago
Allpowerde ★ 1.3k

I was wondering how many reads can usually be made to map to the reference sequence? For next-gen sequencing analysis, choosing the right parameters (for QC and mapping) is quite empirical and the choice hugely influences the outcome.

I try to map 1 Million paired-end reads from illumina to a certain region in the genome. I used fastx-toolkit for QC and mapped with SOAP2. After QC I have about 140 000 reads left of which only 200 pairs map to the reference sequence. This can not be a normal outcome, can it?

(I know that I'm trying to align to the right reference sequence because a quick de-novo assembly of these reads maps exclusively to the region's locus when doing a blat search of the assembled sequence against the genome)

Any suggestions (e.g. should I play around with the mapping parameters to get more to map) ?

next-gen-sequencing short-read-aligner • 4.3k views
ADD COMMENT
3
Entering edit mode
14.0 years ago

Seeing so few reads match indicate problems that may not be solvable via parameter tuning.

Possible explanations include:

  1. The reads also include extra information(for example indices used for multiplexing or other adapters at the beginning or end)
  2. The reference genome is incorrect - although you suggest that you checked that - you could have just found a homologous region
  3. The software or its installation is failing - try a different aligner and don't filter for quality first, see what happens. Same for the filtering.
  4. The control lane was not set properly or the sequences do not have random base distribution over each index. The illumina basecaller has this requirement to function properly.
  5. The library preparation or sequencing has failed (reagents etc) - that is also an option but I would keep it for last

We usually see between 20% to 60% reads match to the genome.

ADD COMMENT
0
Entering edit mode
12.2 years ago

I think the most common problem is you have phred-64 quals and didn't tell your aligner.

If that's not it please show us one read you think should have mapped but didn't.

ADD COMMENT
0
Entering edit mode
12.2 years ago
Darked89 4.6k

There is something wrong if your QC rejects 86% of reads in the first place. For mapping quality filtering does not make much sense, since you can map 96bp Illumina reads with string of Bs as quality values with 0-2 mismatches. Check how your reads look in FastQC quality-wise.

Also SOAP2 is not that great at mapping compared to other programs. Check if BWA gives you similar figures.

ADD COMMENT

Login before adding your answer.

Traffic: 1993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6