What Is The Default Quality Encoding Expected By Bwa?
2
3
Entering edit mode
12.0 years ago
Panos ★ 1.8k

What is the quality encoding in the input reads that BWA expects as default? Is it Sanger, Solexa, Illumina 1.3+, Illumina 1.5+ or Illumina 1.8+ (as per the section "Encoding" found in this Wikipedia article). Also, is it true that BWA doesn't really use the quality values for finding matches? What is the usefullness then, of the "-I" parameter in bwa aln? How are the quality values used by BWA?

What if I have reads generated by the new Illumina 1.8 pipeline? Should I somehow convert qualities before feeding them to BWA? I'm asking because I saw that quality range in 1.8 differs significantly compared to both 1.3 and 1.5.

bwa illumina • 5.2k views
ADD COMMENT
2
Entering edit mode
12.0 years ago

Every tool has standardized on the Sanger encoding.

That being said the quality scores are extremely rough estimates that do not really reflect the actual probabilities that they supposedly stand for. In that light whether or not they are off a bit does not really matter. As you note most tools do not make use of the quality scores during alignments, thankfully so since that might lead to a lot of confusion and would interfere with interpreting the alignments.

The only potential problem that you might run into is that some tools cannot deal with the variable ranges.

ADD COMMENT
1
Entering edit mode
10.4 years ago

I am also having the same problem with all the messy Illumina formats. In summary, I think that:

  • bwa by default expects the sanger format
  • the -I option is needed to read the Illumina 1.3 to 1.6 formats.
  • the Illumina 1.8 format is similar to the sanger, so you don't need the -I option for that.

I've updated the Fastq wikipedia page with some sed scripts to convert Illumina 1.8 to 1.3 and vice-versa, but in principle you don't need to use them.

What happens if you run bwa aln on a Illumina 1.8 dataset, using the -I option? Unfortunately I don't know yet, but I think you will need to run the bwa aln again.

ADD COMMENT
0
Entering edit mode

For what happens if you incorrectly set -I, see Seeing unexpected characters (^D,^Q) in the QUAL field of a SAM file

ADD REPLY
0
Entering edit mode

hehe, nice pointer, we have an answer for everything

ADD REPLY

Login before adding your answer.

Traffic: 2212 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6