Converting Illumina fastq quality scores to phred
3
0
Entering edit mode
7.4 years ago
L. A. Liggett ▴ 120

I have been trying to understand how to calculate probabilities of correct calls in my Illumina dna sequencing results coming off of both the nextseq and the hiseq 4000. My understanding is that these can use different formats like illumina 1.7 or 1.8 and that the quality scores will translate to a phred score differently.

So, is there a simple ascii conversion that can be used to convert the illumina quality scores into a phred score which can then be used with a formula like Q=-10log10P to compute the probability that the base call is correct?

sequencing • 12k views
ADD COMMENT
0
Entering edit mode

Just so you know, you still have to know which version was used, else you'll be doing +64 to everything not +33. Perhaps the nextseq/hiseq 4000 are such new machines they never even came with anything <1.8

ADD REPLY
1
Entering edit mode

They are - Illumina's software for those platforms is ASCII-33 exclusively.

ADD REPLY
0
Entering edit mode

Ah awesome :)

ADD REPLY
3
Entering edit mode
7.4 years ago
L. A. Liggett ▴ 120

I came across the answer to this in the biostar handbook, it is super easy to calculate using the following code:

python -c 'print ord("A")-33'

And this can be easily converted to the probability of correct call as well:

python -c 'from math import*; print 10**-((ord("A")-33)/10.0)'

ADD COMMENT
0
Entering edit mode

That second line shouldn't be importing math for no reason - i will fix it

ADD REPLY
1
Entering edit mode
7.4 years ago

See the wikipedia page, noting that everything is phred+33 these days.

ADD COMMENT
0
Entering edit mode

I actually read through this and looked up an ASCII conversion table, but I still don't understand how to do the conversion. Could you explain?

ADD REPLY
1
Entering edit mode

Subtract 33 from each of the quality scores to get the Phred score. For example, 'A' is ASCII letter 65. 'A'-33 = 65-33 = 32. So a quality score of 'A' means Phred 32, or slightly better than 99.9%.

ADD REPLY
0
Entering edit mode
4.2 years ago
zubenel ▴ 120

Assuming phred+33 another easy solution is to use Perl oneliner like this:

perl -E 'say ord("A")-33'

ADD COMMENT

Login before adding your answer.

Traffic: 1695 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6