Biostar Beta. Not for public use.
Bbduk filters away good reads
0
Entering edit mode
15 months ago
dt • 30

Hi,

I seem to be missing something obvious, but I have a read (actually lots of reads, it's just an example) that shouldn't be filtered by bbduk.sh based on average quality, but it is.

Read:

@NZ_AP014881.1_0_0/2
TGCAGCATTCTCCTGATGGCGGTCTTGATGAAGAGCTTTGTTACGGGGGTCATCCTCATCCATCAGGTCTGGTGCAGAAATAAAGCGCAAAGGCTTGGGGTTACCGCCTGCGCGCATGTCTGCCAGCATATCCTCTAGCGCGGCAGGCTCTGGGCAATTAATCTCAATCTGCTGACGGTCAGACTTTGGCAAATTGAGCAGGCGGTTGCGGGCCGACGTATCCAAAAGGCGATTGCACCAGCGCTGAACACGATATCCCGGACGATCTGGTAGCTGTTCTTCTTCCAGTTCCTCACGCAGA
+
<CCCCEGG6GGGGF,GEGGGC,FFGFFFCEGGCGGGGG@GGFCGGC<FCGGGGEFGFGGF@GCG,GGG@GGGGGG<GEGFGFGGGEGGFAEEC<GGGGFFFEECD<EFGGF8<G5GGGCDBGFFGEG<GCGFECG+FCGF,CC=,*DF5=,9<7C4E:EF,,=B@CGF>:GGECGG;;8G>C1,:,C:+E,9,FF<@,*6:,793G,4+13G*7*;3*=@6C7/85+C59C+<>***2C)*1*/**))<+<2)+**)4/)A)>)1+2)**51065.:091>1***0*)*).0+*(*2*90.

Command:

bbduk.sh in1=test.fq out1=test_out.fq maq=20

Bbduk version is Version 38.46. Average quality of the read seems to be ~27. Hope someone can help, thanks in advance.

ADD COMMENTlink
0
Entering edit mode

could it be that the reads underwent some trimming causing it to fail under the maq threshold?

minavgquality=0 (maq) Reads with average quality (after trimming) below this will be discarded.

ADD REPLYlink
0
Entering edit mode

Not really, I provided the exact read and the exact command to reproduce the problem. Can you reproduce it?

ADD REPLYlink
3
Entering edit mode
15 months ago
dt • 30

Ok, I believe I figured it out, it's explained in https://github.com/wdecoster/NanoPlot/issues/57:

BBDuk calculates average quality score by converting to probability scale, taking an average, and then converting back to Phred scale. So for example, a 2bp read with quality scores 10 and 20 would yield an average quality of (0.9+0.99)/2=0.945 -> Q12.6 rather than Q15 with a linear average.

Essentially it means that, looking e.g. at the seqtk fqchk output, bbduk uses the value calculated in the errQ field rather than the avgQ field. I believe, this can be confusing and should be mentioned in the Bbduk documentation. If someone knows the developer, maybe you can let him know? Thanks.

ADD COMMENTlink
3
Entering edit mode

I believe, this can be confusing

This is not confusing at all. Please make yourself familiar with the mathematics of Q values.

Q values represent a logarithmic transformation of the error rates. Logarithmic transformations are often used in science and engineering, but have some pit falls, especially when it comes to calculate the "mean". Why do you want to calculate the arithmetic mean of Q values? The arithmetic mean of Q values is equivalent to the geometric mean of the error rates, which is most likely not what you want.

ADD REPLYlink
0
Entering edit mode

nice, and that indeed probably explains iit.

for suggestions and comments on the BBTools package you can find a link on their webpage: https://jgi.doe.gov/data-and-tools/bbtools/bbtools-faq-support-forums/

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3