Question

Number of bases with a certain quality in FASTQ file

0

Entering edit mode

5.8 years ago

Denis ▴ 310

Hi, I'm wondering is there a convenient way to count quantity of bases in FASTQ file with Phred quality score equal or higher then 30? Thanks!

next-gen sequencing • 3.1k views

ADD COMMENT • link updated 5.8 years ago by h.mon 35k • written 5.8 years ago by Denis ▴ 310

0

Entering edit mode

FastQC provides a graphical quality per base position summary, and if you dig into its output files, you will find mean, median, lower quartile, upper quartile, 10th percentile and 90th percentile quality per base position.

Why do you need total count?

ADD REPLY • link 5.8 years ago by h.mon 35k

0

Entering edit mode

We often use the number of bases with quality score >=30 as a quick assay of data quality. Besides, our sequencing facilities commonly use that statistic.

ADD REPLY • link 5.8 years ago by Denis ▴ 310

1

Entering edit mode

5.8 years ago

h.mon 35k

reformat.sh from BBTools / BBMap package can also do this:

reformat.sh in=file.fq.gz qchist=file.qchist.txt

For paired end files

reformat.sh in1=file1.fq.gz in2=file2.fq.gz qchist=file.qchist.txt

ADD COMMENT • link 5.8 years ago by h.mon 35k

score 5 · Accepted Answer · 2018-07-18

5

Entering edit mode

5.8 years ago

Pierre Lindenbaum 161k

 gunzip -c  input.fastq.gz | paste - - - - | cut -f 4 | fold -w 1 | awk '($1>="?")' | wc -l

gunzip -c input.fastq.gz decompress input
paste - - - - linearize 4 lines
cut -f 4 get the quality column
fold -w 1 fold to one column
awk '($1>="?")' select strings greater than "?" (ASCII code=63 = 33(base fastq)+30.
wc -l count

ADD COMMENT • link 5.8 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thanks! But from where awk may know that for example "=" is less than "?" ? How can it compare just two characters?

ADD REPLY • link 5.8 years ago by Denis ▴ 310

1

Entering edit mode

Based on ASCII code decimal values. It is comparing value contained in $1 (which would be individual Q score) for being more than or equal to to ?.

ADD REPLY • link 5.8 years ago by GenoMax 141k

0

Entering edit mode

Hi genomax! Thanks a lot! Now it's clear.

ADD REPLY • link 5.8 years ago by Denis ▴ 310

0

Entering edit mode

Should i add something like this tr -d '\n' somewhere in the pipe to remove new line characters? echo \n | awk '($0>="?")' results in n

ADD REPLY • link 5.8 years ago by Denis ▴ 310