Biostar Beta. Not for public use.
Guessing The Quality Scale In Fastq Files
7
Entering edit mode
12 months ago
Manuel • 370
Germany

Is there an easy way to guess the scale, given a sufficiently large FASTQ file?

The best would be some working code that I could learn from. However, both BioPerl and BioPython appear not to contain guessing code.

fastq quality • 8.8k views
ADD COMMENTlink
4
Entering edit mode
12 months ago
brentp 23k
Salt Lake City, UT

You read the biopython code here? That's the best explanation of the quality scores I've seen.

There's also a nice text-graphic about 2/3rd's of the way down the wikipedia page

Finally, FastQC guesses the encoding of your quality scores, so you could look at the java code.

ADD COMMENTlink
0
Entering edit mode

Thanks, BioPython does not hav guessing code, though, right? FastQC just looks at the lowest seen quality. I guess that's most promising, then, maybe augmented by checking an upper limit, too.

ADD REPLYlink
3
Entering edit mode
2.9 years ago
Stockholm

Here is a Perl script for guessing the quality scale

https://www.uppnex.uu.se/content/check-fastq-quality-score-format

ADD COMMENTlink
1
Entering edit mode

Here is the new link for this Perl tool : http://www.uppmax.uu.se/userscript/check-fastq-quality-score-format

It has been improved recently.

-- update -- You can find it in this repository, under this name fastq_guessMyFormat.pl: https://github.com/NBISweden/GAAS/tree/master/annotation/Tools/Util [Here is a link to download it directly.][1] [1]: https://minhaskamal.github.io/DownGit/#/home?url=https://github.com/NBISweden/GAAS/tree/master/annotation/Tools/Util/fastq_guessMyFormat.pl

ADD REPLYlink
0
Entering edit mode

link is meanwhile broken also.

ADD REPLYlink
0
Entering edit mode

Thanks,

Updated now

ADD REPLYlink
2
Entering edit mode
11 months ago
France/Nantes/Institut du Thorax - INSE…

Does the FAST-X toolkit answer your needs ? http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastq_quality_boxplot_usage

ADD COMMENTlink
1
Entering edit mode

Hm, I would like to do this programatically. I think something like the FastQC guesser looks more promising. Thanks, though.

ADD REPLYlink
2
Entering edit mode
18 months ago
Ryan Thompson ♦ 3.4k
TSRI, La Jolla, CA

I wrote a Python-based FASTQ quality guesser: https://github.com/DarwinAwardWinner/fastqident It uses BioPython's FASTQ parser, so it will work on anything that is parsable by BioPython.

ADD COMMENTlink
0
Entering edit mode

i am getting 404'd

ADD REPLYlink
0
Entering edit mode

Looks good, but it doesn't install correctly. The module "placsupport" cannot be found in PyPI.

ADD REPLYlink
0
Entering edit mode

The placsupport module can be found at https://github.com/DarwinAwardWinner/placsupport

ADD REPLYlink
2
Entering edit mode
6.0 years ago
Marvin • 850

Isn't that solving the wrong problem? The guessing code in FastQC looks fragile, it simply looks at the smallest code used for qualities, so it depends on actually seeing low quality bases.

I believe you should get the correct encoding from extra knowledge (i.e. knowing which version of which program generated the file, say from some log file), and then convert to a well specified format (e.g. BAM) _once_. Please don't perpetuate the practive of guessing at the details underspecified formats.

ADD COMMENTlink
0
Entering edit mode
6.0 years ago
Sequencegeek • 740
UCLA

In addition to Ryan, I have a python based fastq quality guesser as well if you would like to use it. It is just standard python (no biopython). PM if interested.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1