Forum:The struggle between fastq and fastq.gz, compressed v/s uncompressed file formats
0
2
Entering edit mode
6.8 years ago

Hi All,

I am starting this discussion to know a general view point about the annoying struggle of compressing and decompressing fastq file (as all the analysis in NGS starts with this file). While it is understood that compression is important in order to save space, there are a couple of routine problems I face where a considerable amount of time is wasted in either compressing or decompressing fastq files.

Now, for basic analysis like trimming, cleaning, taking the fastq stats, tools can be classified into below categories:

  • tools which only work on compressed fastq files (.gz)
  • tools which only work on decompressed fastq files
  • tools which work on decompressed fastq files and themselves decompress files before analysing.
  • tools which work on both compressed and decompressed files (e.g trimmomatic, fastqc)

Isn't it there is a need of unanimous protocol/guidelines to design tools which work on compressed fastq files?

fastq • 9.3k views
ADD COMMENT
4
Entering edit mode

Isn't it there is a need of unanimous protocol/guidelines to design tools which work on compressed fastq files?

33% of bioinformatics is just dealing with tool-quirks. We would all like standards, but then you know what happens:

enter image description here

ADD REPLY
3
Entering edit mode

All tools that work on compressed file will decompress it (in memory) before analyzing it - the performance of that decompression may in turn also vary as well. One complication here is that one has to "pay" the cost of decompression each time a tool is run on a compressed data - this though may not a problem since the process may likely to be IO bound rather than CPU bound - though when running tools in highly parallel fashion this may change.

ADD REPLY
2
Entering edit mode

You should be working with pipes (if the tool accepts) or with "bash process substitution" like this:

$ tool_that_doesnt_support <(zcat myfasta.gz)
ADD REPLY
0
Entering edit mode

This is occasionally supported by pre-processing tools, so not that much helpful.

ADD REPLY
0
Entering edit mode

What do you mean by occasional support? <(zcat myfasta.gz) construct works like the unzipped content is coming from a normal file.

ADD REPLY
1
Entering edit mode

you can add to this list uBAM file, which should be more convenient than FASTQ file

ADD REPLY
1
Entering edit mode

Some example tools for every problem would give an idea how frequently people use that tool. For some basic operations unzipping | operation | zipping is the usual case. But having a tool working with compressed file is a good idea.

ADD REPLY

Login before adding your answer.

Traffic: 3269 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6