Forum:Has anyone tried the new LFQC compression scheme for FASTQ?
1
2
Entering edit mode
8.4 years ago
Dan D 7.4k

Has anyone used the tool described in this paper for compressing FASTQ data? I'm going to evaluate it when I get time, but I wanted to see if anyone has intel on it ahead of that. I'll report back here after giving it my assessment.

fastq compression • 3.0k views
ADD COMMENT
4
Entering edit mode

This paper has been temporarily withdrawn by the authors.

ADD REPLY
3
Entering edit mode

Just for reference, the journal website says: "This manuscript has been temporarily withdrawn at the request of the authors. The authors report that they have identified an error in the software. This withdrawal is to provide the authors with an opportunity to determine to what extent the reported results are affected by this error."

ADD REPLY
1
Entering edit mode

It would have been nice of the Journal to update the HTML article with the same notice.

ADD REPLY
1
Entering edit mode

Frankly this sounds like a serious problem beyond a bug of the method. Some methodological error on the evaluation: for example the files were actually larger and slower than before and they switched up the comparison. That can happen easily.

See edit: Then the fact that in 2015 the Bioinformatics journal publishes a software that is "available" from someone's webpage is saddening.

--- Edit ---

Actually scratch that (kind of) there is a github repo here:

https://github.com/mariusmni/lfqc

still at the time of publication there was no repository.

ADD REPLY
0
Entering edit mode

Very useful info, thanks!

ADD REPLY
1
Entering edit mode

If you haven't seen it (I hadn't until a few days ago, coincidentally), there was a compression challenge recently that evaluated a few related tools: http://www.pistoiaalliance.org/projects/sequence-squeeze/

An article describing the results is here, in case you wanted to investigate alternatives to lfqc.

ADD REPLY
0
Entering edit mode

Anyone familiar enough with Ruby out there who can comment on the code? It looks like it might be a wrapper around Mahoney's lpaq and zpaq tools, but I'm not certain. I can't get behind a paywall to read a preprint so maybe it is described in the paper?

ADD REPLY
1
Entering edit mode

I am not familiar with Ruby, but the code is easy to understand. I may be wrong, but I think the core algorithm is on line 15:

def initialize(filePath, storeQualNoEOL=false, nameCompMethod = $cm['zpaq'], dataCompMethod = $cm['lpaq'], qualCompMethod = $cm['zpaq'])

and line 99:

cmd = "tar cf #{archive} #{zName} #{zData} #{zQual}"

edit: the github code has an Apache license, but it is surprising the manuscript states "the implementations are freely available for non-commercial purposes", which seems to me incompatible with both zpaq and lpaq licenses.

ADD REPLY
0
Entering edit mode

The tar step is just packaging, not compression. It looks like it is a wrapper to Matt Mahoney's compression tools, applied on different pieces of a FASTQ record.

ADD REPLY
1
Entering edit mode

Yes, the tar only bundles together files, but it is important because it keeps things tidy. :-)

I glanced over the paper, for sequence and quality, it is practically just a wrapper for lpaq8 and zpaq, respectively - there is some processing (encoding # runs as a bit flag and removing newlines), but nothing original.

The header line is "tokenized" (split into bits), tokens are compressed (RunLength encoding or Incremental encoding, or just reverse it, as they "observed that this tends to improve the compression ratio of the context mixing algorithm applied downstream". Then it is again compressed with zpaq.

ADD REPLY
2
Entering edit mode

Actually that was my impression too. I did not spend substantial time on details but honestly it just seemed like running some two existing methods even invoking them as command line applications. My first thought was, how is this a bioinformatics paper?

ADD REPLY
1
Entering edit mode
how is this a bioinformatics paper
Reviewer don't actually evaluate the algorithms, implementation, or code health and instead just glance over figures.
ADD REPLY
2
Entering edit mode
8.4 years ago

Looks like the authors continued development and released another version (?) of their code/paper with a duplicate PMID. See the new primary author respond on PubMed Commons: http://1.usa.gov/1Qys6Nh.

ADD COMMENT

Login before adding your answer.

Traffic: 2002 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6