Comparison Of Soap And Bwa
3
4
Entering edit mode
12.9 years ago

Hi,

Could you detail me about the differences between SOAP2 and BWA (latest version of both) (though both use BWT algorithm).

Which one is faster?

Which one uses lesser memory?

Which one is more accurate and which one is more widely used for human genome sequence alignment and why?

In both the cases I want to compare single read alignment of human query gnome with the given reference gnome.

bwa next-gen sequencing human read • 13k views
ADD COMMENT
16
Entering edit mode
12.9 years ago
lh3 33k

From my evaluation and an internal evaluation done by 1000g:

On specificity (well, bwa is not far off):

novoalign~stampy+bwa>bwa>soap2>bowtie

On single-end speed:

bowtie~soap2>bwa>novoalign~stampy+bwa

On paired-end speed:

soap2>bwa~bowtie>novoalign

On paired-end sensitivity, I guess:

bwa~soap2>bowtie

On single-end sensitivity, I guess:

soap2>bwa~bowtie

On memory (">" means better or less memory):

bowtie>bwa>soap2>novoalign

On citations:

bowtie~bwa>soap2

People choose bowtie and bwa more often probably because both natively support the SAM output, while soap2 not. Bowtie is often seen in RNA-seq/ChIP-seq because it is extremely fast for single-end reads and because the whole tophat/cufflink package is very useful. BWA is often seen for SNP/indel calling because it does gapped alignment and produces fewer false alignment. BWA/stampy/novoalign estimate mapping quality which is at times useful. Bowtie/soap2 do not, which is why they are faster.

When you really come to very rare events (e.g. somatic mutations, structural variations, RNA editing and rare splicing form), probably you should consider novoalign/stampy or even trying two aligners at the same time.

ADD COMMENT
0
Entering edit mode

@lh3: I profiled both your latest version of bwa (short read) and soap2 in "intel vtune" for "single end read" of human "chromosomes10" and "chromsomeX" but found the soap2 faster. That doesn't follow your answer. Am I missing something?

ADD REPLY
0
Entering edit mode

I am saying for single-end reads soap2 is faster than bwa?

ADD REPLY
0
Entering edit mode

Probably just had trouble parsing that very useful and information dense paragraph. Would be a bit more readable if you tabulated those results or at least started each '>' string on its own line...

ADD REPLY
0
Entering edit mode

I just quickly edited the post because it was quite unreadable despite the interesting information.

ADD REPLY
0
Entering edit mode

I have a question regarding specificity: when comparing specificity for aligners and getting something below 100%, does that mean that some tools indeed call 'false' alignments. How is it at all possible?

ADD REPLY
0
Entering edit mode

Essentially no aligner guarantees to find the "best" alignment from the human genome. Even if an aligner could achieve the best, the "best" is not necessarily the correct.

ADD REPLY
0
Entering edit mode

You did not include novoalign in sensitivity, do you kow how it compares to soap2/bwa/bowtie?

ADD REPLY
5
Entering edit mode
12.9 years ago
Benm ▴ 710

Please check this paper: Bao S, etc., Evaluation of next-generation sequencing software in mapping and assembly. J Hum Genet. 2011 Apr 28.PubMed

BTW, the latest version of SOAPaligner is SOAP3: GPU-based Compressed Indexing and Ultra-fast Parallel Alignment of Short Reads.

ADD COMMENT
1
Entering edit mode

FYI, That citations appears to have been retracted by the publisher.

ADD REPLY
0
Entering edit mode

This paper does not tell us about accuracy.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

But is it available? PubMed indeed lists this as WITHDRAWN:

http://www.ncbi.nlm.nih.gov/pubmed/21677664.1#

ADD REPLY
2
Entering edit mode

Late reply because I just found this thread: The "Withdrawn"-status is because of double publishing,

"The publisher is retracting this Review. The same Review was made available online on 28 April 2011 and published in the June issue of Journal of Human Genetics (doi:10.1038/jhg.2011.43)."

It's still available at the link.

ADD REPLY
0
Entering edit mode
11.2 years ago

I found Bowtie2GP to be 4 times faster than BWA on some human pair-end alignments, took less than half the memory and almost the same accuracy. See http://arxiv.org/abs/1301.5187

Bill

ADD COMMENT
0
Entering edit mode

The following is my reply in email. GP is quite interesting anyway.

Thank you. Genetic improvement sounds very interesting. I will look into it further. On the other hand, for short-read alignment, there is more than speed and sensitivity. We have known for a long time that bowtie is several to 10 times faster than BWA with comparable sensitivity and I could make BWA much faster without reducing its sensitivity. However, I keep BWA as it is and BWA is still more widely used for variant discovery and cancer projects. This is firstly because of its accuracy (your Table 2) and secondly for its power to distinguish good and bad hits. For those applications, a tiny fraction of wrong alignments make up many false calls. It is critical to inform the caller which alignment to trust. Bowtie cannot do that.

Bowtie2 version beta5 or later (not earlier versions) competes well with bwa for 100bp reads and is likely to surpass bwa (not bwa-sw) for 200bp reads. I would be interested in the comparison for the typical 100bp and the upcoming 250bp reads, instead of the 36bp reads in your RN. Aligner performance/accuracy is greatly affected by the read length.

Thanks,

Heng

ADD REPLY
1
Entering edit mode

Dear Heng, I replied to your email before spotting your posting. So for everyone else here are my thoughts:

Yes it does seem that people are moving to longer read lengths. We measured 36bp because that is what the Cancer Institute used. One of the longer term goals of the project is to make it easier to tune software as its users change their requirement. So whilst we optimised Bowtie2GP for 36bp single ended, it was nice to see the optimisation still held for pair-end but Bioinformatics, like many fields, sees a pretty much continual change in data. At present people are forced to keep up by manual code changes. Whilst here we have an automated approach, it may be that people will want to run GP on the new data and inspect its suggested optimisations before allowing them to be implemented. I guess for BWA, there might be a user controlled switch which enabled the GP code tweaks. Initially it could default to off and only later (when users have more confidence in it) it might default to on. ....

If you use Linux, the 64bit binary is available via ftp http://www.cs.ucl.ac.uk/staff/W.Langdon/ftp/gp-code/bowtie2gp Alternatively I could [POST] the three GP optimised source files.

Thanks again

Bill

ADD REPLY

Login before adding your answer.

Traffic: 1992 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6