Hey everyone! I'm a microbiology undergrad in the middle of completing a work term with a bioinformatics team. My project for the semester involves researching different read alignment / mapping programs for the development of a future training course. One point that's been emphasized is that I should look into the effects of using the default parameters vs custom settings.
So far, I've been looking into Bowtie2, BWA and BWA-MEM, SMALT, segemehl, and BBMap. Reading the manuals and literature, trying to get a few test datasets, figuring out the different options for each program. I've also just added Minimap2 to my list, and will be reading up on that one over the coming days.
If you perform read alignments as part of your research, or have any other unique expertise on the subject, I'd be grateful for whatever insight you can offer! Here's an idea of the type of questions I'd like feedback on:
- What sequencing technology do you use to obtain your reads? What is your preferred tool for performing read alignments, and why? What advantage does it have over other available programs?
- Are there any unique challenges presented by the organisms you study or the sequencing technology you use? What adjustments do you make to your workflow to combat this?
- What factors do you place the most weight on when performing read alignments? Speed, adjustable parameters, low memory requirements?
- Do you use default settings, or custom options? If you use custom settings, what options do you change? Why is this necessary? How does it affect the outcome of your alignment?
- Any other info you think is valuable and would like to share with me!
Thanks in advance!
This really boils down to personal preference. Most aligners (as long as you use them for intended purposes, e.g. you can't use
bowtie v.1
with spliced reads) should produce more or less similar answers for 90+% of data. They will likely perform differently (badly or not at all) on edge cases but then those may or may not be critical to overall success of your analysis. If you are doing something non-standard to begin with then you need to research your options for that specific application.research many aligners, identify one you like, become intimately familiar with its options and stick with it, is the best take home I offer.
For bulk (read volume) sequencing there is no alternative for Illumina. For long reads both Oxford Nanopore and PacBio will do. It depends on how much money you have to spend and what kind of error you are willing to tolerate.
The "accuracy" is often a just a red-herring, there are usually far more fundamental differences between aligners than what percent of reads they appear to be able to align. For example, the default behavior for
bwa mem
can be achieved with bowtie if you run it with--very-sensitive-local
parameter. At the same time bowtie2 gives users far many more options to filter their results (which alignments make it into the BAM file).In addition, different aligner may fill in different optional SAM tags - that may be necessary for downstream analysis. Alas, the differences between aligners are not well documented thus is more of a gray area for most.