Inclusion of ALT contigs and decoys in bwa alignment
1
1
Entering edit mode
5.4 years ago

I am using hg38 as a reference for mapping in bwa-0.7.17. I have already created the 5 BWA index files for my reference fasta file. Since it's advisable to also take into consideration the alternate contigs (hg38DH.fa.alt); I wanted to ask if: 1. How do we include the file containing decoy sequences (hg38DH-extra.fa) in the alignment process.

  1. In case the decoy file is needed, does it need to be in the same folder as the other index files.

  2. If the bwa index files for the ALT contigs need to be present.

bwa NGS alignment hg38 bwa.kit • 5.8k views
ADD COMMENT
5
Entering edit mode
5.4 years ago
ATpoint 81k

My reference includes the primary chromosomes (1-22, X, Y) plus chrM, all unplaced and random contigs + the EBV decoy. See also Which human reference genome should I use?. ALT contigs are typically not included unless you intend to use an ALT-aware alignment pipeline, such as the one in BWAkit. That is typically not necessary unless you are interested in exactly these regions and the variation going on there. Therefore, for standard purposes, do not include them into the reference. The reason is that they represent alternative sequences for regions that are already included in the primary assembly (highly variable regions like MHC) leading to multimapping events between the primary assembly and the ALTs. As most downstream tools typically exclude multimappers, including the ALTs in an ALT-unaware alignment pipeline, eventually leads to a loss of these reads. See also Reference Genome Components.

Once you have all the chromsomes and contigs you want to include as fasta, cat them together into one file and index them with BWA. See again Which human reference genome should I use?.

ADD COMMENT
1
Entering edit mode

After aligning with a reference containing the decoys, should we filter the bam to remove the reads that have mapped to the decoy?

ADD REPLY
0
Entering edit mode

That is typically not necessary unless you are interested in exactly these regions and the variation going on there

But wouldn't including these regions expand the scope of my variant search and also verify if the variants I have gotten are indeed novel ?

ADD REPLY
3
Entering edit mode

Hello,

I tried to explain this a little bit more in detailed, in this part of the tutorial ATpoint linked to. The most important conclusion is:

You do not only need an ALT aware aligner, you also need an ALT aware variant calling and reporting pipeline.

fin swimmer

ADD REPLY
0
Entering edit mode

Thank you. Would you be aware of any Alt-aware variant callers ? I mostly plan to use GATK haplotypecaller, freebayes, pindel and varscan.

ADD REPLY
0
Entering edit mode

BWAkit from the BWA package

ADD REPLY

Login before adding your answer.

Traffic: 2602 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6