I want to map samples (70bp paired-end) to a 30-gene panel using BWA (mem). They were generated using a capture kit so should have few if any off-target reads.
Should I map the reads to the whole genome and just filter on the gene panel, or just restrict mapping to those genes (+- some margin, say 1000bp)?
This is for variant calling downstream, so accurate mapping to the genes of interest is most important.
Alternatively, are there any better algorithms I could use? (BWA aln/sampe for instance)
Except that with short reads (70bp) and few sequences (30), there are many more locations in the genome that they can map to which is not part of the target, so that answer really does not apply. If I were doing all exons captured or >10K as in that previous answer, then I would do whole genome mapping followed by filtering rather than filtering then mapping on the 30-gene panel.
You have no control over what kinds of off-targets you capturing array binds. Of course you design it to minimize off-targets but this is not a perfect process. Therefore, you always align to the entire genome.
Hopping in to provide support for ATpoint's comments - they're right. Map to the whole genome, then subset or restrict variant calling to regions of interest. You want your mapping qualities to reflect any uncertainty in genomic placement to produce the most accurate variant calls for your regions of interest. As a thought experiment, consider what might happen if your targets are similar to pseudogenes or contain any sort of duplicated/repetitive element.
I am getting (target) mapping rates of between 13% (from a particularly degraded sample) and 40%. Is this normal for such an experiment? I have no experience in this (previous work being on whole genome, whole exome or RNA sequencing) so I have no intuition as to whether this is a good rate or not