Biostar Beta. Not for public use.
BWA mapping to multiple reference genome
0
Entering edit mode
15 months ago

If the reference genome is very big (like for plant species), we'd like to first split the ref into smaller chunks, for example, chromosome by chromosome. Then we could map fastq to each chromosome ref separately, then merge together.

Then my worry is, this would totally change the alignment scenario compared to running against one complete genome. One read could potentially map MANY TIMES. For example, one unique read coming from chr1, will definitely map to chr1 with highest mapping score, when complete genome used as reference. But when we try to map against each chromosome reference, this same read could map to many different similar ref sequences, which bring many false positives.

So after we map to separate chromosome reference, then merge together, do we have any tools to re-calculate the mapping score? Maybe dedup tools?

But to me, dedup usually means, we find the mapping with same sequence + start + end + orientation, and remove potential PCR duplicates. So is it possible to have another type of "dedup", that is to only to retain the best mapping for one read, removing other lower-score mapping?

thx

ADD COMMENTlink
2
Entering edit mode

Mapping against a reduced reference is always going to cause problems. bwa should be able to handle large genomes without having to split them.

ADD REPLYlink
1
Entering edit mode
18 months ago
swbarnes2 5.7k
United States

we'd like to first split the ref into smaller chunks, for example, chromosome by chromosome.

Sorry, but this is a terrible idea. You need to let every read find its best mapping position from the entire genome, which means each read needs to be aligned to the entire genome.

You can split your fastq into chunks to do the alignments in parallel, but do not split the reference into chunks.

ADD COMMENTlink
0
Entering edit mode

Just to make sure you won't go that path, I've done that years ago when bowtie2 wasn't able to handle huge references.

You will face the actual problem you try to avoid, probably raised to a higher power. Reads with suboptimal alignments which are not reported on the full reference but might be on split references. You need to sort that kind of stuff out.

When I did it was an ugly mess, loads of pointless work, and I was so glad bowtie2 developers published an update that handled the problem. Check the manual, bwa has options to restrict reporting

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3