Polyploid genome and mapping !!
3
1
Entering edit mode
7.5 years ago
BioGeek ▴ 170

How does the mapping software deals with reads mapping against polyploid genome. Does they randomly map at any allelic location ? If yes, then it might affect structure variation study. Isnt it?

alignment next-gen • 2.9k views
ADD COMMENT
1
Entering edit mode
7.5 years ago

The reference genome for mapping is haploid so mapping is not different compared to haploid organisms. More complicated are alternative haplotypes, but for the spirit of this question it's okay to ignore those.

ADD COMMENT
0
Entering edit mode

What if the reference is tetraploid ?

ADD REPLY
1
Entering edit mode

The reference is never tetraploid. The genome may be tetraploid or hexaploid or whateverploid, but the reference is always haploid. Multiple alleles are then mapped to the same location and it's up to the variant caller to take ploidy into account.

ADD REPLY
1
Entering edit mode
5.0 years ago
Baoxu ▴ 10

Same questions! When the reference is tetraploid or hexaploid there are many duplications across the genome. Lots of reads will be Mutipul-mapped reasds. How to deal with it ?

ADD COMMENT
0
Entering edit mode

How to deal with it?

First key to success is to open a new question, instead of adding an solution to a 2.5y old question.

Besides this, I see two options:

  • Accept the consensus and hence multimapping and identify CNVs for duplications
  • resolving everything in a massive sequencing and assembly project with many sequencing libraries, ideally including a mix of short and long reads
ADD REPLY
0
Entering edit mode

Many thanks for your reply! I don't understand what you mean for the first option. Maybe I didn't explain the question well. I mapped the re-sequencing reads(100 bp pair-end) with bwa-mem (defaut parameter) for the hexaploid wheat. But I found a lot of reads which can be mapped to reference have very low mapping quality (actually 0 for most mapped reads), thus make the number of uniq-reads very low.
I was wondering if there is a method can improve the accuracy. Maybe add some parameter or change another maping software

ADD REPLY
0
Entering edit mode

See this thread for some details on the MAPQ field calculations of BWA. In my opinion the field often is close to meaningless, and I had my share of difficulties with big plant genomes and within that group specifically the wheat (Triticum aestivum) genome.

You may need to play with the parameters and/or other aligners like bowtie2, though I'm really not enough of a Triticum specialist to evaluate whether the duplication rate you observe is expected or abnormal.

ADD REPLY
0
Entering edit mode
22 months ago

I suspect the real answers to these questions is that our standard procedures, ie. mapping to a standard haploid reference and using well mapped reads to call good SNPs effectively, is not going to work well.

If we are looking at true diploid, triploid and polyploid reference genomes, then mapping quality will very frequently be low (0, or close to zero) because many near identical sequences are present.

I suspect this is a perfect use case for pangenome software like PGGB -> ODGI, rather than the BWA vs haploid reference we are all more familiar with.

ADD COMMENT

Login before adding your answer.

Traffic: 2496 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6