Question

Polyploid genome and mapping !!

1

Entering edit mode

7.5 years ago

BioGeek ▴ 170

How does the mapping software deals with reads mapping against polyploid genome. Does they randomly map at any allelic location ? If yes, then it might affect structure variation study. Isnt it?

alignment next-gen • 2.9k views

ADD COMMENT • link updated 22 months ago by colindaven 6.3k • written 7.5 years ago by BioGeek ▴ 170

score 1 · Answer 1 · 2016-10-27

1

Entering edit mode

7.5 years ago

WouterDeCoster 47k

The reference genome for mapping is haploid so mapping is not different compared to haploid organisms. More complicated are alternative haplotypes, but for the spirit of this question it's okay to ignore those.

ADD COMMENT • link 7.5 years ago by WouterDeCoster 47k

0

Entering edit mode

What if the reference is tetraploid ?

ADD REPLY • link 7.5 years ago by BioGeek ▴ 170

1

Entering edit mode

The reference is never tetraploid. The genome may be tetraploid or hexaploid or whateverploid, but the reference is always haploid. Multiple alleles are then mapped to the same location and it's up to the variant caller to take ploidy into account.

ADD REPLY • link 7.5 years ago by WouterDeCoster 47k

score 1 · Answer 2 · 2019-04-29

1

Entering edit mode

5.0 years ago

Baoxu ▴ 10

Same questions! When the reference is tetraploid or hexaploid there are many duplications across the genome. Lots of reads will be Mutipul-mapped reasds. How to deal with it ?

ADD COMMENT • link 5.0 years ago by Baoxu ▴ 10

0

Entering edit mode

How to deal with it?

First key to success is to open a new question, instead of adding an solution to a 2.5y old question.

Besides this, I see two options:

Accept the consensus and hence multimapping and identify CNVs for duplications
resolving everything in a massive sequencing and assembly project with many sequencing libraries, ideally including a mix of short and long reads

ADD REPLY • link 5.0 years ago by Carambakaracho ★ 3.2k

0

Entering edit mode

Many thanks for your reply！ I don't understand what you mean for the first option. Maybe I didn't explain the question well. I mapped the re-sequencing reads(100 bp pair-end) with bwa-mem (defaut parameter) for the hexaploid wheat. But I found a lot of reads which can be mapped to reference have very low mapping quality (actually 0 for most mapped reads), thus make the number of uniq-reads very low.
I was wondering if there is a method can improve the accuracy. Maybe add some parameter or change another maping software

ADD REPLY • link 5.0 years ago by Baoxu ▴ 10

0

Entering edit mode

See this thread for some details on the MAPQ field calculations of BWA. In my opinion the field often is close to meaningless, and I had my share of difficulties with big plant genomes and within that group specifically the wheat (Triticum aestivum) genome.

You may need to play with the parameters and/or other aligners like bowtie2, though I'm really not enough of a Triticum specialist to evaluate whether the duplication rate you observe is expected or abnormal.

ADD REPLY • link 5.0 years ago by Carambakaracho ★ 3.2k

score 0 · Answer 3 · 2022-06-10

I suspect the real answers to these questions is that our standard procedures, ie. mapping to a standard haploid reference and using well mapped reads to call good SNPs effectively, is not going to work well.

If we are looking at true diploid, triploid and polyploid reference genomes, then mapping quality will very frequently be low (0, or close to zero) because many near identical sequences are present.

I suspect this is a perfect use case for pangenome software like PGGB -> ODGI, rather than the BWA vs haploid reference we are all more familiar with.