Biostar Beta. Not for public use.
Question: How To Check For Ploidy Using Ngs Only?
3
Entering edit mode

Hello,

I am working on a single celled organism, that I am isolating from a natural environment and sequencing it using illumina PE.

How can I determine it's ploidy? There is no reference genome.

I was thinking of mapping the reads to a denovo assembly, and seeing what is the maximum number of alleles I can find per locus, and what are their frequencies.

Adrian

ADD COMMENTlink 6.0 years ago Adrian Pelin ♦ 2.3k • updated 17 months ago kamiljaron • 120
Entering edit mode
0

Sound reasonable. Maybe you can do this even faster with a kmer based approach, provided you find a way to differentiate between alleles and sequencing errors.

ADD REPLYlink 6.0 years ago
Christian
♦ 2.8k
Entering edit mode
1

This is an excellent idea, I have already tried it:) I build a kmer graph, and I see 3 peaks. The last peak is the fattest (if you know what I mean), the second peak is half the coverage of the last, and the first peak is half the coverage of the second. This suggests that Allele frequencies in the data set are either 0.25, 0.50 and 1.00.

This potentially suggests the organism is tetraploid. However, I am missing a peak for 0.75, but I believe since the 1.00 peak is so thick, it is hiding the 0.75 peak.

Do these conclusions sound correct?

As for seq. error... this is illumina, and the run was of high quality, so I suppose the errors would simple contribute to the bell curve in my peaks. Any other suggestions? I can filter reads based on quality I suppose, but I heard people warning against this, since it introduces bias.

ADD REPLYlink 6.0 years ago
Adrian Pelin
♦ 2.3k
Entering edit mode
0

Your conclusion sounds reasonable to me, although I am not an expert on interpreting these peaks. With respect to the sequencing errors I think you are also right. To clean up your data you could also just throw away all kmers that occur only a few times.

ADD REPLYlink 6.0 years ago
Christian
♦ 2.8k
Entering edit mode
0

Hi Adrian, I am also using kmer strategy in order to determine ploidy. Can you tell me more about the tool you used and what did you do with the output, please?

ADD REPLYlink 3.0 years ago
dilution
• 0
0
Entering edit mode

Couldn't one of the additional peaks be an organellar genome?

ADD COMMENTlink 6.0 years ago bewickaj • 10
Entering edit mode
0

Good point, but I work on an organism that does not have an organellar genome. The peaks could also originate from contaminants, bacteria.

That is why I have mapped my reads to my draft assembly (bwa), of contigs I am certain come from the correct nuclear genome, and used those reads that mapped to construct the kmer graph.

ADD REPLYlink 6.0 years ago
Adrian Pelin
♦ 2.3k
0
Entering edit mode

We just released a method for detection of ploidy using sequencing reads without assembly (it's basically an extension of kmer spectra analysis). We call the approach smudgeplot and it's available on GitHub: https://github.com/tbenavi1/smudgeplot

ADD COMMENTlink 17 months ago kamiljaron • 120

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0