How to calculate genome size????
1
0
Entering edit mode
9.4 years ago
123clouds123 ▴ 10

What web sites or programs could I use to calculate genome side using the sequencings of the reads obtained from NGS systems like Illumina, Ion Torrent? and what is the necessary sequencing coverage to make a RNA-Seq experiment?

Illumina NGS genome • 7.6k views
ADD COMMENT
1
Entering edit mode

'side' ? do you mean 'strand' ?

ADD REPLY
1
Entering edit mode

May be its "size"

ADD REPLY
0
Entering edit mode

That's probably the best bet.

ADD REPLY
0
Entering edit mode

Sorry!! It is a mistake. 'Genome Size'. Thank you so much!

ADD REPLY
3
Entering edit mode
9.4 years ago

You can use BBTools's kmercountexact program for this purpose, as outlined here.

As for min coverage in RNA-seq, that depends on the goal of your experiment.

ADD COMMENT
1
Entering edit mode

Thank you so much, Brian Bushnell. I am very grateful. Your information can help me in my studies!

ADD REPLY
1
Entering edit mode

Look at papers which explains coverage Vs Replicates issue. They might give you an idea. For eg http://www.nature.com/nrg/journal/v15/n2/full/nrg3642.html

This post also could help you.

How Much Coverage Do We Need For An Rna-Seq Experiment?

ADD REPLY
1
Entering edit mode

About this approach, what if the Genome is heterozygous tetraploid, and the k-mer graph shows 3 peaks, in a 1:2:4 ratio? Assumin the 3rd peak is the homozygous regions in single copy, would it be correct to count genome size as:

3rd_peak(nr of kmers) + 0.5*2nd_peak(nr of kmers) + 0.25(nr of kmers)?

Does BBTools handle these genomes?

ADD REPLY
0
Entering edit mode

If the genome is tetraploid with 1:2:4-ratio primary peaks, then yes, the 3rd peak should be single-copy het content, so your math is correct. I have not looked at a tetraploid with this program but I would expect that if there was a 1-copy peak, there should also be a 3-copy peak of similar magnitude, so you'd probably end up with 4 peaks, with the 3rd needing a 0.75 multiplier.

BBTools does handle this, with one caveat - the peak detection is not very sophisticated so it may lump smaller peaks together. It generally identifies the two to four most prominent peaks correctly, but if there are more than that you may need to do some manual labor on the kmer frequency histogram for a precise estimate. I developed it primarily for microbes and fungi, which are haploid and haploid or diploid; I may put in more robust peak modeling later.

Also note that if you are working on a large dataset for a large genome, you can reduce memory consumption by enabling a bloom filter for error kmers with the "prefilter" flag. That's not necessary for microbes or most fungi which tend to be much smaller than plants and animals.

ADD REPLY

Login before adding your answer.

Traffic: 1534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6