Question

How to calculate genome size????

0

Entering edit mode

9.4 years ago

123clouds123 ▴ 10

What web sites or programs could I use to calculate genome side using the sequencings of the reads obtained from NGS systems like Illumina, Ion Torrent? and what is the necessary sequencing coverage to make a RNA-Seq experiment?

Illumina NGS genome • 7.6k views

ADD COMMENT • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by 123clouds123 ▴ 10

1

Entering edit mode

'side' ? do you mean 'strand' ?

ADD REPLY • link 9.4 years ago by Pierre Lindenbaum 161k

1

Entering edit mode

May be its "size"

ADD REPLY • link 9.4 years ago by GouthamAtla 12k

0

Entering edit mode

That's probably the best bet.

ADD REPLY • link 9.4 years ago by Devon Ryan 104k

0

Entering edit mode

Sorry!! It is a mistake. 'Genome Size'. Thank you so much!

ADD REPLY • link 9.4 years ago by 123clouds123 ▴ 10

Ram · Answer 1 · 2014-11-29

3

Entering edit mode

9.4 years ago

Brian Bushnell 20k

You can use BBTools's kmercountexact program for this purpose, as outlined here.

As for min coverage in RNA-seq, that depends on the goal of your experiment.

ADD COMMENT • link 9.4 years ago by Brian Bushnell 20k

1

Entering edit mode

Thank you so much, Brian Bushnell. I am very grateful. Your information can help me in my studies!

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by 123clouds123 ▴ 10

1

Entering edit mode

Look at papers which explains coverage Vs Replicates issue. They might give you an idea. For eg http://www.nature.com/nrg/journal/v15/n2/full/nrg3642.html

This post also could help you.

How Much Coverage Do We Need For An Rna-Seq Experiment?

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by GouthamAtla 12k

1

Entering edit mode

About this approach, what if the Genome is heterozygous tetraploid, and the k-mer graph shows 3 peaks, in a 1:2:4 ratio? Assumin the 3rd peak is the homozygous regions in single copy, would it be correct to count genome size as:

3rd_peak(nr of kmers) + 0.5*2nd_peak(nr of kmers) + 0.25(nr of kmers)?

Does BBTools handle these genomes?

ADD REPLY • link 9.4 years ago by Adrian Pelin ★ 2.6k

0

Entering edit mode

If the genome is tetraploid with 1:2:4-ratio primary peaks, then yes, the 3rd peak should be single-copy het content, so your math is correct. I have not looked at a tetraploid with this program but I would expect that if there was a 1-copy peak, there should also be a 3-copy peak of similar magnitude, so you'd probably end up with 4 peaks, with the 3rd needing a 0.75 multiplier.

BBTools does handle this, with one caveat - the peak detection is not very sophisticated so it may lump smaller peaks together. It generally identifies the two to four most prominent peaks correctly, but if there are more than that you may need to do some manual labor on the kmer frequency histogram for a precise estimate. I developed it primarily for microbes and fungi, which are haploid and haploid or diploid; I may put in more robust peak modeling later.

Also note that if you are working on a large dataset for a large genome, you can reduce memory consumption by enabling a bloom filter for error kmers with the "prefilter" flag. That's not necessary for microbes or most fungi which tend to be much smaller than plants and animals.

ADD REPLY • link updated 2.2 years ago by Ram 43k • written 9.4 years ago by Brian Bushnell 20k