Biostar Beta. Not for public use.
Question: How do I compute the effective genome size?
2
Entering edit mode

Several pieces of software require this parameter.

Is counting the number of masked nucleotides in fasta files going to give a good approximate result?

If not, is there a simple way to do it?

ADD COMMENTlink 4.5 years ago Endre Bakken Stovner • 880 • updated 4.2 years ago Biostar 20
3
Entering edit mode

EDIT: if you understand effective genome size as "mappable" genome size, than Devon is right, of course.

Assemblies only will provide you with a good size estimate if they are of really high quality. This is usually only the case for either model organisms or small, bacterial genomes.

Assemblies of larger genomes such as plant, animals etc., and in particular draft genomes usually do not contain a complete representation of a single haplogenome - which you would need to get your size estimate right. The reasons are that assembly algorithms usually cannot resolve all repeats and centromeric/telomeric regions, and also are prone to generate multiple sequences for different alleles of the the same region.

In my opinion, there are two better approaches:

1) Use the experimentally determined nuclear DNA (e.g. www.genomesize.com) content to calculate the haploid genome size. DNA content in pg can be directly converted into a bp estimate.

2) Use a k-mer based approaches to estimate the genome sizes form a high coverage NGS data set of your organism

ADD COMMENTlink 4.5 years ago thackl ♦ 2.6k
Entering edit mode
1

Note to self and others: Conversion of pg is just multiplying with .978 * 10^9

http://www.genomesize.com/faq.php

ADD REPLYlink 4.5 years ago
Endre Bakken Stovner
• 880
Entering edit mode
0

Thanks. It is mostly for assembled genomes (if that is what assembly means). It will also mostly be used for model species.

ADD REPLYlink 4.5 years ago
Endre Bakken Stovner
• 880
3
Entering edit mode

The simplest method is to just subtract the number of Ns from the total length of the genome. That will over estimate things, but since a real number is read length/pair vs. single end/insert size dependent, this is a simpler and quicker approximation.

ADD COMMENTlink 4.5 years ago Devon Ryan 90k
Entering edit mode
0

I'll go this route then. It is probably good enough for SICER/MACS.

ADD REPLYlink 4.5 years ago
Endre Bakken Stovner
• 880
Entering edit mode
0

Sb in my group pointed out that this is a very bad approx: doing it for the human genome gives 95% while the actual number is 74%

ADD REPLYlink 4.5 years ago
Endre Bakken Stovner
• 880
1
Entering edit mode

In programs like MACS, the effective genome size is used to compute statistics of mapped reads with respect to the size of the genome covered by reads. Such size varies depending on read length and mapping strategy. With mapping strategy I just mean whether multi-mapping reads are kept or discarded. This can introduce a difference of about 20% in human and mouse effective genomes sizes.

If multi-mapping reads (reads that map to multiple positions) are kept then the strategy given by Devon can be used because all positions in the genome can be covered by reads excepts for stretches of NNNs.

Otherwise, the best way to compute the effective genome size is to add up all positions being covered by reads or, if you are using a model organism you can use this table although is a bit outdated as they used reads of length 30.

ADD COMMENTlink 4.4 years ago Fidel ♦ 1.9k
0
Entering edit mode

Wondering how to calculate the pg content of a determined organism

ADD COMMENTlink 4.5 years ago Antonio R. Franco ♦ 4.0k
Entering edit mode
0

It isn't computed, but experimentally determined and you can look it up at www.genomesize.com

ADD REPLYlink 4.5 years ago
Endre Bakken Stovner
• 880
Entering edit mode
0

The most common approach is probably flow cytometry - you can used DNA binding dyes, such as propidium iodide and measure fluorescence per nucleus in a FACS machine.

Dolezel, 2007 and Veselska, 2014 describe some protocols I can recommend.

ADD REPLYlink 4.5 years ago
thackl
♦ 2.6k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0