Biostar Beta. Not for public use.
What Coverage For Genome Re-Sequencing By Illumina ?
1
Entering edit mode
6.2 years ago
University Paris South

Hello,

I was wondering was coverage you need to do genome re-sequencing in illumina (Illumina HighSeq 2000) ?

I was told 100x, which seems high, but I read that people often seem to use a 20-30x coverage.

Moreover, is it necessary to have a higher coverage to look for intra-population selective sweeps (from individual samples), than to investigate the genomic architecture of differenciation between sister species ?

Thank you by advance for you answer.

ADD COMMENTlink
0
Entering edit mode

In which species do you intend to work? You know what's the quality of their genome?

ADD REPLYlink
0
Entering edit mode

It's a phytopathogen fungi genome, with a high GC-rate, so I think we're going to re-sequence several individuals at a high coverage at first (100x). Then we'll do some sampling, to see how much we can lower the coverage for further experiments without decreasing sensitivity.

ADD REPLYlink
3
Entering edit mode
15 months ago
United States

Just an example of whole genome coverage:

enter image description here

Rather than giving you a hard number here are two articles that answer your questions.

Assessing the accuracy and power of population genetic inference from low-pass next-generation sequencing data. Crawford & Lazzro 2012.

Low-coverage sequencing: Implications for design of complex trait association studies. Li et al 2011.

Whole genome depth modeling:

Exome dist:

ADD COMMENTlink
0
Entering edit mode

Thank you for the link. The first one in particular is very relevant for my interests (non human populations with small sample sizes).

ADD REPLYlink
3
Entering edit mode
15 months ago
Philadelphia, PA

Coverage _should_ follow a Poisson distribution, so if your mean coverage is 30X, you will fall below 20X about 3.5% of the time. In theory to get 30X at 99% of locations you will need a mean of 45X coverage.

Unfortunately the genome does not respect this distribution and you will often see deserts and hotspots with thousands of reads, although this is largely a mappability issue.

ADD COMMENTlink
0
Entering edit mode

Yup, it is naughty data. I Often see that a negative binomial is a better fit.

ADD REPLYlink
0
Entering edit mode

using the negative binomial, what mean coverage is necessary to have 99% of bases covered at 30X?

ADD REPLYlink
1
Entering edit mode

I guess I should have been more clear: this was for exome data. I also added a plot for WG data in my original post.

Exome depth histograms often look more like:

n<-100000 hist(rpois(n,rgamma(n,2,0.0333)))

ADD REPLYlink
2
Entering edit mode
17 months ago
Lee Katz ♦ 2.9k
Atlanta, GA

With bacteria, we are aiming for something like 50x. For high quality SNPs, we aim for 100x so that even the lower-coverage bases will have good coverage.

ADD COMMENTlink
2
Entering edit mode
17 months ago
swbarnes2 5.7k
United States

It also depends what you are looking for. For homozygous SNPs, 30x average will do pretty well. For heterozygous, or mixed SNPs, 50x is more like it.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1