Question

What Coverage For Genome Re-Sequencing By Illumina ?

1

Entering edit mode

11.5 years ago

helene.badouin ▴ 20

Hello,

I was wondering was coverage you need to do genome re-sequencing in illumina (Illumina HighSeq 2000) ?

I was told 100x, which seems high, but I read that people often seem to use a 20-30x coverage.

Moreover, is it necessary to have a higher coverage to look for intra-population selective sweeps (from individual samples), than to investigate the genomic architecture of differenciation between sister species ?

Thank you by advance for you answer.

illumina mapping coverage • 5.6k views

ADD COMMENT • link updated 11.5 years ago by Zev.Kronenberg 12k • written 11.5 years ago by helene.badouin ▴ 20

0

Entering edit mode

In which species do you intend to work? You know what's the quality of their genome?

ADD REPLY • link 11.5 years ago by Biojl ★ 1.7k

0

Entering edit mode

It's a phytopathogen fungi genome, with a high GC-rate, so I think we're going to re-sequence several individuals at a high coverage at first (100x). Then we'll do some sampling, to see how much we can lower the coverage for further experiments without decreasing sensitivity.

ADD REPLY • link 11.5 years ago by helene.badouin ▴ 20

score 3 · Answer 1 · 2012-10-31

3

Entering edit mode

11.5 years ago

Zev.Kronenberg 12k

Just an example of whole genome coverage:

enter image description here

Rather than giving you a hard number here are two articles that answer your questions.

Assessing the accuracy and power of population genetic inference from low-pass next-generation sequencing data. Crawford & Lazzro 2012.

Low-coverage sequencing: Implications for design of complex trait association studies. Li et al 2011.

Whole genome depth modeling:

Exome dist:

ADD COMMENT • link 11.5 years ago by Zev.Kronenberg 12k

0

Entering edit mode

Thank you for the link. The first one in particular is very relevant for my interests (non human populations with small sample sizes).

ADD REPLY • link 11.5 years ago by helene.badouin ▴ 20

score 3 · Answer 2 · 2012-10-31

3

Entering edit mode

11.5 years ago

Jeremy Leipzig 22k

Coverage should follow a Poisson distribution, so if your mean coverage is 30X, you will fall below 20X about 3.5% of the time. In theory to get 30X at 99% of locations you will need a mean of 45X coverage.

Unfortunately the genome does not respect this distribution and you will often see deserts and hotspots with thousands of reads, although this is largely a mappability issue.

ADD COMMENT • link 11.5 years ago by Jeremy Leipzig 22k

0

Entering edit mode

Yup, it is naughty data. I Often see that a negative binomial is a better fit.

ADD REPLY • link 11.5 years ago by Zev.Kronenberg 12k

0

Entering edit mode

using the negative binomial, what mean coverage is necessary to have 99% of bases covered at 30X?

ADD REPLY • link 11.5 years ago by Jeremy Leipzig 22k

1

Entering edit mode

I guess I should have been more clear: this was for exome data. I also added a plot for WG data in my original post.

Exome depth histograms often look more like:

n<-100000 hist(rpois(n,rgamma(n,2,0.0333)))

ADD REPLY • link 11.5 years ago by Zev.Kronenberg 12k

score 2 · Answer 3 · 2012-10-31

2

Entering edit mode

11.5 years ago

Lee Katz ★ 3.1k

With bacteria, we are aiming for something like 50x. For high quality SNPs, we aim for 100x so that even the lower-coverage bases will have good coverage.

ADD COMMENT • link 11.5 years ago by Lee Katz ★ 3.1k

score 2 · Answer 4 · 2012-10-31

2

Entering edit mode

11.5 years ago

swbarnes2 14k

It also depends what you are looking for. For homozygous SNPs, 30x average will do pretty well. For heterozygous, or mixed SNPs, 50x is more like it.

ADD COMMENT • link 11.5 years ago by swbarnes2 14k