Question: What Coverage For Genome Re-Sequencing By Illumina ?
1
Entering edit mode

Hello,

I was wondering was coverage you need to do genome re-sequencing in illumina (Illumina HighSeq 2000) ?

I was told 100x, which seems high, but I read that people often seem to use a 20-30x coverage.

Moreover, is it necessary to have a higher coverage to look for intra-population selective sweeps (from individual samples), than to investigate the genomic architecture of differenciation between sister species ?

Thank you by advance for you answer.

ADD COMMENTlink 7.2 years ago helene.badouin • 20 • updated 7.2 years ago Zev.Kronenberg 11k
Entering edit mode
0

In which species do you intend to work? You know what's the quality of their genome?

ADD REPLYlink 7.2 years ago
Biojl
♦ 1.6k
Entering edit mode
0

It's a phytopathogen fungi genome, with a high GC-rate, so I think we're going to re-sequence several individuals at a high coverage at first (100x). Then we'll do some sampling, to see how much we can lower the coverage for further experiments without decreasing sensitivity.

ADD REPLYlink 7.2 years ago
helene.badouin
• 20
3
Entering edit mode

Just an example of whole genome coverage:

enter image description here

Rather than giving you a hard number here are two articles that answer your questions.

Assessing the accuracy and power of population genetic inference from low-pass next-generation sequencing data. Crawford & Lazzro 2012.

Low-coverage sequencing: Implications for design of complex trait association studies. Li et al 2011.

Whole genome depth modeling:

Exome dist:

ADD COMMENTlink 7.2 years ago Zev.Kronenberg 11k
Entering edit mode
0

Thank you for the link. The first one in particular is very relevant for my interests (non human populations with small sample sizes).

ADD REPLYlink 7.2 years ago
helene.badouin
• 20
3
Entering edit mode

Coverage _should_ follow a Poisson distribution, so if your mean coverage is 30X, you will fall below 20X about 3.5% of the time. In theory to get 30X at 99% of locations you will need a mean of 45X coverage.

Unfortunately the genome does not respect this distribution and you will often see deserts and hotspots with thousands of reads, although this is largely a mappability issue.

ADD COMMENTlink 7.2 years ago Jeremy Leipzig 18k
Entering edit mode
0

Yup, it is naughty data. I Often see that a negative binomial is a better fit.

ADD REPLYlink 7.2 years ago
Zev.Kronenberg
11k
Entering edit mode
0

using the negative binomial, what mean coverage is necessary to have 99% of bases covered at 30X?

ADD REPLYlink 7.2 years ago
Jeremy Leipzig
18k
Entering edit mode
1

I guess I should have been more clear: this was for exome data. I also added a plot for WG data in my original post.

Exome depth histograms often look more like:

n<-100000 hist(rpois(n,rgamma(n,2,0.0333)))

ADD REPLYlink 7.2 years ago
Zev.Kronenberg
11k
2
Entering edit mode

With bacteria, we are aiming for something like 50x. For high quality SNPs, we aim for 100x so that even the lower-coverage bases will have good coverage.

ADD COMMENTlink 7.2 years ago Lee Katz ♦ 2.9k
2
Entering edit mode

It also depends what you are looking for. For homozygous SNPs, 30x average will do pretty well. For heterozygous, or mixed SNPs, 50x is more like it.

ADD COMMENTlink 7.2 years ago swbarnes2 5.7k

Login before adding your answer.

Powered by the version 1.8