Question

ddRADseq and segregating SNPs

0

Entering edit mode

8.8 years ago

biogirl ▴ 210

Hi there,

I'm completely new as-of-this-morning to ddRADseq, but am trying to get my head around the theory. If I have a 30 Mb genome and use ddRADseq with ~6500 digestion sites, how many segregating SNPs can I find (roughly)? WGS shows that there's roughly 20000 SNPs separating each isolate.

Does this depend on the amount of the reference genome covered by the ddRAD?

Any help is greatly appreciated - thanks.

SNPs RADseq ddRAD • 2.8k views

ADD COMMENT • link updated 17 months ago by Ram 43k • written 8.8 years ago by biogirl ▴ 210

Ram · Answer 1 · 2015-07-29

2

Entering edit mode

8.8 years ago

SNPsaurus ▴ 50

If you have 6500 digestion sites within your planned fragment size selection range, and sequence with 100 bp reads, then you will be sampling 6500 x 100 = 650 kb. Then if the SNPs are spaced every 1.5kb (30Mb/20000 SNPs), you should end up with ~400 SNPs total that you assay. If you sequence fragments that are 150-200 bp with 100 bp paired-end reads, you'll sample more of the genome and have more SNPs.

How did you figure the 6500 digestion sites? Most ddRAD protocols cut with a 6-cutter and a 4-cutter enzyme, but there are probably 6500 6-cutter enzyme sites in your genome. With ddRAD, you have to find the subset of fragments that have the two enzyme sites in the exact size range desired. Just checking... maybe you did all that.

ADD COMMENT • link updated 17 months ago by Ram 43k • written 8.8 years ago by SNPsaurus ▴ 50

0

Entering edit mode

6500*200 may be ? Because ddRad seq is generally PE sequencing.

ADD REPLY • link 8.8 years ago by GouthamAtla 12k

0

Entering edit mode

Right, that's why I included the "If you sequence fragments that are 150-200 bp with 100 bp paired-end reads, you'll sample more of the genome and have more SNPs."

I thought it might be helpful to start with the simpler case of always being 100 bp to show how it is done. Sequencing a size range is less exact (150-200 bp fragments with PE) since it depends on the distribution of fragment sizes.

ADD REPLY • link 8.8 years ago by SNPsaurus ▴ 50

0

Entering edit mode

Ah, I see! That makes complete sense, thank you for going through that.

In reply to the 6500 digestion sites, I would actually have more because I'd use two cutters (as you mention). 6500 was just for the one cutter. But thank you for your reply, the theory makes sense now.

ADD REPLY • link updated 4.5 years ago by Ram 43k • written 8.8 years ago by biogirl ▴ 210

Ram · Answer 2 · 2015-07-29

1

Entering edit mode

8.8 years ago

GouthamAtla 12k

Here is some code (adapted from Peterson Et al) to double digest your genome.

Usage: edit the restriction sites and give full path to your genome file. Then

RE_Digestion.py > Rest-sites.txt

If you would like to select fragments with specific size:

RE_Digestion.py | awk '{ if( ($3-$2 ) >=300 && ($3-$2) <= 500 ) print }' > 300_500_sites.txt

ADD COMMENT • link updated 18 months ago by Ram 43k • written 8.8 years ago by GouthamAtla 12k