Question

Do I need to reconstruct haplotypes from SNP data to calculate nucleotide diversity?

0

Entering edit mode

6.2 years ago

suzyhocking • 0

Hi,

I'm new to working with SNP data and I'm quite confused about how best to analyse what I have. I work with a haploid species

My SNP files are in text format:

CHR SNP_POS SAMPLE1 SAMPLE2 etc...

chr1 5 A -

chr1 12 T G

etc... for about 400,000 SNPs and 20 samples. The reason I use this format is because I use customise scripts that do extra quality control and calculate the likely base at each position based on read depth, I have no option of doing this another way. SNPs are filtered at <10% missingness in the dataset

I want to work out genome-wise nucleotide diversity based on these SNPS. My questions are:

1) for nucleotide diversity (pi): do I need to reconstruct whole genome haplotypes for each sample by substituting each appropriate base of the reference with the alternative 'SNP base' for each sample?

2) If so - any suggestions on how to do this? I've found tools that work with VCF files but not the text files I have

3) Otherwise, can I calculate pi based only on SNP data? This doesn't seem like a valid method to me.

4) I can't seem to find a programme to find pi/theta that will work with text files - I can happily reformat them within a text format - but I can't convert them to VCF.

Any clarifications of advice would be very much welcomed! Thanks

haplotypes SNPs nucleotide diversity • 1.5k views

ADD COMMENT • link 6.2 years ago by suzyhocking • 0