Question

is Hapmap or 1000 genome VCF data is from diseased or healthy one

0

Entering edit mode

8.6 years ago

Being Bioinformatician ▴ 250

Respected Member,

I am trying to calculate dn/ds Tajima's D and other statistical test on VCF file of some genes obtained from Hapmap. Though I have been successful to get significant result in my initial studies but I am bit confused as the VCF file obtained from population in Hapmap may be from healthy ones too.

My question is, am I doing correct analysis as my only objective is to do statistical analysis of the genes and see whether they are showing positive selection or negative selection during evolution.

For evolutionary study, do we need data from diseased one or 1000 genome data will be only helpful

Thanking you in advance

RNA-Seq vcf purifying-selection • 1.9k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by Being Bioinformatician ▴ 250

3

Entering edit mode

please, define "healthy" :-)

ADD REPLY • link 8.6 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Sorry ,. by healthy I meant to say Control ... ;)

ADD REPLY • link 8.6 years ago by Being Bioinformatician ▴ 250

1

Entering edit mode

You should be aware there there are many "evolutionary studies". In fact, Giovanni M Dall'Olio has done a really nice study on selection in the 1000 genomes data: A Database Of Signatures Of Selection In The 1000 Genomes Dataset

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by Zev.Kronenberg 12k

0

Entering edit mode

Thanks sir, For this valuable information, I too have came across two papers.

According to "1000 GENOMES: A World of Variation, which state"

"I think the real key . . . is being able to translate the gene activity into the operation of biological networks," Hood says. "What can be useful is to look at the genes that are present in the 1000 Genomes Project, the nature of the variation, and map them into key biological networks in cardiovascular disease, neurodegenerative disease, whatever you are interested in and see if there are candidates that stand out. Are there variants that might lead to interesting behaviors of those biological networks?"

According to "A map of human genome variation from population-scale sequencing"

Although data from the 1000 Genomes Project pilots are neither fully comprehensive nor fully free of ascertainment bias (issues include low power for rare variants, noise in allele frequency estimates, some false positives, non-random data collection across samples, platforms and populations, and the use of imputed genotypes), they can be used to address key questions about the extent of differentiation among populations, the presence of highly differentiated variants and the ability to fine-map signals of local adaptation.

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by Being Bioinformatician ▴ 250