How To Retrieve Coding Snps Typed Only In 1000G Data
1
2
Entering edit mode
12.7 years ago
Sarah Tyrell ▴ 20

Good afternoon,

I have a list of 100 genes for which (in fact, for one of their transcripts in particular) I would like to get "synonymous coding" and "non-synonymous coding" SNPs that are observed in 1000G data (n=629).

Moreover, it would be fantastic to somehow extract the heterozygosity status for those SNPs.

I tried the ENSEMBLE 1000G browser, however, there are inconsistencies, that is, some SNPs that appear in the VCF file do not show up in the browser view. In addition, I do not want to mess with the dbSNP but am only interested in the SNPs observed in 1000G.

Any help would be much appreciated.

snp genome non • 3.3k views
ADD COMMENT
0
Entering edit mode

Do you have the VCF file describing the 1000G variants that you want to use?

ADD REPLY
0
Entering edit mode

The inconsistencies you see are probably caused by the fact that there are different 1000genomes releases. In particular, they have published a new one in October 2011, including almost 2000 individuals (http://www.1000genomes.org/announcements/october-2011-integrated-variant-set-release-ichg2011-2011-10-12). Which release are you interested to?

ADD REPLY
1
Entering edit mode
12.5 years ago
Simon P ▴ 10

Sarah,

A pretty direct pipeline should allow you to do so.

  1. Get the chromosomal coordinates of your genes

  2. Extract the SNPs contained in the regions found in 1 (make sure that you use the same genome annotation version)

  3. Use a variant annotation software to annotate the SNPs (IE : ANNOVAR)
ADD COMMENT

Login before adding your answer.

Traffic: 2532 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6