Query For A List Of Snp Alleles, Frequency Ceu, Strand
3
2
Entering edit mode
13.2 years ago
jvijai ★ 1.2k

Hi,

Say, I have a list of 1 Million SNPs from one of the common arrays, and I want to search for data pertaining to Hapmap CEU; information such as alleles, Minor allele frequency, strand, etc., how can I do this.
UCSC tables does not give me allele frequency for dbSNP130.
Does Biomart have a limit of # of SNPs that can be queried at one time.?

Thank you

dbsnp frequency • 5.2k views
ADD COMMENT
2
Entering edit mode
13.2 years ago

You'll find all those data in the sub-directories of the Hapmap FTP site: ftp://ftp.ncbi.nlm.nih.gov/hapmap/frequencies/

ADD COMMENT
2
Entering edit mode
13.2 years ago

although bulk downloading the data from the HapMap ftp site and process it yourself as Pierre suggests would be the most appropriate thing to do (I suggest you too to go to the latest ftp relase folder currently 2010-08_phaseII+III, download all the chromosomes data, and parse all that files looking for the SNPs of your interest), I understand that you may think "why should I deal with all that bunch of files if they have already done so?". if that is the case and you want to use the BioMart retrieval tool on top of HapMap you can also obtain the data you are interested in from there.

I have tried in the past the capabilities of such retrieval tool, and I didn't find any limitations in terms of query size. I'm sure it should be capable of letting you download all the data you are interested in by uploading a bunch of rs numbers on a single file as the only filter, and selecting the attributes you need (frequency, MAF, strand, ...). note that the HapMap version this tool handles is the #27 release, and not the current #28 release, so go ahead if you can live with that. if not, you will need to consider the original bulk parsing suggestion.

ADD COMMENT
0
Entering edit mode

Thank you Jorge, your reply was most useful.

ADD REPLY
1
Entering edit mode
13.2 years ago
lh3 33k

With the release of 1000g data, which is far more complete than HapMap, the best way is to always look at the latest build (currently 132) of dbSNP. There are also other improvements in the latest build as I remember. I recommend VCF format:

ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/v4.0/ByChromosomeNoGeno/

Other formats are also available if you prefer.

ADD COMMENT
0
Entering edit mode

Thank you Heng Li. Perhaps slightly offtopic, but if one wants to filter known variants from a novel variant disease discovery project...say exome sequencing, would it be by using the combined VCF file from this page?

ADD REPLY
0
Entering edit mode

For this purpose, dbSNP is definitely more appropriate than HapMap which does not cover all the common SNPs. Nonetheless, you may want to set a threshold to filter out SNPs with very low frequency.

ADD REPLY

Login before adding your answer.

Traffic: 1980 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6