Search dbSNP for hg19 based coordinates
1
1
Entering edit mode
8.0 years ago
curious ▴ 50

Hi,

I'm starting out new in bioinformatics. I have couple of questions on searching dbSNP.

  1. I'm searching (a list of rs# in batch mode in browser) dbSNP for SNP coordinates. dbSNP returns the coordinates in hg38 assembly build (I'm requesting a bed file for output format). I'd like to retrieve the coordinates in hg19 version. Is there a way to achieve this? dbSNP FAQ section doesn't mention if this could be done.

  2. I would also like to know if it's possible to search genotype information for a given SNP (rs#). I would also like this in batch mode.

Any help is appreciated!

dbsnp SNP • 14k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

A general comment, why are you using rs# to retrieve SNPs. SNPs IDs are not (1) fixed (2) stable. Instead use genomic position for integrity and reproducibility.

ADD REPLY
11
Entering edit mode
8.0 years ago

One way to do this is via the command line. You could download SNP annotations via wget. For example:

$ wget -qO- ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/common_all_20180423.vcf.gz | gunzip -c | convert2bed --input=vcf --output=bed --sort-tmpdir=${PWD} - > hg19.snp151.bed

Filter via grep for the SNP of interest. For example, to search on a single SNP ID:

$ grep -F rs554008981 hg19.snp151.bed
1       13549   13550   rs554008981     .       G       A       .       RS=554008981;RSPOS=13550;dbSNPBuildID=142;SSR=0;SAO=0;VP=0x050000000005000026000100;GENEINFO=DDX11L1:100287102;WGT=1;VC=SNV;ASP;KGPhase3;CAF=0.9966,0.003395,.;COMMON=1;TOPMED=0.99221139143730886,0.00778064475025484,0.00000796381243628

To search on a file of IDs, e.g. a list of SNP IDs in rsIDs.txt:

$ grep -fF rsIDs.txt hg19.snp151.bed > matches.bed
ADD COMMENT
0
Entering edit mode

Thanks for the quick response! I'm trying out the mysql interface. There seems to be connectivity issues. I'm guessing due to query timeout errors. I'll change the timeout settings and try.

ADD REPLY
0
Entering edit mode

I edited my answer to use wget instead of mysql, which should probably get around timeouts. Feel free to give that a try, if you like.

ADD REPLY
0
Entering edit mode

thanks Alex! using wget is a better approach. the most recent build available at UCSC hg19 database is snp144 while dbSNP batch query mode returns snp146 build. for now, I'm going with snp144.

ADD REPLY
0
Entering edit mode

I think my previous answer was inaccurate in that it did not adjust the start and stop positions to 0-based, half-open indexing. I updated my answer to use the current VCF file from NCBI, using convert2bed to convert from VCF to BED with the correct coordinate system adjustment. It is probably better to go directly to NCBI for SNPs, instead of using UCSC database files.

ADD REPLY
0
Entering edit mode

Hi Alex, What's the difference between All_20180423.vcf.gz and 00-All.vcf.gz in the link you mentioned?

ADD REPLY
2
Entering edit mode

If you compare the md5 signatures for each file, they are likely the same. The file with the date is available for accessing older versions of the "All" (and, correspondingly, other) sets of variants, as newer files are generated. The file without the date will be the currently available dataset. They appear to be identical at this time.

ADD REPLY
0
Entering edit mode

Is it sure that the third script will match the exact rs (e.g., rs554008981) and not a similar one (e.g., rs55400898123)? is the option -w useful in that context? (e.g., $ grep -fFw rsIDs.txt hg19.snp151.bed > matches.bed)

ADD REPLY
0
Entering edit mode

EDIT: Looks like Windows was messing with my text file. Running it through dos2unix sorted everything out.

Hi Alex,

I'm trying to follow your suggestions you made here, and am having some issues that I hoped you might be able to help with.

I'm using slightly newer files (dbsnp_138.b37.vcf) than the example.

If I manually search for a single SNP it works great

$grep -F rs2230577 dbsnp_138.b37.bed

But if I try to give the same command a list of rsID's it writes only the last rsID to the file.

$grep -Ff rsIDs.txt dbsnp_138.b37.bed > output.txt

If I reorder the grep command to -fF as in your answer I get an error

grep: F: No such file or directory

Any idea on how to fix this would be greatly appreciated.

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1933 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6