how to find SNP positions (for non-bioinformaticians)
5
1
Entering edit mode
8.9 years ago
CrazyB ▴ 280

First off, I apologize for posting this "old" inquiry. I know similar inquiries were put out before, but I am hoping to find new solutions to this inquiry.

I am trying to find the positions of a list of SNPs (given the rs#). Need a "new" solution.

What I have tried so far -

(a) sending a batch query to dbSNP at NCBI, which worked well in the past, but today ~10 hr after sending the batch query, no return of result yet ( is the server down ??)

(b) downloading all dbSNP positions from Biomart and hoping to do some "intersection" to find the positions for specific rs#. The download somehow was terminated prematurely (first download took ~ 1+ hr).

(c) downloading cruzdb. cruzdb was suggested as a solution in one of the earlier posts. I read the document and still could not run it - my apology ! (does running cruzdb require an understanding of python? which I currently don't possess) Having to say it though, in contrast to cruzdb doc, I had better luck with vcftools and plink thanks to their "more friendly" documents.

Is there any other solutions that allow non-bioinformaticians to find answers to this task (i.e. positions for a list of SNPs)?

I certainly hope to get some useful responses, but It's understandable if the admin chooses to close this thread (due possibly to "duplication of questions"). Thank you

snp dbsnp BioMart • 6.0k views
ADD COMMENT
0
Entering edit mode

How many rs# you have got? You can give UCSC table browser and give a list of rsIDs (< 1000) and select whatever information you need in the output file.

ADD REPLY
0
Entering edit mode

Thanks. Will try UCSC table and see how it runs. I have only ~ 1000 rs, so wasn't sure why dbSNP database failed me yesterday.

ADD REPLY
3
Entering edit mode
8.9 years ago

Hi, as suggested by Ashutosh Pandey, you can exploit UCSC table browser. Select genome and assembly of interest and from group menu select Variation. From track menu select All SNPs(142). Paste or upload your rs ids using buttons at identifiers (names/accessions). To export your results, select selected fields from primary and related tables from output format then click get output. In the next step you can select fields of interest (i.e. input rds, chromosome, genomic position) that will be included in final output table, click get output to retrieve it.

ADD COMMENT
4
Entering edit mode
8.9 years ago
Emily 23k

Don't download all the variants from BioMart whatever you do! There are >114M variants in human and BioMart cannot do that – that's why it's failing. Use the Variation database and filter by Variation name, then input your list of IDs.

ADD COMMENT
1
Entering edit mode
8.0 years ago

You can use the mysql client to download a BED file containing the SNP position and rs* ID:

$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -N -D hg19 -e 'SELECT chrom, chromStart, chromEnd, name FROM snp144Common' > snp144Common.bed

This download took about 4-5 minutes to complete.

The complete schema for the snp144Common table is available from UCSC here — in the example above, we retrieve data for the chrom, chromStart, chromEnd and name fields. You can add other fields to the example command if they are useful to you, such as observed and func annotations, etc.

Once you have a list of SNPs, you can use awk to find the position of a single SNP, given the ID.

For example:

$ awk -v id='rs10409603' '{ if ($4 == id) { print $0; exit; } }' snp144Common.bed
chr19   8313572 8313573 rs10409603

If you have a list of IDs, you can use grep -F -f <filename> and pass in a file containing a list of IDs to do fixed-string (quick) searches against.

For example:

$ grep -F -f list-of-SNP-IDs.txt snp144Common.bed > answer.bed

Learning a few basics of doing things on the command line will pay massive dividends, in the long term.

ADD COMMENT
0
Entering edit mode

Could you tell what the difference between snp144.txt.gz and snp144Common? The former contains more than 130 million SNPs while snp144Common only contains 14760200 SNPs. Thank you very much!

ADD REPLY
0
Entering edit mode

I think this page answers my question: http://genome.ucsc.edu/goldenPath/newsarch.html Thank you!

ADD REPLY
0
Entering edit mode
8.2 years ago

For non-bioinformatician: I would use Knime.org

ADD COMMENT
0
Entering edit mode
8.0 years ago

Dear,

I have created a software for myself a year ago for visualizing SNVs for a specific gene name. Whether you use a VCF file, or a variant file from Biomart, you can generate these graphs for a given gene. If your SNPs/SNVs are gene based, you can easily generate graphs like this:

http://i-pv.org/EGFR.html

http://i-pv.org/JAK2.html

Maybe it helps,

ADD COMMENT

Login before adding your answer.

Traffic: 1748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6