How to collect SNP dataset from databases for SNP analysis?
2
0
Entering edit mode
5.3 years ago
arr234 ▴ 40

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5449402/ - In this study, it is mentioned that human APOE gene has 183 validated SNPs out of which 31 are missense, 21 are synonymous, 2 are nonsense, 98 are intronic, 7 are 5′ UTR, 6 are 3′ UTR, 7 are downstream, 8 are upstream, 1 is splice donor and 2 are splice acceptor variants. This data is collected using dbSNP. I would like to know how to collect these validated SNP dataset from dbSNP.

SNP databases • 1.6k views
ADD COMMENT
0
Entering edit mode

ExSNP database (http://www.exsnp.org/DZeQTL)

ADD REPLY
1
Entering edit mode
5.3 years ago

The link provided by maryamtavasoli71 relates to eQTL studies, which is not what you want.

If you are not comfortable using the command line and in working with the dbSNP data locally, then you can just use the Ensembl Genome Browser to look up all variants in a particular gene. HERE is a search configured for APOE:

g

-----------------------------------------

Click on the Excel® sheet icon (at right) in order to download the data as CSV: j

The data contains scores from in silico predictors, like SIFT, PolyPhen, MutationAssessor, CADD, etc. I did my own quick filtering and more or less identified ~200 'damaging' variants in the gene.

Kevin

ADD COMMENT
1
Entering edit mode
5.3 years ago

using mysq ucsc

$ mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -P 3306 -D hg38 -e 'select func,valid,count(*) from snp142 where chrom="chr19" and chromStart>=44905749 and chromEnd<=44909395 group by func,valid'
+----------------------------+--------------------------------------------------------+----------+
| func                       | valid                                                  | count(*) |
+----------------------------+--------------------------------------------------------+----------+
| coding-synon               | unknown                                                |       13 |
| coding-synon               | by-frequency                                           |        1 |
| coding-synon               | by-1000genomes                                         |        3 |
| coding-synon               | by-cluster,by-1000genomes                              |        2 |
| coding-synon               | by-frequency,by-1000genomes                            |        2 |
| intron                     | unknown                                                |       25 |
| intron                     | by-cluster                                             |        4 |
| intron                     | by-1000genomes                                         |       33 |
| intron                     | by-cluster,by-1000genomes                              |        3 |
| intron                     | by-frequency,by-1000genomes                            |       24 |
| intron                     | by-cluster,by-frequency,by-1000genomes                 |       11 |
| near-gene-5                | by-cluster,by-frequency,by-1000genomes                 |        1 |
| nonsense                   | unknown                                                |        2 |
| missense                   | unknown                                                |       28 |
| missense                   | by-cluster                                             |        8 |
| missense                   | by-1000genomes                                         |       11 |
| missense                   | by-frequency,by-1000genomes                            |        7 |
| missense                   | by-cluster,by-frequency,by-1000genomes                 |        1 |
| missense                   | by-cluster,by-frequency,by-2hit-2allele,by-1000genomes |        2 |
| missense                   | by-frequency,by-hapmap,by-1000genomes                  |        1 |
| intron,missense            | by-1000genomes                                         |        1 |
| intron,missense            | by-frequency,by-1000genomes                            |        1 |
| intron,missense            | by-cluster,by-frequency,by-hapmap,by-1000genomes       |        1 |
| frameshift                 | unknown                                                |        1 |
| cds-indel                  | unknown                                                |        2 |
| untranslated-3             | unknown                                                |        2 |
| untranslated-3             | by-1000genomes                                         |        1 |
| untranslated-3             | by-frequency,by-1000genomes                            |        3 |
| untranslated-5             | by-1000genomes                                         |        3 |
| intron,untranslated-5      | by-1000genomes                                         |        1 |
| intron,untranslated-5      | by-frequency,by-1000genomes                            |        1 |
| near-gene-5,untranslated-5 | by-1000genomes                                         |        1 |
| splice-3                   | unknown                                                |        1 |
| splice-3                   | by-cluster                                             |        1 |
| splice-5                   | by-cluster                                             |        1 |
+----------------------------+--------------------------------------------------------+----------+
ADD COMMENT

Login before adding your answer.

Traffic: 2359 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6