How to find if a rsID is exomic ?
4
1
Entering edit mode
4.9 years ago

Hello, I impute, annotate a vcf file and now I want to filter SNP which are exomic. How to find if an gene name (or rs ID or position) is exomic ?

gene • 1.2k views
ADD COMMENT
0
Entering edit mode

Thank you for all your proposals, I will analyse them.

ADD REPLY
2
Entering edit mode
4.9 years ago
Emily 23k

How did you "annotate" it? If you ran a programme like the VEP then the information you need is in the variant consequences.

ADD COMMENT
0
Entering edit mode

I would like to say put rsID for all variants thanks to GATK

ADD REPLY
0
Entering edit mode

OK, then run it through the VEP

ADD REPLY
1
Entering edit mode
4.9 years ago
mks002 ▴ 220

Check the link

EVA

ADD COMMENT
1
Entering edit mode
4.9 years ago
ATpoint 82k

Get the coordinates of the RS or the gene name and intersect with an annotation file (GTF). Please use the search function for it, this has been asked many times before.

A: how to get intronic and intergenic sequences based on gff file?

ADD COMMENT
1
Entering edit mode
4.9 years ago

Another answer via biomaRt:

snps <- c("rs6025", "rs424964","rs199473684")

require(biomaRt)

ensembl <- useMart("ENSEMBL_MART_SNP", dataset = "hsapiens_snp")

out <- getBM(
  attributes=c("refsnp_id", "chr_name", "chrom_start", "chrom_end",
    "allele", "mapweight", "validated", "allele_1", "minor_allele",
    "minor_allele_freq", "minor_allele_count", "clinical_significance",
    "synonym_name", "ensembl_gene_stable_id", "consequence_type_tv"),
    filters = "snp_filter",
    values = snps,
    mart=ensembl,
    uniqueRows=TRUE)

This will return a lot of information, some of which you don't need for your purpose (so, eliminate what you dont need from theattributes` parameter). You can infer an exonic rs ID in various ways, one being the final column, consequence_type_tv

unique(out[,c("refsnp_id","ensembl_gene_stable_id", "consequence_type_tv")])
     refsnp_id ensembl_gene_stable_id           consequence_type_tv
1  rs199473684        ENSG00000257529                intron_variant
2  rs199473684        ENSG00000102393           3_prime_UTR_variant
3  rs199473684        ENSG00000102393        NMD_transcript_variant
4  rs199473684        ENSG00000102393                intron_variant
5  rs199473684        ENSG00000102393 non_coding_transcript_variant
6  rs199473684                LRG_672                intron_variant
43    rs424964        ENSG00000257636 non_coding_transcript_variant
44    rs424964        ENSG00000257636                intron_variant
49      rs6025        ENSG00000198734              missense_variant
50      rs6025                LRG_553              missense_variant

See here, also: A: How to retrieve Gene name from SNP ID using biomaRt

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6