Question

Retrieve mutation position and ID for a mutation in hgvs format

0

Entering edit mode

9.7 years ago

vigprasud ▴ 60

How can I find a mutation's chr pos and id represented in HGVS format?

Eg:

Gene: TMEM231    cdna_Change: NM_001077418.1:c.582+3A>G    protein_change: p.?

The mutations are represented in HGVS format. How and where can I find the rs#, chr and pos for this particular mutation.

I have a set of 10000 mutations and would like to annotate then with their chr, pos and rs#

mutation hgvs annotation dbsnp • 8.5k views

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.7 years ago by vigprasud ▴ 60

0

Entering edit mode

What programming languages do you know? This could be done in R (and presumably biopython/bioperl) relatively easily.

ADD REPLY • link 9.7 years ago by Devon Ryan 104k

0

Entering edit mode

I know python and R

ADD REPLY • link 9.7 years ago by vigprasud ▴ 60

1

Entering edit mode

If VEP doesn't work for you, then you can do this in R. The general steps would be to:

Load this file as a dataframe and parse the cdna information to split the ID from the position information.
Load a txdb that contains these IDs (they don't all).
You can then just apply a function to each transcript to calculate the cDNA position of each exon (you'd just use the 5' or 3' most coordinate).
Now you have numbers you can compare, so you'll need to apply a function to extract the appropriate transcript and then just determine (A) which exon it would be in (or intron following an exon as in your example) and then (B) increment/decrement the genomic position of said exon by the appropriate offset.

ADD REPLY • link 9.7 years ago by Devon Ryan 104k

0

Entering edit mode

The Ensembl VEP should work fine for this, as it does accept HGVS notations on RefSeq transcripts as input. No need for any programming. The documentation for the VEP is excellent!

ADD REPLY • link 9.7 years ago by Bert Overduin ★ 3.7k

2

Entering edit mode

8.5 years ago

Reece ▴ 310

Also consider the Python hgvs package. [Disclosure: I'm one of the authors.]

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 8.5 years ago by Reece ▴ 310

0

Entering edit mode

Well as a non-python user I didn't like this tool initially but given it's well-written documentation, I was able to follow. Posting an example just in case if someone like me is struggling.

I had a NC IG as follows: "NC_000002.11:g.113890610C>T" and I wanted NP IDs for the same

## Initializing hgvs shell
hgvs-shell
 var_g = parse("NC_000002.11:g.113890610C>T") 
transcripts = am37.relevant_transcripts(var_g)

In [27]: for ac in sorted(transcripts):
    ...:     var_t = g_to_t(var_g, ac)
    ...:     var_p = t_to_p(var_t)
    ...:     print("-> " + str(var_t) + " (" + str(var_p) + ") ")

This returned all the NP IDS

ADD REPLY • link 3.3 years ago by rohitsatyam102 ▴ 840

1

Entering edit mode

9.7 years ago

Vivek ★ 2.7k

The lesser mentioned yet totally awesome webpage

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.7 years ago by Vivek ★ 2.7k

0

Entering edit mode

I see that mutalyzer works for converting rsIDs to HGvs but not the other way around.

ADD REPLY • link 7.1 years ago by Ritika • 0

Ram · Accepted Answer · 2014-07-24

6

Entering edit mode

9.7 years ago

Jeremy Leipzig 22k

I would try VEP

ADD COMMENT • link updated 2.4 years ago by Ram 43k • written 9.7 years ago by Jeremy Leipzig 22k

0

Entering edit mode

I second VEP, it works quite well for this. I forget what the limit is for the number of variants through the online web interface but you can either do it that way in batches or do it through the command-line version. You just have to switch from the default and you can put in HGVS mutations using RefSeq sequences