Hello,
I'm trying to analyze exome data in a .vcf file with exome. Every row in this file has a column with the chromosome, position, ref sequence and alt sequence as well as a ton of other information, i.e. chr14 105258893 A G
How can I go about transforming this data to the HGVS format such as NM_
? For example, if I search for chr14 105258893
on Google, this link from dbSNP comes up: https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=rs2494749
Here I can see that this SNP is labeled rs2494749
and on the right I see every HVGS annotation, i.e. NM_001014432.1:c.46+42T>C
. If I then type NM_001014432.1:c.46+42T>C
into a service like Varsome, I can see a TON of information: https://varsome.com/variant/hg19/NM_001014432.1%3Ac.46%2B42T%3EC - this link shows me the original mutation from A to G, the chromosome, and the original position as well all sorts of cool stuff.
I'm looking for a package in Python (preferably) or R (if I have to) that I can use to transform chromosome number, position, ref and alt into this HGVS format without all this tedious Google searching. Most of the info I've found only formats existing HGVS annotations. I've browsed Biostars and found people trying to do the reverse of what I'm asking - converting HGVS back to genetic coordinates.
Also I'm wondering why one single variant has so many different annotations? Is one better or more informative than the other? If I'm trying to get a single annotation out of this mapping, which one do I choose?
Thank you!