I am looking at this variant: http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=754264214 and it is represented as NM_000309.3:c.-8-11C>T
The gene PPOX (NM_000309.3) has a 5' UTR, of which 8 bases lie in first CDS exon. Usually, upstream variants are denoted by a c.-[number_of_bases][REF]>[ALT]
, but here I see c.-8-11
, which could indicate that the variant is 19 upstream of CDS start, and 11 upstream of CDS_Exon-1.
Is this representation canon/acceptable? Can we insert random pieces of information into standardized nomenclature for supposed clarification?
P.S: Let us ignore the part where an NM identifier is followed by a negative co-ordinate - I am not sure if that's valid, but I see it quite often. The problem of course, is that NM sequences may not always have the REF base at the upstream position - they may only include the sequence from a later position.
That nomenclature makes absolutely no sense to me, though people seem really good at coming up with the craziest HGVS representations.
But does dbSNP not curate entries? I mean, expecting them to follow the latest in HGVS rules might be problematic, but this?