Biostar Beta. Not for public use.
How does RefSeq Calculate their Positions?
1
Entering edit mode
5.3 years ago
pwg46 • 370
United States

I have an Ensembl Transcript, ENST00000029410 with a mutation CT at a 1-based position of 808. Mapping this transcript to a protein is very simple, as the transcript's position is its position in its coding chunks, so the mutation's position on the protein is simply ceil(808/3)=270.

This transcript also maps to 3 Refseq transcripts (according to Ensembl's Biomart): XM_006714816, XM_006714815, and XM_005265805. I assumed a RefSeq transcript (XM or NM) represents the entire Ensembl transcript that maps to it, so I expected the position 808 on ENST00000029410 to also map to position 808 on each of the three Refseq transcripts. However, instead they mapped to three different positions: 833, 1044 and 1319, respectively. Where are these positions coming from? And how can they be used to find the mutation position on the resulting protein? Clearly dividing these positions by 3 does not result in a position of 270 on the resulting protein.

ADD COMMENTlink
1
Entering edit mode
13 months ago
UK, Hinxton, EMBL-EBI

An NM or XM may represent an entire Ensembl transcript, but it may not. These 3 XMs are an example of the latter. See the pairwise alignment on the Ensmebl browser between ENST00000029410 and XM_005265805.1, XM_006714815.1, XM_006714816.1 The XMs have different lengths hence the different target %ID and query %IDs. On the other hand the NM is a perfect match with ENST00000029410, so I'd expect the variant to be at position 808 of the NM entry but not at position 808 of the XMs. I'd not take the XMs into consideration, as they are predicted mRNAs. I'd just work on the NM or better still I would just worry about ENST00000029410 and try to get the corresponding affected aa residue from that transcript only. By the way, VEP can easily tell you which position of the protein is affected by your variant. See this example

ADD COMMENTlink
0
Entering edit mode

Hmm, interesting. But NM's are not always perfect matches with an ENST that maps to it? I'm basically just trying to figure out how to connect RefSeq transcript to my node graph of other identifiers (ENST,ENSG,ENSP,grch38 chromosome,uniprot, refseq protein). But in order to connect Refseq transcript to a node, I have to be confident about not only my ID conversions, but also my position conversions. Perhaps I could just connect it to Grch38 chromosome instead of to ENST? Would you happen to know of a file that converts Refseq transcript to their chromosomal positions? I have been looking through Refseq's DB to no avail.

ADD REPLYlink
0
Entering edit mode

There can be cases where the NMs aren't be a perfect match to an ENST. You can get the genomic coordinates for the NMs on this GFF3 file from NCBI. You can also get the start and end coordinates of the RefSeq transcripts using the Ensembl REST API. See this example

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1