RefSeq Version Numbers/Mapping File
1
1
Entering edit mode
9.2 years ago
pwg46 ▴ 540

Hello,

I notice that the refSeq db's data files contain refSeq transcripts, proteins, etc. with version numbers. I am wondering approximately how often these version numbers change? Also, is it likely that two Refseq transcripts, which are the same transcript (but different versions), would have different sequences if they are both GrCH38 annotations?

Also, I am looking for a data file which maps refSeq transcripts to proteins, but also takes into account version numbers. I know Biomart maps refSeq transcripts to proteins, but it doesn't end the transcripts/proteins with their version numbers (it only chooses the latest versions).

Thanks

transcript mapping version refseq protein • 3.3k views
ADD COMMENT
0
Entering edit mode
9.2 years ago
Prakki Rama ★ 2.7k

how often these version numbers change?

To find out the changes of version, you can look for Revision history in Display settings, under the search box.

Is it likely that two Refseq transcripts, which are the same transcript (but different versions), would have different sequences?***

Possible. For example, XM_003440720 and NM_001279661 are two different versions of the same nucleotide sequence. XM_003440720 is now obsolete which was previous version of NM_001279661. They are not completely different sequences in a strict sense, but the new one seems to be improved version, with additional bases to the previous one.

looking for a data file which maps refSeq transcripts to proteins, but also takes into account version numbers.

One way to do this is by using eutils. In terminal:

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=XM_003440720,NM_001279661&retmode=text" | \
  grep 'accession "' | \
  sed 's/          accession "//g' | \
  sed 's/" ,//g' | \
  egrep "NP|XP" | \
  while read IDS ; do curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=${IDS}&retmode=text&rettype=fasta"; done;

The output includes older version as well as newer version protein sequences of above mentioned ID's XM_003440720, NM_001279661.

>gi|348506442|ref|XP_003440768.1| PREDICTED: 40S ribosomal protein S12 [Oreochromis niloticus]
MAEEGRQAHLCVLAANCDEPMYVKLVEALCAEHQINLIKVDDNKKLGEWVGLCKIDREGKPRKVVGCSCV
VVKDYGKESQAKDVIEEYFKSKK

>gi|525343327|ref|NP_001266590.1| 40S ribosomal protein S12 [Oreochromis niloticus]
MAEEGSPAGGVMDVNTALPEVLKTALIHDGLAPGIREAAKALDKRQAHLCVLAANCDEPMYVKLVEALCA
EHQINLIKVDDNKKLGEWVGLCKIDREGKPRKVVGCSCVVVKDYGKESQAKDVIEEYFKSKK
ADD COMMENT

Login before adding your answer.

Traffic: 1493 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6