Biostar Beta. Not for public use.
Question: how to relate protein id(protein sequence) to genbank file seuqnce
Entering edit mode

Hi everyone,

I am completely new to bioinformatics and I'm working on a project about tomato. So I have used some package to identify the orthologs of S.pennellii to transcription factors of S.lycopersicum. I did that by aligning the S.lycopersicum's transcription factor protein sequences against all the protein sequences (fasta file on ncbi) of S.pennellii.

Now I basically have something like this

Solyc07g053610.2.1 100%,Sopen07g027560.100%

What I want to do about these protein ids is that I want to relate them to genbank file (nucleotide sequences), does anyone have any idea how can I do this? These protein id may not be compatible with the genbank files as they having different naming system? Thank you very much

ADD COMMENTlink 4.0 years ago kws15 • 40 • updated 17 months ago RamRS 21k
Entering edit mode

It seems that these protein identifiers have only been used internally by ITAG (international tomato annotation group) but never submitted to Genbank.

There is currently only one full genome of tomato in Genbank. It has seen some upgrades in recent years, but with every upgrade the chromosomal coordinates are shifted.The latest assembly from ITAG is available as a NCBI refsequence. This refsequence has been automatically reannotated by NCBI, but the original ITAG annotation can be downloaded from


The GFF file can be grepped for the position of protein Solyc07g053610 in the chromosomal DNA sequence:

awk '$3~/gene/ && $9~/Solyc07g053610/' ITAG2.4_gene_models.gff3 | sed 's/SL2.50ch07/NC_015444.2/'

NC_015444.2     ITAG_eugene     gene    62033451        62049779        .       +       .       ID=gene:Solyc07g053610.2;Name=Solyc07g053610.2;Alias=Solyc07g053610;from_BOGAS=1;length=16329

Table on mapping between chromosome numbers and NCBI refsequence accessions here

ADD COMMENTlink 4.0 years ago piet ♦ 1.7k • updated 17 months ago RamRS 21k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0