Check programmaticaly if a protein is mitochondrial with annotations
1
0
Entering edit mode
3.7 years ago
lumal29 ▴ 80

Hi, I have a fasta file with more than 38000 protein sequences infered from a genome of Diplonema. All sequences have an ID and an annotation, but the ID is not referenced in any database. I need to check which protein is mitochondrial with the annotations.

Here is an example, with an ID and an annotation:

XXXXX12345 Succinyl-CoA ligase [ADP-forming] subunit beta

I know this one is mitochondrial, because I also used BLAST to check the similarities with the mitochondrial proteins from another organism. But I only know it, because I checked with Google what a "Succinyl-CoA ligase" was amongst the little subset (30 proteins) I found with BLAST.

But is there a way to check programmaticaly each annotations in the fasta file to see if it corresponds to a mitochondrial protein? Which ressource(s) can I use to at least see if proteins are mitochondrial?

Thanks in advance.

proteomics GO terms Annotations • 1.4k views
ADD COMMENT
0
Entering edit mode

Hi,

The only thing that I can think of and you can do, but not sure if is feasible neither the best option, is to build a mitochondrial database, and then map/align all the Diplonema genes/proteins against this database, and the ones that align against it, i.e., higher percent identity and lower e-value, will be assigned/annotated as mitochondrial. I believed there is a human mitochondrial database. Other thing that is possible, but I don't think that will work well, is to assigned/annotate a protein/gene as mitochondrial based on their gene/protein name (though you can have mitochondrial genes without annotation), comparing each gene/protein name against a list of mitochondrial genes/proteins.

António

ADD REPLY
0
Entering edit mode

Hi Antonio, Thank you for your advice! I actually did what you suggest, but with an organism closer in the evolution tree, called Andalucia. I took the proteins that I was sure was mitochondrial, made a database with Blast and use my proteins against it. By doing this, I found proteins like the one I talked about in my post. Thank you again, because it comforts me in what I'm doing!

ADD REPLY
2
Entering edit mode
3.7 years ago
Mensur Dlakic ★ 27k

It is not a perfect solution, but you can try predicting protein localization from sequence. For example:

Most of them should work well with mitochondrial proteins.

ADD COMMENT
0
Entering edit mode

Thank you Mensur for your answer. I already used 3 tools to predict the sequences. I used TargetP, Mitofates and PredSL. It's really hard to make a decision based on the results you get from these tools because they don't give the same results. If I look a positive prediction from the 3 tools together, I obtain more than 600 proteins over 38000, and if I look for a positive prediction from at least one tool, I have more than 4000 sequences. How can I decide then which to chose? What I did was taking mitochondrial proteins from another closed organism called Andalucia and check the accuracy of the tools. I had 33 proteins and it predicted 32 of them when I look for a positive prediction from at least one tool. So, as you said, it's not perfect, but it can give me a good idea perhaps. I saw new tools in the links you gave me, I will maybe try some. Thank you again for your help!

ADD REPLY

Login before adding your answer.

Traffic: 3243 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6