next generation of BLAST2GO: software recommendations needed
2
0
Entering edit mode
8.4 years ago
jeremy.cox.2 ▴ 130

Hey everyone,

I am trying to find multiple methods to annotate assembled mRNA sequences, which I am considering as candidate gene transcripts. So my question

An obvious choice is BLAST2GO, and I have read about the CAFA project (http://www.nature.com/nmeth/journal/v10/n3/pdf/nmeth.2340.pdf), which 3 years ago sampled some 54 in-development next-generation GO annotation systems. I would guess after 3 years, some of these might have been turned into tools, but finding said tools is difficult.

Does anyone know of or recommend any GO annotation tools?

RNA-Seq GO annotation • 2.4k views
ADD COMMENT
1
Entering edit mode
8.4 years ago
pld 5.1k

http://trinotate.github.io/

It isn't too hard to string together something on your own if you can't find anything that foots the bill.

ADD COMMENT
1
Entering edit mode
8.4 years ago
cyril-cros ▴ 950

First, interproscan / interpro2go,... You often only get the molecular function though (like this is a DNA binding protein, but you do not know what functional pathway it affects), as you are looking at protein domains/large families.

UniprotKB/Swissprot is better (curated protein database), but you don't have that many proteins available. You can do a tabular blast against UniprotKB and and retrieve GO codes using the GO Lead SQL database. Even better, you can query the go annotation evidence. Other protein databases may give you a name but no GO terms.

Last, but not least, you can try to scan the relevant HMMs in Eggnog (I am not sure if Trinotate do that).

http://geneontology.org/page/go-annotation-standard-operating-procedures could also be a start. Some consortiums are much more organized than others and have better annotations (http://www.wormbase.org/, not to name it). Some GO terms may be assigned a bit too hastily, keeping track of the source you used is a must.

Make no mistake, Trinotate is just a Perl wrapper around a SQL database that does the job of summarizing what you got from your searches of various databases, and can export it in a convenient format. I have a few bash scripts that do a much poorer job, but it is similar in spirit. Did I say Blast2GO (free version) is waaaaay slower than a direct search on your hard drive/nearest cluster, since it uses public servers?

I really like what they did in this paper on crocodiles: http://www.sciencemag.org/content/346/6215/1254449. In the functional annotation part of their Supplementary Materials, they explained how they proceed. Their annotation is easy to parse, they use clear codes and show their matches.

ADD COMMENT
0
Entering edit mode

Not sure what you're getting at with Trinotate, all of these programs function in more or less the same way. It does assign eggNOG annotations.

Another option would be to map GO terms via PFAM hits.

ADD REPLY
0
Entering edit mode

My point is that Trinotate or Blast2GO are useful programs, which both build an annotation report after parsing the output of searches against third party sources. What I wanted to do is to introduce these ressources and some potential pitfalls.

  • You can do a blast search against curated proteins with known GO terms (UniprotKB/Swissprot), find orthologs (Eggnog/OrthoDB), identify protein families or domains with HMM models (Interpro covers this).
  • There are different kinds of evidence for GO terms. Manual assignment can be trusted, automatic annotation might be wrong.
  • Orthology is not usually transitive, but Eggnog and OrthoDB use hierarchical groups and can be trusted
  • Synteny and comparison with a close relative is also another source of information
ADD REPLY

Login before adding your answer.

Traffic: 2749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6