Simplest genome protein annotation pipeline possible
1
1
Entering edit mode
6.8 years ago

I'm often playing with draft genomes of non-model species (mostly in fishes) and we need to annotate these genomes. In cases like this, we do not really care about putative proteins that are based on ORFs or any ab-initio methods.

What we really need is to get a GFF3 annotation file listing known proteins (from swissprot, for example) with an accompanying .csv file that gives more informations about the proteins (scaffold, position, protein name, etc).

What would the simplest approach be to achieve that goal while treating intron/exons properly and producing annotations like (gene, cds, exon, utr...)?

Right now, I am considering a workflow like this:

  1. Repeat Masker
  2. PASA
  3. EVidence Modeler (EVM)

And skipping anything to do with ab-initio detection (augustus, exonerate...)

Am I missing a simpler approach? The approach needs to work for eukaryote genomes (~1-3 Gbp).

EDIT: Ah well... Please do not suggest MAKER 1 or 2. I am not going to use MAKER unless my actual survival depends on it ;)

genome annotation proteins • 2.7k views
ADD COMMENT
0
Entering edit mode

In the end, it looks like Maker is still the best/correct approach... Eukaryote Genome Annotation needs some serious streamlining.

ADD REPLY
1
Entering edit mode

Isn't "eukariotic genome annotation" and "simple" an oxymoron (unless there is a not between them)?

I never used them (and they seem to be anything but simple), but do you know JAMg and JAMp?

ADD REPLY
0
Entering edit mode

Yes, I know firsthand that genome annotation and simple don't go hand in hand ;)

I'm investigating JAMg. Thanks for the suggestion!

ADD REPLY
0
Entering edit mode

JAMg looks a bit more complex than our current pipeline (which fails) and depends on some of the same software that fails on our genomes... ¯\_(ツ)_/¯

ADD REPLY
0
Entering edit mode
6.7 years ago

I ended up developing a genome annotation pipeline based on suggestions from a colleague. You can find more about it in this Biostar post: GAWN - Genome Annotation Without Nightmares GAWN - Genome Annotation Without Nightmares

ADD COMMENT

Login before adding your answer.

Traffic: 2614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6