Biostar Beta. Not for public use.
Question: Implement ensembl gene annotation pipeline for my assembly
3
Entering edit mode

I'm new in gene annotation, can any one help me to implement ensembl gene annotation pipelines. How can I implement the ensembl gene annotation pipeline for my data? ensembl is web based? is there any linux package? can I implement it on my server or their website? can anyone give any tutorial link?

TIA

ADD COMMENTlink 15 months ago Mbillah • 110 • updated 15 months ago WouterDeCoster 39k
Entering edit mode
1

It is unclear which data you have and what you aim to obtain. Please elaborate (e.g on file formats) and be specific.

ADD REPLYlink 15 months ago
WouterDeCoster
39k
Entering edit mode
0

I have paired read, contigs, scaffolds , gff file and now I want to annotate the gene like Protein coding genes, Small non coding genes, Long non coding genes, Other non coding genes, Pseudogenes, Gene transcripts.

ADD REPLYlink 15 months ago
Mbillah
• 110
Entering edit mode
1

Maybe this repo: https://github.com/Ensembl/ensembl-annotation suits you, but it isn't been finished.

ADD REPLYlink 15 months ago
hsiaoyi0504
• 40
Entering edit mode
1

Actually I don't understand how can I start, can you tell me how can I start? can you please explain this command

find . -name '*.p[l|m]' -exec perltidy -pro=perltidyrc -b {} \;

ADD REPLYlink 15 months ago
Mbillah
• 110
Entering edit mode
0

So what you have is an assembly and a gff file. Please change your post to make this more clear.

ADD REPLYlink 15 months ago
WouterDeCoster
39k
5
Entering edit mode

There is currently no easy way or stream-lined way to install the Ensembl annotation pipeline locally, therefore I do not recommend to even attempt this as a beginner. This doesn't mean it has to stay like this, Ensembl and EBI have been working on a distributed Ensembl infrastructure within Elixir which involves the EBI, Elixir-Norway and Sweden. Possibly, part of the outcome will be a Docker container that runs the whole annotation pipeline with documentation. Have a look at the webinar to see if you might be interested in testing it out anyway. If you want I can try to find out more about the current state of the Ensembl Docker images.

In the meantime I recommend to use the MAKER2 pipeline.

Update:

Unfortunately, it is unlikely that there will be an installable Ensembl annotation pipeline in a Docker container, or otherwise, in the foreseeable future. The efforts towards distributed Ensembl have mainly focussed on the services, like the genome browser and back-end. That means in summary it is only Ensembl that can run the Ensembl annotation pipeline. Also, the Ensembl annotation pipeline relies heavily on Protein evidence, while in your case you might mostly have RNA-seq evidence. For such, MAKER is more suitable.

ADD COMMENTlink 15 months ago Michael Dondrup 46k
Entering edit mode
1

Can the maker2 give these? Protein coding genes, Small non coding genes, Long non coding genes, Other non coding genes, Pseudogenes, Gene transcripts

ADD REPLYlink 15 months ago
Mbillah
• 110
Entering edit mode
0

Possibly not all of them, also what you can get depends on the evidence data you have. Prediction of de-novo non-coding genes is often pretty bad except for tRNA anyway. I recommend you start with a feasible approach first, e.g. looking at the MAKER docs and install some of its dependencies, then predict the protein coding genes and transcripts + tRNA. Don't try to solve everything in one go, getting MAKER to run is tough enough, try bioconda for installing dependencies.

ADD REPLYlink 15 months ago
Michael Dondrup
46k
0
Entering edit mode

You may convert your ensembl GTF into gene-based annotation table (tab-delimited). Then you can import this simple table in R or just use linux command-line tools to annotated your results.

Check out this post A: extract only geneID and gene symbol from GTF file

ADD COMMENTlink 15 months ago EagleEye 6.4k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0