Biostar Beta. Not for public use.
Implement ensembl gene annotation pipeline for my assembly
3
Entering edit mode
2.2 years ago
Mbillah • 110
@Mbillah49537

I'm new in gene annotation, can any one help me to implement ensembl gene annotation pipelines. How can I implement the ensembl gene annotation pipeline for my data? ensembl is web based? is there any linux package? can I implement it on my server or their website? can anyone give any tutorial link?

TIA

gene annotation ensembl • 489 views
ADD COMMENTlink
1
Entering edit mode

It is unclear which data you have and what you aim to obtain. Please elaborate (e.g on file formats) and be specific.

ADD REPLYlink
0
Entering edit mode

I have paired read, contigs, scaffolds , gff file and now I want to annotate the gene like Protein coding genes, Small non coding genes, Long non coding genes, Other non coding genes, Pseudogenes, Gene transcripts.

ADD REPLYlink
1
Entering edit mode

Maybe this repo: https://github.com/Ensembl/ensembl-annotation suits you, but it isn't been finished.

ADD REPLYlink
1
Entering edit mode

Actually I don't understand how can I start, can you tell me how can I start? can you please explain this command

find . -name '*.p[l|m]' -exec perltidy -pro=perltidyrc -b {} \;

ADD REPLYlink
0
Entering edit mode

So what you have is an assembly and a gff file. Please change your post to make this more clear.

ADD REPLYlink
5
Entering edit mode
2.2 years ago
@Michael Dondrup55

There is currently no easy way or stream-lined way to install the Ensembl annotation pipeline locally, therefore I do not recommend to even attempt this as a beginner. This doesn't mean it has to stay like this, Ensembl and EBI have been working on a distributed Ensembl infrastructure within Elixir which involves the EBI, Elixir-Norway and Sweden. Possibly, part of the outcome will be a Docker container that runs the whole annotation pipeline with documentation. Have a look at the webinar to see if you might be interested in testing it out anyway. If you want I can try to find out more about the current state of the Ensembl Docker images.

In the meantime I recommend to use the MAKER2 pipeline.

Update:

Unfortunately, it is unlikely that there will be an installable Ensembl annotation pipeline in a Docker container, or otherwise, in the foreseeable future. The efforts towards distributed Ensembl have mainly focussed on the services, like the genome browser and back-end. That means in summary it is only Ensembl that can run the Ensembl annotation pipeline. Also, the Ensembl annotation pipeline relies heavily on Protein evidence, while in your case you might mostly have RNA-seq evidence. For such, MAKER is more suitable.

ADD COMMENTlink
1
Entering edit mode

Can the maker2 give these? Protein coding genes, Small non coding genes, Long non coding genes, Other non coding genes, Pseudogenes, Gene transcripts

ADD REPLYlink
0
Entering edit mode

Possibly not all of them, also what you can get depends on the evidence data you have. Prediction of de-novo non-coding genes is often pretty bad except for tRNA anyway. I recommend you start with a feasible approach first, e.g. looking at the MAKER docs and install some of its dependencies, then predict the protein coding genes and transcripts + tRNA. Don't try to solve everything in one go, getting MAKER to run is tough enough, try bioconda for installing dependencies.

ADD REPLYlink
0
Entering edit mode
2.2 years ago
EagleEye 6.4k
@EagleEye12958

You may convert your ensembl GTF into gene-based annotation table (tab-delimited). Then you can import this simple table in R or just use linux command-line tools to annotated your results.

Check out this post A: extract only geneID and gene symbol from GTF file

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3