GO annotation from EnsemblPlant and JGI
2
0
Entering edit mode
5.6 years ago
ytlin610 ▴ 70

Hi, does anyone know how are the GO terms assigned to genes by EnsemblPlant? I'm working on a GO enrichment analysis with my Chlamydomonas RNA-seq data, and I've downloaded two GO annotation tables from these two resources: EnsemblPlant and JGI-Phytozome. However, I found that the numbers of genes that can be assigned to any GO terms in the two annotation tables are different. For example, there are 9,687 genes which have matching GO terms in the annotation table from EnsemblPlant, while the JGI one only has 6,181 genes with matching GO terms. I tried to find the relationship between the two gene lists and here's how it looks like:

https://imgur.com/a/W03zd9u

It seems for the GO annotation of Chlamydomonas, EnsemblPlant somehow has way more GO annotated genes than JGI. Given that they all have the same total gene number and assembly version (v5.5), I'm very curious about the reason behind it and which one may be more reliable?

Here're the links to both resources:

Thank you!

JGI Ensembl GO chlamydomonas • 2.8k views
ADD COMMENT
1
Entering edit mode
5.6 years ago

Hey,

So Ensembl generate the GO term annotations based on links with UniProt protein identifiers, further details here: http://www.ensembl.org/Help/View?id=285. I cannot seem to find any information about how Phytozome carry out these annotations for their site. The genome version (v5.5) and annotation (v.11.6) and gene count seems to be the same for both resources so I can only assume the way that they are matching the GO terms to the genes is different.

You can see the details for the assembly in Ensembl here: http://plants.ensembl.org/Chlamydomonas_reinhardtii/Info/Annotation/#assembly

Hope that helps. If you cannot find the information about GO term annotation in Phytozome you may wish to contact them directly, you can find the link at the bottom left of their webpages.

ADD COMMENT
1
Entering edit mode
5.6 years ago
ytlin610 ▴ 70

Hi Erin,

Thank you very much for the answer, it's very helpful.

I finally found that there is a file in Phytozome called Creinhardtii_281_v5.5.readme.txt stating that: Gene Ontology terms (NOTE: these are automated results from interpro2go in most genomes, not empirically derived).

So it seems the two resources are using different pipelines (UniProt for Ensembl and InterPro2go for JGI) for protein sequence analysis and GO assignment. I have tried to analyze several unknown proteins of Chlamydomonas with these two protein predictors and the results are quite different.

For example, Cre01.g005100 has no predicted protein family or functional domains in IntePro, and there's no GO term assignment for this gene in JGI-Phytozome. However, a transmembrane domain can be identified by UniProt, and it's assigned with a GO term GO:0016021 (integral component of membrane) in Ensembl.

I think that's why the GO assignment tables from the two resources are quite different.

Thank you!

ADD COMMENT

Login before adding your answer.

Traffic: 2917 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6