Biostar Beta. Not for public use.
How to interpret the result of GO analysis using Ontologizer / mapping GO IDs to GO TERMS ?
0
Entering edit mode
19 months ago
jack • 760
Germany
I have done gene ontology enrichment analysis using  Ontologizer. the output is like this :

ID    Pop.total    Pop.term    Study.total    Study.term    Pop.family    Study.family    nparents    is.trivial    p    p.adjusted    p.min
GO:0000000    15117    15075    3743    3733    0    0    0    true    1.0    1.0    1.0
GO:0008800    15117    3    3743    1    4    1    1    false    0.7500000000000001    0.7500000000000001    0.25000000000000006
GO:0052547    15117    23    3743    6    670    208    3    false    0.7699531541574813    0.7699531541574813    3.789971590544467E-43
GO:0000003    15117    1    3743    1    11028    2874    1    false    0.26060935799429585    0.26060935799429585    9.067827348505459E-5
GO:0052548    15117    22    3743    5    280    100    2    false    0.9446003360272346    0.9446003360272346    3.811732507786439E-33

How can I have translation of GO terms? what does this table means ?

ADD COMMENTlink
2
Entering edit mode
16 months ago
EagleEye 6.4k
Sweden

You can use simple bash script: I hope this should work

go_convert.sh

----------------------------------------

#!/bin/bash

GOlist=(cat $1 | awk '!x[$0]++' | cut -f $3)

for i in "${GOlist[@]}"
do

_cat $2 | grep "$GOlist" >> GO_mapped.txt_

done

----------------------------------------------

Run:

./go_convert.sh <YOUR_column_number having_GO_IDs>

ADD COMMENTlink
1
Entering edit mode

Sorry use this:

!/bin/bash

GOlist=(cat $1 | cut -f $3 | awk '!x[$0]++')

for i in "${GOlist[@]}"
do

cat $2 | grep "$i" >> GO_mapped.txt

done

I have ran it on sample files and got the results, check out: ./go_convert.sh input_file.txt sample_go_db.txt 1

bioinformatics.kandurilab.org/biostars/files/mapping_ids.zip

bioinformatics.kandurilab.org/biostars/files/mapping_ids.zip

ADD REPLYlink
0
Entering edit mode

Thanks. which GO_DB_FILE_FROM_github should I use ? there are few files there. and can I ask how you have generated this files which are in GitHub?

ADD REPLYlink
1
Entering edit mode

This file will have all biological_process, molecular_function and cellular_components gene_association.grouped.annotated140122_new.txt

ADD REPLYlink
1
Entering edit mode

Those files are generated from geneontology.org which are being used by the tool GeneSCF.

ADD REPLYlink
1
Entering edit mode
16 months ago
EagleEye 6.4k
Sweden

You can try this tool which gives the results in more detailed manner. If you are working on Human and Linux system, this tool will be useful for you: https://www.biostars.org/p/108669/

Or still if you want to translate the IDs which you got, use http://geneontology.org/ and search your GO ID there.

Update: GeneSCF now supports all organisms/species from KEGG and Gene Ontology repository.

ADD COMMENTlink
0
Entering edit mode

I want to translate them, but the question is that, how can I do that in automated manner ? because there are lot's of GO ids for my gene cases (2000) and it's not feasible to copy and paste them in the genen ontology website to search them individually.

ADD REPLYlink
0
Entering edit mode

You can use this annotation files from GeneSCF to map it, if you are familiar with playing with files: https://github.com/santhilalsubhash/geneSCF/tree/master/annotation

ADD REPLYlink
0
Entering edit mode

my organism is not model organism and I had to prase everything by myself, now I have the enrichment of GO ids and I need to translate them, but I don't know exatly how to parse it and which files I should use. can you help on it bit more ?

ADD REPLYlink
1
Entering edit mode
3.2 years ago
SES 8.2k
Vancouver, BC

This information is all in the documentation. Click "Help" and then "Help Contents..." Honestly, I'm confused how you got this far without knowing what these fields are, such as the population and study IDs. These would have to be created before the analysis, so you might want to think about whether these results are exactly what you want to test. From the docs:

GO id: The accession number of the GO term
Name: The name of the GO term
NSP: The namespace, or subontology: biological process (B), cellular component (C) or molecular function (F)
P-value: The nominal (uncorrected) P-value resulting from the observed overrepresentation of the GO term
Adj. P-Value: The adjusted P-Value (adjusted by the MTC procedure chosen by the user)
Pop. Count: The number of genes in the population set that are annotated to the GO term in question
Study Count: The number of genes in the study set that are annotated to the GO term in question

If you want to know the definition of your GO term, search it on QuickGO. For example, https://www.ebi.ac.uk/QuickGO/GSearch?q=GO:0008800

ADD COMMENTlink
0
Entering edit mode

I Know what the population set, study set,... what I need is an automated way to translate the GO ID to their concepts like Glycolysis.... and because my study case is around thousounds , it doesn't make sense to search them individually

ADD REPLYlink
1
Entering edit mode

Did you try my script and file?? Please let me know if you needmore help in that.

ADD REPLYlink
0
Entering edit mode

it works, but it's create messy file with unnecessary information. what I need is that, the script just add the one line(the line which begin with GO ID) of the GO_mapped.txt file to the last column of my YOUR_INPUT_FILE. Basically first column of my input file is GO ID and I want to add just translation of the GO ID to the last column of my input file. for example for GO:0016021 the last column would be integral component of membrane cellular_component . Can you help me with this ?

ADD REPLYlink
1
Entering edit mode

You can try this new script which merges the output with your input file in the last column (Keep in mind all files should be TAB-separated):

Note: whenever you run this script, please delete the output created from last run... otherwise it will keep on appending into previously created file.

bioinformatics.kandurilab.org/biostars/files/mapping_ids_mergingWithInput.zip

ADD REPLYlink
0
Entering edit mode

Thanks, but this does not add it to the last column of my input file, for example, one line of my input file is like this :

GO:0000000    15117    15075    3743    3733    0    0    0    true    1.0    1.0    1.0                                                                                  and what I expect as output is  


GO:0000000    15117    15075    3743    3733    0    0    0    true    1.0    1.0  transcription, DNA-templates       
ADD REPLYlink
1
Entering edit mode

Yes when I use the sample files used along with the script. It gives the output exactly like you wanted. You can check my sample inputs and output file generated in the same compressed folder.

ADD REPLYlink
1
Entering edit mode

Sample Input file:


GO:0002040 dsrg dg
GO:0006351 drfh gjfj
GO:0008283 ksjhgk skjrhgfl
GO:0032466 kjf ksjgf
GO:0032877 ol g
GO:0033301 fnbl ksjg
GO:0045944 hfo jgp
GO:0060707 jpgs jge


Merged annotation to input:


GO:0002040 dsrg dg sprouting angiogenesis
GO:0006351 drfh gjfj transcription, DNA-templated
GO:0008283 ksjhgk skjrhgfl cell proliferation
GO:0032466 kjf ksjgf negative regulation of cytokinesis
GO:0032877 ol g positive regulation of DNA endoreduplication
GO:0033301 fnbl ksjg cell cycle comprising mitosis without cytokinesis
GO:0045944 hfo jgp positive regulation of transcription from RNA polymerase II promoter
GO:0060707 jpgs jge trophoblast giant cell differentiation

ADD REPLYlink
0
Entering edit mode

what do you mean exactly with input file? what I mean with input file is the one I have in the original post and in your command correspond to . am I right ? :)

ADD REPLYlink
1
Entering edit mode

Your input file is the file you want to add annotation or the file you mentioned in your first post.

ADD REPLYlink
0
Entering edit mode

but please make sure that your input file is TAB-separated.

ADD REPLYlink
1
Entering edit mode

You don't have to search one by one, there is a link on the QuickGo page showing the very simple ways of getting descriptions for terms with different programming languages. In Bash, it can be done with one line.

ADD REPLYlink
0
Entering edit mode

@ SES How you got this information. I'm using it in Linux and the header of my files after runnig is this :

ID

|

 Pop.total

|

   Pop.term

|

  Study.total

|

   Study.term

|

   Pop.family

|

   Study.family

|

   nparents

|

 is.trivial

|

p  

|

p.adjusted

|

  p.min  

---|---|---|---|---|---|---|---|---|---|---|---

ADD REPLYlink
1
Entering edit mode

In your original post you asked what that table means and I explained it, and also showed how you could get this information from the documentation. Then, you answered and said you know what that information means but your main interest is in the GO definitions. Now, you are asking what the table means again? This is obviously confusing. Please refer to the documentation or my post for a description of the results.

For getting the GO definitions, see the QuickGO WebServices page. There are examples for numerous languages on that page and if you read the documentation you'll see that you can come up with a Bash or Perl script for your task in no time.

.

ADD REPLYlink
1
Entering edit mode

Hi Jack, please let me know whether you managed to add terms to your file. I want to know that it worked or not, so that I will decide to keep the script or remove it. Therefore other people in future will know whether to use it or not.

And as SES says please change the post topic from

How to interpret the result of GO analysis using Ontologizer ? To How to interpret the result of GO analysis using Ontologizer / mapping GO IDs to GO TERMS.

Because you are asking two different questions in same post.

ADD REPLYlink
0
Entering edit mode

it worked, thanks

ADD REPLYlink
0
Entering edit mode

Santhilal Subhash , I faced with other problem, can you help me with that ? https://www.biostars.org/p/129057/

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1