Biostar Beta. Not for public use.
How to interpret the result of GO analysis using Ontologizer / mapping GO IDs to GO TERMS ?
0
Entering edit mode
19 months ago
jack • 760
Germany
I have done gene ontology enrichment analysis using  Ontologizer. the output is like this :

ID    Pop.total    Pop.term    Study.total    Study.term    Pop.family    Study.family    nparents    is.trivial    p    p.adjusted    p.min
GO:0000000    15117    15075    3743    3733    0    0    0    true    1.0    1.0    1.0
GO:0008800    15117    3    3743    1    4    1    1    false    0.7500000000000001    0.7500000000000001    0.25000000000000006
GO:0052547    15117    23    3743    6    670    208    3    false    0.7699531541574813    0.7699531541574813    3.789971590544467E-43
GO:0000003    15117    1    3743    1    11028    2874    1    false    0.26060935799429585    0.26060935799429585    9.067827348505459E-5
GO:0052548    15117    22    3743    5    280    100    2    false    0.9446003360272346    0.9446003360272346    3.811732507786439E-33


How can I have translation of GO terms? what does this table means ?

2
Entering edit mode
16 months ago
EagleEye 6.4k
Sweden

You can use simple bash script: I hope this should work

go_convert.sh

----------------------------------------

#!/bin/bash

GOlist=(cat $1 | awk '!x[$0]++' | cut -f $3) for i in "${GOlist[@]}"
do

_cat $2 | grep "$GOlist" >> GO_mapped.txt_

done

----------------------------------------------

Run:

./go_convert.sh <YOUR_column_number having_GO_IDs>

1
Entering edit mode

Sorry use this:

# !/bin/bash

GOlist=(cat $1 | cut -f$3 | awk '!x[$0]++') for i in "${GOlist[@]}"
do

cat $2 | grep "$i" >> GO_mapped.txt

done

I have ran it on sample files and got the results, check out: ./go_convert.sh input_file.txt sample_go_db.txt 1

bioinformatics.kandurilab.org/biostars/files/mapping_ids.zip

bioinformatics.kandurilab.org/biostars/files/mapping_ids.zip

0
Entering edit mode

Thanks. which GO_DB_FILE_FROM_github should I use ? there are few files there. and can I ask how you have generated this files which are in GitHub?

1
Entering edit mode

This file will have all biological_process, molecular_function and cellular_components gene_association.grouped.annotated140122_new.txt

1
Entering edit mode

Those files are generated from geneontology.org which are being used by the tool GeneSCF.

1
Entering edit mode
16 months ago
EagleEye 6.4k
Sweden

You can try this tool which gives the results in more detailed manner. If you are working on Human and Linux system, this tool will be useful for you: https://www.biostars.org/p/108669/

Or still if you want to translate the IDs which you got, use http://geneontology.org/ and search your GO ID there.

Update: GeneSCF now supports all organisms/species from KEGG and Gene Ontology repository.

0
Entering edit mode

I want to translate them, but the question is that, how can I do that in automated manner ? because there are lot's of GO ids for my gene cases (2000) and it's not feasible to copy and paste them in the genen ontology website to search them individually.

0
Entering edit mode

You can use this annotation files from GeneSCF to map it, if you are familiar with playing with files: https://github.com/santhilalsubhash/geneSCF/tree/master/annotation

0
Entering edit mode

my organism is not model organism and I had to prase everything by myself, now I have the enrichment of GO ids and I need to translate them, but I don't know exatly how to parse it and which files I should use. can you help on it bit more ?

1
Entering edit mode
3.2 years ago
SES 8.2k
Vancouver, BC

This information is all in the documentation. Click "Help" and then "Help Contents..." Honestly, I'm confused how you got this far without knowing what these fields are, such as the population and study IDs. These would have to be created before the analysis, so you might want to think about whether these results are exactly what you want to test. From the docs:

GO id: The accession number of the GO term
Name: The name of the GO term
NSP: The namespace, or subontology: biological process (B), cellular component (C) or molecular function (F)
P-value: The nominal (uncorrected) P-value resulting from the observed overrepresentation of the GO term
Pop. Count: The number of genes in the population set that are annotated to the GO term in question
Study Count: The number of genes in the study set that are annotated to the GO term in question


If you want to know the definition of your GO term, search it on QuickGO. For example, https://www.ebi.ac.uk/QuickGO/GSearch?q=GO:0008800

0
Entering edit mode

I Know what the population set, study set,... what I need is an automated way to translate the GO ID to their concepts like Glycolysis.... and because my study case is around thousounds , it doesn't make sense to search them individually

1
Entering edit mode

Did you try my script and file?? Please let me know if you needmore help in that.

0
Entering edit mode

it works, but it's create messy file with unnecessary information. what I need is that, the script just add the one line(the line which begin with GO ID) of the GO_mapped.txt file to the last column of my YOUR_INPUT_FILE. Basically first column of my input file is GO ID and I want to add just translation of the GO ID to the last column of my input file. for example for GO:0016021 the last column would be integral component of membrane cellular_component . Can you help me with this ?

1
Entering edit mode

You can try this new script which merges the output with your input file in the last column (Keep in mind all files should be TAB-separated):

Note: whenever you run this script, please delete the output created from last run... otherwise it will keep on appending into previously created file.

0
Entering edit mode

Thanks, but this does not add it to the last column of my input file, for example, one line of my input file is like this :

GO:0000000    15117    15075    3743    3733    0    0    0    true    1.0    1.0    1.0                                                                                  and what I expect as output is

GO:0000000    15117    15075    3743    3733    0    0    0    true    1.0    1.0  transcription, DNA-templates

1
Entering edit mode

Yes when I use the sample files used along with the script. It gives the output exactly like you wanted. You can check my sample inputs and output file generated in the same compressed folder.

1
Entering edit mode

Sample Input file:

GO:0002040 dsrg dg
GO:0006351 drfh gjfj
GO:0008283 ksjhgk skjrhgfl
GO:0032466 kjf ksjgf
GO:0032877 ol g
GO:0033301 fnbl ksjg
GO:0045944 hfo jgp
GO:0060707 jpgs jge

Merged annotation to input:

GO:0002040 dsrg dg sprouting angiogenesis
GO:0006351 drfh gjfj transcription, DNA-templated
GO:0008283 ksjhgk skjrhgfl cell proliferation
GO:0032466 kjf ksjgf negative regulation of cytokinesis
GO:0032877 ol g positive regulation of DNA endoreduplication
GO:0033301 fnbl ksjg cell cycle comprising mitosis without cytokinesis
GO:0045944 hfo jgp positive regulation of transcription from RNA polymerase II promoter
GO:0060707 jpgs jge trophoblast giant cell differentiation

0
Entering edit mode

what do you mean exactly with input file? what I mean with input file is the one I have in the original post and in your command correspond to . am I right ? :)

1
Entering edit mode

Your input file is the file you want to add annotation or the file you mentioned in your first post.

0
Entering edit mode

1
Entering edit mode

You don't have to search one by one, there is a link on the QuickGo page showing the very simple ways of getting descriptions for terms with different programming languages. In Bash, it can be done with one line.

0
Entering edit mode

@ SES How you got this information. I'm using it in Linux and the header of my files after runnig is this :

ID


|

 Pop.total


|

   Pop.term


|

  Study.total


|

   Study.term


|

   Pop.family


|

   Study.family


|

   nparents


|

 is.trivial


|

p


|

p.adjusted


|

  p.min


---|---|---|---|---|---|---|---|---|---|---|---

1
Entering edit mode

In your original post you asked what that table means and I explained it, and also showed how you could get this information from the documentation. Then, you answered and said you know what that information means but your main interest is in the GO definitions. Now, you are asking what the table means again? This is obviously confusing. Please refer to the documentation or my post for a description of the results.

For getting the GO definitions, see the QuickGO WebServices page. There are examples for numerous languages on that page and if you read the documentation you'll see that you can come up with a Bash or Perl script for your task in no time.

.

1
Entering edit mode

Hi Jack, please let me know whether you managed to add terms to your file. I want to know that it worked or not, so that I will decide to keep the script or remove it. Therefore other people in future will know whether to use it or not.

And as SES says please change the post topic from

How to interpret the result of GO analysis using Ontologizer ? To How to interpret the result of GO analysis using Ontologizer / mapping GO IDs to GO TERMS.

Because you are asking two different questions in same post.

0
Entering edit mode

it worked, thanks