Question

Retrieve All Genes Associated With A Go Term

21

Entering edit mode

11.6 years ago

enricoferrero ▴ 900

Hi,

I'm looking for an easy way to retrieve all the genes in a list that are associated with a certain GO term, preferably using R/Bioconductor packages. I'm not interested in under/overrepresentation or enrichment.

For instance, say I have a list of 1000 genes and I want to create a sublist with only the genes known to be involved in 'heart development'.

Thanks!

r bioconductor go gene-ontology • 37k views

ADD COMMENT • link updated 2.1 years ago by ATpoint 81k • written 11.6 years ago by enricoferrero ▴ 900

7

Entering edit mode

11.6 years ago

Pierre Lindenbaum 161k

Not using R: Use quickGO to download all the annotation about your term (and its descendants):

$ curl -s "http://www.ebi.ac.uk/QuickGO/GAnnotation?tax=9606&relType=IP&goid=%20GO:0007507%20&format=tsv" | head | verticalize
>>>    2
$1    DB           UniProtKB
$2    ID           A0PJ49
$3    Splice       -
$4    Symbol       FGFRL1
$5    Taxon        9606
$6    Qualifier    -
$7    GO ID        GO:0003179
$8    GO Name      heart valve morphogenesis
$9    Reference    GO_REF:0000019
$10    Evidence     IEA
$11    With         Ensembl:ENSMUSP00000013633
$12    Aspect       Process
$13    Date         20120825
$14    Source       ENSEMBL
<<<    2

>>>    3
$1    DB           UniProtKB
$2    ID           A0PJ49
$3    Splice       -
$4    Symbol       FGFRL1
$5    Taxon        9606
$6    Qualifier    -
$7    GO ID        GO:0060412
$8    GO Name      ventricular septum morphogenesis
$9    Reference    GO_REF:0000019
$10    Evidence     IEA
$11    With         Ensembl:ENSMUSP00000013633
$12    Aspect       Process
$13    Date         20120825
$14    Source       ENSEMBL
<<<    3

>>>    4
$1    DB           UniProtKB
$2    ID           A0SZU5
$3    Splice       -
$4    Symbol       -
$5    Taxon        9606
$6    Qualifier    -
$7    GO ID        GO:0003007
$8    GO Name      heart morphogenesis
$9    Reference    GO_REF:0000019
$10    Evidence     IEA
$11    With         Ensembl:ENSMUSP00000058354
$12    Aspect       Process
$13    Date         20120825
$14    Source       ENSEMBL
<<<    4

(...)

sort and join with your list of genes.

ADD COMMENT • link 11.6 years ago by Pierre Lindenbaum 161k

2

Entering edit mode

I love the ease with which you can query data and customize the output with quickGO, but the verticalize in that command really threw me. I thought there was a cool member of the unix tool chain I didn't know, but I googled and found out it's actually a cool program you wrote! What a great way to make sense of data that wraps over numerous lines.

ADD REPLY • link 11.6 years ago by SES 8.6k

4

Entering edit mode

11.6 years ago

Malachi Griffith 19k

Similarly using Amigo you can craft URLs within a script to pull this info for a list of GO IDs (or query any number of different ways). For example:

http://amigo.geneontology.org/cgi-bin/amigo/term-assoc.cgi?gptype=all&speciesdb=all&taxid=9606&evcode=all&term_assocs=all&term=GO:0007507&action=filter&format=rdfxml

That will return entries for GO:0007507, for human only (taxid=9606), all evidence code types allowed, all term associations allowed. Results are returned in xml format for convenient parsing. You can also download in the GO associations ('go_assoc') format.

ADD COMMENT • link 11.6 years ago by Malachi Griffith 19k

2

Entering edit mode

11.6 years ago

enricoferrero ▴ 900

In the end I just used FlyMine, which is handy because it's where I store my lists of genes anyway. In the list page it's just a matter of selecting the right fiilter (i.e. GO parent term: heart development).

Thanks for your suggestions anyway!

ADD COMMENT • link 11.6 years ago by enricoferrero ▴ 900

1

Entering edit mode

2.1 years ago

ATpoint 81k

You can retrieve genes directly from the annotation databases in Bioconductor, e.g. getting all genes annotated with "cell junction" (GO:0030054) in mouse:

library(org.Mm.eg.db)
retrieved <- AnnotationDbi::select(org.Mm.eg.db, keytype="GOALL", keys="GO:0030054", columns="ENSEMBL")

ADD COMMENT • link 2.1 years ago by ATpoint 81k

0

Entering edit mode

6.3 years ago

EagleEye 7.5k

Try GeneSCF enrichment analysis, it will provide you with all 1000 genes and list all associated GO terms irrespective of their statistical significance.

ADD COMMENT • link 6.3 years ago by EagleEye 7.5k

score 20 · Accepted Answer · 2013-04-03

20

Entering edit mode

11.0 years ago

Dave Bridges ★ 1.4k

Using biomaRt within R:

library(biomaRt)
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl") #uses human ensembl annotations
#gets gene symbol, transcript_id and go_id for all genes annotated with GO:0007507
gene.data <- getBM(attributes=c('hgnc_symbol', 'ensembl_transcript_id', 'go_id'),
                   filters = 'go_id', values = 'GO:0007507', mart = ensembl)

ADD COMMENT • link 11.0 years ago by Dave Bridges ★ 1.4k

1

Entering edit mode

Hi, I've been trying to use this - and it worked ages ago now I keep getting an error saying

"Error in getBM(attributes = c("wikigene_name", "ensembl_transcript_id",  : 
  Invalid filters(s): go_id"

any suggestions? thanks

ADD REPLY • link 6.7 years ago by V ▴ 380

1

Entering edit mode

Change "go_id" to "go".

You can find the valid filter names with listFilters(ensembl)

ADD REPLY • link 6.7 years ago by agatawesol ▴ 50

1

Entering edit mode

This seems to give the genes only specifically annotated to the given GO term, and not any genes associated with the child terms. Mostly one is interested in ALL the genes for a GO term, i.e, with both direct and indirect annotations.

ADD REPLY • link 5.0 years ago by hbw ▴ 90

1

Entering edit mode

Forgive me if I misunderstand something here.

library(GO.db)
library(biomaRt)
GOBPOFFSPRING[["GO:1903450"]]
[1] "GO:1903451" "GO:1903452"
GOBPOFFSPRING[["GO:1903452"]]
[1] NA

According to GO.db, GO:1903452 should be children of GO:1903450 and itself have no children, however, I get nothing from

gene.1903450 <- getBM(attributes=c('hgnc_symbol', 'ensembl_transcript_id', 'go_id'),
                   filters = 'go_id', values = 'GO:1903450', mart = ensembl)

while I can retrieve 25 rows of RAB11FIP4 belonging to different go_id.

gene.1903452 <- getBM(attributes=c('hgnc_symbol', 'ensembl_transcript_id', 'go_id'),
                   filters = 'go_id', values = 'GO:1903452', mart = ensembl)

And None of these go_id is ancestor of GO:1903452

gene.1903452$go_id %in% GOBPANCESTOR[["GO:1903452"]]

so what is going on here? And how could I know if I REALLY retrieve ALL genes associated with a certain GO term and nothing else?

ADD REPLY • link 4.5 years ago by ZeroFung ▴ 10

0

Entering edit mode

Hi ZeroFung,

I recently encountered a similiar issue to you where a lot of the tools that I tried did not capture the genes present in the child terms. What I ended up doing is: 1) downloading the GO terms with their corresponding gene names from Ensembl's biomart 2) Loading this into R as a dataframe along with the package GO.db 3) Using GO.db's GOBPOFFSPRING function to pull all of the child terms

# Get vector of all child terms
t <- c(GOBPOFFSPRING[["GO:0042110"]], "GO:0042110")

This can then be used to filter your ensembl downloaded GO terms to get all of the genes in your GO term and the child GO terms.

ADD REPLY • link 4.0 years ago by reberya • 0

0

Entering edit mode

how do I get only the BP goterms? I am only interested in deriving the Biological process goterms given a gene ID say 6713?

ADD REPLY • link 5.0 years ago by arya64898 • 0

0

Entering edit mode

Have you tried 'prepare_database' from GeneSCF.

ADD REPLY • link 5.0 years ago by EagleEye 7.5k

0

Entering edit mode

Hello EagleEye, no I did not try that out.

ADD REPLY • link 5.0 years ago by arya64898 • 0