I'm using Bioconductor, and the packages topGO
and biomaRt
. I'm about to conduct my first GO Enrichment analysis, but I'm a little lost on how to correctly assign the mapping
database for my study organisms (Three spined stickleback; Gasterosteus aculeatus). I've seen that the human one is "org.Hs.eg.db"
, and the mouse database is used in the workflow I've found below; "org.Mm.eg"
. My guess would be "org.Ga.eg"
, but I'd like to confirm this if possible.
go_data <- new("topGOdata",
ontology = "BP",
allGenes = gene_universe,
nodeSize = 5,
annotationFun = annFUN.org,
mapping = "org.Mm.eg",
ID = "ensembl")
I've had a look on the ensembl website, but I cannot seem to find the appropriate information. So my main question is where on ensembl would I find this information?
Thanks.
Just in case anyone else comes across this post, I have just been shown a package on BioConductor that could be of use. It's called
AnnotationDBi
(https://www.bioconductor.org/packages/devel/bioc/manuals/AnnotationDbi/man/AnnotationDbi.pdf), and it apparently can retrieve GO annotations for 3 spine sticklebacks (Gasterosteus aculeatus).Ah, I assumed since it said ensembl ID, it was using an ensembl database. I think the best way forward would be to actually use orthologous gene ID's in Human, Mouse, or Zebrafish as my gene ID's instead of the stickleback.
org.db
packages provides mappings between several features and databases, such as mappings between ENTREZ and Ensembl gene identifiers, genes and GO categories, and so on. It is an amalgamation of information from several sources.Thanks. I think I'll forgo making my own as I reckon it would be trickier than it superficially sounds. Also, I'm guessing a sizeable portion of the gene predictions in sticklebacks would have come from the zebrafish, so it makes sense to use it for gene ontology.