My aim is to get all the genes annotated to a Gene Ontology(GO) term in ENTREZ ID form. And currently I have 3 different solutions that achieve this. Below are my example solutions for Human and GO ID: 0005634(nucleus).
Biomart
library(biomaRt)
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
gene.data <- getBM(attributes=c('entrezgene'),
filters = 'go',
values = "GO:0005634",
mart = ensembl)
org.Hs.eg.db
library(org.Hs.eg.db)
gene_list <- data.frame(mget("GO:0005634", org.Hs.egGO2ALLEGS)[[1]])
print(gene_list)
running an SQL query on the GO servers
SELECT
gene_product.symbol AS gp_symbol
FROM term
INNER JOIN association ON term.id=association.term_id)
INNER JOIN gene_product ON (association.gene_product_id=gene_product.id)
INNER JOIN species ON (gene_product.species_id=species.id)
INNER JOIN dbxref ON (gene_product.dbxref_id=dbxref.id)
INNER JOIN db ON (association.source_db_id=db.id)
WHERE
term.acc = 'GO:0005634'
AND
species.ncbi_taxa_id="9606";
you can try running the same code in this link . The first two solutions give me entrez ids but the last one gives gene symbol and I think there is no way to get entrez id from gene ontology(please correct me if I am wrong). So I use the mygene library in python to convert the gene symbols to entrez ids. (I search these gene symbols in both the symbols scope and the alias scope).
When I compare the entrez gene ids I obtained with each other I get this:
So my question is:
Why do these return such different results?
Another problem that I have is:
converting all gene symbols into gene ids
Using the mygene python library with Human and Nucleus I am able to get 4955 entrez gene ids and I am left with 980 gene symbols that couldn't be converted into entrez ids. Below are 6 gene symbols that the mygene library is not able to convert into entrez ids
A2RUA4', 'B3KY84', 'ENSP00000368480', 'OTTHUMP00000081030', 'Q14547', 'XP_933608
I mentioned more about that problem in this link but couldn't reach a conclusion.
Any help on my problems would be appreciated and I am also open to new solutions.
tagging: Mike Smith