Biostar Beta. Not for public use.
Mysterious genes in my Biomart results (genes that were not part of original query)
1
Entering edit mode
13 months ago
adam.faranda • 10

I'm using the R package 'biomaRt' to retrieve ensembl ID's and descriptions for a list of gene symbols (eg "Sf1", "Rhox7a" etc. . . ). My query consists of 41203 symbols; biomart returns a result set with 30774 records corresponding to the gene symbols recognized by ensembl. The 30774 records returned included four genes that were not part of the original query.

My first thought was that the four 'mystery' genes were synonyms for something in my original query. I've since verified that none of the synonyms of these genes are in my query.

I am querying the mouse data set, and using the attribute 'external_gene_name' as my filter column Code used to query biomaRt

# 'gq': list of unique 'GeneID' submitted as biomart query
   gq<-unique(dg$gene)

# attributes used for query
  attr<-c("ensembl_gene_id", "external_gene_name", "description",
            "ensembl_gene_id_version", "chromosome_name", 
            "gene_biotype"
    )

# Query Submission
  mart<-useMart(biomart="ensembl", dataset="mmusculus_gene_ensembl")
  result<-getBM(mart=mart, 
                attributes=attr, 
                filters='external_gene_name',
                values=gq
    )

The mystery genes are:

setdiff(result$external_gene_name, gq)
[1] "Trdd2"   "Trdv4"   "Trdd1"   "SPATA24"

Where "gq" is the list of genes submitted to ensembl. None of the above genes, nor any of their synonyms (synonyms recognized by ensembl at least) are in my original query. If anyone is willing to help me troubleshoot, I would be happy to send them the gene list I'm querying with.

ADD COMMENTlink
0
Entering edit mode

That's very strange. Could you please send the list to helpdesk [at] ensembl.org and my colleagues and I will take a look at it.

ADD REPLYlink
1
Entering edit mode
13 months ago
Mike Smith ♦ 1.2k
EMBL Heidelberg / de.NBI

"SPATA24" doesn't look like a normal MGI symbol since it's all in caps, so I wouldn't be suprised if your query contains "Spata24" and the all caps version is retrieved too. Is it possible your gene list include capitalised versions of the 'Trdd1' etc? I don't think BioMart is case senstive and will still retrieve results for them e.g.

getBM(mart=mart, 
      attributes=attr, 
      filters='external_gene_name',
      values="TRDV4"
)
     ensembl_gene_id external_gene_name
1 ENSMUSG00000076867              Trdv4

If that's not it, my advice would be to break your query down into smaller chunks and submit this independently, to try and narrow down where the unexpected entries are being introduced. Happy to try and identify if it's a problem in biomaRt, email address is on the biomaRt landing page (https://bioconductor.org/packages/biomaRt/)

ADD COMMENTlink
0
Entering edit mode
13 months ago
adam.faranda • 10

Thank you both for your prompt responses. Mike's answer was correct -- this appears to have been an issue with capitalization.

"SPATA24" doesn't look like a normal MGI symbol since it's all in caps, so I wouldn't be suprised if your query contains "Spata24" and the all caps version is retrieved too. Is it possible your gene list include capitalised versions of the 'Trdd1' etc? I don't think BioMart is case senstive and will still retrieve results for them e.g.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1