how to query biomaRt from a GRanges object
1
1
Entering edit mode
9.2 years ago

I can't quite figure this out.

I'd like to query a list of co-ordinates using biomart to find out interesting stuff about the co-ordinates and basically play around with this new toy I've discovered and see what it can do.

However I can't figure out how to add this list of genome co-ordinates (from a Grange list) to the query.

What I have so far is

library(GenomicRanges)

BEDfile = "/Normalized_countsbed.bed"

#Load the annotation and reduce it
BED <- import.gff(BEDfile, format="BED", genome="hg19", asRangedData=F)

library("biomaRt")
#List all the databases available
listMarts()
#Select the top level database
ensembl=useMart("ensembl")
#Select the dataset from ensembl database
listDatasets(ensembl)
listDatasets(ensembl)$dataset
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")

filters = listFilters(ensembl)

attributes = listAttributes(ensembl)

#Perform the query using the vignette example
affyids=c("202763_at","209310_s_at","207500_at")
getBM(attributes=c('affy_hg_u133_plus_2', 'entrezgene'), filters = 'affy_hg_u133_plus_2', values = BED, mart = ensembl)

but I get the error as follows:

Error in as.vector(x, mode = "character") : no method for coercing this S4 class to a vector

What to do?

sequencing granges biomart r • 4.2k views
ADD COMMENT
0
Entering edit mode
9.2 years ago

First of all, convert your GRanges object to a data frame:

> BED.df = as.data.frame(BED)
  seqnames  start    end width strand
1       16 543277 543366    90      *
2       16 770183 770277    95      *
3       16 178554 178656   103      *
4       16 439297 439397   101      *
5       16 396725 400746  4022      *
6       16 396769 399474  2706      *

Then, query biomaRt as you would normally do with a data frame:

getBM(
    attributes=c('entrezgene', 'chromosome_name', 'transcript_start', 'transcript_end'), 
    filters = c('chromosome_name', 'start', 'end' ), 
    values = with(BED.df, list(gsub("chr", "", as.character(seqnames)), start, end)), 
    mart = ensembl, 
    verbose=T
)
ADD COMMENT
0
Entering edit mode

OK. That's almost there. When I run that it gives me lots of output to the console but seems to write it as an xml document e.g.

<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE Query><Query virtualSchemaName = 'default' uniqueRows = '1' count = '0' datasetConfigVersion = '0.6' header='0' requestid= 'biomaRt'> <Dataset name = 'hsapiens_gene_ensembl'><Attribute name = 'entrezgene'/><Attribute name = 'chromosome_name'/><Attribute name = 'transcript_start'/><Attribute name = 'transcript_end'/><Filter name = 'chromosome_name' value = 'chr1,chr1,chr1,chr1,chr1,chr1

I thought the output should be a dataframe but when I do .....

mylovelylist = getBM(
    attributes=c('entrezgene', 'chromosome_name', 'transcript_start', 'transcript_end'), 
    filters = c('chromosome_name', 'start', 'end' ), 
    values = with(BED.df, list(as.character(seqnames), start, end)), 
    mart = ensembl, 
    verbose=T
)

I get no error, but when I try to get the output from mylovelylist I get

[1] entrezgene       chromosome_name  transcript_start transcript_end
<0 rows> (or 0-length row.names).

So how do I convert this xml output to a dataframe? How come it doesn't output as a dataframe by default. I tried adding output= "data.frame" and it just gives my the error: (output = "data.frame")

ADD REPLY
0
Entering edit mode

Please don't answer to your own question like this - use the comments system instead. The XML output is because in the example I added the option verbose=T. Just remove it and the output will be less dense. Your final output list is empty because the chromosome names should not contain the word "chr" (e.g. see the chromosome_name field in the XML output). Try using

values = with(BED.df, list(gsub("chr", "", as.character(seqnames)), start, end))
ADD REPLY

Login before adding your answer.

Traffic: 2631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6