Question

Trying to annotate genes using R and eutils, curious bug

0

Entering edit mode

8.6 years ago

Ram 44k

This question might be more R than bioinformatics, but I'm trying to find which part of the logical pipeline causes an error, so please bear with me.

I recently ran a cuffdiff operation and used cummeRbund to read the output. Then, with the diffData() function, I have extracted differential expression data frames. I am now using the gene_id to fetch a human readable description of these significantly diff expressed genes, and the fetching is done by reutils, the R package for eutils.

In the search query, I need to restrict to mouse genes, so I query the gene database with the query gene_id AND Mus musculus[organism]. I then manipulate the output with (content and strsplit and subscripting) to pick just the first line of the output for annotation. (I know it's a jury rigged solution, and if you have better alternatives, please suggest. But that is not the primary problem)

When I run the command:

diff.genes.sig$gene_annotation <- strsplit ( content ( efetch ( esearch( term = paste( diff.genes.sig$gene_id,' AND Mus musculus[organism]',sep=""), db="gene"), rettype = 'gene_table', retmode = 'text', retmax=1), as='text')[1],split = "\n")[[1]][1]

Every row in the data frame is annotated with the output from the first row. Is the fetching not being repeated for each row? Is there some kind of cache in action?

I use a for loop to bypass this now. But there has to be a better way, right? What are your thoughts on this?

r reutils • 1.3k views

ADD COMMENT • link 8.6 years ago by Ram 44k