Loading previously downloaded GEO SOfT data into R, using getGEO from GEOquery package
1
0
Entering edit mode
5.0 years ago
mosesoo ▴ 30

Hello.

I'm a beginner at Programming with R. now i'm trying to download and parse GEO SOFT format file into an R data structure. when is use the following function:

gset = getGEO(series, GSEMatrix = TRUE, AnnotGPL = TRUE, destdir = "Data/")

using the class(gset) function , the object stored in the gset variable is a list, which is the way that I intended to do:

class(ggset)
[1] "list"

But when is try to load the same downloaded data via getgeo() into the variable gset, using the following code:

getGEO(filename="Data/GSE9476_series_matrix.txt.gz", GSEMatrix = TRUE, AnnotGPL = TRUE)

the class(gset) function, returns the following:

class(ggset)
[1] "ExpressionSet"
attr(,"package")
[1] "Biobase"

I can't understand what i'm doing wrong in the second scenario and will be grateful if someone could kindly explain.

microarray r GEOquery • 4.0k views
ADD COMMENT
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLY
0
Entering edit mode

Thank you, i wasn't familiar with the formating. will do excatly as you did from now on. :)

ADD REPLY
1
Entering edit mode
5.0 years ago

If in doubt, follow the code for downloading each dataset via the Analyze with GEO2R button on each GEO accession page:

f

Through, that button, you'll see a R script tab, with code:

g

library(Biobase)
library(GEOquery)

# load series and platform data from GEO
gset <- getGEO("GSE9476", GSEMatrix =TRUE, AnnotGPL=TRUE)[[1]]

# make proper column names to match toptable 
fvarLabels(gset) <- make.names(fvarLabels(gset))

# group names for all samples
gsms <- "undefined"
sml <- c()
for (i in 1:nchar(gsms)) { sml[i] <- substr(gsms,i,i) }

# log2 transform
ex <- exprs(gset)
qx <- as.numeric(quantile(ex, c(0., 0.25, 0.5, 0.75, 0.99, 1.0), na.rm=T))
LogC <- (qx[5] > 100) ||
          (qx[6]-qx[1] > 50 && qx[2] > 0) ||
          (qx[2] > 0 && qx[2] < 1 && qx[4] > 1 && qx[4] < 2)
if (LogC) { ex[which(ex <= 0)] <- NaN
  exprs(gset) <- log2(ex) }

This particular study seems a bit different from others; however, running this code will give you the normalised and log2 transformed data:

boxplot(exprs(gset), outline = FALSE)

b

Kevin

ADD COMMENT
0
Entering edit mode

Dear kevin Thank you for your informative response. i have actually tried to replicate the 'GEO2R' codes, as you have mentioned. But my question is specifically about the getgeo() function. and to be more clear, why loading the same pre-downloaded data, with the excact same parameters, returns a totally different object class.

P.S: I've looked up the package documentations before posting the question, but still couldn't identify what i'm doing wrong.

ADD REPLY
1
Entering edit mode

They are both the same. It is just the way that the data is returned based on how the getGEO() is invoked. When you instruct it to download automatically from the repository, the data is returned as a list, so, you have to access the list elements in order to obtain the ExpressionSet object. Note the [[1]] in my code at the end of the getGEO() function:

library(Biobase)
library(GEOquery)

gset1 <- getGEO("GSE9476", GSEMatrix =TRUE, AnnotGPL=TRUE)[[1]]
class(gset1)
[1] "ExpressionSet"
attr(,"package")
[1] "Biobase"

To prove that they are the exact same data:

gset2 <- getGEO(filename = "GSE9476_series_matrix.txt.gz")
class(gset2)
[1] "ExpressionSet"
attr(,"package")
[1] "Biobase"

table(exprs(gset1) == exprs(gset2))    
   TRUE 
1426112

Here it is without specifying any list element:

gset3 <- getGEO("GSE9476", GSEMatrix =TRUE, AnnotGPL=TRUE)
gset3
$GSE9476_series_matrix.txt.gz
ExpressionSet (storageMode: lockedEnvironment)
assayData: 22283 features, 64 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM239170 GSM239323 ... GSM240509 (64 total)
  varLabels: title geo_accession ... relation.1 (41 total)
  varMetadata: labelDescription
featureData
  featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (22283 total)
  fvarLabels: ID Gene title ... GO:Component ID (21 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
  pubMedIds: 17910043 
Annotation: GPL96

Note the $GSE9476_series_matrix.txt.gz, indicating that the list element that contains the ExpressionSet object is named GSE9476_series_matrix.txt.gz

ADD REPLY
1
Entering edit mode

Thank you very much for your help. i get it now. :)

ADD REPLY

Login before adding your answer.

Traffic: 3296 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6