How to get data from GEO
1
0
Entering edit mode
5.7 years ago
wenbinm ▴ 40

Hi there,

I am using R package GEOquery to download data from GEO. I use

library(GEOquery)
library(Biobase)
data <- getGEO('GSE2034')
data <- as.data.frame(exprs(data[[1]])) #extracting expression data

Then I have a file named "GSE2034_family.soft.gz" downloaded. So far this works well. But the other time I tried directly reading "GSE2034_family.soft.gz":

library(GEOquery)
library(Biobase)
data <- getGEO(filename = 'GSE2034_family.soft.gz' )
data <- as.data.frame(exprs(data[[1]]))

Then I got

"Error in data[[1]] : this S4 class is not subsettable"

Does anyone know how to fix this?

Thank you!

microarray • 14k views
ADD COMMENT
8
Entering edit mode
5.7 years ago

Edit (1st September 2018): see a quick distinction of the GEO files, here: A: Parsing values from GSE file

----------------------------

With your first chunk of code, you are obtaining the 'series matrix' data, which, in the vast majority of cases, is already normalized and transformed by log (base 2). Your object data is stored in an ExpressionSet object, which is the standard way to store microarray data:

data <- getGEO('GSE2034', GSEMatrix=TRUE)

data

$GSE2034_series_matrix.txt.gz
ExpressionSet (storageMode: lockedEnvironment)
assayData: 22283 features, 286 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM36777 GSM36778 ... GSM37062 (286 total)
  varLabels: title geo_accession ... bone relapses (1=yes, 0=no):ch1
    (28 total)
  varMetadata: labelDescription
featureData
  featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (22283 total)
  fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL96

You can proceed to downstream analyses with this data, accessed via exprs[data[[1]]]

------------------------------------------------

Note that, on the home page for GSE2034 (HERE), there's a big blue button at the bottom labelled ANALYZE WITH GEO2R

j

Click on that and then go to the R script tab. There, you'll find a ready-made way to read in what is [usually] the normalized data.

Kevin

ADD COMMENT
0
Entering edit mode

Thank you for your response! I am sorry I made a mistake here. library(GEOquery) will download 'series matrix' data. I met the problem when I try to directly read in downloaded series matrix data:

data <- getGEO(filename = 'GSE2034_series_matrix.txt.gz' )
data <- as.data.frame(exprs(data[[1]]))

And got the error. I am just looking for a way to use local files instead of downloading everytime. data <- getGEO('GSE2034') will download the data again right?

ADD REPLY
0
Entering edit mode

What is the error? Yes, you can just download the series matrix file and then load it with:

gse <- getGEO(filename="GSE2034_series_matrix.txt.gz")

Then, access the normalised expression values with:

exprs(gse)

...or:

exprs(gse[[1]])

--------------------------------

If you run getGEO('GSE2034', GSEMatrix=TRUE) twice in the same session, then it will use the data that was already downloaded:

data <- getGEO('GSE2034', GSEMatrix=TRUE)
Found 1 file(s)
GSE2034_series_matrix.txt.gz
tentando a URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE2nnn/GSE2034/matrix/GSE2034_series_matrix.txt.gz'
Content type 'application/x-gzip' length 14344700 bytes (13.7 MB)
==================================================
downloaded 13.7 MB


data <- getGEO('GSE2034', GSEMatrix=TRUE)
Found 1 file(s)
GSE2034_series_matrix.txt.gz
Using locally cached version: /tmp/RtmppE74xT/GSE2034_series_matrix.txt.gz
ADD REPLY
0
Entering edit mode
Im getting an error 

library(GEOquery)
library(Biobase)

gse <- getGEO("GSE53987",GSEMatrix=TRUE) # you want GSEMatrix = TRUE


gse <- gse$GSE53987_series_matrix.txt.gz
gse


#data <- getGEO('GSE2034', GSEMatrix=TRUE)




# now get the phenotypic data (covariates etc.) using pData()
pd <- pData(gse)
names(pd)
#library(dplyr)

x <- exprs(gset[[1]])


x <- x[-grep('^AFFX', rownames(x)),]

# extract information of interest from the phenotype data (pdata)
idx <- which(colnames(pData(gse[[1]])) %in%
               c('age:ch1'))

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘pData’ for signature ‘"factor"’

Now i downloaded the data fresh and i see

library(GEOquery)
library(Biobase)

gse <- getGEO("GSE53987",GSEMatrix=TRUE) # you want GSEMatrix = TRUE


gse <- gse$GSE53987_series_matrix.txt.gz
gse


#data <- getGEO('GSE2034', GSEMatrix=TRUE)




# now get the phenotypic data (covariates etc.) using pData()
pd <- pData(gse)
names(pd)
#library(dplyr)

x <- exprs(gse[[1]])

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘exprs’ for signature ‘"factor"’

ADD REPLY
0
Entering edit mode

i manage to fix it .Now its working

ADD REPLY

Login before adding your answer.

Traffic: 3327 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6