Biostar Beta. Not for public use.
How to search/filter GEO datasets according platform (Illumina/Affymetrix)
0
Entering edit mode
5.7 years ago

I am trying to find very specific GEO dataset (http://www.ncbi.nlm.nih.gov/gds) for my study. I am able to filter geo sets according organism and study type, example:

"Homo sapiens"[Organism] AND "Methylation profiling by genome tiling array"[Filter]

But how to expand my search according platform criteria, example:

"Homo sapiens"[Organism] AND "Methylation profiling by genome tiling array"[Filter] AND "Affymetrix"[Platform]

Or add tissue filter, example:

"Homo sapiens"[Organism] AND "Methylation profiling by genome tiling array"[Filter] AND "Affymetrix"[Platform] AND "brain"[Tissue]

geo • 2.5k views
ADD COMMENTlink
1
Entering edit mode
13 months ago
National Institutes of Health, Bethesda…

You might take a look at the GEOmetadb package for full-text or SQL queries of NCBI GEO metadata. NCBI GEO metadata have been parsed into a SQLite database that can be queried from R, any other language that has SQLite bindings, or using the sqlite command-line interface.

ADD COMMENTlink
0
Entering edit mode

Sean Davis, "getSQLiteFile()" returns

trying URL 'http://dl.dropbox.com/u/51653511/GEOmetadb.sqlite.gz'
Error in download.file(url_geo, destfile = localfile, mode = "wb") :
  cannot open URL 'http://dl.dropbox.com/u/51653511/GEOmetadb.sqlite.gz'

Can you please check this.

ADD REPLYlink
0
Entering edit mode

I don't see that (R 3.1.1, Bioconductor 2.14, GEOmetadb 1.24.0):

getSQLiteFile()
trying URL 'http://gbnci.abcc.ncifcrf.gov/geo/GEOmetadb.sqlite.gz'
Content type 'text/plain; charset=ISO-8859-1' length 230197789 bytes (219.5 Mb)
opened URL
===========
ADD REPLYlink
0
Entering edit mode

for your purpose, I would try something like the following in R with GEOmetadb package :

getSQLiteFile()
file.info('GEOmetadb.sqlite')

size isdir mode mtime ctime atime uid gid uname grname

GEOmetadb.sqlite 3282480128 FALSE 644 2014-09-16 11:36:22 2014-09-16 11:36:22 2014-09-16 11:35:40 1612422931 1360859114 zhujack NIH\\Domain Users

con <- dbConnect(SQLite(),'GEOmetadb.sqlite')
geo_tables <- dbListTables(con)
geo_tables

[1] "gds" "gds_subset" "geoConvert" "geodb_column_desc" "gpl" "gse" "gse_gpl"

[8] "gse_gsm" "gsm" "metaInfo" "sMatrix"

dbListFields(con,'gsm')

[1] "ID" "title" "gsm" "series_id" "gpl"

[6] "status" "submission_date" "last_update_date" "type" "source_name_ch1"

[11] "organism_ch1" "characteristics_ch1" "molecule_ch1" "label_ch1" "treatment_protocol_ch1"

[16] "extract_protocol_ch1" "label_protocol_ch1" "source_name_ch2" "organism_ch2" "characteristics_ch2"

[21] "molecule_ch2" "label_ch2" "treatment_protocol_ch2" "extract_protocol_ch2" "label_protocol_ch2"

[26] "hyb_protocol" "description" "data_processing" "contact" "supplementary_file"

[31] "data_row_count" "channel_count"

dbListFields(con,'gpl')

[1] "ID" "title" "gpl" "status" "submission_date"

[6] "last_update_date" "technology" "distribution" "organism" "manufacturer"

[11] "manufacture_protocol" "coating" "catalog_number" "support" "description"

[16] "web_link" "contact" "data_row_count" "supplementary_file" "bioc_package"

you could join different tables here and refine query with different terms

rs <- dbGetQuery(con,paste("select gpl.manufacturer,gsm.gpl,",
"gpl.organism,gpl.title as gpl_title,gsm,",
"gsm.title as gsm_title,gsm.series_id ",
"from gsm join gpl on gsm.gpl=gpl.gpl",
"where gpl.manufacturer='Affymetrix' ",
"and gpl.organism = 'Homo sapiens' ",
"and gpl.description like '%tiling%'"))

dim(rs)

[1] 3483 7

head(rs)

manufacturer gpl organism gpl_title gsm

1 Affymetrix GPL3111 Homo sapiens Affymetrix ENCODE Tiling Array - ANTISENSE - NCBI build 35 GSM84453

2 Affymetrix GPL3111 Homo sapiens Affymetrix ENCODE Tiling Array - ANTISENSE - NCBI build 35 GSM84454

3 Affymetrix GPL3111 Homo sapiens Affymetrix ENCODE Tiling Array - ANTISENSE - NCBI build 35 GSM84455

4 Affymetrix GPL3111 Homo sapiens Affymetrix ENCODE Tiling Array - ANTISENSE - NCBI build 35 GSM84456

5 Affymetrix GPL3111 Homo sapiens Affymetrix ENCODE Tiling Array - ANTISENSE - NCBI build 35 GSM84495

6 Affymetrix GPL3111 Homo sapiens Affymetrix ENCODE Tiling Array - ANTISENSE - NCBI build 35 GSM84496

gsm_title series_id

1 H3K9K14D ChIP from Retinoic Acid Stimulated HL60 Cells, 0 hours (NCBI build 35) - strict analysis parameters GSE3658

2 H3K9K14D ChIP from Retinoic Acid Stimulated HL60 Cells, 2 hours (NCBI build 35) - strict analysis parameters GSE3658

3 H3K9K14D ChIP from Retinoic Acid Stimulated HL60 Cells, 8 hours (NCBI build 35) - strict analysis parameters GSE3658

4 H3K9K14D ChIP from Retinoic Acid Stimulated HL60 Cells, 32 hours (NCBI build 35) - strict analysis parameters GSE3658

5 HisH4 ChIP from Retinoic Acid Stimulated HL60 Cells, 0 hours (NCBI build 35) - strict analysis parameters GSE3659

6 HisH4 ChIP from Retinoic Acid Stimulated HL60 Cells, 2 hours (NCBI build 35) - strict analysis parameters GSE3659

close the connecyion

close(con)

ADD REPLYlink
0
Entering edit mode
22 months ago
Neilfws 48k
Sydney, Australia

There is no specific filter for platform manufacturer, nor for tissue. The keys that you can use to filter are listed in this text file.

I'd guess that words like Affymetrix or Illumina are specific enough by themselves in most cases, without the need for qualifiers.

ADD COMMENTlink
0
Entering edit mode
ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1