Convert GDCResults object (nested lists) to a human readable data frame
0
0
Entering edit mode
5.8 years ago
user31888 ▴ 130

Is there a way to convert a GDCResults object (i.e. nested lists) obtained with the R package GenomicDataCommons into a data frame?

test sample:

library(GenomicDataCommons)
test = cases() %>% filter(~ project.project_id=='TCGA-CHOL') %>% results(n=10)

I tried to convert into a data frame using the codes mentioned here and here, but they return a 2 column data frame of hundreds of lines (not very handy). Plus, I lose the column names when converting to a matrix:

df <- as.data.frame(matrix(unlist(test), nrow=length(unlist(test[1]))), stringsAsFactors=F)
GenomicDataCommon R list data frame • 2.0k views
ADD COMMENT
0
Entering edit mode
https://www.rdocumentation.org/packages/GenomicDataCommons/versions/1.3.1/topics/as.data.frame.GDCResults

copy/pasted from webpage:

expands = c("diagnoses","diagnoses.treatments","annotations", "demographic","exposures")
head(cases() %>% expand(expands) %>% results() %>% as.data.frame())
ADD REPLY
0
Entering edit mode

No luck.

# Not working with 'as.data.frame()'
> expands = c("diagnoses","diagnoses.treatments","annotations","demographic","exposures")
> head(cases() %>% expand(expands) %>% results() %>% as.data.frame())
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
   arguments imply differing number of rows: 1, 0

# Not working with 'as.data.frame.GDCResults()'
> head(cases() %>% expand(expands) %>% results() %>% as.data.frame.GDCResults())
Error in as.data.frame.GDCResults(.) :
  could not find function "as.data.frame.GDCResults"

# Working without 'as.data.frame()'
> head(cases() %>% expand(expands) %>% results())
ADD REPLY
0
Entering edit mode

@OP: Try this. tagging the author: Sean Davis

library("GenomicDataCommons")
test = cases() %>% filter(~ project.project_id=='TCGA-CHOL') %>% results(n=10) %>% as.data.frame())

Unfortunately, I am not able to connect to gdc server.

ADD REPLY
0
Entering edit mode

I've just reinstalled GenomicDataCommons and all the dependencies. I cannot connect to the server anymore neither.

> source('https://bioconductor.org/biocLite.R')
> biocLite('Bioconductor/GenomicDataCommons')

> library(GenomicDataCommons)

> test = cases() %>% filter(~ project.project_id=='TCGA-CHOL') %>% results(n=10) %>% as.data.frame()
Error in curl::curl_fetch_memory(url, handle = handle) :
  Could not resolve host: gdc-api.nci.nih.gov

> status()
Error in curl::curl_fetch_memory(url, handle = handle) :
  Could not resolve host: gdc-api.nci.nih.gov

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux release 6.2 (Carbon)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GenomicDataCommons_1.2.0 magrittr_1.5             BiocInstaller_1.28.0
[4] RevoUtils_10.0.8         RevoUtilsMath_10.0.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17           xml2_1.2.0             XVector_0.18.0
 [4] GenomicRanges_1.30.3   BiocGenerics_0.24.0    hms_0.4.2
 [7] zlibbioc_1.24.0        IRanges_2.12.0         R6_2.2.2
[10] rlang_0.2.1            httr_1.3.1             GenomeInfoDb_1.14.0
[13] tools_3.4.3            parallel_3.4.3         data.table_1.11.4
[16] lazyeval_0.2.1         tibble_1.4.2           crayon_1.3.4
[19] GenomeInfoDbData_1.0.0 readr_1.1.1            S4Vectors_0.16.0
[22] bitops_1.0-6           curl_3.2               RCurl_1.95-4.11
[25] pillar_1.3.0           compiler_3.4.3         stats4_3.4.3
[28] jsonlite_1.5           pkgconfig_2.0.1

The version available from Bioconductor (installed on my system) is 1.2.0. The version of GenomicDataCommons describing the as.data.frame.GDCResults function here is 1.3.1. Maybe the function was added recently. Where can we get version 1.3.1 or 1.3.4?

ADD REPLY
0
Entering edit mode

Installed GenomicDataCommons v1.5.4 on macOS WITHOUT updating dependencies. I can connect to the GDC server. But the function as.data.frame still not working (note that as.data.frame.GDCResults does not seem to exist).

> source('https://bioconductor.org/biocLite.R')
> biocLite('Bioconductor/GenomicDataCommons')

> test = cases() %>% filter(~ project.project_id=='TCGA-CHOL') %>% results(n=10) %>% as.data.frame()
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  :
  arguments imply differing number of rows: 2, 4, 5, 3

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GenomicDataCommons_1.5.4 magrittr_1.5             BiocInstaller_1.28.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17               compiler_3.4.4             pillar_1.2.3               git2r_0.21.0               GenomeInfoDb_1.14.0
 [6] XVector_0.18.0             bindr_0.1.1                bitops_1.0-6               tools_3.4.4                zlibbioc_1.24.0
[11] digest_0.6.15              jsonlite_1.5               memoise_1.1.0              tibble_1.4.2               lattice_0.20-35
[16] pkgconfig_2.0.1            rlang_0.2.1                Matrix_1.2-14              DelayedArray_0.4.1         curl_3.2
[21] parallel_3.4.4             bindrcpp_0.2.2             GenomeInfoDbData_1.0.0     xml2_1.2.0                 withr_2.1.2
[26] httr_1.3.1                 dplyr_0.7.6                knitr_1.20                 hms_0.4.2                  rappdirs_0.3.1
[31] S4Vectors_0.16.0           IRanges_2.12.0             devtools_1.13.5            stats4_3.4.4               grid_3.4.4
[36] tidyselect_0.2.4           glue_1.3.0                 Biobase_2.38.0             R6_2.2.2                   tcltk_3.4.4
[41] readr_1.1.1                purrr_0.2.5                matrixStats_0.53.1         BiocGenerics_0.24.0        GenomicRanges_1.30.3
[46] assertthat_0.2.0           SummarizedExperiment_1.8.1 lazyeval_0.2.1             RCurl_1.95-4.10
ADD REPLY
0
Entering edit mode

GenomicDataCommons v1.2.0 has as.data.frame.GDCResults function. Try help(package = GenomicDataCommons) to see the functions. I think there is basic functionality issue. Try GenomicDataCommons::status()

ADD REPLY
0
Entering edit mode

Correct. as.data.frame.GDCResults appears in the v1.2.0 and 1.5.4 helpers. But still:

> test = cases() %>% filter(~ project.project_id=='TCGA-CHOL') %>% results(n=10) %>% as.data.frame.GDCResults()
Error in as.data.frame.GDCResults(.) :
  could not find function "as.data.frame.GDCResults"

Also tried on Linux (v1.2.0 new install + dependencies update):

> GenomicDataCommons::status()
Error in curl::curl_fetch_memory(url, handle = handle) :
  Could not resolve host: gdc-api.nci.nih.gov

On macOS (v.1.5.4 new install without dependencies update):

> GenomicDataCommons::status()
$commit
[1] "e9e20d6f97f2bf6dd3b3261e36ead57c56a4c7cc"

$data_release
[1] "Data Release 12.0 - June 13, 2018"

$status
[1] "OK"

$tag
[1] "1.14.1"

$version
[1] 1
ADD REPLY

Login before adding your answer.

Traffic: 1814 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6