Hello!
I'm new to bioinformatics, and wanted to get some array data from TCGA, using TCGA biolinks.
I ran a query for 12 files, 2 groups of 6 samples with the same type of cancer, and when I ran GDC download I get this confusing information:
GDCdownload(query) GDCdownload will download 12 files. A total of 119.416574 MB Downloading as: Fri_Nov_25_09_39_55_2016.tar.gz Downloading: 31 MB [1] 1
the download stops at 31MB always. different chunks get me the same final size of 31 MB (for example in chunks of 2 files, I get the message that each chunk is 20 mb but they get completed at 5.1MB. setting chunks of 1 file each makes it so that I get only one downloaded and then an error message.
I tried working with the data as it downloaded but keep getting errors, I'm guessing due to missing information
Here's the query and graph for reference:
query2 <- GDCquery(project = "TCGA-BRCA",
data.category = "DNA Methylation",
platform = "Illumina Human Methylation 27", barcode = casos) # casos is a character vector with the barcodes of the cases i wanted
and the error message
Group1:solid tissue normal Group2:primary solid tumor Error in TCGAanalyze_DMR(resu1, groupCol = "definition", group1 = "solid tissue normal", : Sorry, but solid tissue normal has no samples In addition: Warning message: In any(rowSums(!is.na(assay(data)))) : coercing argument of type 'double' to logical
if I run
resu1$definition
I get
[1] "Primary solid Tumor" "Primary solid Tumor" "Primary solid Tumor" "Primary solid Tumor" [5] "Primary solid Tumor" "Primary solid Tumor" "Solid Tissue Normal" "Solid Tissue Normal" [9] "Solid Tissue Normal" "Solid Tissue Normal" "Solid Tissue Normal" "Solid Tissue Normal"
Showing that groupCol is actually getting a column that can be splitted in 2.
Hoping to get some help, thank you in advance
Juan