Issues in Gene Set Enrichment Analysis using TCGAbiolinks
0
0
Entering edit mode
7.3 years ago
ammarsabir15 ▴ 70

I want to perform Gene Set Enrichment Analysis on Glioblastoma Multiforme dataset in TCGA using GO or KEGG pathway. For this purpose I downloaded data from TCGA using this code. `

library(TCGAbiolinks)
query <- GDCquery(project = "TCGA-GBM",
                   data.category = "Transcriptome Profiling",
                  data.type = "Gene Expression Quantification")

GDCdownload(query)

This downloaded the data according to the given parameters but when I tried to prepare this query using the command given below :`

 data <- GDCprepare(query)

Then following error came Unable to prepare query there are duplicates in the data. I tried to remove duplicates using fdupes but the software found no duplicate files in the data sets.

So regarding this I have following questions,

  • How this error can be removed.?

  • For doing enrichment analysis do I need datasets from all workflows i.e HTseq_counts, HTseq_FPKM and HTseq_FPKM_UQ or any one or two from these can suffice?

  • Getting the data what are the next steps to perform the enrichment analysis using GO or KEGG pathway ?
bioconductor TCGAbiolinks R • 2.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 2100 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6