R: Inconsistency between data from TCGAbiolinks and GDC (does TCGAbiolinks retrieve Legacy data by default?)
1
0
Entering edit mode
7.3 years ago
fr ▴ 210

Hi!

I'm using R and TCGAbiolinks to retrieve data and clinical data from GDC. To do so I use the following:

library(TCGAbiolinks)
patientdownload<-function("TCGA-LIHC"){
  clinquery<-GDCquery(project = "TCGA-LIHC",data.category = "Clinical")
  GDCdownload(clinquery,chunks.per.download = 30)
  prepatientout<-GDCprepare_clinic(clinquery, clinical.info = "patient")

However, I am finding some iconsistencies between what I'm getting and what is in GDC. For instance, for subject with 'bcr_patient_barcode=TCGA-DD-AADB' I retrieve the following data from 'GDCquery'

    bcr_patient_barcode gender  race_list   vital_status    neoplasm_histologic_grade   stage_event_pathologic_stage
18  TCGA-DD-AADF    FEMALE  ASIAN   Dead    G4  Stage I

However, when you look at the subject's data in GDC (here) everything is in agreement, with exception for Grade, which is never reported.

Why?

Could this mean that 'neoplasm_histologic_grade' is not the tumor grade? Or that 'GDCquery' is retrieving some Legacy data?

EDIT: this was now crossposted at github

r genome rna-seq • 2.4k views
ADD COMMENT
0
Entering edit mode
7.3 years ago
fr ▴ 210

An answer to this question was added at the GitHub of TCGAbiolinks. Full credit goes to tiagochst (who is also here in biostars but I can't tag him).

ADD COMMENT

Login before adding your answer.

Traffic: 1215 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6