Retrieving data from new data portal of tcga
3
0
Entering edit mode
7.8 years ago

For retrieving rna-seq data from tcga https://gdc-portal.nci.nih.gov/, though I am able to download different expression level data and clinical data but I am struggling with mapping the two. I dont know how to map a certain patient's id in mRNA data to the corresponding id in clinical data. Any way I can sort this. Thanks in advance.

RNA-Seq Annotation gdc-portal • 7.0k views
ADD COMMENT
1
Entering edit mode

have they changed the structure of the TCGA barcodes?

ADD REPLY
1
Entering edit mode
7.8 years ago
Mike ★ 1.9k

You can match by TCGA barcodes, match "patient.bcr_patient_barcode" column of clinical data with expression sample id.

ADD COMMENT
0
Entering edit mode

The thing is in clinical data all I have xml files from which I can still though extract patient barcode. But problem comes for the expression data I get each sample id which does not seem to correspond to barcode.

ADD REPLY
0
Entering edit mode
7.8 years ago

I got it in the end. I did not look at the metadata files. After downloading it provided information about the 1-1 correspondence between sample id and filename.

ADD COMMENT
0
Entering edit mode

May I ask how did you map a certain patient's id in mRNA data to the corresponding id in clinical data? I've seen the MANIFEST as you mentioned, but the id in clinical data and mRNA data are not consistent...

ADD REPLY
0
Entering edit mode

Please refer the answer below. Sorry for late reply.

ADD REPLY
0
Entering edit mode
7.6 years ago
rli012 • 0

Hi Noorpratap, could you please explain how to link the biospecimen/clinical data to the transcriptome profiling data? Thanks

ADD COMMENT
1
Entering edit mode

When you download the respective files(expression, clinical, biospecimen etc) there is also an option to download the meta data file along it. Now thats a json file and for each patient there will be a field 'entity submitter id' (TCGA-..-...) barcode which will give you an idea about the patient and will be common in all the respective meta files. Though for mRNA it will be an extended form telling us about type of patient (normal or cancer). You would have to break that for mRNA to map to a patient. Thus total entries that you see for mRNA would be more at times with more than two entries for the same patient indicating for tumor and adjacent normal tissue sample. However for the clinical and biospecimen files the total entires would be equal to total number of patients. For more information about barcodes follow the link posted by @russhh above.

ADD REPLY
1
Entering edit mode

Just in case anyone needs it, here is some example code in Python because JSON is fiddly.

If you download the manifest with your FPKM data you can match your files to their info like this:

import json

fileName='metadata.cart.2017-06-RESTOFID.json'

with open(fileName) as data_file:    
    data = json.load(data_file)

for i in range(0, len(data)):

     print data[i]['file_name'],  data[i]['associated_entities'][0]['entity_submitter_id']
ADD REPLY

Login before adding your answer.

Traffic: 2700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6