Biostar Beta. Not for public use.
Interpretation of TCGA clinical data
0
Entering edit mode
24 months ago
bxia • 140

When I parse the xml file from TCGA,

I saw something like "age_at_initial_diagnosis", 63, precision = day, then I check the TCGA website, it is 63 years old...

and there are several number in xml about last day follow up... which number is correct? some patients look like being followed for more than 1 times, but I did calculation...the number does not match...

Thanks

RNA-Seq • 1.9k views
ADD COMMENTlink
1
Entering edit mode

Hi,

Can I just ask how did you parse the XML files ?

ADD REPLYlink
0
Entering edit mode

Python has some XML (and JSON) libraries that can be imported, which you might find helpful:

https://docs.python.org/2/library/xml.etree.elementtree.html

https://docs.python.org/2/library/json.html

ADD REPLYlink
2
Entering edit mode
18 months ago
European Union

I would suggest to use the data provided by the TCGA CDR (Clinical Data Resource) which have been manually curated and concatenated. It is described in this paper and should solve many of those problems.

ADD COMMENTlink
0
Entering edit mode
20 months ago
igor 7.7k
United States

I personally find that Xena is the easiest way to download TCGA-related data. All the datasets are aggregated in a basic table format with consistent sample names.

For example, the Pan-Cancer data is here: https://xenabrowser.net/datapages/?cohort=TCGA%20Pan-Cancer%20(PANCAN)&removeHub=https%3A%2F%2Fxena.treehouse.gi.ucsc.edu%3A443

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1