Meaning of header in TCGA metadata

0

Entering edit mode

5.0 years ago

foehn • 0

Hi,

Following the tips from TCGA UUIDS to TCGA barcode (SampleID) in R, I was able to download metadata associated with the UUIDs from TCGA database. Yet I have trouble understanding the meaning of some column names in the metadata. For example, cases.0.samples.0.portions.0.slides.2.percent_tumor_cells is one of such columns, and I want to know what 0 and 2 stand for. Does it mean the 3rd slide of the 1st portion of the 1st sample of the 1st case? If it is the case, it should be tracked by additional columns like slide, portion, sample and case, rather than being a composite column.

There are many other columns in similar format, such as cases.0.samples.0.portions.0.slides.1.section_location, cases.0.diagnoses.0.days_to_death, cases.0.samples.0.portions.1.slides.0.created_datetime, etc. Does anybody have an idea?

Thanks,

tcga metadata • 1.5k views

ADD COMMENT • link 5.0 years ago by foehn • 0

0

Entering edit mode

Can you please link to the exact data that you downloaded?

ADD REPLY • link 5.0 years ago by Kevin Blighe 87k

0

Entering edit mode

You can find an excerpt (10 samples) of the metadata here . Thanks.

ADD REPLY • link 5.0 years ago by foehn • 0

0

Entering edit mode

Thanks, but, from where did you obtain this? - GDC Legacy?

ADD REPLY • link 5.0 years ago by Kevin Blighe 87k

0

Entering edit mode

I pulled it from GDC website via API: curl --request POST --header "Content-Type: application/json" --data @Query.txt "https://gdc-api.nci.nih.gov/files" > Metadata.txt, where Query.txt is a json in such format: { "filters":{ "op":"and", "content":[ { "op":"in", "content":{ "field":"files.file_id", "value":[ %s ] } }, { "op":"=", "content":{ "field":"files.data_type", "value":"Gene Expression Quantification" } } ] }, "format":"tsv", "fields":%s, "size":"%s" }

ADD REPLY • link 5.0 years ago by foehn • 0

Login before adding your answer.