Biostar Beta. Not for public use.
Question: TCGA: Does TCGA cancer studies have mRNA expression data for Control/Normal samples?
15
Entering edit mode

Hi everyone,

I am using the TCGA portal to get mRNA expression data for various cancer studies (e.g. lung, liver, thyroid etc). We have been on a lookout for control/normal samples for the cancer studies on TCGA. On the website we could find case/tumor samples but couldn't find any control samples.

Does anyone know or have used control/normal samples from TCGA and can point me to it? Or do you know of a good resource (preferably using RNASeq V2 RSEM normalized expression values or z-scores) for control/normal samples in tissues like Lung, Liver, Thyroid etc. (basically all the fore-gut tissues)?

Thanks!

ADD COMMENTlink 5.6 years ago komal.rathi ♦ 3.4k • updated 3.6 years ago JJ • 430
Entering edit mode
3

you can use TCGA-Assembler for that. there is a Nature Methods paper "describing it" (see ref on the link).

when you download the data using the "DownloadRNASeqData" function, you can specify if you want normal, primary tumor, recurrent tumor or metastatic. this will have you download RNASeqV1 or V2 level 3 data (RSEM normalized (or not)). you will have to transform it in z-scores youself tho.

you can do it by following this thread in Google groups by matching the sample names (for matched samples) or taking the average of normal controls for the non matched data

ADD REPLYlink 5.5 years ago
TriS
♦ 3.8k
Entering edit mode
0

Thanks, what https://www.biostars.org/u/11208/ said worked for me, but I will definitely give this a try. Looks promising!

ADD REPLYlink 5.5 years ago
komal.rathi
♦ 3.4k
Entering edit mode
0

TCGA-Assembler out of service, any good alternative?

ADD REPLYlink 3.4 years ago
arup
♦ 1.3k
Entering edit mode
1

TCGA Firehose

ADD REPLYlink 3.4 years ago
TriS
♦ 3.8k
Entering edit mode
0

There's certainly RNASeq data from matched normal samples (ie, normal lung tissue from a lung cancer patient) for the lung samples, eg TCGA-44-2655-11 here

ADD REPLYlink 5.6 years ago
russhh
♦ 4.4k
Entering edit mode
0

So, there are a lot of TN (Tumor samples that have matched normals) compared to NT ( Normal samples that have matched tumors). How is this possible? Shouldn't the number of TN be same as NT?

ADD REPLYlink 5.6 years ago
komal.rathi
♦ 3.4k
Entering edit mode
0

I don't know what you mean, that';s certainly not what I thought I'd said - apologies.

There are very few control samples (ie, normal lung tissue from individuals who do not have cancer), but for around 20-25% of the lung tumour samples, there is an associated matched-normal lung sample

Hence, there are more tumour samples for which there isn't a matched-normal sample than there is tumour samples for which there is a matched normal sample

ADD REPLYlink 5.6 years ago
russhh
♦ 4.4k
Entering edit mode
0

I meant, I referred to this & this, sample names ending in 01 are Tumor and those ending in 11 are Normal. When I went to the data matrix on TCGA for LUAD there are options like Tumor-matched & Normal-matched. Also, according to this

  • TN (Tumor, matched normal) - Data for a tumor tissue for which matched normal tissue exists.

  • NT (Normal, matched tumor) - Data for normal tissue for which matched tumor tissue exists.

So I am a bit confused that shouldn't there be equal number of TN & NT when you check the data matrix?

ADD REPLYlink 5.6 years ago
komal.rathi
♦ 3.4k
Entering edit mode
0

hi, komal.rathi, if I want analysis the TCGA data talked above for a differential expression test(for paired data), whether the quantity of TN set is too small compared with the NT set for a certain cancer type? Which might lead a deviation to the result.

Maybe it would be better, if I using the RNASeq data for the normal sample(without any cancer) as the control set for the differential analysis compared with a certain cancer? Will you give me a light where could I get the RNASeq dataset compared with TCGA?

Thanks!

ADD REPLYlink 5.3 years ago
Miao Yu
• 70
Entering edit mode
0

@[komal.rathi](https://www.biostars.org/u/7631/)

I need to download the RNA-Seq data, only (raw read counts for gene quantification) for Ovarian cancer patients from TCGA. I am not interested in downloading all the cases present in TCGA. I want a considerable number of patients with tumor and its match normal for which I can retrieve the RNA-Seq raw counts . I am bit confused as to what criteria of selection should I do? I have download the 489 cases of OvaCa data from TCGA having the gene expression values but there is no mention of which are for normal and which are for tumor. Can you let me know how I should do it from the portal? Correct me if ma wrong, I should first select TN RNA-Seq data for OV (color code blue), this is will give batch wise RNA-Seq V1 for tumor tissues. Now I should do the NT for finding the expression data of the samples samples of the normal for which I downloaded tumor data right? please share your idea.

ADD REPLYlink 5.2 years ago
ivivek_ngs
♦ 4.8k
Entering edit mode
1

https://www.biostars.org/u/8620/

I am assuming you have the barcodes, e.g. TCGA-09-0364-01, for each of your samples. This is the code table I referred to. The last two digits tell you if it is a tumor or normal sample. I used the TCGA Assembler to first download everything and then extracting out the matched Tumor and Normal samples. When you download from the data matrix, blue is for Matched Tumor sample and yellow is for Matched Normal sample.

But I just checked, there is no matched normal sample available for download for Ovarian serous cystadenocarcinoma in TCGA. I went to the data matrix portal, selected RNASeq and RNASeqV2 in Data Type, Level 3 in Data Level, and Tumor - matched & Normal - matched in Tumor/Normal section. It returned only Matched Tumor samples but no matched Normal samples. I guess they are not available for download yet.

ADD REPLYlink 5.2 years ago
komal.rathi
♦ 3.4k
Entering edit mode
0

@ komal.rathi

Yes I could not find the matched normal samples as well for both RNASeq and RnASeqV2 in the data type for Level 3. It also returned only blue codes which is for matched tumor samples. So I guess it would be not possible for me to get a few patient cohort that might give me matched tumor and normal RNA-Seq data. Will it be helpful to download the clinical data from any other repositories?? Any inputs on that? I have asked a question in another link, if you would like to answer.

ADD REPLYlink 5.2 years ago
ivivek_ngs
♦ 4.8k
Entering edit mode
0

https://www.biostars.org/u/8620/ I am not aware of any other repository but I will try to find it.

ADD REPLYlink 5.2 years ago
komal.rathi
♦ 3.4k
Entering edit mode
0

Oh, alright! Thanks!

ADD REPLYlink 5.6 years ago
komal.rathi
♦ 3.4k
Entering edit mode
0

Download-->TCGA-Assembler software

Download-->TCGA-Assembler Manual: "http://www.compgenome.org/TCGA-Assembler/documents/TCGA-Assembler%20User%20Manual.pdf"

Refer to section--> "ExtractTissueSpecificSamples" on page 27.

ADD REPLYlink 5.1 years ago
kerem.senses
• 0
2
Entering edit mode

Hi,

SInce TCGA data are now on NCI website how can I download gene expression data (FPKM) for breast cancer and associated normal tissue. I do not find any "normal tissue" option (maybe I missed it..)

For example here's the selection for breast cance expression data :

https://gdc-portal.nci.nih.gov/search/s?filters={%22op%22:%22and%22,%22content%22:[{%22op%22:%22in%22,%22content%22:{%22field%22:%22cases.project.primary_site%22,%22value%22:[%22Breast%22]}},{%22op%22:%22in%22,%22content%22:{%22field%22:%22files.data_category%22,%22value%22:[%22Transcriptome%20Profiling%22]}},{%22op%22:%22in%22,%22content%22:{%22field%22:%22files.data_type%22,%22value%22:[%22Gene%20Expression%20Quantification%22]}},{%22op%22:%22in%22,%22content%22:{%22field%22:%22files.analysis.workflow_type%22,%22value%22:[%22HTSeq%20-%20FPKM%22]}}]}

ADD COMMENTlink 3.6 years ago Nicolas Rosewick 7.7k
Entering edit mode
0

Since this is a separate query, you might consider starting a new question

ADD REPLYlink 3.6 years ago
russhh
♦ 4.4k
2
Entering edit mode

Hi,

Download the clinical files e.g, here: http://firebrowse.org

If you then look at one of the merged_only_clinical file e.g., KIRC.merged_only_clinical_clin_format.txt, then look at the barcodes: https://wiki.nci.nih.gov/display/TCGA/TCGA+barcode The two digits at position 14-15 of the barcode indicates the sample type. Tumor types range from 01 - 09, normal types from 10 - 19 and control samples from 20 - 29 So 0 are tumors and 1 are normals e.g, 01 are primary tumours

Some datasets will contain normals, some only cancer samples.

EDIT: RNASeq V2 RSEM normalized expression values are available over http://firebrowse.org as well.

Best, Julia

ADD COMMENTlink 3.6 years ago JJ • 430
Entering edit mode
0

ok thanks. They should add this option in their search tool... It's a little bit a pain in the a#* ;)

ADD REPLYlink 3.6 years ago
Nicolas Rosewick
7.7k
Entering edit mode
0

For filenames that don't have position 14-15, is position 6-7 equivalent?

e.g. TCGA-08-0531 -> Tumor ; TCGA-12-0615 -> Control ; TCGA-26-1438 -> Normal ;

Thanks for the link to firebrowse Julia. Great resource!

ADD REPLYlink 3.6 years ago
SplitInf
• 0
Entering edit mode
0

nope, that is not the same

ADD REPLYlink 3.2 years ago
TriS
♦ 3.8k
Entering edit mode
0

Hi Julia,

As bann13 pointed, I dont see the format that you mentioned in (KIRC.merged_only_clinical_clin_format.txt) file, instead I saw "tcga-3z-a93z" - missing the 14-15 position. I am looking for Lung cancer(LUAD) Normal and cancer patient gene expression data. I have also checked LUAD file and I found the same format "tcga-05-4244".

Help will be appreciated.

ADD REPLYlink 3.2 years ago
umesh
• 0
Entering edit mode
0

in the clinical data you won't have data (mostly) about normal or tumor, i.e. 14-15 position simply because they come from the same patient and therefore they won't add duplicate information.

ADD REPLYlink 3.2 years ago
TriS
♦ 3.8k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0