Question

Differential Expression of a normalized RNAseq expression dataset

0

Entering edit mode

6.0 years ago

Arko ▴ 30

I have a RNAseq expression matrix obtained from TCGA (TCGA-GBM) it's been normalized. The columns are for sample IDs and the rows are for genes. I want to do a differential expression analysis. I also have the copy number matrix for the same dataset from cbioportal but I'm not exactly sure on how to use this to my advantage and how to divide the given dataset based on amplification. I am completely clueless on how to use DeSeq2 (creating the DeSeq object, without the required information). If someone could elaborate on the division of the samples based on copy number amplification and differential expression analysis, that would help a lot.

RNA-Seq R DeSeq TCGA glioblastoma • 2.3k views

ADD COMMENT • link updated 6.0 years ago by Kevin Blighe 87k • written 6.0 years ago by Arko ▴ 30

score 3 · Answer 1 · 2018-04-17

3

Entering edit mode

6.0 years ago

Kevin Blighe 87k

If you have downloaded an expression matrix, then you have most likely obtained FPKM-UQ normalised 'counts' (expression levels), or (possibly, and mistakingly) downloaded the microarray normalised expression log base 2 ratios. You must not use either of these for differential expression analysis via DESeq2 or any other tool. If you want to conduct your own differential expression analysis, then obtain the RSEM or HTseq raw counts for each sample and then merge them together into a single data matrix (of raw counts). That, then, would become your input to DESeq2.

For GBM, I can see that RNA-seq was done with RSEM, which is fine. Here is the sample listing on the GDC Legacy Archive:

https://portal.gdc.cancer.gov/legacy-archive/search/f?filters=%7...

So, here's the plan:

Download those raw count TXT files by obtaining the file manifest and using GDC Data Transfer Tool
Input the data by looking Here

Then, proceed from there by following the tutorial...

Kevin

ADD COMMENT • link 4.1 years ago by Kevin Blighe 87k

1

Entering edit mode

Thanks Kevin, I shall follow up on it and let you know how it works out!

ADD REPLY • link 6.0 years ago by Arko ▴ 30

0

Entering edit mode

Good luck - stay in touch.

ADD REPLY • link 6.0 years ago by Kevin Blighe 87k

0

Entering edit mode

A rather silly question, but from the file manifest how many of the files are required to be downloaded, as from what I see each .result file contains counts and corresponding genes apart from the annotation.txt files. There is normalized and RAW data, so if I were to download only the RAW data how would I filter them?

Apart from that the CNV link seems to be broken.

I did read your publication and the methodology that was implemented, it is quite helpful and the workflow seems to be similar to what I'm trying to do.

ADD REPLY • link 6.0 years ago by Arko ▴ 30

0

Entering edit mode

I see what you mean. You can just remove files that you don't need from the manifest. It is just a plain text file. Are you using Mac or Linux? The command you'd need would be:

grep -e ".rsem.genes.results" Manifest.txt

If you are using Windows, you could possibly edit the flle in Excel.

I say this in assuming that the rsem.genes.results files contain the estimated 'raw' counts.

You may additionally want to look at this very old thread: Interpreting TCGA .rsem.genes.results and .rsem.genes.normalized_results files.

ADD REPLY • link 6.0 years ago by Kevin Blighe 87k