Differential Expression of a normalized RNAseq expression dataset
1
0
Entering edit mode
6.0 years ago
Arko ▴ 30

I have a RNAseq expression matrix obtained from TCGA (TCGA-GBM) it's been normalized. The columns are for sample IDs and the rows are for genes. I want to do a differential expression analysis. I also have the copy number matrix for the same dataset from cbioportal but I'm not exactly sure on how to use this to my advantage and how to divide the given dataset based on amplification. I am completely clueless on how to use DeSeq2 (creating the DeSeq object, without the required information). If someone could elaborate on the division of the samples based on copy number amplification and differential expression analysis, that would help a lot.

RNA-Seq R DeSeq TCGA glioblastoma • 2.3k views
ADD COMMENT
3
Entering edit mode
6.0 years ago

If you have downloaded an expression matrix, then you have most likely obtained FPKM-UQ normalised 'counts' (expression levels), or (possibly, and mistakingly) downloaded the microarray normalised expression log base 2 ratios. You must not use either of these for differential expression analysis via DESeq2 or any other tool. If you want to conduct your own differential expression analysis, then obtain the RSEM or HTseq raw counts for each sample and then merge them together into a single data matrix (of raw counts). That, then, would become your input to DESeq2.

For GBM, I can see that RNA-seq was done with RSEM, which is fine. Here is the sample listing on the GDC Legacy Archive:

So, here's the plan:

  1. Download those raw count TXT files by obtaining the file manifest and using GDC Data Transfer Tool
  2. Input the data by looking Here

Then, proceed from there by following the tutorial...

Kevin

ADD COMMENT
1
Entering edit mode

Thanks Kevin, I shall follow up on it and let you know how it works out!

ADD REPLY
0
Entering edit mode

Good luck - stay in touch.

ADD REPLY
0
Entering edit mode

A rather silly question, but from the file manifest how many of the files are required to be downloaded, as from what I see each .result file contains counts and corresponding genes apart from the annotation.txt files. There is normalized and RAW data, so if I were to download only the RAW data how would I filter them?

Apart from that the CNV link seems to be broken.

I did read your publication and the methodology that was implemented, it is quite helpful and the workflow seems to be similar to what I'm trying to do.

ADD REPLY
0
Entering edit mode

I see what you mean. You can just remove files that you don't need from the manifest. It is just a plain text file. Are you using Mac or Linux? The command you'd need would be:

grep -e ".rsem.genes.results" Manifest.txt

If you are using Windows, you could possibly edit the flle in Excel.

I say this in assuming that the rsem.genes.results files contain the estimated 'raw' counts.

You may additionally want to look at this very old thread: Interpreting TCGA .rsem.genes.results and .rsem.genes.normalized_results files.

ADD REPLY

Login before adding your answer.

Traffic: 2567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6