I have a RNAseq expression matrix obtained from TCGA (TCGA-GBM) it's been normalized. The columns are for sample IDs and the rows are for genes. I want to do a differential expression analysis. I also have the copy number matrix for the same dataset from cbioportal but I'm not exactly sure on how to use this to my advantage and how to divide the given dataset based on amplification. I am completely clueless on how to use DeSeq2 (creating the DeSeq object, without the required information). If someone could elaborate on the division of the samples based on copy number amplification and differential expression analysis, that would help a lot.
Thanks Kevin, I shall follow up on it and let you know how it works out!
Good luck - stay in touch.
A rather silly question, but from the file manifest how many of the files are required to be downloaded, as from what I see each .result file contains counts and corresponding genes apart from the annotation.txt files. There is normalized and RAW data, so if I were to download only the RAW data how would I filter them?
Apart from that the CNV link seems to be broken.
I did read your publication and the methodology that was implemented, it is quite helpful and the workflow seems to be similar to what I'm trying to do.
I see what you mean. You can just remove files that you don't need from the manifest. It is just a plain text file. Are you using Mac or Linux? The command you'd need would be:
If you are using Windows, you could possibly edit the flle in Excel.
I say this in assuming that the rsem.genes.results files contain the estimated 'raw' counts.
You may additionally want to look at this very old thread: Interpreting TCGA .rsem.genes.results and .rsem.genes.normalized_results files.