I'm a newbie to analyzing RNAseq data and wanted to get input on how to proceed forward with data that I received from my PI. The goal of the experiment was to compare gene expression across blood cells from different donors all under the same condition. There are donors of a given phenotype (e.g., S1, S2...) and another phenotype (e.g., P1, P2...) I have been given two files: data that has read counts and data that has been quantile normalized. The files are organized as follows:
Read Count File
Gene S1 S2 P1
B2M 174991 119507 166104
LYZ 69046 35013 24405
Quantile Normalized File
Gene S1 S2 P1
B2M 8449.38 8449.38 2821.43
LYZ 5186.47 1476.66 850.11
I have been informed to assess differences between samples by using the quantile normalized values. However, if I want to compare the expression of B2M, for example, between different samples (e.g., S1 and P1), do I need to normalize the quantile normalized values to a housekeeping gene (e.g., GPI) and then compare or do I just compare the values 8449.38 to 2821.43?
Or alternatively, should I turn to the read count file to re-analyze?
Furthermore, we'd like to do a GSEA for between the two different phenotypes (e.g., S samples versus P samples). Any advice on how to combine the data for S donors and P donors to attempt this?
Any advice, insight or pointing to relevant questions on Biostars is extremely appreciated.