PCA on selected genes.
1
0
Entering edit mode
6.1 years ago
a.archana • 0

I am analysing RNASeq data. I got 45 samples with nine different treatment, each five replicate. 9*5 = 45 (experimental design).

I did PCA and they do not seem to separate quite well based on treatment. pc1 = 12 % pc2 = 10 % pc3 = 7 %

I did pairwise differential expression among the samples. And I am planning to do PCA on DE genes only. I have normalised matrix of all the genes. My question is:

To do PCA on selected gene should I take input of all genes normalised -> correlation matrix -> retrieve DE genes only -> prcomp

OR from normalised gene matrix -> retrieve DE genes -> correlation matrix -> prcomp

Any help would be much appreciated. Thanks

RNA-Seq gene • 2.6k views
ADD COMMENT
0
Entering edit mode

Thanks everyone. Thanks for your input.

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted. Upvote|Bookmark|Accept

ADD REPLY
4
Entering edit mode
6.1 years ago

why would you want to do PCA on DE genes? you already know that these genes are going to separate the treatments since this is, presumably, the comparison you did to get those DE genes in the first place.

if you see that PCA on normalized (!) values does not yield the expected pattern, then this is an indication that you might have batch effects that explain more variability than your conditions of interest. that is valuable information as you might be able to identify the factor that explains the batch effect (typical example would be the type of sample instead of the type of treatment).

ADD COMMENT
0
Entering edit mode

Thanks Friederike.

I should have been more clear in my question.

We are not expecting samples to separate completely on treatment. Some of the treatments have more effect than the other (this is also an observation). Now based on the known study on similar species, we know there are some genes which are known to have more effect on treatment. We want to see if these known genes have similar effect on our study. Which is why I want to do PCA on selected genes.

I hope it make sense. Thanks

ADD REPLY
1
Entering edit mode

So select the genes of interest and do a heatmap with them. You can also do a Venn diagram to see how much of the known genes are found by your study as well.

ADD REPLY
2
Entering edit mode

I agree with both Friederike and h.mon here. Following from h.mon's point, just do the clustering with heatmap and think about further refining your DEGs via regression modelling and 'gene signature' creation.

The only realm where I have seen PCA used on DEGs is when one would want to develop a new scoring system based on eigenvalues, as is performed in WGCNA, i.e., network analysis.

ADD REPLY
0
Entering edit mode

Thanks h.mon.

I am planning to do this. I have a similar question for this as well. If I do heatmap. To calculate z score (1)Should I calculate z score on entire gene(normalised reads) and get my gene of interest to do heatmap (2) OR Take my gene of interest(normalised reads) and then calculate z score. Does this change the result?

Thanks

ADD REPLY
0
Entering edit mode

I don't understand the distinction you're making, so I would just say:

  1. make a matrix of normalized expression values for all your genes of interest (e.g., your DEG) and all your samples
  2. use, for example, pheatmap, which will allow you to see the effects of row- and column-based z-scores as well as the actual values without z-value transformation (you can set this via the parameter scale = c("none","row","column") )
ADD REPLY

Login before adding your answer.

Traffic: 2329 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6