Question: Can this PCA be a highly good result?
0
Entering edit mode
6 months ago
fernardo • 130
Italy

Hello All,

Can somebody please tell me if this PCA result but a good result and which way recommended best to validate that?

Note: the PCA is based on around 20 features and the samples are around 100.

enter image description here

Thanks a lot

Entering edit mode
1

What question do you want to answer?

How to add images to a Biostars post

ADD REPLYlink 6 months ago
ATpoint
17k
Entering edit mode
0

Actually I asked a question not trying to answer one :) thanks for the link too.

ADD REPLYlink 6 months ago
fernardo
• 130
1
Entering edit mode
6 months ago
genomax 68k
United States

We can see a clear separation with respect to the two components you are plotting but beyond that there is no information to provide any judgement. You need to provide additional information about what experiment you are working on and are these components representing the main effect you are trying to study.

ADD COMMENTlink 6 months ago genomax 68k
Entering edit mode
0

Thanks. The study is from two conditions (disease vs normal).

ADD REPLYlink 6 months ago
fernardo
• 130
Entering edit mode
1

Then it looks like you have a clear difference between them.

ADD REPLYlink 6 months ago
Devon Ryan
90k
Entering edit mode
0

You are just doing PCA using the differentially expressed genes, right? - 20 genes? You may also want to show the separation in a cluster dendrogram and heatmap.

ADD REPLYlink 6 months ago
Kevin Blighe
43k
Entering edit mode
0

@Devon and @Kevin, thanks for both. I am picking up genes randomly and most of them are not differentially expressed or at least not statistically significant in that term. So my point is that, perhaps among those 20 genes only 3 of them differentially expressed and make such out. Can this be significant? Plus, heatmap and clustering would be enough to prove this separation? and also how about if I involve a classification method such as SVM? even I already applied and accuracy and Kappa value is too high.

ADD REPLYlink 6 months ago
fernardo
• 130
Entering edit mode
0

Picking up genes randomly does not sound scientific in this situation - why would you do that? Why not do PCA on the entire dataset?

Usually, people perform a differential expression analysis and then subset their original data matrix with the statistically significant genes. Clustering with heatmap generation may then be performed on the subset data matrix.

ADD REPLYlink 6 months ago
Kevin Blighe
43k
Entering edit mode
0

Two answers are here.

First, if a subset of gene gives me the same output as the entire dataset, why is it not useful and scientific with less effort and information, gives good and same result? what do you think?

Second, following what others generally do like DE analysis and heatmap is not mandatory and it prevents making new approaches, at least I believe.

ADD REPLYlink 6 months ago
fernardo
• 130
Entering edit mode
0

Hey, well, in that case, you should be performing the random samplng many times, and then checking the reproducibility of the results. Another name for this is bootstrapping.

I do not 100% understand your second point. Clustering / heatmap can show to what degree a panel of genes can segregate, for example, cases and controls.

ADD REPLYlink 6 months ago
Kevin Blighe
43k
Entering edit mode
0

Yes, exactly, I do random sampling / bootstrapping.

ADD REPLYlink 6 months ago
fernardo
• 130

Login before adding your answer.

Powered by the version 1.5