Is it OK doing Principal component analysis (PCA) analysis just with high variance genes?
2
1
Entering edit mode
6.1 years ago
bioinfo_wen ▴ 10

There are so many papers performing PCA analysis with only top high variance genes(top 500 for example), and the plot seems good. Cells in different conditions separated from each other, but I am wondering is that reasonable? Since all the genes in my expression data can represent all the characteristic in my samples, only top 500 high variance selected for PCA analysis seems too artifical and could be on purpose just for separating samples out?

PCA • 10k views
ADD COMMENT
0
Entering edit mode

I think ideally, you would want to use some kind of statistical method to find genes that vary significantly among all your samples. And then plot a PCA with only those genes. There really isn't any point to plot genes that don't vary significantly as they don't offer any information.

I am not sure if there are any good/easy methods for doing that right now, so that's probably why you see a lot of papers just plotting the top X most varying genes.

ADD REPLY
0
Entering edit mode
6.1 years ago
h.mon 35k

One of the assumptions of several microarray and RNAseq analysis packages (e.g. limma, edgeR and DESeq2) is that the number of differentially expressed genes is not too large in relation to the total number of genes being measured. So although 500 is an arbitrary number, yes, it does make sense not to use all genes for these exploratory analyses.

ADD COMMENT
0
Entering edit mode
6.1 years ago
Hussain Ather ▴ 990

If you did it with all of the genes, the plot would just be bigger. Performing PCA with only the high variance genes is to differentiate among those which are close to one another to begin with.

ADD COMMENT

Login before adding your answer.

Traffic: 1526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6