Biostar Beta. Not for public use.
Question about PCA plot using RPKM/FPKM.
Entering edit mode
20 months ago
hxlei613 • 80

Hi, after searching for how to draw PCA plot using FPKM, there is still a question confusing me. For example, I have a FPKM matrix (let's say matrix_sample) for sample1 sample2 .. sampleN control1, control2 ... controlN (column) and gene1, gene2 ... geneN(row). I want to check if the data have batch effect. So ideally I want to see points representing samples and points representing controls are seperated into 2 parts in the plot.

I note that there are 2 method to draw PCA plots.

a) # note that in this method, rows an columns of matrix_sample are geneN and sampleN(or controlN).

   pca = prcomp(matrix_sample)
   plot(pca$rotation[,1],pca$rotation[,2], xlab = "PC1", ylab = "PC2")

b) # note that in this method, matrix_sample is transposed.

   pca = prcomp(t(matrix_sample))

I don't know which method is correct for a) doesn't transpose matrix and b) transpose it. I know that usually row is observation and column is variable. But in biology samples are less than genes so row is gene and column is sample. This can make the matrix more easily to understand. However for me plots are not the same generated by these 2 methods. I don't know why. I didn't find any information or I miss something. Please help me out. Thank you very much!

Entering edit mode
14 months ago
Republic of Ireland

I would not use FPKM units for PCA, nor would I use these units for any analyses where sample comparisons were the intention. FPKM units are produced from a normalisation process that renders samples incomparable because there is nil / zero / no cross-sample normalisation in this method - some also question the within-sample normalisation that produces FPKM, too. If you must use FPKM, at least convert these units to the Z-scale via zFPKM package in R, first, i.e., before running the PCA transformation.

It is perfectly fine to perform PCA on the transposed and un-transposed data matrix. However, in each case, the x variable returned by prcomp() will naturally relate to different things, one being samples and the other your genes.


if retx is true the value of the rotated data (the centred (and scaled if requested) data multiplied by the rotation matrix) is returned. Hence, cov(x) is the diagonal matrix diag(sdev^2). For the formula method, napredict() is applied to handle the treatment of values omitted by the na.action. [from:]

See also:



Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3