Question

correlation and PCA analysis R code

0

Entering edit mode

6.9 years ago

Bioinfonext ▴ 460

Your suggestions are valuable to me. Now I need one more help from you. I have basic knowledge of R coding, I have done raw read count for multiple libraries. Now I need to do correlation and PCA analysis after normalization. Please suggest which PCA method should I used and also share R code for both analysis and please suggest the any book or notes.

There are two line 240 and 250

Three development stages 5 WEEK (5W), 7W, 9W.

Three tissue: Ca, Co, Pa

each with two biological replicate.

RNA-Seq • 2.6k views

ADD COMMENT • link updated 6.9 years ago by User000 ▴ 690 • written 6.9 years ago by Bioinfonext ▴ 460

2

Entering edit mode

Your question is far too broad and you need to sit back and do a little more homework, you can see this from the following points:

What have you tried? Have you read any review papers? Are you trying to follow a similar paper you found, this is advisable.
You chose a method without having framed a biological question properly:

"Now I need to do correlation and PCA" (these are actually two question)

Why? possibly you don't need but use another technique?

Please suggest which PCA method should I used (third question)

Possibly you mean dimension reduction? What do you want to learn about your samples? Do you want to see grouping of genes or samples and replicates?

I don't know why everyone wants to use PCA on rna-seq, MDS, correspondence analysis, and clustering are often much better suited to see grouping of samples and genes.

also share R code for both analysis and please suggest the any book or notes. (another two questions)

ADD REPLY • link 6.9 years ago by Michael 54k

0

Entering edit mode

I don't know why everyone wants to use PCA on rna-seq, MDA correspondence analysis, and clustering are often much better suited to see grouping of samples and genes.

Finally, I have someone said my mind. Every time I try to explain this to wet lab people or even fellow bioinformatician that PCA is not always the best way for grouping samples or genes unless the dimension reduction is needed. If your observation is less then you should not go for a dimension reduction or if your observation is too high then MDS is also better but not that it will be a gold standard. Thanks for pointing it out and hope OP and also other people get to understand it well. There is a reason why we have partitioning or clustering methods as well.

ADD REPLY • link 6.9 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

Actually My aim is to see the correlation between biological replicate and also want to visualize the difference between tissue specific data clustering.

Thanks for your suggestion, I need to find the publication where people used MDS, correspondence analysis for RNAseq data.

ADD REPLY • link 6.9 years ago by Bioinfonext ▴ 460

1

Entering edit mode

Have you seen the vignette of DESeq2? If you see there is a very nice workflow, even before doing any differential expression analysis you can use them to track either with PCA or MDS or even hierarchical clustering how the samples behave and if they are able to give your tissue-specific segregation. You will have to see on both counts and normalized data this pattern. It is not always that you will use all samples for your downstream DE analysis. You might also find outlier samples which you can remove. Another point is an analysis of surrogate variables to find confounders. This is usually done when you take into sequencing data that are generated by different labs, different time and also different people/machine. These factors are used then to model your data as a reductionist approach. Try to take a look at the tutorial of DESeq2 and you will find all your answers. Now about correlation. You can simply create a dataframe in R with your genes in rows and samples in columns with count data and plot a heatmap on the correlation output of your matrix of count data. You should also do that on your normalized data. This will be able to give you your information. Just find the correlation of the matrix and plot it with a heatmap with unsupervised clustering to find the what you want to see. MDS is another thing. Mind it the MDS in DESeq2 is not the classical MDS plot. Not that you will have to perform an MDS always if your PCA or even clustering method is able to convey your much-needed information you are asking. This is what is explorative analysis all about.

ADD REPLY • link 6.9 years ago by ivivek_ngs ★ 5.2k

score 0 · Answer 1 · 2017-06-01

Hi,

I was new to R as well when I had to do similar analysis, so I spent several days searching and trying to get what I need, because there is a lot of info on this. This is the PCA function I used after normalization step.

## Modified plotPCA from DESeq2 package.
plotPCA.san <- function (rld, intgroup = c("condition", "sizeFactor"), ntop = 60000, returnData = FALSE) 
{
  rv <- rowVars(assay(rld))
  select <- order(rv, decreasing = TRUE)[seq_len(min(ntop, 
                                                 length(rv)))]
  pca <- prcomp(t(assay(rld)[select, ]))
  percentVar <- pca$sdev^2/sum(pca$sdev^2)
  if (!all(intgroup %in% names(colData(rld)))) {
    stop("the argument 'intgroup' should specify columns of colData(dds)")
 }
  intgroup.df <- as.data.frame(colData(rld)[, intgroup, drop = FALSE])
  group <- if (length(intgroup) > 1) {
    factor(apply(intgroup.df, 1, paste, collapse = " : "))
  }
  else {
    colData(rld)[[intgroup]]
  }
  d <- data.frame(PC1 = pca$x[, 1], PC2 = pca$x[, 2], group = group, 
              intgroup.df, name = colData(rld)[,1])
  if (returnData) {
attr(d, "percentVar") <- percentVar[1:2]
return(d)
  }
  ggplot(data = d, aes_string(x = "PC1", y = "PC2", color = "group", label = "name")) + geom_point(size = 3) +       xlab(paste0("PC1: ", round(percentVar[1] * 100), "% variance")) + ylab(paste0("PC2: ", round(percentVar[2] * 100), "% variance")) + coord_fixed() + geom_text_repel(size=3) 

}

Then I sued ggplot2 to plot the PCA, please see this link which I found extremely useful: http://rpubs.com/crazyhottommy/heatmap_demystified