Biostar Beta. Not for public use.
correlation between data
0
Entering edit mode
13 months ago
star • 150
Netherlands

I have some ChIP-seq data from different studies and I like to normalise them based on TMM and Upperquartile methods from edge R packages and then see which method is better for my data.

As you see their normalized data are different in the table for each method but when I got a correlation and draw heatmap plot, all the value is the same.

  • I want to know finding correlation is a good way and why all value after cor() is the same?
  • drawing heatmap on the result of the correlation is correct?

    > dge <- DGEList(counts=data)
    
    > data_upperquartile <- calcNormFactors(dge, method="upperquartile")
    
    > data_upperquartile<- data.frame(cpm(data_upperquartile,normalized.lib.sizes = TRUE))
    
    > data_upperquartile[c(100:105),c(1:3)]
    
     A                                B                               C
    0.1007585                        0.1230328                       0.01741683
    0.1151526                        0.1730148                       0.03483366
    0.1439407                        0.2268417                       0.04644487
    0.1727289                        0.2768238                       0.05225048
    0.1631328                        0.2460656                       0.04644487
    0.1103546                        0.1461014                       0.02902805
    
    
    >data_TMM <- calcNormFactors(dge, method="TMM")
    
    >data_TMM<- data.frame(cpm(data_TMM,normalized.lib.sizes = TRUE))
    
    > data_TMM[c(100:105),c(1:3)]
    
    
    A                                 B                               C
    0.09484844                        0.1153246                       0.01901974
    0.10839821                        0.1621753                       0.03803947
    0.13549776                        0.2126298                       0.05071930
    0.16259732                        0.2594804                       0.05705921
    0.15356413                        0.2306493                       0.05071930
    0.10388162                        0.1369480                       0.03169956
    
    
    > cor_data_upperquartile <- cor(data_upperquartile)
    
                        A              B                        C
     A             1.0000000          0.9878731            0.9383675
     B             0.9878731          1.0000000            0.9739410
     C             0.9383675          0.9739410            1.0000000
    
    
     >cor_data_TMM <- cor(data_TMM)
    
                        A              B                        C
     A             1.0000000          0.9878731            0.9383675
     B             0.9878731          1.0000000            0.9739410
     C             0.9383675          0.9739410            1.0000000
    
ADD COMMENTlink
0
Entering edit mode
14 months ago
Dinara • 0
New York

Normalization doesn't change the correlation. It is just a mathematical fact, that cor( x , y )=cor( ax , by ), where a and b are positive scalar values.

ADD COMMENTlink
1
Entering edit mode

As a remark, that is only true if normalization uses linear factors such as in TMM or the geometric mean approach of DESeq2. If you do something like quantile normalization or loess regression, cor will change dramatically.

ADD REPLYlink
0
Entering edit mode

Thanks for your reply. So how can I find which method is better?

ADD REPLYlink
1
Entering edit mode

I recommend reading the csaw manual on ChIP-seq normalization. It explains the concepts quite nicely and contains code to plot MA plots to visually check the normalization "efficiency".

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1