how to get correlation between the counts over each gene at the same timepoint (two replicates)
1
1
Entering edit mode
5.3 years ago
Lila M ★ 1.2k

Hi everybody, I have the counts (obtained by HTSeq) for a lot of genes(~58,000) at different time points (replicates).

gene                           t1_S1    t1_S2
ENSG00000000003.14              0        0
ENSG00000000005.5               0        0
ENSG00000000419.12              1        3
 [...]

I woul like to calculate the correlation between the counts over each gene at the same timepoint to understand how reproducible the replication timing and progression is for each repeat. Any suggestions?

RNA-Seq HTSeq replication correlation • 2.8k views
ADD COMMENT
1
Entering edit mode

Check out the cor function in R. Different kinds of correlation measures are available, including Spearman and Pearson.

ADD REPLY
1
Entering edit mode

This is what I am doing, but as I have a huge number of genes, R gets stuck . This is what I'm trying:

xx <- read.table(file="matrix_count", sep="\t", header = T)
cor(t(xx), method="pearson")

any other suggestion?

ADD REPLY
1
Entering edit mode

Do I understand correctly that you aim to calculate 58000 correlation coefficients?

ADD REPLY
1
Entering edit mode
ADD REPLY
5
Entering edit mode
5.3 years ago

Do you want to test the correlation between the different timepoints or between the different genes.

Let say you have 10 timepoints and 58000 genes

To test the different timepoints :

cor(xx, method="pearson")

will give you a 10x10 matrix , so 100 correlations calculation (even though I guess the cor function is smart and should not compute twice the cor function between col A and col B ; and between col B and col A ; thus 45 correlations should be computed)

To test the different genes (in a pairwise manner) :

cor(t(xx), method="pearson")

here a 58,000 x 58,000 matrix , = 3.364e+09 correlations (or 1,681,971,000 correlations if cor function is smart). That's why R crashes, it will take to long to compute so many correlations.


Edit based on OP comments

Use the coefficent of variation : https://en.wikipedia.org/wiki/Coefficient_of_variation :

dat.coeff.var <- apply(dat,1,function(x){sd(x)/mean(x)})
ADD COMMENT
1
Entering edit mode

Maybe I miss explain what I want. I want to know the correlation for, lets say gene ENSG00000000003.14 in the two replicates, to see if there are differences in each replicate for each gene. I'm not interested in the correlation ENSG00000000003.14 and ENSG00000000005.5. Has more sense?

ADD REPLY
1
Entering edit mode

Ok so you want to check the correlation between replicates : then cor(xx,method="pearson")

ADD REPLY
0
Entering edit mode

Not exactly, because it gives to me the cor between replicates, and what I want to know is if the counts for the gene ENSG00000000003.14 is different in t1_S1 and t1_S2 (and also for the others genes)

ADD REPLY
2
Entering edit mode

Use maybe the coefficent of variation : https://en.wikipedia.org/wiki/Coefficient_of_variation : dat.coeff.var <- apply(dat,1,function(x){sd(x)/mean(x)})

ADD REPLY
1
Entering edit mode

that's exactly what I want! thanks!

ADD REPLY
0
Entering edit mode

ok great. I modified my answer to archive the right answer. If the answer suits you you can accept the question.

ADD REPLY
1
Entering edit mode

There is no correlation for a single pair of measures. The correlation between samples will give you a general view of how similar samples are, and you can plot the values to check outliers. However, you have to take into account sample sequencing depth.

ADD REPLY
0
Entering edit mode

How do you know any other way to do that?

ADD REPLY

Login before adding your answer.

Traffic: 2127 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6