Question

What is the best way to compare transcriptome between different condition?

1

Entering edit mode

5.4 years ago

ch8316f5eyu ▴ 10

My goal is to compare transcriptome between different condition. For example, I KD gene A, gene B, genes C. And I want to know whether the consequence of KD gene A is more close to gene B or gene C. The first way I adopted is to compare CPM of KD gene A Control, KD gene A, KD gene B Control, KD gene B .... But the result is KD gene A Control and KD gene A is more close. So I think I should consider the effect of the background. I next compared the log2foldchange from DESeq2 result. But I lose the p-value information. So, what is the best way to compare the transcriptome of RNA-seq?

RNA-Seq • 1.7k views

ADD COMMENT • link updated 5.4 years ago by bharata1803 ▴ 560 • written 5.4 years ago by ch8316f5eyu ▴ 10

2

Entering edit mode

If you are interested in just knowing which of the knockdowns i.e. B or C is close to lets say A, you can do hierarchical clustering on the counts post applying a transform like vst() or rld() in DEseq2. You can find an example here.

ADD REPLY • link 5.4 years ago by rizoic ▴ 250

0

Entering edit mode

But there is a batch effect. I haven't KD those genes at the same time. Those KD samples have corresponding control. Can I just cluster those without control? If I add control samples, the KD samples are clustered with their corresponding control.

ADD REPLY • link 5.4 years ago by ch8316f5eyu ▴ 10

0

Entering edit mode

I used the first way you mentioned. I 'm not confident because I don' t it is acceptable. Thank you for your help.

ADD REPLY • link 5.4 years ago by ch8316f5eyu ▴ 10

score 2 · Answer 1 · 2018-11-23

Then you can either:

Do the hierarchical clustering on the log2FC produced by DESeq2
You can batch correct the entire expression matrix (using sva::ComBat (see section 7 here) or limma::removeBatchEffect (see page 190 here)) and do the hierarchical clustering on the corrected matrix.

Btw for doing a global comparison of which are more/less similar I would not use p-values (or only significant features) but rather the entire transcriptome.

score 0 · Answer 2 · 2018-11-23

The question of your setting is basically find which change between treated gene vs control gene is closer acroos gene, right? In that case you need to measure the change between the group, then you measure the change acrross gene. Clustering log2FC is okay I guess but I think it will not show any direct relationship because 2 genes up regulation/down regulation can be caused by many things.

I think calculating correleation between 2 genes expression is better. Calculate using normalized expression from CPM function from Limma or EdgeR I forget or VST from DESeq2.

Why I think it is better? Correlation for expression of 2 genes basically check if gene A is affected by gene B or vice versa. If a gene is affecting another gene, it will affect both in control condition and in treatment condition. It means that no matter the condition, there would be an effect of gene A to gene B.