DESeq2 log2FoldChange vs Salmon log2 TPM
0
0
Entering edit mode
5.1 years ago
liartom2 ▴ 10

Hi! I am having trouble with analyzing outputs from salmon-tximport-DESeq2 pipeline. So, naturally, I used counts to perform difseq analysis, And then I use mean TPM values to analyze various aspects, like see median expression level of some subset of genes, etc. One peculiar thing is when I plot log2 TPM treated vs log2 TPM untreated and then color dots based on their being identified as differentially expressed (log2Fold change > 1, or < -1, and p adjusted < 0.05 in DESeq2 output), I see assimilarity of up- and down-regulated genes in relation to the x=y line. Can someone please explain to me why this happens? Here is the resulting plot I can attach some R code that I used too

RNA-Seq R DESeq2 Salmon • 3.5k views
ADD COMMENT
0
Entering edit mode

This is interesting. According to the plot a lot of the highly expressed genes are down-regulated, if you think it's not biology then DESeq normalization was off, did you use the default one?

ADD REPLY
0
Entering edit mode

No, we think this can actually occur because of biology

ADD REPLY
0
Entering edit mode

Then I think the what you see is the effect of different normalization. Try plotting the normalized counts from DESeq (basically MAplot) and see if the picture is more balanced, my guess is that it should be.

ADD REPLY
0
Entering edit mode

Yeah I did MA-plot and it looks ok, but the thing is that I am using TPMs for subsequent analysis and now I'm not sure if they're not a total garbage. DESeq doesn't produce normalized counts for each of conditions though, so I can't use it either. I'm also kinda curious what normalization did they use for this logFC. Based on their tutorials and article I figured that only genes with low read counts should be affected by their normalization algo

ADD REPLY
1
Entering edit mode

You're confusing two things - normalization and dispersion estimate. Normalization is bringing all libraries to a comparable level which is done by multiplying the read counts by a normalization factor which is different for each library and determined using several methods. I guess DESeq and the one used for computing TPM were different. You can get the normalized counts from DESeq2 using counts(dds, normalized=TRUE) and you can use the rlog function if you really want to work with log values.

ADD REPLY
0
Entering edit mode

Thank you, Asaf! Although you explained normalization which I already knew and said nothing about dispersion estimate. Could you please elaborate a little? I am really confused. Can I use these normalized counts to compare expression instead of TPMs?

ADD REPLY
1
Entering edit mode

You can use the normalized expression but it's best if you used the DESeq results directly. You can read about the dispersion estimation in DESeq2 manual and paper, in short, it's their way of estimating the "noise" of each gene.

ADD REPLY
0
Entering edit mode

You can read about the dispersion estimation in DESeq2 manual and paper

Thanks! I will

it's best if you used the DESeq results directly

Yeah, maybe, but don't you want to include all others genes when you analyze let's say ChIP-seq and RNA-seq together and not only those 1000 that are differentially expressed? Or do you mean that I should use only log2fc and baseMean (I still don't understand the use of this) from DESeq output to test hypotheses?

ADD REPLY
0
Entering edit mode

Basically baseMean and log2fc (with SE) give you all you should know

ADD REPLY

Login before adding your answer.

Traffic: 2527 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6