Biostar Beta. Not for public use.
Log2(x + 1) transformation in gene expression not normally distributed.
0
Entering edit mode
18 months ago
rin • 30

Hi all!

I am using raw counts data from TCGA. As I want to compute the Z-score between tumor and normal samples, I have to first ensure that my data are normally distributed. Until now, I downloaded raw counts, normalized them for their GC content using TCGAanalyze_Normalization() function from TCGAbiolinks, log2(x+1) transfromed them but the distribution is right skewed and definetily not normal, as seen in qqnorm() plots.

Commercial Photography

How could I tackle that? I have been trying to figure it out for days, but I cannot find a solution.

Thanks a lot, R.

ADD COMMENTlink
0
Entering edit mode

Could you reattach the link to your plot, please

ADD REPLYlink
0
Entering edit mode

Edited! Sorry about that! :)

ADD REPLYlink
2
Entering edit mode
15 months ago
Benn 6.9k
Netherlands

Some data can not be transformed into a normal distribution. RNA-seq count data fits a Poisson distribution or a negative binomial distribution. There is a great answer here about how RNA-seq data is distributed.

ADD COMMENTlink
1
Entering edit mode

RNA-Seq is typically fitted to a Poisson or NB-distribution. Claiming that it fits those distributions is a bit strong though.

ADD REPLYlink
1
Entering edit mode
15 months ago
Freiburg, Germany

This is expected, RNAseq data should be right-skewed or multimodal.

ADD COMMENTlink
0
Entering edit mode

@Devon Ryan @b.nota @russhh Really helpful link and answers! Thank you! The reason I want them to be normally distributed is to assess the change between tumor and normal expression by computing a Z-score. Would that be possible / have the same interpretation if they fit a Poisson or NB distribution?

ADD REPLYlink
0
Entering edit mode

Try to use limma or edgeR for this kind of analysis.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1