RNa-seq RPKM data normalization for clustering
1
0
Entering edit mode
7.7 years ago

I have a dataset from RNA-seq of genes expression in RPKM, one gene per row and four condition. I need clustering that data with kmeans and hierarchical.

My question is: I have to normalize the dataset with transformation to log(x+1) or can use it directly?

RNA-Seq rpkm normalization • 4.0k views
ADD COMMENT
0
Entering edit mode

This page gives some pointers for clustering which I found useful: http://www.statmethods.net/advstats/cluster.html

ADD REPLY
0
Entering edit mode

thank you, what about the RPKM data?

ADD REPLY
0
Entering edit mode

RPKM already is a normalization. Should your clustering weigh heavier on highly expressed genes or should al genes be taken into account to the same extent? That's the question you have to ask for log normalization. Log normalization will squeeze all values closer together, limiting the effect of the strongest expressed genes...

ADD REPLY
0
Entering edit mode
7.7 years ago
seta ★ 1.9k

It's recommended to normalize with log2-transform and then mean-center the data for creating heatmap based on RPKM.

ADD COMMENT
0
Entering edit mode

How the mean-center? I did not understand. Can you explain, please

ADD REPLY
0
Entering edit mode

OP doesn't want to make a heatmap...

ADD REPLY
0
Entering edit mode

Sorry, you're right. However, many tools that generate heatmap, also cluster data at the same time.

ADD REPLY

Login before adding your answer.

Traffic: 2100 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6