Question

gene expression correlation, FPKM, WGCNA

0

Entering edit mode

5.8 years ago

Reza ▴ 10

Hi all, I have expression values in FPKM. Can I directly use average of log2(FPKM) as input values for WGCNA?

Thanks

R RNA-Seq • 2.8k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 5.8 years ago by Reza ▴ 10

score 2 · Answer 1 · 2018-07-19

I googled "FPKM WGCNA". First link: https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/faq.html

Text from first link:

Can WGCNA be used to analyze RNA-Seq data?

Yes. As far as WGCNA is concerned, working with (properly normalized) RNA-seq data isn't really any different from working with (properly normalized) microarray data.

We suggest removing features whose counts are consistently low (for example, removing all features that have a count of less than say 10 in more than 90% of the samples) because such low-expressed features tend to reflect noise and correlations based on counts that are mostly zero aren't really meaningful. The actual thresholds should be based on experimental design, sequencing depth and sample counts.

We then recommend a variance-stabilizing transformation. For example, package DESeq2 implements the function varianceStabilizingTransformation which we have found useful, but one could also start with normalized counts (or RPKM/FPKM data) and log-transform them using log2(x+1). For highly expressed features, the differences between full variance stabilization and a simple log transformation are small.

Whether one uses RPKM, FPKM, or simply normalized counts doesn't make a whole lot of difference for WGCNA analysis as long as all samples were processed the same way. These normalization methods make a big difference if one wants to compare expression of gene A to expression of gene B; but WGCNA calculates correlations for which gene-wise scaling factors make no difference. (Sample-wise scaling factors of course do, so samples do need to be normalized.)

If data come from different batches, we recommend to check for batch effects and, if needed, adjust for them. We use ComBat for batch effect removal but other methods should also work.

Finally, we usually check quantile scatterplots to make sure there are no systematic shifts between samples; if sample quantiles show correlations (which they usually do), quantile normalization can be used to remove this effect.

Relevant part:

but one could also start with normalized counts (or RPKM/FPKM data) and log-transform them using log2(x+1).

I know neither concept (FPKM/WGCNA). ~~I think you should invest a little more effort into your questions before you expect others to put in any effort for you.~~ Apologies if you're in Iran, this site is not accessible there. I'd recommend editing your profile ( https://www.biostars.org/u/edit/34000/ ) and adding your location there so anyone responding to your questions knows you're working with restricted access to the Internet. Also, see if you could possibly mention your location in future posts.