Similar to this question on biostars (Which Expression Units To Use, Fpkm Or Rpkm ?), can someone provide some insight on when using RNA-seq raw count data is advantageous over using FPKM (and vice-versa)? Thanks!
Similar to this question on biostars (Which Expression Units To Use, Fpkm Or Rpkm ?), can someone provide some insight on when using RNA-seq raw count data is advantageous over using FPKM (and vice-versa)? Thanks!
In most cases, raw counts are preferred. The exception to this would be things like isoform comparisons, where using raw counts would vastly decrease the data at hand (whether you use FPKM or "expected counts" there is dependent upon how you perform the analysis). If you're curious why raw counts are preferred, it's because they convey precision information useful in downstream statistics (i.e., you know the technical variance of a measurement and how to weight measurements...something that can't be said of RPKMs).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
if this is true, then why did the authors of ballgown still elect to use FPKM as late as 2016?
The Ballgown developers don't call themselves statisticians, much as I am aware.
An update (12th August 2018):
You should abandon RPKM / FPKM normalisation. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis: Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis
In their key points:
Note - FPKM is essentially the same as RPKM