Biostar Beta. Not for public use.
What is RPKM/FPKM > 1 or 3 or 5?
2
Entering edit mode
22 months ago
Bangalore

Hello all,

I have a very basic question. In many papers and analysis we see analysis are been doing using genes having a threshold like RPKM/FPKM >1 or 3 or 5. What is this threshold? What does it mean and how do you calculate it? I'm having trouble understanding this and finding papers/articles to explain this. Any help is appreciated.

Thanks, Susmita

ADD COMMENTlink
0
Entering edit mode

For a nice explanation, also see StatQuest

ADD REPLYlink
2
Entering edit mode
14 months ago
Freiburg, Germany

The threshold itself is pretty arbitrary and should be based off of your own data. In general, what people are trying to do with this is to look at only "expressed" genes, for some hopefully reasonable meaning of expressed.

RPKM/FPKM is computed as follows:

"number of reads" / "length of gene or region in kb" / (total reads in millions)

For paired-end data, substitute "number of fragments" for reads. You can also get these values from a number of programs, such as stringTie and RSEM (I think RSEM produces them too, but don't quote me on that).

ADD COMMENTlink
0
Entering edit mode

And how do you decide which ones are the "expressed" genes?

ADD REPLYlink
0
Entering edit mode

Those which have their RPKM/FPKM above a certain threshold are considered "expressed".

ADD REPLYlink
0
Entering edit mode

Using an arbitrary cutoff on these expression values - as you say typically 1, 3 or 5.

ADD REPLYlink
0
Entering edit mode

Does this cutoff means that all the genes in a particular sample are having at least this cut-off RPKM?

ADD REPLYlink
0
Entering edit mode

Yes. You filter the obtained RPKM counts to only keep genes with expression above that cut-off.

ADD REPLYlink
1
Entering edit mode

Important to remember, though, that, due to the way that these units are derived, the values are not cross comparable across samples.

To derive RPKM/FPKM expression units, samples are only normalised 'within themselves' - there is no cross-sample normalisation. Thus, due to external factors for which this normalisation method does not control, a value of 10 in one sample is not the same as 10 in another. For this reason, in addition, these units are not suitable for differential expression analysis and you should abandon their usage if your aim is to conduct differential expression.

ADD REPLYlink
0
Entering edit mode

What would you suggest instead?

ADD REPLYlink
0
Entering edit mode

Obtain the raw counts, if you can, and then use EdgeR or DEseq2 for performing normalisation and differential expression comparisons.

ADD REPLYlink
1
Entering edit mode
19 months ago
United States

As mentioned, the purpose is to set a cutoff for what is considered 'expressed'. This is also where the concept of TPM (transcripts per million) started becoming popular rather then RPKM/FPKM since the attempt is to quantify the expression in a complete transcript. For what is considered a good cutoff is debatable by analysis groups. The Sequence Quality Consortium (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4810084/) and (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4321899/) is an FDA-led group that was put together since pharmaceutical companies were submitting RNA-Seq results rather then microarray data as proof of expression data. This group did a fairly good assessment on the consistencies and relative cutoffs for RNA-Seq data. They reported that as low as 1 FPKM was verifiable by RT-PCR. It is also well known that variability in RNA-Seq data greatly increases the lower expression.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3