Why FPKM value of gene expression profiles are very close to zero?
1
0
Entering edit mode
8.6 years ago
jack ▴ 960

Hi all,

I have gene expression profile of human in FPKM scale, when I take look into expression distribution, the mean is close to zero. Does it make sense ? the FPKM value for most of the genes are small and it's close to zero.

When I create box plot of expressed genes, the median is very close to zero. Is it normal or something might going wrong with my data?

next-gen RNA-Seq • 2.9k views
ADD COMMENT
1
Entering edit mode
8.6 years ago
TriS ★ 4.7k

if you have non-transformed FPKM/RPKM data and you plot a histogram, you will find a tall peak close to zero and then a long tail to much higher values. generally you log2 transform your data to "fix" this skewed distribution, or a few other options available with various pros-cons. this kinda of reflects what happens in a cell, you will have most of the transcript at low concentrations and a few (i.e. housekeeping) at higher concentrations/reads. however, for downstream analysis, this distribution is not the best one since it puts much weight on the low-count transcripts.

ADD COMMENT
0
Entering edit mode

...this kinda of reflects what happens in a cell, you will have most of the transcript at low concentrations and a few (i.e. housekeeping) at higher concentrations/reads.

I agree. To emphasis this point, think that in a total RNA extract, you'll have about 95% rRNA (only a few genes !), the remaining 5% being split mostly between tRNAs and mRNAs.

Log transformation of your data (you might need pseudocounts) before the boxplot is a good idea.

ADD REPLY

Login before adding your answer.

Traffic: 3434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6