Question

Fpkm To Transcript Quantification

0

Entering edit mode

10.3 years ago

biola ▴ 20

hello biostars, i'm quite new to bioinformatics so i'm sorry if i'm going to ask something stupid. well, i'm stuck on a problem with RNASeq data manipulation. i have my samples and I want a quantity that measures their expression, without getting involved in normalisation pipelines provided by edgeR,deseq, since they depends on the samples that you are analysing. given that i can't normalise all my samples together because they belong to different treatments and conditions, i thought to use the FPKM provided by the the output of RSEM, the algorithm used to perform the first part of the analysis.

reading the paper "trascriptome analyses of the human retina identify unprecedented transcript diversity..." (Farkas 2013), I found this expression: "[...]Using the standard of 1–4 RPKM being equal to one transcript/cell, this suggests that we have detected between 1 to 2500 transcripts, at a minimum per cell [54]. Approximately 50% of all expressed transcripts fall within the 5–25 RPKM (5–25 transcripts/cell) range. "

I deduce that the assumption of 1-4 RPKM being equal to one transcript/cell derives from the mortazavi 2008 paper in which they introduced the concept of RPKM as a measure of the expression, but I read they had internal standards on which they could effectively measure the transcript levels and correlate them with RPKM.

given the assumption that I understood the difference between FPKM and RPKM (I hope so), I sincerely don't understand why Farkas and colleagues use RPKM since they perform the analysis whit Illumina HISeq 2000 instrument (that gives paired-end sequence reads so I presume they should use FPKM instead of RPKM), and given that I need a correlation between FPKM and number of transcripts per cell just like what they say in the paragraph i posted before, what I should do? consider my FPKM as an RPKM IFF the library size is the same for my experiments and that of Farkas or, better, the libraries from Mortazavi? is anywhere an assumption like "5 FPKM = 1 transcript/cell"?

thanks in advance!

fab

fpkm transcript expression • 4.1k views

ADD COMMENT • link updated 10.3 years ago by Devon Ryan 104k • written 10.3 years ago by biola ▴ 20

score 3 · Answer 1 · 2014-01-15

3

Entering edit mode

10.3 years ago

Devon Ryan 104k

No clue why they used RPKM instead of FPKM, though they'll normally be either identical or nearly identical.
Using a predefined standard of RPKM (or FPKM) to transcripts per cell is completely and utterly useless. Any paper doing so without having done the standards needed themselves should be automatically rejected.
Your reasoning for not using DESeq/edgeR/etc. (as stated at least) makes no sense. There's no reason you can't normalize everything at once if you're going to be comparing everything. I presume that you left out some important details here.
If you need to know how your FPKM values relate to actual transcript counts then you need to do spike-ins with known concentrations (the internal standards you referred to). There's no way around that unless you know the values for a useful range of genes/transcripts and can use those instead of the spike-ins.

ADD COMMENT • link 10.3 years ago by Devon Ryan 104k

0

Entering edit mode

i would guess that if splicing is accounted for, then Fragments make more sense, but I am not sure. also, while normalization is highly desired, one can work with counts using Fischer exact test.

ADD REPLY • link 10.3 years ago by Pavel Senin ★ 1.9k

0

Entering edit mode

Agreed on the splicing consideration. I hesitate to suggest Fisher's test, since it's often not testing what people naively think ("naively" only since the average biologist doesn't have much of a grasp of data analysis).