Question

why the TPM value is not same?

0

Entering edit mode

4.4 years ago

star ▴ 350

I would like to do normalizing on my data using TPM methods like what explained https://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/

TPM is very similar to RPKM and FPKM. The only difference is the order of operations. Here’s how you calculate TPM:

Divide the read counts by the length of each gene in kilobases. This gives you reads per kilobase (RPK).

Count up all the RPK values in a sample and divide this number by 1,000,000. This is your “per million” scaling factor.

Divide the RPK values by the “per million” scaling factor. This gives you TPM.

I used the below codes but I do not know why the output is not correct?

CODE:

RPK<- data.matrix(Data [-1] / Data$Length.Kbp)
TPM <- t(t(RPK)*1e6 / colSums(RPK))

Data:

                Length.Kbp    FB_1    FB_2    FB_3
1:15040-15500         0.46       0       4       0
1:108570-109500       0.93       1       5       0
1:248240-249110       0.87       2       1       1

RPK:

                                 FB_1           FB_2    FB_3
1:15040-15500                       0       8.695652       0
1:108570-109500              1.075269       5.376344       0
1:248240-249110              2.298851       1.149425       1.149425

TPM:

                                  FB_1             FB_2    FB_3
1:15040-15500                        0       2577162.0       0
1:108570-109500                70641.81       353209.1       0
1:248240-249110              2000000.00       1000000.0      1000000.0

while for the first row (related value to FB_2) should be like :

8.695652 * 1000000 / 15.221422 =571277.2

R RPKM TPM edgeR normalizing • 1.6k views

ADD COMMENT • link updated 4.4 years ago by Ram 43k • written 4.4 years ago by star ▴ 350

0

Entering edit mode

Did you try storing colSums2(RPK) in a vector and verifying a few values in it to ensure you're dividing by the right value? There is something odd about the third row - it seems to be exactly 1e6 x original_counts.

Also, your datasets don't conform to the code. If RPK <- data.matrix(Data / Data$Length.Kbp) is exactly what was run, then RPK would also have a column titled Length.Kbp with all values = 1. Did you remove that column?

ADD REPLY • link 4.4 years ago by Ram 43k

0

Entering edit mode

Thanks for your reply! Yes, I have removed it and Edited the cod now.

In my cod I just used transpose :

TPM <- t(t(RPK)*1e6 / colSums(RPK))

and it looks work. but I don`t know what exactly happens after two times transposing?

ADD REPLY • link 4.4 years ago by star ▴ 350

0

Entering edit mode

Are you sure you should be using colSums and not rowSums? You're dividing transposed-RPK by per-sample RPK sums, not per-region RPK sums. Try using rowSums instead.

ADD REPLY • link 4.4 years ago by Ram 43k

0

Entering edit mode

I want to divide RPK per-sample RPK based on the below explanation:

1) Divide the read counts by the length of each gene in kilobases. This gives you reads per kilobase (RPK).

2) Count up all the RPK values in a sample and divide this number by 1,000,000. This is your “per million” scaling factor.

3) Divide the RPK values by the “per million” scaling factor. This gives you TPM.

ADD REPLY • link 4.4 years ago by star ▴ 350

0

Entering edit mode

Please read those three statements and interpret them to get to the denominator you need to use. I can help you with specific questions, but I will not read English and translate it to reproducible code for you - you should be able to do that on your own.

ADD REPLY • link 4.4 years ago by Ram 43k

0

Entering edit mode

I have edited your post and updated the TPM object with the formula above. Going forward, please give us the exact code you use - it is impossible to help you when you withhold critical information.

ADD REPLY • link 4.4 years ago by Ram 43k