Survival plot between low and high expression of gene
2
0
Entering edit mode
5.7 years ago
Biologist ▴ 290

Hi,

I wanted to make a survival plot showing between low and high expression samples of a gene. I followed this cutpoint using maxstat package to divide samples into low and high. In that tutorial they used rsem normalised counts gene expression data.

I have raw counts from featurecounts package. Along with that I also have rpkm data also.

First I used rpkm data and plotted the survival and it looks like this: survival plot b/w low and high with rpkm expression This showed p-value = 0.026.

Secondly, I used normalized counts [converted counts to normalised counts using Deseq2] and plotted the survival and it looks like this: survival plot b/w low and high with normalised counts I see the p-value = 0.1

Both plots have same pattern, there is no change at all but why the p-values are totally different? When I used rpkm I see that it is significant and when I used normalized counts it is not significant. What could be the reason?

Which units of gene expression data I should use to divide samples into low and high?

RNA-Seq r survival geneexpression • 3.2k views
ADD COMMENT
2
Entering edit mode
5.7 years ago

But there is a very important difference between the plots, namely the "low" values in the bottom plot are MUCH closer to the "high" values. This is why there's a difference in the P-values. You can see this in the "Strata" plot, where there's a constant difference of 1 between the top and bottom set of plots.

ADD COMMENT
0
Entering edit mode

Oh yes. thank you. What could be the reason for that? because of different expression data?

And what would you recommend to use for dividing samples into low and high based on expression, normalized counts or rpkm? or fpkm or any other?

ADD REPLY
1
Entering edit mode

For a single gene it won't matter, unless you have isoform switching or something like that. If your gene-level metric is a summary of transcript-level metrics then TPM is going to be the most useful.

ADD REPLY
0
Entering edit mode

Hi Devon,

Small doubt. TPM converted from raw feature counts can be used for this Analysis? I used the following function to convert.

tpm <- function(counts, lengths) {
  rate <- counts / lengths
  rate / sum(rate) * 1e6
}
ADD REPLY
1
Entering edit mode

That looks right at least.

ADD REPLY
1
Entering edit mode
5.7 years ago

The curves look slightly different because the maxstat algorithm in the first case assigns 18 samples in the low group, but in the 2nd case, there are 20 samples. This means that the fraction of samples surviving in the second group would be higher at most of the event points, which makes the blue curve in the 2nd group to move a little bit up and come closer to the yellow => low p-value.

And what would you recommend to use for dividing samples into low and high based on expression, normalized counts or rpkm? or fpkm or any other?

If your choice of count-algorithm gives different results, then the right Q to ask is if the results are robust. And according to me, they are not. Also, there is not enough power because maybe you are taking low/high as a thin boundary line, which is blurring the distinction between the two. You may try categorizing something like low|medium|high and check if the results are robust for the low vs high group by all of the count methods. Robustness is more important than any particular method because all of them are essentially measuring the same thing.

ADD COMMENT
0
Entering edit mode

Thank you. I will do that with normalized counts. And do you think using rpkm for the cutpoint a bad idea?

ADD REPLY
2
Entering edit mode

As I said above, rpkm and normalized counts are measuring the same things - but in a different way. So if your choice of counts changes the result, you may dig deeper why it is happening by looking which samples are changing from high -> low group and why. There is no universal answer if rpkm is better vs normalized count. You have to get your hands dirty!

ADD REPLY

Login before adding your answer.

Traffic: 2594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6