Hello,
The survival plot based on Best separation
of high and low expression samples of GPAM
with Expression cutoff
23.6 FPKM looks like below (This plot is from Human Protein Atlas database)
Survival Plot between high and low samples of GPAM Expression
I took the GPAM FPKM data given in the above database and merged with Clinical data. Everything is stored in a dataframe df
head(df)
times bcr_patient_barcode patient.vital_status FPKM
1 724 TCGA-2Y-A9GS 1 30.3
2 1624 TCGA-2Y-A9GT 1 5.6
3 1569 TCGA-2Y-A9GU 0 26.6
4 2532 TCGA-2Y-A9GV 1 18.4
5 1271 TCGA-2Y-A9GW 1 4.7
6 2442 TCGA-2Y-A9GX 0 19.4
I used survminer
package for the cutpoint to divide low and high expression samples.
library(survminer)
surv_rnaseq.cut <- surv_cutpoint(
df,
time = "times",
event = "patient.vital_status",
variables = c("FPKM")
)
summary(surv_rnaseq.cut)
cutpoint statistic
GPAM_FPKM 23.6 2.834408
Then catogarization is done.
surv_rnaseq.cat <- surv_categorize(surv_rnaseq.cut)
Then to plot the data I did like below:
library(survival)
library(RTCGA)
fit <- survfit(Surv(times, patient.vital_status) ~ FPKM,
data = surv_rnaseq.cat)
pdf("Survival_high_vs_low.pdf", width = 10, height = 10)
ggsurvplot(
fit, # survfit object with calculated statistics.
risk.table = TRUE, # show risk table.
pval = TRUE, # show p-value of log-rank test.
conf.int = TRUE, # show confidence intervals for
# point estimaes of survival curves.
xlim = c(0,3000), # present narrower X axis, but not affect
# survival estimates.
break.x.by = 1000, # break X axis in time intervals by 500.
break.y.by = 0.1,
ggtheme = theme_RTCGA(), # customize plot and risk table with a theme.
risk.table.y.text.col = T, # colour risk table text annotations.
risk.table.y.text = FALSE # show bars instead of names in text annotations
# in legend of risk table
)
dev.off()
The Survival plot I got looks like this Suvival plot with my analysis. Basically I used the same data which they used in Human Protein Atlas database. But the plot with my analysis look different compared to the plot in the database.
What could be the reason for this? Kaplan Meier statistics?
Any help is appreciated.
Yes, the trend looks same but in my plot I see after 2000 days there is down peak of high expression which I didn't observe in plot in HPA. I have used the same cutoff 23.6 which they have used. Don't know what is that small difference.
you have one sample less (247 instead of 248 for one group). Also: did you remove everything FPKM < 1?
Yes, I see that in my case I have one sample less. I guess it won't make much difference. In their analysis they removed Genes with FPKM < 1, In my case I'm looking at only single gene.