Question

heat map and Survival plot

0

Entering edit mode

5.9 years ago

Dataminer ★ 2.8k

Dear Community,

I am struggling with a problem: I have RNAseq data from 24 samples and for these I have the survival data as well. What I want his to figure out the most interesting genes based upon Survival analyses and clustering.

The sample data that I have is

RNAseq (read counts):

Gene    S1_cellTr   S2_cellTr   S3_cellTr   S4_cellTr   S5_cellTr   S6_cellTr   S7_cellTr   S8_cellTr   S9_cellTr   S10_cellTr  S11_cellTr  S12_cellTr  S13_cellT   S14_cellT   S15_cellT   S16_cellT   S17_cellT   S18_cellT   S19_cellT   S20_cellT   S21_cellT   S22_cellT   S23_cellT   S24_cellT
gene1   725 230 2344    657 243 246 290 868 1722    1534    86  332 174 812 101 310 530 820 1380    200 520 1416    548 196
gene2   7   18  1   5   2   31  48  7   1   63  1   1   0   0   1   12  1   17  66  2   78  12  76  118
gene3   426 242 854 1490    336 308 20929   3515    858 1205    498 2941    959 555 113 185 1295    5579    9828    173 721 385 468 20169
gene4   11  43  1   110 19  10  3   95  3   86  167 11  274 3779    25  7   2   69  220 16  548 11  38  131
gene5   1567    1392    1224    2317    731 1436    1213    6124    5214    1861    416 1145    2666    2314    408 2939    1108    2178    4357    1699    3199    1462    1623    2056
gene6   1055    1695    209 502 1408    922 738 164 3699    700 589 31  1655    1351    481 2212    645 2023    2932    755 1278    937 193 229
gene7   77  596 185 248 40  145 396 62  437 678 128 103 47  2323    178 106 49  131 1797    110 329 125 244 64
gene8   130 415 1369    518 28  604 1693    311 961 383 959 610 1831    194 562 165 5   2228    1135    593 436 47  34  1170
gene9   8191    2975    3032    3497    3317    1682    3205    5322    13686   6487    2398    3127    2729    4431    1931    8238    2670    10236   10720   3501    11154   6477    14769   7201
gene10  1043    655 917 859 530 457 502 1447    1160    837 259 369 569 2930    412 1911    296 764 1096    722 1266    477 708 920
gene11  70  68  13  256 198 46  1443    1011    154 59  19  119 91  381 109 103 40  95  163 80  533 62  29  920
gene12  1404    755 3237    1653    2719    1460    958 11393   6973    2901    1853    2843    38  4402    411 614 3146    1829    2721    1600    464 3920    3094    2677
gene13  1115    1667    979 1791    424 878 1560    2180    3395    1262    924 1204    4778    1342    476 1779    1571    1827    2810    416 2524    828 1719    1617
gene14  225 2017    687 206 167 260 1519    157 396 365 88  93  122 1105    197 54  132 1944    1562    97  381 765 40  184
gene15  11  60  22  40  70  107 306 10  16  18  13  19  252 9   8   370 10  315 191 64  66  8   33  134

Clinical data

Sample  AGE CLINICAL_STAGE  DAYS_TO_BIRTH   DAYS_TO_COLLECTION  DAYS_TO_DEATH   DAYS_TO_INITIAL_PATHOLOGIC_DIAGNOSIS    DAYS_TO_LAST_FOLLOWUP   DFS_MONTHS  DFS_STATUS  OS_MONTHS   OS_STATUS
S1_cellTr   55  Stage IIB   -20114  903 NA  0   1005    39.26   DiseaseFree 39.26   LIVING
S2_cellTr   80  Stage IB    -29420  349 NA  0   377 22.54   DiseaseFree 22.54   LIVING
S3_cellTr   NA      NA  957 NA  NA  NA  NA      NA  
S4_cellTr   74  Stage IIIC2 -27317  332 50  0   NA  NA      1.64    DECEASED
S5_cellTr   60  Stage IA    -22169  2772    1106    0   NA  9.92    Recurred/Progressed 36.33   DECEASED
S6_cellTr   72  Stage IVB   -26556  199 NA  0   22  0.72    DiseaseFree 0.72    LIVING
S7_cellTr   60  Stage IIIA  -22010  323 NA  0   210 25.99   Recurred/Progressed 32.69   DECEASED
S8_cellTr   71  Stage IA    -26044  253 NA  0   248 11.5    Recurred/Progressed 18.4    LIVING
S9_cellTr   57  Stage II    -20841  158 NA  0   275 22.63   DiseaseFree 22.63   LIVING
S10_cellTr  68  Stage IIIC1 -25063  878 NA  0   1026    13.37   Recurred/Progressed 36.1    DECEASED
S11_cellTr  70  Stage IIB   -25812  274 NA  0   337 23.03   DiseaseFree 23.03   LIVING
S12_cellTr  66  Stage IB    -24260  975 NA  0   1095    62.98   Recurred/Progressed 77.27   DECEASED
S13_cellT   80  Stage IB    -29234  241 NA  0   469 27.14   DiseaseFree 27.14   LIVING
S14_cellT   74  Stage IB    -27077  NA  NA  0   239 22.47   Recurred/Progressed 30.98   DECEASED
S15_cellT   75  Stage IIIC1 -27581  156 NA  0   257 15.8    DiseaseFree 15.8    LIVING
S16_cellT   70  Stage IB    -25573  1416    NA  0   107 48.75   DiseaseFree 48.75   LIVING
S17_cellT   47  Stage IA    -17308  128 NA  0   241 31.96   DiseaseFree 31.96   LIVING
S18_cellT   56  Stage IB    -20797  715 NA  0   774 45.73   DiseaseFree 45.73   LIVING
S19_cellT   83  Stage II    -30351  101 NA  0   52  15.6    DiseaseFree 15.6    LIVING
S20_cellT   60  Stage IA    -22098  1067    NA  0   1064    68.33   DiseaseFree 68.33   LIVING
S21_cellT   53  Stage IV    -19448  254 72  0   NA  1.97    Recurred/Progressed 2.37    DECEASED
S22_cellT   61  Stage IIIC  -22378  86  NA  0   371 34.95   DiseaseFree 34.95   LIVING
S23_cellT   65  Stage IA    -23861  137 NA  0   23  15.77   Recurred/Progressed 18.07   LIVING
S24_cellT   82  Stage IIIC2 -30095  70  NA  0   166 12.58   DiseaseFree 12.58   LIVING

This is just a small subset of sample data. The original data has 80 genes and above 400 samples.

Has anyone done this before and can guide me through the R script or something.

Many thanks in advance.

RNA-Seq Survival-plot K-M plot R • 2.8k views

ADD COMMENT • link updated 5.9 years ago by Kevin Blighe 87k • written 5.9 years ago by Dataminer ★ 2.8k

0

Entering edit mode

What are your covariates?

ADD REPLY • link 5.9 years ago by cpad0112 21k

0

Entering edit mode

This is all I have got. May be OS can be used, I am very new to this kind of analysis.

ADD REPLY • link 5.9 years ago by Dataminer ★ 2.8k

1

Entering edit mode

When you are looking for significant genes, you are supposed to have one or more conditions (factors) to compare against (for eg treated vs untreated, normal vs proband, male vs female, time series - one or more combinations of these).

ADD REPLY • link 5.9 years ago by cpad0112 21k

0

Entering edit mode

These are all female samples and I don't have the normals. What I can think of that the median expression can be made used in deceased and living(may be).

ADD REPLY • link 5.9 years ago by Dataminer ★ 2.8k

score 3 · Accepted Answer · 2018-06-03

For survival, you need an endpoint / event, usually progression-free survival (PFS) or overall survival (OS). You then compare various categorical variables in relation to these, such as tumour grade, tumour stage, etc. With genes, you can create categorical variables based on tertile or quartile expression. or example, you could compare the lower, mid, and upper tertile of EGFR expression in relation to OS. You can then derive a Cox proportional hazards p-value and ratio for each tertile. Take a look here: cox proportional hazard model

For the heatmap, here are many posts on Biostars about how to o that:

Kevin