Question

Disease Free Survival Time & Event

3

Entering edit mode

13.3 years ago

Saman ▴ 260

I am trying to extract a discrete label (0/1) for a classification (supervised learning) task based on two pieces of information available for each patient, dfs.t and dfs.e, in different cancer related studies. My main concern here is the way that researchers fill in the dfs.e column for patients, this is what I think:

dfs.e = 0: no relapse/recurrence/distant-metastasis within dfs.t time frame
dfs.e = 1: relapse/recurrence/distant metastasis/death-caused-by-cancer occurred at dfs.t

Is this interpretation right? I was wondering if there is any conventional way for dealing with data like this.

Thanks in advance,

--Saman

disease classification • 5.3k views

ADD COMMENT • link updated 13.1 years ago by David Quigley 11k • written 13.3 years ago by Saman ▴ 260

score 2 · Answer 1 · 2010-12-17

2

Entering edit mode

13.3 years ago

David Quigley 11k

DFS usually means "disease-free survival" in a cancer context. The exact meaning can be tricky to pin down, though; some groups don't consider a local (or contralateral) recurrence a metastasis, so you have to read the paper carefully.

These data are often used for Kaplan-Meier analysis, where you need to know

If the event occurred, when did it occur
If the event did not occur, what is the last time-point for which I have follow-up?

Usually this is two columns (dfs.e and dfs.t or some other names), as you've described. There is also usually a third column with some distinguishing category (treated/untreated, predicted good outcome/predicted bad outcome, etc). The "survival" library in R is useful for this analysis.

ADD COMMENT • link 13.3 years ago by David Quigley 11k

0

Entering edit mode

I have seen survival package and used it for plotting KM graphs and running tests to compare survival times in two studies.

What I am interested in is to divide patients into two meaningful distinct groups based on dfs.t and dfs.e values. I cannot find anything useful in this regard!

ADD REPLY • link 13.3 years ago by Saman ▴ 260

0

Entering edit mode

From dfs.t and dfs.e you can extract "did/did not recur in a given time interval", which is a useful start. Selecting the time interval shouldn't be done casually; look at clinical papers or talk to a specialist to find out a meaningful interval. Finding the features to build your classifier is up to you...

ADD REPLY • link 13.3 years ago by David Quigley 11k

0

Entering edit mode

I agree with David that looking to clinical literature is a good idea. For some diseases 5yr or 10yr DFS is the metric that everyone cares/talks about. But, another idea is to plot just the frequency of events (where dfs.e=1) versus their time (dfs.t). Is there a linear accumulation of events over time? Or, is there a point at which the rate of accumulation of events changes. That could inform your choice of cutoff.

ADD REPLY • link 12.2 years ago by Obi Griffith 20k