Disease Free Survival Time & Event
1
3
Entering edit mode
13.3 years ago
Saman ▴ 260

I am trying to extract a discrete label (0/1) for a classification (supervised learning) task based on two pieces of information available for each patient, dfs.t and dfs.e, in different cancer related studies. My main concern here is the way that researchers fill in the dfs.e column for patients, this is what I think:

  • dfs.e = 0: no relapse/recurrence/distant-metastasis within dfs.t time frame

  • dfs.e = 1: relapse/recurrence/distant metastasis/death-caused-by-cancer occurred at dfs.t

Is this interpretation right? I was wondering if there is any conventional way for dealing with data like this.

Thanks in advance,

--Saman

disease classification • 5.3k views
ADD COMMENT
2
Entering edit mode
13.3 years ago

DFS usually means "disease-free survival" in a cancer context. The exact meaning can be tricky to pin down, though; some groups don't consider a local (or contralateral) recurrence a metastasis, so you have to read the paper carefully.

These data are often used for Kaplan-Meier analysis, where you need to know

  1. If the event occurred, when did it occur

  2. If the event did not occur, what is the last time-point for which I have follow-up?

Usually this is two columns (dfs.e and dfs.t or some other names), as you've described. There is also usually a third column with some distinguishing category (treated/untreated, predicted good outcome/predicted bad outcome, etc). The "survival" library in R is useful for this analysis.

ADD COMMENT
0
Entering edit mode

I have seen survival package and used it for plotting KM graphs and running tests to compare survival times in two studies.

What I am interested in is to divide patients into two meaningful distinct groups based on dfs.t and dfs.e values. I cannot find anything useful in this regard!

ADD REPLY
0
Entering edit mode

From dfs.t and dfs.e you can extract "did/did not recur in a given time interval", which is a useful start. Selecting the time interval shouldn't be done casually; look at clinical papers or talk to a specialist to find out a meaningful interval. Finding the features to build your classifier is up to you...

ADD REPLY
0
Entering edit mode

I agree with David that looking to clinical literature is a good idea. For some diseases 5yr or 10yr DFS is the metric that everyone cares/talks about. But, another idea is to plot just the frequency of events (where dfs.e=1) versus their time (dfs.t). Is there a linear accumulation of events over time? Or, is there a point at which the rate of accumulation of events changes. That could inform your choice of cutoff.

ADD REPLY

Login before adding your answer.

Traffic: 2706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6