Exploiting Censored Information in Self-Training for Time-to-Event Prediction

A common problem in medical applications is predicting the time until an event of interest such as the onset of a disease, time to tumor recurrence, and time to mortality. Traditionally, classical survival analysis techniques have been used to address this problem. However, these techniques are of l...

Full description

Bibliographic Details
Main Authors: Fateme Nateghi Haredasht, Kazeem Adesina Dauda, Celine Vens
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10239393/
Description
Summary:A common problem in medical applications is predicting the time until an event of interest such as the onset of a disease, time to tumor recurrence, and time to mortality. Traditionally, classical survival analysis techniques have been used to address this problem. However, these techniques are of limited usage when considering nonlinear and interaction effects among biomarkers, and high profiling survival datasets. Although supervised machine learning techniques have shown some advantages over standard statistical methods in handling high-dimensional datasets, their application to survival analysis, particularly in the context of feature-based approaches, is at best limited. A major reason behind this is the difficulty in processing censored data, which is a common component of survival analysis. In this paper, we have transformed the time-to-event prediction problem into a semi-supervised regression problem. We utilize a self-training wrapper approach, where an outer layer guides the iterative refinement of predictions. This approach enhances the performance of our model by leveraging confident predictions from censored instances. The self-training wrapper is applied in conjunction with random survival forests as the base learner. In this approach, censored observations are introduced as partially labeled observations since their predicted time (target value) should exceed the censoring time. First, the algorithm builds a base model over the observed instances and then augments them iteratively with highly confident predictions over the censored set, using a smart stopping criterion based on the censoring time. The proposed approach has been evaluated and compared on fifteen real-world survival analysis datasets, including clinical and high-dimensional data. The ability of our proposed approach to integrate partial supervision information within a semi-supervised learning strategy has enabled it to achieve competitive performance compared to baseline models, particularly in the case of a high-dimensional regime.
ISSN:2169-3536