Analysis of complex survival data subject to semi-competing risks
This thesis is devoted to develop novel methods for the analysis of complex survival data subject to semi-competing risks. In many clinical trials, multiple time-to-event endpoints, including the primary endpoint (e.g., time to death) and secondary endpoints (e.g. progression-related endpoints) are...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/102662 http://hdl.handle.net/10220/47771 |
Summary: | This thesis is devoted to develop novel methods for the analysis of complex survival data subject to semi-competing risks. In many clinical trials, multiple time-to-event endpoints, including the primary endpoint (e.g., time to death) and secondary endpoints (e.g. progression-related endpoints) are commonly used to determine treatment efficacy. These endpoints are often biologically related. One secondary endpoint and one primary endpoint often arise in medical research and form semi-competing risks where the observation of the secondary endpoint will be censored by the primary endpoint when it occurs earlier than the secondary endpoint, but not vice versa. There has been a longstanding research interest in the analysis of semi-competing risks data.
We focus on the study of complex survival data including clustered data, data subject to two sets of semi-competing risks and data with ultrahigh dimensional covariates.
In many biomedical studies, subjects may also be nested within clusters, such as patients in a multi-center study, leading to possible association among event times due to unobserved shared factors across subjects. Analysis for clustered event times with semi-competing risks is needed. We propose a flexible semiparametric modeling framework where a copula model is employed for the joint distribution of the nonterminal and terminal event times, and their marginal distributions are modeled by Cox PH models with random effects to incorporate association of event times within cluster. We develop a nonparametric maximum likelihood estimation procedure via a Monte Carlo EM algorithm, and establish desirable asymptotic properties of the resulting estimators.
In addition, multiple endpoints including the primary endpoint and two secondary endpoints are often encountered in clinical trials. These give rise to data that are subject to two sets of semi-competing risks. We propose a novel statistical approach that jointly models such data via a pair of copulas to account for multiple dependence structures, while the marginal distribution of each endpoint is formulated by a Cox PH model. And we develop an estimation procedure based on pseudo-likelihood method.
Besides, we consider semi-competing risks data collected together with ultrahigh-dimensional gene features in modern cancer studies, where the number of gene features p is extremely larger than sample size n.
Microarray-based expression studies have demonstrated that gene expression patterns are associated with outcomes of breast cancer patients. To identify the important gene features plays an essential role in disease classification, survival prediction and the development of precision medicine. We aim to identify such important features that greatly impact both the nonterminal and the terminal endpoints. We propose a model-free screening method based on the ranking of the correlation between the features and the joint survival function of two endpoints. The method incorporates the relationship between two endpoints and is simple and easy to compute. We establish the ranking consistency and sure independent screening properties for the proposed method.
Extensive simulation studies have been carried out and results show promising finite sample performance of all proposed methods. Analysis of real data sets from various medical studies demonstrate the practical utility of our proposed methods. |
---|