Predicting venous thromboembolism (VTE) risk in cancer patients using machine learning

Abstract Background The association between cancer and venous thromboembolism (VTE) is well‐established with cancer patients accounting for approximately 20% of all VTE incidents. In this paper, we have performed a comparison of machine learning (ML) methods to traditional clinical scoring models fo...

Full description

Bibliographic Details
Main Authors:	Samir Khan Townsley, Debraj Basu, Jayneel Vora, Ted Wun, Chen‐Nee Chuah, Prabhu R. V. Shankar
Format:	Article
Language:	English
Published:	Wiley 2023-08-01
Series:	Health Care Science
Subjects:	binary classification cancer machine learning pipeline VTE
Online Access:	https://doi.org/10.1002/hcs2.55

_version_	1827861668641636352
author	Samir Khan Townsley Debraj Basu Jayneel Vora Ted Wun Chen‐Nee Chuah Prabhu R. V. Shankar
author_facet	Samir Khan Townsley Debraj Basu Jayneel Vora Ted Wun Chen‐Nee Chuah Prabhu R. V. Shankar
author_sort	Samir Khan Townsley
collection	DOAJ
description	Abstract Background The association between cancer and venous thromboembolism (VTE) is well‐established with cancer patients accounting for approximately 20% of all VTE incidents. In this paper, we have performed a comparison of machine learning (ML) methods to traditional clinical scoring models for predicting the occurrence of VTE in a cancer patient population, identified important features (clinical biomarkers) for ML model predictions, and examined how different approaches to reducing the number of features used in the model impact model performance. Methods We have developed an ML pipeline including three separate feature selection processes and applied it to routine patient care data from the electronic health records of 1910 cancer patients at the University of California Davis Medical Center. Results Our ML‐based prediction model achieved an area under the receiver operating characteristic curve of 0.778 ± 0.006 (mean ± SD) when trained on a set of 15 features. This result is comparable with the model performance when trained on all features in our feature pool [0.779 ± 0.006 (mean ± SD) with 29 features]. Our result surpasses the most validated clinical scoring system for VTE risk assessment in cancer patients by 16.1%. We additionally found cancer stage information to be a useful predictor after all performed feature selection processes despite not being used in existing score‐based approaches. Conclusion From these findings, we observe that ML can offer new insights and a significant improvement over the most validated clinical VTE risk scoring systems in cancer patients. The results of this study also allowed us to draw insight into our feature pool and identify the features that could have the most utility in the context of developing an efficient ML classifier. While a model trained on our entire feature pool of 29 features significantly outperformed the traditionally used clinical scoring system, we were able to achieve an equivalent performance using a subset of only 15 features through strategic feature selection methods. These results are encouraging for potential applications of ML to predicting cancer‐associated VTE in clinical settings such as in bedside decision support systems where feature availability may be limited.
first_indexed	2024-03-12T13:42:48Z
format	Article
id	doaj.art-1e6a40e63a724c609814631828474f6e
institution	Directory Open Access Journal
issn	2771-1757
language	English
last_indexed	2024-03-12T13:42:48Z
publishDate	2023-08-01
publisher	Wiley
record_format	Article
series	Health Care Science
spelling	doaj.art-1e6a40e63a724c609814631828474f6e2023-08-23T11:16:21ZengWileyHealth Care Science2771-17572023-08-012420522210.1002/hcs2.55Predicting venous thromboembolism (VTE) risk in cancer patients using machine learningSamir Khan Townsley0Debraj Basu1Jayneel Vora2Ted Wun3Chen‐Nee Chuah4Prabhu R. V. Shankar5Department of Electrical and Computer Engineering University of California Davis California USADepartment of Electrical and Computer Engineering University of California Davis California USADepartment of Computer Science University of California Davis California USASchool of Medicine, Davis Health University of California Sacramento California USADepartment of Electrical and Computer Engineering University of California Davis California USASchool of Medicine, Davis Health University of California Sacramento California USAAbstract Background The association between cancer and venous thromboembolism (VTE) is well‐established with cancer patients accounting for approximately 20% of all VTE incidents. In this paper, we have performed a comparison of machine learning (ML) methods to traditional clinical scoring models for predicting the occurrence of VTE in a cancer patient population, identified important features (clinical biomarkers) for ML model predictions, and examined how different approaches to reducing the number of features used in the model impact model performance. Methods We have developed an ML pipeline including three separate feature selection processes and applied it to routine patient care data from the electronic health records of 1910 cancer patients at the University of California Davis Medical Center. Results Our ML‐based prediction model achieved an area under the receiver operating characteristic curve of 0.778 ± 0.006 (mean ± SD) when trained on a set of 15 features. This result is comparable with the model performance when trained on all features in our feature pool [0.779 ± 0.006 (mean ± SD) with 29 features]. Our result surpasses the most validated clinical scoring system for VTE risk assessment in cancer patients by 16.1%. We additionally found cancer stage information to be a useful predictor after all performed feature selection processes despite not being used in existing score‐based approaches. Conclusion From these findings, we observe that ML can offer new insights and a significant improvement over the most validated clinical VTE risk scoring systems in cancer patients. The results of this study also allowed us to draw insight into our feature pool and identify the features that could have the most utility in the context of developing an efficient ML classifier. While a model trained on our entire feature pool of 29 features significantly outperformed the traditionally used clinical scoring system, we were able to achieve an equivalent performance using a subset of only 15 features through strategic feature selection methods. These results are encouraging for potential applications of ML to predicting cancer‐associated VTE in clinical settings such as in bedside decision support systems where feature availability may be limited.https://doi.org/10.1002/hcs2.55binary classificationcancermachine learning pipelineVTE
spellingShingle	Samir Khan Townsley Debraj Basu Jayneel Vora Ted Wun Chen‐Nee Chuah Prabhu R. V. Shankar Predicting venous thromboembolism (VTE) risk in cancer patients using machine learning Health Care Science binary classification cancer machine learning pipeline VTE
title	Predicting venous thromboembolism (VTE) risk in cancer patients using machine learning
title_full	Predicting venous thromboembolism (VTE) risk in cancer patients using machine learning
title_fullStr	Predicting venous thromboembolism (VTE) risk in cancer patients using machine learning
title_full_unstemmed	Predicting venous thromboembolism (VTE) risk in cancer patients using machine learning
title_short	Predicting venous thromboembolism (VTE) risk in cancer patients using machine learning
title_sort	predicting venous thromboembolism vte risk in cancer patients using machine learning
topic	binary classification cancer machine learning pipeline VTE
url	https://doi.org/10.1002/hcs2.55
work_keys_str_mv	AT samirkhantownsley predictingvenousthromboembolismvteriskincancerpatientsusingmachinelearning AT debrajbasu predictingvenousthromboembolismvteriskincancerpatientsusingmachinelearning AT jayneelvora predictingvenousthromboembolismvteriskincancerpatientsusingmachinelearning AT tedwun predictingvenousthromboembolismvteriskincancerpatientsusingmachinelearning AT chenneechuah predictingvenousthromboembolismvteriskincancerpatientsusingmachinelearning AT prabhurvshankar predictingvenousthromboembolismvteriskincancerpatientsusingmachinelearning

Predicting venous thromboembolism (VTE) risk in cancer patients using machine learning

Similar Items