Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer

ObjectivesThe aim of this study is to determine whether the clinical features including blood markers can establish an explainable machine learning model to predict epidermal growth factor receptor (EGFR) mutation in lung cancer.MethodsWe retrospectively analyzed 7,413 patients with lung adenocarcin...

Full description

Bibliographic Details
Main Authors: Ruiyuan Yang, Xingyu Xiong, Haoyu Wang, Weimin Li
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-06-01
Series:Frontiers in Oncology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fonc.2022.924144/full
_version_ 1811241438938136576
author Ruiyuan Yang
Xingyu Xiong
Haoyu Wang
Weimin Li
Weimin Li
Weimin Li
Weimin Li
author_facet Ruiyuan Yang
Xingyu Xiong
Haoyu Wang
Weimin Li
Weimin Li
Weimin Li
Weimin Li
author_sort Ruiyuan Yang
collection DOAJ
description ObjectivesThe aim of this study is to determine whether the clinical features including blood markers can establish an explainable machine learning model to predict epidermal growth factor receptor (EGFR) mutation in lung cancer.MethodsWe retrospectively analyzed 7,413 patients with lung adenocarcinoma (LA) diagnosed by gene sequencing in West China Hospital of the Sichuan University from April 2015 to June 2019. The machine learning algorithms (MLAs) included logistic regression (LR), random forest (RF), LightGBM, support vector machine (SVM), multi-layer perceptron (MLP), extreme gradient boosting (XGBoost), and decision tree (DT). Demographic characteristics, personal history, and blood markers were taken into. The area under the receiver operating characteristic curve (AUC) and SHapley Additive exPlanation (SHAP) value were used to explain the prediction models.ResultsOf the 7,413 patients with LA (47.6%), 3,527 were identified with EGFR mutation; RF achieved greatest performance in predicting EGFR mutation AUC [0.771, 95% confidence interval (CI): 0.770, 0.772], which was like XGBoost with AUC (0.740, 95% CI: 0.739, 0.741). The five most influential features were smoking consumption, sex, cholesterol, age, and albumin globulin ratio. The SHAP summary and dependence plot have been used to explain the affection of the 12 features to this model and how a single feature influences the output, respectively.ConclusionWe established EGFR mutation prediction models by MLAs and revealed that the RF was preferred, AUC (0.771, 95% CI: 0.770, 0.772), which was better than the traditional models. Therefore, the artificial intelligence–based MLA predicting model may become a practical tool to guide in diagnosis and therapy of LA.
first_indexed 2024-04-12T13:36:11Z
format Article
id doaj.art-15a66990b2b345f4b1d41ae45a73ea4c
institution Directory Open Access Journal
issn 2234-943X
language English
last_indexed 2024-04-12T13:36:11Z
publishDate 2022-06-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Oncology
spelling doaj.art-15a66990b2b345f4b1d41ae45a73ea4c2022-12-22T03:30:59ZengFrontiers Media S.A.Frontiers in Oncology2234-943X2022-06-011210.3389/fonc.2022.924144924144Explainable Machine Learning Model to Prediction EGFR Mutation in Lung CancerRuiyuan Yang0Xingyu Xiong1Haoyu Wang2Weimin Li3Weimin Li4Weimin Li5Weimin Li6Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, ChinaDepartment of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, ChinaDepartment of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, ChinaDepartment of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, ChinaInstitute of Respiratory Health Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu, ChinaPrecision Medicine Center, Precision Medicine Key Laboratory of Sichuan Province, West China Hospital, Sichuan University, Chengdu, ChinaThe Research Units of West China, Chinses Academy of Medical Sciences, West China Hospital, Chengdu, ChinaObjectivesThe aim of this study is to determine whether the clinical features including blood markers can establish an explainable machine learning model to predict epidermal growth factor receptor (EGFR) mutation in lung cancer.MethodsWe retrospectively analyzed 7,413 patients with lung adenocarcinoma (LA) diagnosed by gene sequencing in West China Hospital of the Sichuan University from April 2015 to June 2019. The machine learning algorithms (MLAs) included logistic regression (LR), random forest (RF), LightGBM, support vector machine (SVM), multi-layer perceptron (MLP), extreme gradient boosting (XGBoost), and decision tree (DT). Demographic characteristics, personal history, and blood markers were taken into. The area under the receiver operating characteristic curve (AUC) and SHapley Additive exPlanation (SHAP) value were used to explain the prediction models.ResultsOf the 7,413 patients with LA (47.6%), 3,527 were identified with EGFR mutation; RF achieved greatest performance in predicting EGFR mutation AUC [0.771, 95% confidence interval (CI): 0.770, 0.772], which was like XGBoost with AUC (0.740, 95% CI: 0.739, 0.741). The five most influential features were smoking consumption, sex, cholesterol, age, and albumin globulin ratio. The SHAP summary and dependence plot have been used to explain the affection of the 12 features to this model and how a single feature influences the output, respectively.ConclusionWe established EGFR mutation prediction models by MLAs and revealed that the RF was preferred, AUC (0.771, 95% CI: 0.770, 0.772), which was better than the traditional models. Therefore, the artificial intelligence–based MLA predicting model may become a practical tool to guide in diagnosis and therapy of LA.https://www.frontiersin.org/articles/10.3389/fonc.2022.924144/fullEGFR mutationlung cancerpredictionmachine learningSHAP value
spellingShingle Ruiyuan Yang
Xingyu Xiong
Haoyu Wang
Weimin Li
Weimin Li
Weimin Li
Weimin Li
Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer
Frontiers in Oncology
EGFR mutation
lung cancer
prediction
machine learning
SHAP value
title Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer
title_full Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer
title_fullStr Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer
title_full_unstemmed Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer
title_short Explainable Machine Learning Model to Prediction EGFR Mutation in Lung Cancer
title_sort explainable machine learning model to prediction egfr mutation in lung cancer
topic EGFR mutation
lung cancer
prediction
machine learning
SHAP value
url https://www.frontiersin.org/articles/10.3389/fonc.2022.924144/full
work_keys_str_mv AT ruiyuanyang explainablemachinelearningmodeltopredictionegfrmutationinlungcancer
AT xingyuxiong explainablemachinelearningmodeltopredictionegfrmutationinlungcancer
AT haoyuwang explainablemachinelearningmodeltopredictionegfrmutationinlungcancer
AT weiminli explainablemachinelearningmodeltopredictionegfrmutationinlungcancer
AT weiminli explainablemachinelearningmodeltopredictionegfrmutationinlungcancer
AT weiminli explainablemachinelearningmodeltopredictionegfrmutationinlungcancer
AT weiminli explainablemachinelearningmodeltopredictionegfrmutationinlungcancer