Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research

Aim: Method: This research presents a model combining machine learning (ML) techniques and eXplainable artificial intelligence (XAI) to predict breast cancer (BC) metastasis and reveal important genomic biomarkers in metastasis patients. Method: A total of 98 primary BC samples was analyzed, compris...

Full description

Bibliographic Details
Main Authors: Burak Yagin, Fatma Hilal Yagin, Cemil Colak, Feyza Inceoglu, Seifedine Kadry, Jungeun Kim
Format: Article
Language:English
Published: MDPI AG 2023-10-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/13/21/3314
_version_ 1827765759594463232
author Burak Yagin
Fatma Hilal Yagin
Cemil Colak
Feyza Inceoglu
Seifedine Kadry
Jungeun Kim
author_facet Burak Yagin
Fatma Hilal Yagin
Cemil Colak
Feyza Inceoglu
Seifedine Kadry
Jungeun Kim
author_sort Burak Yagin
collection DOAJ
description Aim: Method: This research presents a model combining machine learning (ML) techniques and eXplainable artificial intelligence (XAI) to predict breast cancer (BC) metastasis and reveal important genomic biomarkers in metastasis patients. Method: A total of 98 primary BC samples was analyzed, comprising 34 samples from patients who developed distant metastases within a 5-year follow-up period and 44 samples from patients who remained disease-free for at least 5 years after diagnosis. Genomic data were then subjected to biostatistical analysis, followed by the application of the elastic net feature selection method. This technique identified a restricted number of genomic biomarkers associated with BC metastasis. A light gradient boosting machine (LightGBM), categorical boosting (CatBoost), Extreme Gradient Boosting (XGBoost), Gradient Boosting Trees (GBT), and Ada boosting (AdaBoost) algorithms were utilized for prediction. To assess the models’ predictive abilities, the accuracy, F1 score, precision, recall, area under the ROC curve (AUC), and Brier score were calculated as performance evaluation metrics. To promote interpretability and overcome the “black box” problem of ML models, a SHapley Additive exPlanations (SHAP) method was employed. Results: The LightGBM model outperformed other models, yielding remarkable accuracy of 96% and an AUC of 99.3%. In addition to biostatistical evaluation, in XAI-based SHAP results, increased expression levels of TSPYL5, ATP5E, CA9, NUP210, SLC37A1, ARIH1, PSMD7, UBQLN1, PRAME, and UBE2T (<i>p</i> ≤ 0.05) were found to be associated with an increased incidence of BC metastasis. Finally, decreased levels of expression of CACTIN, TGFB3, SCUBE2, ARL4D, OR1F1, ALDH4A1, PHF1, and CROCC (<i>p</i> ≤ 0.05) genes were also determined to increase the risk of metastasis in BC. Conclusion: The findings of this study may prevent disease progression and metastases and potentially improve clinical outcomes by recommending customized treatment approaches for BC patients.
first_indexed 2024-03-11T11:31:27Z
format Article
id doaj.art-31188b19e7534b2ba13a2d52ea031486
institution Directory Open Access Journal
issn 2075-4418
language English
last_indexed 2024-03-11T11:31:27Z
publishDate 2023-10-01
publisher MDPI AG
record_format Article
series Diagnostics
spelling doaj.art-31188b19e7534b2ba13a2d52ea0314862023-11-10T15:00:56ZengMDPI AGDiagnostics2075-44182023-10-011321331410.3390/diagnostics13213314Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer ResearchBurak Yagin0Fatma Hilal Yagin1Cemil Colak2Feyza Inceoglu3Seifedine Kadry4Jungeun Kim5Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, TurkeyDepartment of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, TurkeyDepartment of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya 44280, TurkeyDepartment of Biostatistics, Faculty of Medicine, Malatya Turgut Ozal University, Malatya 44090, TurkeyDepartment of applied Data science, Noroff University College, 4612 Kristiansand, NorwayDepartment of Software, Kongju National University, Cheonan 31080, Republic of KoreaAim: Method: This research presents a model combining machine learning (ML) techniques and eXplainable artificial intelligence (XAI) to predict breast cancer (BC) metastasis and reveal important genomic biomarkers in metastasis patients. Method: A total of 98 primary BC samples was analyzed, comprising 34 samples from patients who developed distant metastases within a 5-year follow-up period and 44 samples from patients who remained disease-free for at least 5 years after diagnosis. Genomic data were then subjected to biostatistical analysis, followed by the application of the elastic net feature selection method. This technique identified a restricted number of genomic biomarkers associated with BC metastasis. A light gradient boosting machine (LightGBM), categorical boosting (CatBoost), Extreme Gradient Boosting (XGBoost), Gradient Boosting Trees (GBT), and Ada boosting (AdaBoost) algorithms were utilized for prediction. To assess the models’ predictive abilities, the accuracy, F1 score, precision, recall, area under the ROC curve (AUC), and Brier score were calculated as performance evaluation metrics. To promote interpretability and overcome the “black box” problem of ML models, a SHapley Additive exPlanations (SHAP) method was employed. Results: The LightGBM model outperformed other models, yielding remarkable accuracy of 96% and an AUC of 99.3%. In addition to biostatistical evaluation, in XAI-based SHAP results, increased expression levels of TSPYL5, ATP5E, CA9, NUP210, SLC37A1, ARIH1, PSMD7, UBQLN1, PRAME, and UBE2T (<i>p</i> ≤ 0.05) were found to be associated with an increased incidence of BC metastasis. Finally, decreased levels of expression of CACTIN, TGFB3, SCUBE2, ARL4D, OR1F1, ALDH4A1, PHF1, and CROCC (<i>p</i> ≤ 0.05) genes were also determined to increase the risk of metastasis in BC. Conclusion: The findings of this study may prevent disease progression and metastases and potentially improve clinical outcomes by recommending customized treatment approaches for BC patients.https://www.mdpi.com/2075-4418/13/21/3314breast cancer metastasismachine learning algorithmsgenomic biomarkerseXplainable artificial intelligenceSHAP
spellingShingle Burak Yagin
Fatma Hilal Yagin
Cemil Colak
Feyza Inceoglu
Seifedine Kadry
Jungeun Kim
Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research
Diagnostics
breast cancer metastasis
machine learning algorithms
genomic biomarkers
eXplainable artificial intelligence
SHAP
title Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research
title_full Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research
title_fullStr Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research
title_full_unstemmed Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research
title_short Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research
title_sort cancer metastasis prediction and genomic biomarker identification through machine learning and explainable artificial intelligence in breast cancer research
topic breast cancer metastasis
machine learning algorithms
genomic biomarkers
eXplainable artificial intelligence
SHAP
url https://www.mdpi.com/2075-4418/13/21/3314
work_keys_str_mv AT burakyagin cancermetastasispredictionandgenomicbiomarkeridentificationthroughmachinelearningandexplainableartificialintelligenceinbreastcancerresearch
AT fatmahilalyagin cancermetastasispredictionandgenomicbiomarkeridentificationthroughmachinelearningandexplainableartificialintelligenceinbreastcancerresearch
AT cemilcolak cancermetastasispredictionandgenomicbiomarkeridentificationthroughmachinelearningandexplainableartificialintelligenceinbreastcancerresearch
AT feyzainceoglu cancermetastasispredictionandgenomicbiomarkeridentificationthroughmachinelearningandexplainableartificialintelligenceinbreastcancerresearch
AT seifedinekadry cancermetastasispredictionandgenomicbiomarkeridentificationthroughmachinelearningandexplainableartificialintelligenceinbreastcancerresearch
AT jungeunkim cancermetastasispredictionandgenomicbiomarkeridentificationthroughmachinelearningandexplainableartificialintelligenceinbreastcancerresearch