Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction

BackgroundMolecular radiation biomarkers are an emerging tool in radiation research with applications for cancer radiotherapy, radiation risk assessment, and even human space travel. However, biomarker screening in genome-wide expression datasets using conventional tools is time-consuming and underl...

Full description

Bibliographic Details
Main Authors: Björn Andersson, Britta Langen, Peidi Liu, Marcela Dávila López
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-05-01
Series:Frontiers in Oncology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fonc.2023.1156009/full
_version_ 1797826830891220992
author Björn Andersson
Britta Langen
Peidi Liu
Marcela Dávila López
author_facet Björn Andersson
Britta Langen
Peidi Liu
Marcela Dávila López
author_sort Björn Andersson
collection DOAJ
description BackgroundMolecular radiation biomarkers are an emerging tool in radiation research with applications for cancer radiotherapy, radiation risk assessment, and even human space travel. However, biomarker screening in genome-wide expression datasets using conventional tools is time-consuming and underlies analyst (human) bias. Machine Learning (ML) methods can improve the sensitivity and specificity of biomarker identification, increase analytical speed, and avoid multicollinearity and human bias.AimTo develop a resource-efficient ML framework for radiation biomarker discovery using gene expression data from irradiated normal tissues. Further, to identify biomarker panels predicting radiation dose with tissue specificity.MethodsA strategic search in the Gene Expression Omnibus database identified a transcriptomic dataset (GSE44762) for normal tissues radiation responses (murine kidney cortex and medulla) suited for biomarker discovery using an ML approach. The dataset was pre-processed in R and separated into train and test data subsets. High computational cost of Genetic Algorithm/k-Nearest Neighbor (GA/KNN) mandated optimization and 13 ML models were tested using the caret package in R. Biomarker performance was evaluated and visualized via Principal Component Analysis (PCA) and dose regression. The novelty of ML-identified biomarker panels was evaluated by literature search.ResultsCaret-based feature selection and ML methods vastly improved processing time over the GA approach. The KNN method yielded overall best performance values on train and test data and was implemented into the framework. The top-ranking genes were Cdkn1a, Gria3, Mdm2 and Plk2 in cortex, and Brf2, Ccng1, Cdkn1a, Ddit4l, and Gria3 in medulla. These candidates successfully categorized dose groups and tissues in PCA. Regression analysis showed that correlation between predicted and true dose was high with R2 of 0.97 and 0.99 for cortex and medulla, respectively.ConclusionThe caret framework is a powerful tool for radiation biomarker discovery optimizing performance with resource-efficiency for broad implementation in the field. The KNN-based approach identified Brf2, Ddit4l, and Gria3 mRNA as novel candidates that have been uncharacterized as radiation biomarkers to date. The biomarker panel showed good performance in dose and tissue separation and dose regression. Further training with larger cohorts is warranted to improve accuracy, especially for lower doses.
first_indexed 2024-04-09T12:38:32Z
format Article
id doaj.art-4b70692530994e46bf2fc9d37004417a
institution Directory Open Access Journal
issn 2234-943X
language English
last_indexed 2024-04-09T12:38:32Z
publishDate 2023-05-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Oncology
spelling doaj.art-4b70692530994e46bf2fc9d37004417a2023-05-15T04:59:17ZengFrontiers Media S.A.Frontiers in Oncology2234-943X2023-05-011310.3389/fonc.2023.11560091156009Development of a machine learning framework for radiation biomarker discovery and absorbed dose predictionBjörn Andersson0Britta Langen1Peidi Liu2Marcela Dávila López3Bioinformatics Core Facility, The Sahlgrenska Academy, University of Gothenburg, Gothenburg, SwedenDepartment of Radiation Oncology, Division of Molecular Radiation Biology, University of Texas (UT) Southwestern Medical Center, Dallas, TX, United StatesBioinformatics Core Facility, The Sahlgrenska Academy, University of Gothenburg, Gothenburg, SwedenBioinformatics Core Facility, The Sahlgrenska Academy, University of Gothenburg, Gothenburg, SwedenBackgroundMolecular radiation biomarkers are an emerging tool in radiation research with applications for cancer radiotherapy, radiation risk assessment, and even human space travel. However, biomarker screening in genome-wide expression datasets using conventional tools is time-consuming and underlies analyst (human) bias. Machine Learning (ML) methods can improve the sensitivity and specificity of biomarker identification, increase analytical speed, and avoid multicollinearity and human bias.AimTo develop a resource-efficient ML framework for radiation biomarker discovery using gene expression data from irradiated normal tissues. Further, to identify biomarker panels predicting radiation dose with tissue specificity.MethodsA strategic search in the Gene Expression Omnibus database identified a transcriptomic dataset (GSE44762) for normal tissues radiation responses (murine kidney cortex and medulla) suited for biomarker discovery using an ML approach. The dataset was pre-processed in R and separated into train and test data subsets. High computational cost of Genetic Algorithm/k-Nearest Neighbor (GA/KNN) mandated optimization and 13 ML models were tested using the caret package in R. Biomarker performance was evaluated and visualized via Principal Component Analysis (PCA) and dose regression. The novelty of ML-identified biomarker panels was evaluated by literature search.ResultsCaret-based feature selection and ML methods vastly improved processing time over the GA approach. The KNN method yielded overall best performance values on train and test data and was implemented into the framework. The top-ranking genes were Cdkn1a, Gria3, Mdm2 and Plk2 in cortex, and Brf2, Ccng1, Cdkn1a, Ddit4l, and Gria3 in medulla. These candidates successfully categorized dose groups and tissues in PCA. Regression analysis showed that correlation between predicted and true dose was high with R2 of 0.97 and 0.99 for cortex and medulla, respectively.ConclusionThe caret framework is a powerful tool for radiation biomarker discovery optimizing performance with resource-efficiency for broad implementation in the field. The KNN-based approach identified Brf2, Ddit4l, and Gria3 mRNA as novel candidates that have been uncharacterized as radiation biomarkers to date. The biomarker panel showed good performance in dose and tissue separation and dose regression. Further training with larger cohorts is warranted to improve accuracy, especially for lower doses.https://www.frontiersin.org/articles/10.3389/fonc.2023.1156009/fullionizing radiationradionuclidesabsorbed dosebiomarkerstranscriptomicskNN (k nearest neighbor)
spellingShingle Björn Andersson
Britta Langen
Peidi Liu
Marcela Dávila López
Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction
Frontiers in Oncology
ionizing radiation
radionuclides
absorbed dose
biomarkers
transcriptomics
kNN (k nearest neighbor)
title Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction
title_full Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction
title_fullStr Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction
title_full_unstemmed Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction
title_short Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction
title_sort development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction
topic ionizing radiation
radionuclides
absorbed dose
biomarkers
transcriptomics
kNN (k nearest neighbor)
url https://www.frontiersin.org/articles/10.3389/fonc.2023.1156009/full
work_keys_str_mv AT bjornandersson developmentofamachinelearningframeworkforradiationbiomarkerdiscoveryandabsorbeddoseprediction
AT brittalangen developmentofamachinelearningframeworkforradiationbiomarkerdiscoveryandabsorbeddoseprediction
AT peidiliu developmentofamachinelearningframeworkforradiationbiomarkerdiscoveryandabsorbeddoseprediction
AT marceladavilalopez developmentofamachinelearningframeworkforradiationbiomarkerdiscoveryandabsorbeddoseprediction