Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction
BackgroundMolecular radiation biomarkers are an emerging tool in radiation research with applications for cancer radiotherapy, radiation risk assessment, and even human space travel. However, biomarker screening in genome-wide expression datasets using conventional tools is time-consuming and underl...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2023-05-01
|
Series: | Frontiers in Oncology |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fonc.2023.1156009/full |
_version_ | 1797826830891220992 |
---|---|
author | Björn Andersson Britta Langen Peidi Liu Marcela Dávila López |
author_facet | Björn Andersson Britta Langen Peidi Liu Marcela Dávila López |
author_sort | Björn Andersson |
collection | DOAJ |
description | BackgroundMolecular radiation biomarkers are an emerging tool in radiation research with applications for cancer radiotherapy, radiation risk assessment, and even human space travel. However, biomarker screening in genome-wide expression datasets using conventional tools is time-consuming and underlies analyst (human) bias. Machine Learning (ML) methods can improve the sensitivity and specificity of biomarker identification, increase analytical speed, and avoid multicollinearity and human bias.AimTo develop a resource-efficient ML framework for radiation biomarker discovery using gene expression data from irradiated normal tissues. Further, to identify biomarker panels predicting radiation dose with tissue specificity.MethodsA strategic search in the Gene Expression Omnibus database identified a transcriptomic dataset (GSE44762) for normal tissues radiation responses (murine kidney cortex and medulla) suited for biomarker discovery using an ML approach. The dataset was pre-processed in R and separated into train and test data subsets. High computational cost of Genetic Algorithm/k-Nearest Neighbor (GA/KNN) mandated optimization and 13 ML models were tested using the caret package in R. Biomarker performance was evaluated and visualized via Principal Component Analysis (PCA) and dose regression. The novelty of ML-identified biomarker panels was evaluated by literature search.ResultsCaret-based feature selection and ML methods vastly improved processing time over the GA approach. The KNN method yielded overall best performance values on train and test data and was implemented into the framework. The top-ranking genes were Cdkn1a, Gria3, Mdm2 and Plk2 in cortex, and Brf2, Ccng1, Cdkn1a, Ddit4l, and Gria3 in medulla. These candidates successfully categorized dose groups and tissues in PCA. Regression analysis showed that correlation between predicted and true dose was high with R2 of 0.97 and 0.99 for cortex and medulla, respectively.ConclusionThe caret framework is a powerful tool for radiation biomarker discovery optimizing performance with resource-efficiency for broad implementation in the field. The KNN-based approach identified Brf2, Ddit4l, and Gria3 mRNA as novel candidates that have been uncharacterized as radiation biomarkers to date. The biomarker panel showed good performance in dose and tissue separation and dose regression. Further training with larger cohorts is warranted to improve accuracy, especially for lower doses. |
first_indexed | 2024-04-09T12:38:32Z |
format | Article |
id | doaj.art-4b70692530994e46bf2fc9d37004417a |
institution | Directory Open Access Journal |
issn | 2234-943X |
language | English |
last_indexed | 2024-04-09T12:38:32Z |
publishDate | 2023-05-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Oncology |
spelling | doaj.art-4b70692530994e46bf2fc9d37004417a2023-05-15T04:59:17ZengFrontiers Media S.A.Frontiers in Oncology2234-943X2023-05-011310.3389/fonc.2023.11560091156009Development of a machine learning framework for radiation biomarker discovery and absorbed dose predictionBjörn Andersson0Britta Langen1Peidi Liu2Marcela Dávila López3Bioinformatics Core Facility, The Sahlgrenska Academy, University of Gothenburg, Gothenburg, SwedenDepartment of Radiation Oncology, Division of Molecular Radiation Biology, University of Texas (UT) Southwestern Medical Center, Dallas, TX, United StatesBioinformatics Core Facility, The Sahlgrenska Academy, University of Gothenburg, Gothenburg, SwedenBioinformatics Core Facility, The Sahlgrenska Academy, University of Gothenburg, Gothenburg, SwedenBackgroundMolecular radiation biomarkers are an emerging tool in radiation research with applications for cancer radiotherapy, radiation risk assessment, and even human space travel. However, biomarker screening in genome-wide expression datasets using conventional tools is time-consuming and underlies analyst (human) bias. Machine Learning (ML) methods can improve the sensitivity and specificity of biomarker identification, increase analytical speed, and avoid multicollinearity and human bias.AimTo develop a resource-efficient ML framework for radiation biomarker discovery using gene expression data from irradiated normal tissues. Further, to identify biomarker panels predicting radiation dose with tissue specificity.MethodsA strategic search in the Gene Expression Omnibus database identified a transcriptomic dataset (GSE44762) for normal tissues radiation responses (murine kidney cortex and medulla) suited for biomarker discovery using an ML approach. The dataset was pre-processed in R and separated into train and test data subsets. High computational cost of Genetic Algorithm/k-Nearest Neighbor (GA/KNN) mandated optimization and 13 ML models were tested using the caret package in R. Biomarker performance was evaluated and visualized via Principal Component Analysis (PCA) and dose regression. The novelty of ML-identified biomarker panels was evaluated by literature search.ResultsCaret-based feature selection and ML methods vastly improved processing time over the GA approach. The KNN method yielded overall best performance values on train and test data and was implemented into the framework. The top-ranking genes were Cdkn1a, Gria3, Mdm2 and Plk2 in cortex, and Brf2, Ccng1, Cdkn1a, Ddit4l, and Gria3 in medulla. These candidates successfully categorized dose groups and tissues in PCA. Regression analysis showed that correlation between predicted and true dose was high with R2 of 0.97 and 0.99 for cortex and medulla, respectively.ConclusionThe caret framework is a powerful tool for radiation biomarker discovery optimizing performance with resource-efficiency for broad implementation in the field. The KNN-based approach identified Brf2, Ddit4l, and Gria3 mRNA as novel candidates that have been uncharacterized as radiation biomarkers to date. The biomarker panel showed good performance in dose and tissue separation and dose regression. Further training with larger cohorts is warranted to improve accuracy, especially for lower doses.https://www.frontiersin.org/articles/10.3389/fonc.2023.1156009/fullionizing radiationradionuclidesabsorbed dosebiomarkerstranscriptomicskNN (k nearest neighbor) |
spellingShingle | Björn Andersson Britta Langen Peidi Liu Marcela Dávila López Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction Frontiers in Oncology ionizing radiation radionuclides absorbed dose biomarkers transcriptomics kNN (k nearest neighbor) |
title | Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction |
title_full | Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction |
title_fullStr | Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction |
title_full_unstemmed | Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction |
title_short | Development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction |
title_sort | development of a machine learning framework for radiation biomarker discovery and absorbed dose prediction |
topic | ionizing radiation radionuclides absorbed dose biomarkers transcriptomics kNN (k nearest neighbor) |
url | https://www.frontiersin.org/articles/10.3389/fonc.2023.1156009/full |
work_keys_str_mv | AT bjornandersson developmentofamachinelearningframeworkforradiationbiomarkerdiscoveryandabsorbeddoseprediction AT brittalangen developmentofamachinelearningframeworkforradiationbiomarkerdiscoveryandabsorbeddoseprediction AT peidiliu developmentofamachinelearningframeworkforradiationbiomarkerdiscoveryandabsorbeddoseprediction AT marceladavilalopez developmentofamachinelearningframeworkforradiationbiomarkerdiscoveryandabsorbeddoseprediction |