Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks
Background and objective: The development of machine learning-based models that can be used for the prediction of severe diseases has been one of the main concerns of the scientific community. The current study seeks to expand a highly sophisticated tool, the Convolutional Neural Networks, making it...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-11-01
|
Series: | Heliyon |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2405844023083731 |
_version_ | 1797429923094200320 |
---|---|
author | Anastasia Zompola Aigli Korfiati Konstantinos Theofilatos Seferina Mavroudi |
author_facet | Anastasia Zompola Aigli Korfiati Konstantinos Theofilatos Seferina Mavroudi |
author_sort | Anastasia Zompola |
collection | DOAJ |
description | Background and objective: The development of machine learning-based models that can be used for the prediction of severe diseases has been one of the main concerns of the scientific community. The current study seeks to expand a highly sophisticated tool, the Convolutional Neural Networks, making it applicable in multidimensional omics data classification problems and testing the newly introduced method on publicly available transcriptomics and proteomics data. Methods: In this study, we introduce Omics-CNN, a Convolutional Neural Network-based pipeline, which couples Convolutional Neural Networks with dimensionality reduction, preprocessing, clustering, and explainability techniques to make them suitable to build highly accurate and interpretable classification models from high-throughput omics data. The developed tool has the potential to classify patients depending on the expression of genetic and clinical factors and identify features that can act as diagnostic biomarkers. Regarding dimensionality reduction, univariate and multivariate techniques were explored and compared. Gradient Weighted Class Activation Mapping analysis was performed to determine the most important features in the classification of the samples after training the model. Results: The newly introduced pipeline was applied to one transcriptomics and one proteomics dataset for the identification of diagnostic models and biosignatures for Ischemic Stroke (IS) and COVID-19 infection, reporting highly accurate biosignatures with accuracies of 96 % and 95.41 %, respectively. Meanwhile, classification models based solely on a small part of attributes provided lower predictive accuracy, but identified compact transcript biosignature (KRT15, VPRBP, TNFRSF4, GORASP2) for Ischemic Stroke and protein biosignature (ADGRB3, VNN2, AGER, CIAPIN1) for Covid-19 infection diagnosis, respectively. Conclusions: Omics-CNN, overcame the inherent problems of applying Convolutional Neural Networks for the training diagnostic models with quantitative omics data, outperforming previous models of machine learning developed using the same datasets for Ischemic Stroke and Covid-19 infection diagnosis, determining the most contributing biomarkers for both diseases. |
first_indexed | 2024-03-09T09:21:21Z |
format | Article |
id | doaj.art-9b4fa4298b014d8a8ccfb666bcb1d431 |
institution | Directory Open Access Journal |
issn | 2405-8440 |
language | English |
last_indexed | 2024-03-09T09:21:21Z |
publishDate | 2023-11-01 |
publisher | Elsevier |
record_format | Article |
series | Heliyon |
spelling | doaj.art-9b4fa4298b014d8a8ccfb666bcb1d4312023-12-02T07:01:21ZengElsevierHeliyon2405-84402023-11-01911e21165Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networksAnastasia Zompola0Aigli Korfiati1Konstantinos Theofilatos2Seferina Mavroudi3Department of Electrical and Computer Engineering, University of Patras, Patras, GreeceInSyBio PC, Patras Science Park, Patras, GreeceKing's British Heart Foundation Centre, Kings College London, United KingdomDepartment of Nursing, School of Rehabilitation Sciences, University of Patras, Patras, Greece; Corresponding author.Background and objective: The development of machine learning-based models that can be used for the prediction of severe diseases has been one of the main concerns of the scientific community. The current study seeks to expand a highly sophisticated tool, the Convolutional Neural Networks, making it applicable in multidimensional omics data classification problems and testing the newly introduced method on publicly available transcriptomics and proteomics data. Methods: In this study, we introduce Omics-CNN, a Convolutional Neural Network-based pipeline, which couples Convolutional Neural Networks with dimensionality reduction, preprocessing, clustering, and explainability techniques to make them suitable to build highly accurate and interpretable classification models from high-throughput omics data. The developed tool has the potential to classify patients depending on the expression of genetic and clinical factors and identify features that can act as diagnostic biomarkers. Regarding dimensionality reduction, univariate and multivariate techniques were explored and compared. Gradient Weighted Class Activation Mapping analysis was performed to determine the most important features in the classification of the samples after training the model. Results: The newly introduced pipeline was applied to one transcriptomics and one proteomics dataset for the identification of diagnostic models and biosignatures for Ischemic Stroke (IS) and COVID-19 infection, reporting highly accurate biosignatures with accuracies of 96 % and 95.41 %, respectively. Meanwhile, classification models based solely on a small part of attributes provided lower predictive accuracy, but identified compact transcript biosignature (KRT15, VPRBP, TNFRSF4, GORASP2) for Ischemic Stroke and protein biosignature (ADGRB3, VNN2, AGER, CIAPIN1) for Covid-19 infection diagnosis, respectively. Conclusions: Omics-CNN, overcame the inherent problems of applying Convolutional Neural Networks for the training diagnostic models with quantitative omics data, outperforming previous models of machine learning developed using the same datasets for Ischemic Stroke and Covid-19 infection diagnosis, determining the most contributing biomarkers for both diseases.http://www.sciencedirect.com/science/article/pii/S2405844023083731Convolutional neural networksTranscriptomicsPersonalized medicineCovid-19Ischemic stroke |
spellingShingle | Anastasia Zompola Aigli Korfiati Konstantinos Theofilatos Seferina Mavroudi Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks Heliyon Convolutional neural networks Transcriptomics Personalized medicine Covid-19 Ischemic stroke |
title | Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks |
title_full | Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks |
title_fullStr | Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks |
title_full_unstemmed | Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks |
title_short | Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks |
title_sort | omics cnn a comprehensive pipeline for predictive analytics in quantitative omics using one dimensional convolutional neural networks |
topic | Convolutional neural networks Transcriptomics Personalized medicine Covid-19 Ischemic stroke |
url | http://www.sciencedirect.com/science/article/pii/S2405844023083731 |
work_keys_str_mv | AT anastasiazompola omicscnnacomprehensivepipelineforpredictiveanalyticsinquantitativeomicsusingonedimensionalconvolutionalneuralnetworks AT aiglikorfiati omicscnnacomprehensivepipelineforpredictiveanalyticsinquantitativeomicsusingonedimensionalconvolutionalneuralnetworks AT konstantinostheofilatos omicscnnacomprehensivepipelineforpredictiveanalyticsinquantitativeomicsusingonedimensionalconvolutionalneuralnetworks AT seferinamavroudi omicscnnacomprehensivepipelineforpredictiveanalyticsinquantitativeomicsusingonedimensionalconvolutionalneuralnetworks |