Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks

Background and objective: The development of machine learning-based models that can be used for the prediction of severe diseases has been one of the main concerns of the scientific community. The current study seeks to expand a highly sophisticated tool, the Convolutional Neural Networks, making it...

Full description

Bibliographic Details
Main Authors: Anastasia Zompola, Aigli Korfiati, Konstantinos Theofilatos, Seferina Mavroudi
Format: Article
Language:English
Published: Elsevier 2023-11-01
Series:Heliyon
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2405844023083731
_version_ 1797429923094200320
author Anastasia Zompola
Aigli Korfiati
Konstantinos Theofilatos
Seferina Mavroudi
author_facet Anastasia Zompola
Aigli Korfiati
Konstantinos Theofilatos
Seferina Mavroudi
author_sort Anastasia Zompola
collection DOAJ
description Background and objective: The development of machine learning-based models that can be used for the prediction of severe diseases has been one of the main concerns of the scientific community. The current study seeks to expand a highly sophisticated tool, the Convolutional Neural Networks, making it applicable in multidimensional omics data classification problems and testing the newly introduced method on publicly available transcriptomics and proteomics data. Methods: In this study, we introduce Omics-CNN, a Convolutional Neural Network-based pipeline, which couples Convolutional Neural Networks with dimensionality reduction, preprocessing, clustering, and explainability techniques to make them suitable to build highly accurate and interpretable classification models from high-throughput omics data. The developed tool has the potential to classify patients depending on the expression of genetic and clinical factors and identify features that can act as diagnostic biomarkers. Regarding dimensionality reduction, univariate and multivariate techniques were explored and compared. Gradient Weighted Class Activation Mapping analysis was performed to determine the most important features in the classification of the samples after training the model. Results: The newly introduced pipeline was applied to one transcriptomics and one proteomics dataset for the identification of diagnostic models and biosignatures for Ischemic Stroke (IS) and COVID-19 infection, reporting highly accurate biosignatures with accuracies of 96 % and 95.41 %, respectively. Meanwhile, classification models based solely on a small part of attributes provided lower predictive accuracy, but identified compact transcript biosignature (KRT15, VPRBP, TNFRSF4, GORASP2) for Ischemic Stroke and protein biosignature (ADGRB3, VNN2, AGER, CIAPIN1) for Covid-19 infection diagnosis, respectively. Conclusions: Omics-CNN, overcame the inherent problems of applying Convolutional Neural Networks for the training diagnostic models with quantitative omics data, outperforming previous models of machine learning developed using the same datasets for Ischemic Stroke and Covid-19 infection diagnosis, determining the most contributing biomarkers for both diseases.
first_indexed 2024-03-09T09:21:21Z
format Article
id doaj.art-9b4fa4298b014d8a8ccfb666bcb1d431
institution Directory Open Access Journal
issn 2405-8440
language English
last_indexed 2024-03-09T09:21:21Z
publishDate 2023-11-01
publisher Elsevier
record_format Article
series Heliyon
spelling doaj.art-9b4fa4298b014d8a8ccfb666bcb1d4312023-12-02T07:01:21ZengElsevierHeliyon2405-84402023-11-01911e21165Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networksAnastasia Zompola0Aigli Korfiati1Konstantinos Theofilatos2Seferina Mavroudi3Department of Electrical and Computer Engineering, University of Patras, Patras, GreeceInSyBio PC, Patras Science Park, Patras, GreeceKing's British Heart Foundation Centre, Kings College London, United KingdomDepartment of Nursing, School of Rehabilitation Sciences, University of Patras, Patras, Greece; Corresponding author.Background and objective: The development of machine learning-based models that can be used for the prediction of severe diseases has been one of the main concerns of the scientific community. The current study seeks to expand a highly sophisticated tool, the Convolutional Neural Networks, making it applicable in multidimensional omics data classification problems and testing the newly introduced method on publicly available transcriptomics and proteomics data. Methods: In this study, we introduce Omics-CNN, a Convolutional Neural Network-based pipeline, which couples Convolutional Neural Networks with dimensionality reduction, preprocessing, clustering, and explainability techniques to make them suitable to build highly accurate and interpretable classification models from high-throughput omics data. The developed tool has the potential to classify patients depending on the expression of genetic and clinical factors and identify features that can act as diagnostic biomarkers. Regarding dimensionality reduction, univariate and multivariate techniques were explored and compared. Gradient Weighted Class Activation Mapping analysis was performed to determine the most important features in the classification of the samples after training the model. Results: The newly introduced pipeline was applied to one transcriptomics and one proteomics dataset for the identification of diagnostic models and biosignatures for Ischemic Stroke (IS) and COVID-19 infection, reporting highly accurate biosignatures with accuracies of 96 % and 95.41 %, respectively. Meanwhile, classification models based solely on a small part of attributes provided lower predictive accuracy, but identified compact transcript biosignature (KRT15, VPRBP, TNFRSF4, GORASP2) for Ischemic Stroke and protein biosignature (ADGRB3, VNN2, AGER, CIAPIN1) for Covid-19 infection diagnosis, respectively. Conclusions: Omics-CNN, overcame the inherent problems of applying Convolutional Neural Networks for the training diagnostic models with quantitative omics data, outperforming previous models of machine learning developed using the same datasets for Ischemic Stroke and Covid-19 infection diagnosis, determining the most contributing biomarkers for both diseases.http://www.sciencedirect.com/science/article/pii/S2405844023083731Convolutional neural networksTranscriptomicsPersonalized medicineCovid-19Ischemic stroke
spellingShingle Anastasia Zompola
Aigli Korfiati
Konstantinos Theofilatos
Seferina Mavroudi
Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks
Heliyon
Convolutional neural networks
Transcriptomics
Personalized medicine
Covid-19
Ischemic stroke
title Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks
title_full Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks
title_fullStr Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks
title_full_unstemmed Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks
title_short Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks
title_sort omics cnn a comprehensive pipeline for predictive analytics in quantitative omics using one dimensional convolutional neural networks
topic Convolutional neural networks
Transcriptomics
Personalized medicine
Covid-19
Ischemic stroke
url http://www.sciencedirect.com/science/article/pii/S2405844023083731
work_keys_str_mv AT anastasiazompola omicscnnacomprehensivepipelineforpredictiveanalyticsinquantitativeomicsusingonedimensionalconvolutionalneuralnetworks
AT aiglikorfiati omicscnnacomprehensivepipelineforpredictiveanalyticsinquantitativeomicsusingonedimensionalconvolutionalneuralnetworks
AT konstantinostheofilatos omicscnnacomprehensivepipelineforpredictiveanalyticsinquantitativeomicsusingonedimensionalconvolutionalneuralnetworks
AT seferinamavroudi omicscnnacomprehensivepipelineforpredictiveanalyticsinquantitativeomicsusingonedimensionalconvolutionalneuralnetworks