A comprehensive analysis of gene expression profiling data in COVID-19 patients for discovery of specific and differential blood biomarker signatures

Abstract COVID-19 is a newly recognized illness with a predominantly respiratory presentation. Although initial analyses have identified groups of candidate gene biomarkers for the diagnosis of COVID-19, they have yet to identify clinically applicable biomarkers, so we need disease-specific diagnost...

Full description

Bibliographic Details
Main Authors: Maryam Momeni, Maryam Rashidifar, Farinaz Hosseini Balam, Amir Roointan, Alieh Gholaminejad
Format: Article
Language:English
Published: Nature Portfolio 2023-04-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-32268-2
_version_ 1797850072892833792
author Maryam Momeni
Maryam Rashidifar
Farinaz Hosseini Balam
Amir Roointan
Alieh Gholaminejad
author_facet Maryam Momeni
Maryam Rashidifar
Farinaz Hosseini Balam
Amir Roointan
Alieh Gholaminejad
author_sort Maryam Momeni
collection DOAJ
description Abstract COVID-19 is a newly recognized illness with a predominantly respiratory presentation. Although initial analyses have identified groups of candidate gene biomarkers for the diagnosis of COVID-19, they have yet to identify clinically applicable biomarkers, so we need disease-specific diagnostic biomarkers in biofluid and differential diagnosis in comparison with other infectious diseases. This can further increase knowledge of pathogenesis and help guide treatment. Eight transcriptomic profiles of COVID-19 infected versus control samples from peripheral blood (PB), lung tissue, nasopharyngeal swab and bronchoalveolar lavage fluid (BALF) were considered. In order to find COVID-19 potential Specific Blood Differentially expressed genes (SpeBDs), we implemented a strategy based on finding shared pathways of peripheral blood and the most involved tissues in COVID-19 patients. This step was performed to filter blood DEGs with a role in the shared pathways. Furthermore, nine datasets of the three types of Influenza (H1N1, H3N2, and B) were used for the second step. Potential Differential Blood DEGs of COVID-19 versus Influenza (DifBDs) were found by extracting DEGs involved in only enriched pathways by SpeBDs and not by Influenza DEGs. Then in the third step, a machine learning method (a wrapper feature selection approach supervised by four classifiers of k-NN, Random Forest, SVM, Naïve Bayes) was utilized to narrow down the number of SpeBDs and DifBDs and find the most predictive combination of them to select COVID-19 potential Specific Blood Biomarker Signatures (SpeBBSs) and COVID-19 versus influenza Differential Blood Biomarker Signatures (DifBBSs), respectively. After that, models based on SpeBBSs and DifBBSs and the corresponding algorithms were built to assess their performance on an external dataset. Among all the extracted DEGs from the PB dataset (from common PB pathways with BALF, Lung and Swab), 108 unique SpeBD were obtained. Feature selection using Random Forest outperformed its counterparts and selected IGKC, IGLV3-16 and SRP9 among SpeBDs as SpeBBSs. Validation of the constructed model based on these genes and Random Forest on an external dataset resulted in 93.09% Accuracy. Eighty-three pathways enriched by SpeBDs and not by any of the influenza strains were identified, including 87 DifBDs. Using feature selection by Naive Bayes classifier on DifBDs, FMNL2, IGHV3-23, IGLV2-11 and RPL31 were selected as the most predictable DifBBSs. The constructed model based on these genes and Naive Bayes on an external dataset was validated with 87.2% accuracy. Our study identified several candidate blood biomarkers for a potential specific and differential diagnosis of COVID-19. The proposed biomarkers could be valuable targets for practical investigations to validate their potential.
first_indexed 2024-04-09T18:55:33Z
format Article
id doaj.art-f9d4ecc98e004892aac0f3094b901b30
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-04-09T18:55:33Z
publishDate 2023-04-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-f9d4ecc98e004892aac0f3094b901b302023-04-09T11:15:53ZengNature PortfolioScientific Reports2045-23222023-04-0113111510.1038/s41598-023-32268-2A comprehensive analysis of gene expression profiling data in COVID-19 patients for discovery of specific and differential blood biomarker signaturesMaryam Momeni0Maryam Rashidifar1Farinaz Hosseini Balam2Amir Roointan3Alieh Gholaminejad4Department of Biotechnology, Faculty of Biological Science and Technology, The University of IsfahanDepartment of Plant Sciences and Biotechnology, Faculty of Life Sciences and Biotechnology, Shahid Beheshti UniversityDepartment of Cellular and Molecular Nutrition, Faculty of Nutrition and Food Technology, National Nutrition and Food Technology Research Institute, Shahid Beheshti University of Medical SciencesRegenerative Medicine Research Center, Faculty of Medicine, Isfahan Univerity of Medical SciencesRegenerative Medicine Research Center, Faculty of Medicine, Isfahan Univerity of Medical SciencesAbstract COVID-19 is a newly recognized illness with a predominantly respiratory presentation. Although initial analyses have identified groups of candidate gene biomarkers for the diagnosis of COVID-19, they have yet to identify clinically applicable biomarkers, so we need disease-specific diagnostic biomarkers in biofluid and differential diagnosis in comparison with other infectious diseases. This can further increase knowledge of pathogenesis and help guide treatment. Eight transcriptomic profiles of COVID-19 infected versus control samples from peripheral blood (PB), lung tissue, nasopharyngeal swab and bronchoalveolar lavage fluid (BALF) were considered. In order to find COVID-19 potential Specific Blood Differentially expressed genes (SpeBDs), we implemented a strategy based on finding shared pathways of peripheral blood and the most involved tissues in COVID-19 patients. This step was performed to filter blood DEGs with a role in the shared pathways. Furthermore, nine datasets of the three types of Influenza (H1N1, H3N2, and B) were used for the second step. Potential Differential Blood DEGs of COVID-19 versus Influenza (DifBDs) were found by extracting DEGs involved in only enriched pathways by SpeBDs and not by Influenza DEGs. Then in the third step, a machine learning method (a wrapper feature selection approach supervised by four classifiers of k-NN, Random Forest, SVM, Naïve Bayes) was utilized to narrow down the number of SpeBDs and DifBDs and find the most predictive combination of them to select COVID-19 potential Specific Blood Biomarker Signatures (SpeBBSs) and COVID-19 versus influenza Differential Blood Biomarker Signatures (DifBBSs), respectively. After that, models based on SpeBBSs and DifBBSs and the corresponding algorithms were built to assess their performance on an external dataset. Among all the extracted DEGs from the PB dataset (from common PB pathways with BALF, Lung and Swab), 108 unique SpeBD were obtained. Feature selection using Random Forest outperformed its counterparts and selected IGKC, IGLV3-16 and SRP9 among SpeBDs as SpeBBSs. Validation of the constructed model based on these genes and Random Forest on an external dataset resulted in 93.09% Accuracy. Eighty-three pathways enriched by SpeBDs and not by any of the influenza strains were identified, including 87 DifBDs. Using feature selection by Naive Bayes classifier on DifBDs, FMNL2, IGHV3-23, IGLV2-11 and RPL31 were selected as the most predictable DifBBSs. The constructed model based on these genes and Naive Bayes on an external dataset was validated with 87.2% accuracy. Our study identified several candidate blood biomarkers for a potential specific and differential diagnosis of COVID-19. The proposed biomarkers could be valuable targets for practical investigations to validate their potential.https://doi.org/10.1038/s41598-023-32268-2
spellingShingle Maryam Momeni
Maryam Rashidifar
Farinaz Hosseini Balam
Amir Roointan
Alieh Gholaminejad
A comprehensive analysis of gene expression profiling data in COVID-19 patients for discovery of specific and differential blood biomarker signatures
Scientific Reports
title A comprehensive analysis of gene expression profiling data in COVID-19 patients for discovery of specific and differential blood biomarker signatures
title_full A comprehensive analysis of gene expression profiling data in COVID-19 patients for discovery of specific and differential blood biomarker signatures
title_fullStr A comprehensive analysis of gene expression profiling data in COVID-19 patients for discovery of specific and differential blood biomarker signatures
title_full_unstemmed A comprehensive analysis of gene expression profiling data in COVID-19 patients for discovery of specific and differential blood biomarker signatures
title_short A comprehensive analysis of gene expression profiling data in COVID-19 patients for discovery of specific and differential blood biomarker signatures
title_sort comprehensive analysis of gene expression profiling data in covid 19 patients for discovery of specific and differential blood biomarker signatures
url https://doi.org/10.1038/s41598-023-32268-2
work_keys_str_mv AT maryammomeni acomprehensiveanalysisofgeneexpressionprofilingdataincovid19patientsfordiscoveryofspecificanddifferentialbloodbiomarkersignatures
AT maryamrashidifar acomprehensiveanalysisofgeneexpressionprofilingdataincovid19patientsfordiscoveryofspecificanddifferentialbloodbiomarkersignatures
AT farinazhosseinibalam acomprehensiveanalysisofgeneexpressionprofilingdataincovid19patientsfordiscoveryofspecificanddifferentialbloodbiomarkersignatures
AT amirroointan acomprehensiveanalysisofgeneexpressionprofilingdataincovid19patientsfordiscoveryofspecificanddifferentialbloodbiomarkersignatures
AT aliehgholaminejad acomprehensiveanalysisofgeneexpressionprofilingdataincovid19patientsfordiscoveryofspecificanddifferentialbloodbiomarkersignatures
AT maryammomeni comprehensiveanalysisofgeneexpressionprofilingdataincovid19patientsfordiscoveryofspecificanddifferentialbloodbiomarkersignatures
AT maryamrashidifar comprehensiveanalysisofgeneexpressionprofilingdataincovid19patientsfordiscoveryofspecificanddifferentialbloodbiomarkersignatures
AT farinazhosseinibalam comprehensiveanalysisofgeneexpressionprofilingdataincovid19patientsfordiscoveryofspecificanddifferentialbloodbiomarkersignatures
AT amirroointan comprehensiveanalysisofgeneexpressionprofilingdataincovid19patientsfordiscoveryofspecificanddifferentialbloodbiomarkersignatures
AT aliehgholaminejad comprehensiveanalysisofgeneexpressionprofilingdataincovid19patientsfordiscoveryofspecificanddifferentialbloodbiomarkersignatures