Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification

COVID-19, the illness caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus belonging to the <i>Coronaviridade</i> family, a single-strand positive-sense RNA genome, has been spreading around the world and has been declared a pandemic by the World Health Organi...

Full description

Bibliographic Details
Main Authors: Gabriel B. M. Câmara, Maria G. F. Coutinho, Lucileide M. D. da Silva, Walter V. do N. Gadelha, Matheus F. Torquato, Raquel de M. Barbosa, Marcelo A. C. Fernandes
Format: Article
Language:English
Published: MDPI AG 2022-07-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/22/15/5730
_version_ 1797412272245571584
author Gabriel B. M. Câmara
Maria G. F. Coutinho
Lucileide M. D. da Silva
Walter V. do N. Gadelha
Matheus F. Torquato
Raquel de M. Barbosa
Marcelo A. C. Fernandes
author_facet Gabriel B. M. Câmara
Maria G. F. Coutinho
Lucileide M. D. da Silva
Walter V. do N. Gadelha
Matheus F. Torquato
Raquel de M. Barbosa
Marcelo A. C. Fernandes
author_sort Gabriel B. M. Câmara
collection DOAJ
description COVID-19, the illness caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus belonging to the <i>Coronaviridade</i> family, a single-strand positive-sense RNA genome, has been spreading around the world and has been declared a pandemic by the World Health Organization. On 17 January 2022, there were more than 329 million cases, with more than 5.5 million deaths. Although COVID-19 has a low mortality rate, its high capacities for contamination, spread, and mutation worry the authorities, especially after the emergence of the Omicron variant, which has a high transmission capacity and can more easily contaminate even vaccinated people. Such outbreaks require elucidation of the taxonomic classification and origin of the virus (SARS-CoV-2) from the genomic sequence for strategic planning, containment, and treatment of the disease. Thus, this work proposes a high-accuracy technique to classify viruses and other organisms from a genome sequence using a deep learning convolutional neural network (CNN). Unlike the other literature, the proposed approach does not limit the length of the genome sequence. The results show that the novel proposal accurately distinguishes SARS-CoV-2 from the sequences of other viruses. The results were obtained from 1557 instances of SARS-CoV-2 from the National Center for Biotechnology Information (NCBI) and 14,684 different viruses from the Virus-Host DB. As a CNN has several changeable parameters, the tests were performed with forty-eight different architectures; the best of these had an accuracy of 91.94 ± 2.62% in classifying viruses into their realms correctly, in addition to 100% accuracy in classifying SARS-CoV-2 into its respective realm, <i>Riboviria</i>. For the subsequent classifications (family, genera, and subgenus), this accuracy increased, which shows that the proposed architecture may be viable in the classification of the virus that causes COVID-19.
first_indexed 2024-03-09T05:00:38Z
format Article
id doaj.art-fe77f99658d540d18ab4b0e3748d7b98
institution Directory Open Access Journal
issn 1424-8220
language English
last_indexed 2024-03-09T05:00:38Z
publishDate 2022-07-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj.art-fe77f99658d540d18ab4b0e3748d7b982023-12-03T13:01:20ZengMDPI AGSensors1424-82202022-07-012215573010.3390/s22155730Convolutional Neural Network Applied to SARS-CoV-2 Sequence ClassificationGabriel B. M. Câmara0Maria G. F. Coutinho1Lucileide M. D. da Silva2Walter V. do N. Gadelha3Matheus F. Torquato4Raquel de M. Barbosa5Marcelo A. C. Fernandes6Bioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte, Natal 59078-970, RN, BrazilLaboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, BrazilLaboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, BrazilLaboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, BrazilLaboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, BrazilLaboratory of Machine Learning and Intelligent Instrumentation, Federal University of Rio Grande do Norte, Natal 59078-970, RN, BrazilBioinformatics Multidisciplinary Environment (BioME), Federal University of Rio Grande do Norte, Natal 59078-970, RN, BrazilCOVID-19, the illness caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus belonging to the <i>Coronaviridade</i> family, a single-strand positive-sense RNA genome, has been spreading around the world and has been declared a pandemic by the World Health Organization. On 17 January 2022, there were more than 329 million cases, with more than 5.5 million deaths. Although COVID-19 has a low mortality rate, its high capacities for contamination, spread, and mutation worry the authorities, especially after the emergence of the Omicron variant, which has a high transmission capacity and can more easily contaminate even vaccinated people. Such outbreaks require elucidation of the taxonomic classification and origin of the virus (SARS-CoV-2) from the genomic sequence for strategic planning, containment, and treatment of the disease. Thus, this work proposes a high-accuracy technique to classify viruses and other organisms from a genome sequence using a deep learning convolutional neural network (CNN). Unlike the other literature, the proposed approach does not limit the length of the genome sequence. The results show that the novel proposal accurately distinguishes SARS-CoV-2 from the sequences of other viruses. The results were obtained from 1557 instances of SARS-CoV-2 from the National Center for Biotechnology Information (NCBI) and 14,684 different viruses from the Virus-Host DB. As a CNN has several changeable parameters, the tests were performed with forty-eight different architectures; the best of these had an accuracy of 91.94 ± 2.62% in classifying viruses into their realms correctly, in addition to 100% accuracy in classifying SARS-CoV-2 into its respective realm, <i>Riboviria</i>. For the subsequent classifications (family, genera, and subgenus), this accuracy increased, which shows that the proposed architecture may be viable in the classification of the virus that causes COVID-19.https://www.mdpi.com/1424-8220/22/15/5730SARS-CoV-2COVID-19deep learningCNN
spellingShingle Gabriel B. M. Câmara
Maria G. F. Coutinho
Lucileide M. D. da Silva
Walter V. do N. Gadelha
Matheus F. Torquato
Raquel de M. Barbosa
Marcelo A. C. Fernandes
Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification
Sensors
SARS-CoV-2
COVID-19
deep learning
CNN
title Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification
title_full Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification
title_fullStr Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification
title_full_unstemmed Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification
title_short Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification
title_sort convolutional neural network applied to sars cov 2 sequence classification
topic SARS-CoV-2
COVID-19
deep learning
CNN
url https://www.mdpi.com/1424-8220/22/15/5730
work_keys_str_mv AT gabrielbmcamara convolutionalneuralnetworkappliedtosarscov2sequenceclassification
AT mariagfcoutinho convolutionalneuralnetworkappliedtosarscov2sequenceclassification
AT lucileidemddasilva convolutionalneuralnetworkappliedtosarscov2sequenceclassification
AT waltervdongadelha convolutionalneuralnetworkappliedtosarscov2sequenceclassification
AT matheusftorquato convolutionalneuralnetworkappliedtosarscov2sequenceclassification
AT raqueldembarbosa convolutionalneuralnetworkappliedtosarscov2sequenceclassification
AT marceloacfernandes convolutionalneuralnetworkappliedtosarscov2sequenceclassification