Anomaly Detection Models for SARS-CoV-2 Surveillance Based on Genome <i>k</i>-mers

Since COVID-19 has brought great challenges to global public health governance, developing methods that track the evolution of the virus over the course of an epidemic or pandemic is useful for public health. This paper uses anomaly detection models to analyze SARS-CoV-2 virus genome <i>k</...

Full description

Bibliographic Details
Main Authors: Haotian Ren, Yixue Li, Tao Huang
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Series:Microorganisms
Subjects:
Online Access:https://www.mdpi.com/2076-2607/11/11/2773
_version_ 1827639176033468416
author Haotian Ren
Yixue Li
Tao Huang
author_facet Haotian Ren
Yixue Li
Tao Huang
author_sort Haotian Ren
collection DOAJ
description Since COVID-19 has brought great challenges to global public health governance, developing methods that track the evolution of the virus over the course of an epidemic or pandemic is useful for public health. This paper uses anomaly detection models to analyze SARS-CoV-2 virus genome <i>k</i>-mers to predict possible new critical variants in the collected samples. We used the sample data from Argentina, China and Portugal obtained from the Global Initiative on Sharing All Influenza Data (GISAID) to conduct multiple rounds of evaluation on several anomaly detection models, to verify the feasibility of this virus early warning and surveillance idea and find appropriate anomaly detection models for actual epidemic surveillance. Through multiple rounds of model testing, we found that the LUNAR (learnable unified neighborhood-based anomaly ranking) and LUNAR+LUNAR stacking model performed well in new critical variants detection. The results of simulated dynamic detection validate the feasibility of this approach, which can help efficiently monitor samples in local areas.
first_indexed 2024-03-09T16:34:55Z
format Article
id doaj.art-11d412dd1aed4e4f9d0eb04dbde92edd
institution Directory Open Access Journal
issn 2076-2607
language English
last_indexed 2024-03-09T16:34:55Z
publishDate 2023-11-01
publisher MDPI AG
record_format Article
series Microorganisms
spelling doaj.art-11d412dd1aed4e4f9d0eb04dbde92edd2023-11-24T14:57:13ZengMDPI AGMicroorganisms2076-26072023-11-011111277310.3390/microorganisms11112773Anomaly Detection Models for SARS-CoV-2 Surveillance Based on Genome <i>k</i>-mersHaotian Ren0Yixue Li1Tao Huang2Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, ChinaBio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, ChinaBio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, ChinaSince COVID-19 has brought great challenges to global public health governance, developing methods that track the evolution of the virus over the course of an epidemic or pandemic is useful for public health. This paper uses anomaly detection models to analyze SARS-CoV-2 virus genome <i>k</i>-mers to predict possible new critical variants in the collected samples. We used the sample data from Argentina, China and Portugal obtained from the Global Initiative on Sharing All Influenza Data (GISAID) to conduct multiple rounds of evaluation on several anomaly detection models, to verify the feasibility of this virus early warning and surveillance idea and find appropriate anomaly detection models for actual epidemic surveillance. Through multiple rounds of model testing, we found that the LUNAR (learnable unified neighborhood-based anomaly ranking) and LUNAR+LUNAR stacking model performed well in new critical variants detection. The results of simulated dynamic detection validate the feasibility of this approach, which can help efficiently monitor samples in local areas.https://www.mdpi.com/2076-2607/11/11/2773anomaly detectionvirus surveillanceSARS-CoV-2<i>k</i>-mermachine learning
spellingShingle Haotian Ren
Yixue Li
Tao Huang
Anomaly Detection Models for SARS-CoV-2 Surveillance Based on Genome <i>k</i>-mers
Microorganisms
anomaly detection
virus surveillance
SARS-CoV-2
<i>k</i>-mer
machine learning
title Anomaly Detection Models for SARS-CoV-2 Surveillance Based on Genome <i>k</i>-mers
title_full Anomaly Detection Models for SARS-CoV-2 Surveillance Based on Genome <i>k</i>-mers
title_fullStr Anomaly Detection Models for SARS-CoV-2 Surveillance Based on Genome <i>k</i>-mers
title_full_unstemmed Anomaly Detection Models for SARS-CoV-2 Surveillance Based on Genome <i>k</i>-mers
title_short Anomaly Detection Models for SARS-CoV-2 Surveillance Based on Genome <i>k</i>-mers
title_sort anomaly detection models for sars cov 2 surveillance based on genome i k i mers
topic anomaly detection
virus surveillance
SARS-CoV-2
<i>k</i>-mer
machine learning
url https://www.mdpi.com/2076-2607/11/11/2773
work_keys_str_mv AT haotianren anomalydetectionmodelsforsarscov2surveillancebasedongenomeikimers
AT yixueli anomalydetectionmodelsforsarscov2surveillancebasedongenomeikimers
AT taohuang anomalydetectionmodelsforsarscov2surveillancebasedongenomeikimers