Using machine learning to detect coronaviruses potentially infectious to humans

Abstract Establishing the host range for novel viruses remains a challenge. Here, we address the challenge of identifying non-human animal coronaviruses that may infect humans by creating an artificial neural network model that learns from spike protein sequences of alpha and beta coronaviruses and...

Full description

Bibliographic Details
Main Authors: Georgina Gonzalez-Isunza, M. Zaki Jawaid, Pengyu Liu, Daniel L. Cox, Mariel Vazquez, Javier Arsuaga
Format: Article
Language:English
Published: Nature Portfolio 2023-06-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-35861-7
_version_ 1797806732088442880
author Georgina Gonzalez-Isunza
M. Zaki Jawaid
Pengyu Liu
Daniel L. Cox
Mariel Vazquez
Javier Arsuaga
author_facet Georgina Gonzalez-Isunza
M. Zaki Jawaid
Pengyu Liu
Daniel L. Cox
Mariel Vazquez
Javier Arsuaga
author_sort Georgina Gonzalez-Isunza
collection DOAJ
description Abstract Establishing the host range for novel viruses remains a challenge. Here, we address the challenge of identifying non-human animal coronaviruses that may infect humans by creating an artificial neural network model that learns from spike protein sequences of alpha and beta coronaviruses and their binding annotation to their host receptor. The proposed method produces a human-Binding Potential (h-BiP) score that distinguishes, with high accuracy, the binding potential among coronaviruses. Three viruses, previously unknown to bind human receptors, were identified: Bat coronavirus BtCoV/133/2005 and Pipistrellus abramus bat coronavirus HKU5-related (both MERS related viruses), and Rhinolophus affinis coronavirus isolate LYRa3 (a SARS related virus). We further analyze the binding properties of BtCoV/133/2005 and LYRa3 using molecular dynamics. To test whether this model can be used for surveillance of novel coronaviruses, we re-trained the model on a set that excludes SARS-CoV-2 and all viral sequences released after the SARS-CoV-2 was published. The results predict the binding of SARS-CoV-2 with a human receptor, indicating that machine learning methods are an excellent tool for the prediction of host expansion events.
first_indexed 2024-03-13T06:11:45Z
format Article
id doaj.art-b2c2e98ae64347ba969d848fbde6139e
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-13T06:11:45Z
publishDate 2023-06-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-b2c2e98ae64347ba969d848fbde6139e2023-06-11T11:14:20ZengNature PortfolioScientific Reports2045-23222023-06-0113111210.1038/s41598-023-35861-7Using machine learning to detect coronaviruses potentially infectious to humansGeorgina Gonzalez-Isunza0M. Zaki Jawaid1Pengyu Liu2Daniel L. Cox3Mariel Vazquez4Javier Arsuaga5Department of Microbiology and Molecular Genetics, University of CaliforniaDepartment of Physics, University of CaliforniaDepartment of Microbiology and Molecular Genetics, University of CaliforniaDepartment of Physics, University of CaliforniaDepartment of Microbiology and Molecular Genetics, University of CaliforniaDepartment of Molecular and Cellular Biology, University of CaliforniaAbstract Establishing the host range for novel viruses remains a challenge. Here, we address the challenge of identifying non-human animal coronaviruses that may infect humans by creating an artificial neural network model that learns from spike protein sequences of alpha and beta coronaviruses and their binding annotation to their host receptor. The proposed method produces a human-Binding Potential (h-BiP) score that distinguishes, with high accuracy, the binding potential among coronaviruses. Three viruses, previously unknown to bind human receptors, were identified: Bat coronavirus BtCoV/133/2005 and Pipistrellus abramus bat coronavirus HKU5-related (both MERS related viruses), and Rhinolophus affinis coronavirus isolate LYRa3 (a SARS related virus). We further analyze the binding properties of BtCoV/133/2005 and LYRa3 using molecular dynamics. To test whether this model can be used for surveillance of novel coronaviruses, we re-trained the model on a set that excludes SARS-CoV-2 and all viral sequences released after the SARS-CoV-2 was published. The results predict the binding of SARS-CoV-2 with a human receptor, indicating that machine learning methods are an excellent tool for the prediction of host expansion events.https://doi.org/10.1038/s41598-023-35861-7
spellingShingle Georgina Gonzalez-Isunza
M. Zaki Jawaid
Pengyu Liu
Daniel L. Cox
Mariel Vazquez
Javier Arsuaga
Using machine learning to detect coronaviruses potentially infectious to humans
Scientific Reports
title Using machine learning to detect coronaviruses potentially infectious to humans
title_full Using machine learning to detect coronaviruses potentially infectious to humans
title_fullStr Using machine learning to detect coronaviruses potentially infectious to humans
title_full_unstemmed Using machine learning to detect coronaviruses potentially infectious to humans
title_short Using machine learning to detect coronaviruses potentially infectious to humans
title_sort using machine learning to detect coronaviruses potentially infectious to humans
url https://doi.org/10.1038/s41598-023-35861-7
work_keys_str_mv AT georginagonzalezisunza usingmachinelearningtodetectcoronavirusespotentiallyinfectioustohumans
AT mzakijawaid usingmachinelearningtodetectcoronavirusespotentiallyinfectioustohumans
AT pengyuliu usingmachinelearningtodetectcoronavirusespotentiallyinfectioustohumans
AT daniellcox usingmachinelearningtodetectcoronavirusespotentiallyinfectioustohumans
AT marielvazquez usingmachinelearningtodetectcoronavirusespotentiallyinfectioustohumans
AT javierarsuaga usingmachinelearningtodetectcoronavirusespotentiallyinfectioustohumans