Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene

Porcine reproductive and respiratory syndrome is an infectious disease of pigs caused by PRRS virus (PRRSV). A modified live-attenuated vaccine has been widely used to control the spread of PRRSV and the classification of field strains is a key for a successful control and prevention. Restriction fr...

Full description

Bibliographic Details
Main Authors: Jeonghoon Kim, Kyuyoung Lee, Ruwini Rupasinghe, Shahbaz Rezaei, Beatriz Martínez-López, Xin Liu
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-07-01
Series:Frontiers in Veterinary Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fvets.2021.683134/full
_version_ 1818640448911048704
author Jeonghoon Kim
Kyuyoung Lee
Ruwini Rupasinghe
Shahbaz Rezaei
Beatriz Martínez-López
Xin Liu
author_facet Jeonghoon Kim
Kyuyoung Lee
Ruwini Rupasinghe
Shahbaz Rezaei
Beatriz Martínez-López
Xin Liu
author_sort Jeonghoon Kim
collection DOAJ
description Porcine reproductive and respiratory syndrome is an infectious disease of pigs caused by PRRS virus (PRRSV). A modified live-attenuated vaccine has been widely used to control the spread of PRRSV and the classification of field strains is a key for a successful control and prevention. Restriction fragment length polymorphism targeting the Open reading frame 5 (ORF5) genes is widely used to classify PRRSV strains but showed unstable accuracy. Phylogenetic analysis is a powerful tool for PRRSV classification with consistent accuracy but it demands large computational power as the number of sequences gets increased. Our study aimed to apply four machine learning (ML) algorithms, random forest, k-nearest neighbor, support vector machine and multilayer perceptron, to classify field PRRSV strains into four clades using amino acid scores based on ORF5 gene sequence. Our study used amino acid sequences of ORF5 gene in 1931 field PRRSV strains collected in the US from 2012 to 2020. Phylogenetic analysis was used to labels field PRRSV strains into one of four clades: Lineage 5 or three clades in Linage 1. We measured accuracy and time consumption of classification using four ML approaches by different size of gene sequences. We found that all four ML algorithms classify a large number of field strains in a very short time (<2.5 s) with very high accuracy (>0.99 Area under curve of the Receiver of operating characteristics curve). Furthermore, the random forest approach detects a total of 4 key amino acid positions for the classification of field PRRSV strains into four clades. Our finding will provide an insightful idea to develop a rapid and accurate classification model using genetic information, which also enables us to handle large genome datasets in real time or semi-real time for data-driven decision-making and more timely surveillance.
first_indexed 2024-12-16T23:11:26Z
format Article
id doaj.art-d5960785c3d94045aa24fbc3d6cb84b9
institution Directory Open Access Journal
issn 2297-1769
language English
last_indexed 2024-12-16T23:11:26Z
publishDate 2021-07-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Veterinary Science
spelling doaj.art-d5960785c3d94045aa24fbc3d6cb84b92022-12-21T22:12:24ZengFrontiers Media S.A.Frontiers in Veterinary Science2297-17692021-07-01810.3389/fvets.2021.683134683134Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 GeneJeonghoon Kim0Kyuyoung Lee1Ruwini Rupasinghe2Shahbaz Rezaei3Beatriz Martínez-López4Xin Liu5Department of Mathematics, University of California, Davis, Davis, CA, United StatesDepartment of Medicine and Epidemiology, Center for Animal Disease Modeling and Surveillance (CADMS), School of Veterinary Medicine, University of California, Davis, Davis, CA, United StatesDepartment of Medicine and Epidemiology, Center for Animal Disease Modeling and Surveillance (CADMS), School of Veterinary Medicine, University of California, Davis, Davis, CA, United StatesDepartment of Computer Science, University of California, Davis, Davis, CA, United StatesDepartment of Medicine and Epidemiology, Center for Animal Disease Modeling and Surveillance (CADMS), School of Veterinary Medicine, University of California, Davis, Davis, CA, United StatesDepartment of Computer Science, University of California, Davis, Davis, CA, United StatesPorcine reproductive and respiratory syndrome is an infectious disease of pigs caused by PRRS virus (PRRSV). A modified live-attenuated vaccine has been widely used to control the spread of PRRSV and the classification of field strains is a key for a successful control and prevention. Restriction fragment length polymorphism targeting the Open reading frame 5 (ORF5) genes is widely used to classify PRRSV strains but showed unstable accuracy. Phylogenetic analysis is a powerful tool for PRRSV classification with consistent accuracy but it demands large computational power as the number of sequences gets increased. Our study aimed to apply four machine learning (ML) algorithms, random forest, k-nearest neighbor, support vector machine and multilayer perceptron, to classify field PRRSV strains into four clades using amino acid scores based on ORF5 gene sequence. Our study used amino acid sequences of ORF5 gene in 1931 field PRRSV strains collected in the US from 2012 to 2020. Phylogenetic analysis was used to labels field PRRSV strains into one of four clades: Lineage 5 or three clades in Linage 1. We measured accuracy and time consumption of classification using four ML approaches by different size of gene sequences. We found that all four ML algorithms classify a large number of field strains in a very short time (<2.5 s) with very high accuracy (>0.99 Area under curve of the Receiver of operating characteristics curve). Furthermore, the random forest approach detects a total of 4 key amino acid positions for the classification of field PRRSV strains into four clades. Our finding will provide an insightful idea to develop a rapid and accurate classification model using genetic information, which also enables us to handle large genome datasets in real time or semi-real time for data-driven decision-making and more timely surveillance.https://www.frontiersin.org/articles/10.3389/fvets.2021.683134/fullartificial intelligencerandom forestk-nearest neighborsupport vector machineswine healthphylogenetic tree
spellingShingle Jeonghoon Kim
Kyuyoung Lee
Ruwini Rupasinghe
Shahbaz Rezaei
Beatriz Martínez-López
Xin Liu
Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene
Frontiers in Veterinary Science
artificial intelligence
random forest
k-nearest neighbor
support vector machine
swine health
phylogenetic tree
title Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene
title_full Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene
title_fullStr Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene
title_full_unstemmed Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene
title_short Applications of Machine Learning for the Classification of Porcine Reproductive and Respiratory Syndrome Virus Sublineages Using Amino Acid Scores of ORF5 Gene
title_sort applications of machine learning for the classification of porcine reproductive and respiratory syndrome virus sublineages using amino acid scores of orf5 gene
topic artificial intelligence
random forest
k-nearest neighbor
support vector machine
swine health
phylogenetic tree
url https://www.frontiersin.org/articles/10.3389/fvets.2021.683134/full
work_keys_str_mv AT jeonghoonkim applicationsofmachinelearningfortheclassificationofporcinereproductiveandrespiratorysyndromevirussublineagesusingaminoacidscoresoforf5gene
AT kyuyounglee applicationsofmachinelearningfortheclassificationofporcinereproductiveandrespiratorysyndromevirussublineagesusingaminoacidscoresoforf5gene
AT ruwinirupasinghe applicationsofmachinelearningfortheclassificationofporcinereproductiveandrespiratorysyndromevirussublineagesusingaminoacidscoresoforf5gene
AT shahbazrezaei applicationsofmachinelearningfortheclassificationofporcinereproductiveandrespiratorysyndromevirussublineagesusingaminoacidscoresoforf5gene
AT beatrizmartinezlopez applicationsofmachinelearningfortheclassificationofporcinereproductiveandrespiratorysyndromevirussublineagesusingaminoacidscoresoforf5gene
AT xinliu applicationsofmachinelearningfortheclassificationofporcinereproductiveandrespiratorysyndromevirussublineagesusingaminoacidscoresoforf5gene