Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during Pandemics

Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has recently become a novel pandemic event following the swine flu that occurred in 2009, which was caused by the influenza A virus (H1N1 subtype). The accurate identification of the huge num...

Full description

Bibliographic Details
Main Authors: Hendrick Gao-Min Lim, Shih-Hsin Hsiao, Yuan-Chii Gladys Lee
Format: Article
Language:English
Published: MDPI AG 2021-10-01
Series:Biology
Subjects:
Online Access:https://www.mdpi.com/2079-7737/10/10/1023
_version_ 1797515262645239808
author Hendrick Gao-Min Lim
Shih-Hsin Hsiao
Yuan-Chii Gladys Lee
author_facet Hendrick Gao-Min Lim
Shih-Hsin Hsiao
Yuan-Chii Gladys Lee
author_sort Hendrick Gao-Min Lim
collection DOAJ
description Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has recently become a novel pandemic event following the swine flu that occurred in 2009, which was caused by the influenza A virus (H1N1 subtype). The accurate identification of the huge number of samples during a pandemic still remains a challenge. In this study, we integrate two technologies, next-generation sequencing and cloud computing, into an optimized workflow version that uses a specific identification algorithm on the designated cloud platform. We use 182 samples (92 for COVID-19 and 90 for swine flu) with short-read sequencing data from two open-access datasets to represent each pandemic and evaluate our workflow performance based on an index specifically created for SARS-CoV-2 or H1N1. Results show that our workflow could differentiate cases between the two pandemics with a higher accuracy depending on the index used, especially when the index that exclusively represented each dataset was used. Our workflow substantially outperforms the original complete identification workflow available on the same platform in terms of time and cost by preserving essential tools internally. Our workflow can serve as a powerful tool for the robust identification of cases and, thus, aid in controlling the current and future pandemics.
first_indexed 2024-03-10T06:43:01Z
format Article
id doaj.art-05f9128145db48f4b530e1d8c08cf3ec
institution Directory Open Access Journal
issn 2079-7737
language English
last_indexed 2024-03-10T06:43:01Z
publishDate 2021-10-01
publisher MDPI AG
record_format Article
series Biology
spelling doaj.art-05f9128145db48f4b530e1d8c08cf3ec2023-11-22T17:28:35ZengMDPI AGBiology2079-77372021-10-011010102310.3390/biology10101023Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during PandemicsHendrick Gao-Min Lim0Shih-Hsin Hsiao1Yuan-Chii Gladys Lee2Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 11031, TaiwanDivision of Pulmonary Medicine, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei 11031, TaiwanGraduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 11031, TaiwanCoronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has recently become a novel pandemic event following the swine flu that occurred in 2009, which was caused by the influenza A virus (H1N1 subtype). The accurate identification of the huge number of samples during a pandemic still remains a challenge. In this study, we integrate two technologies, next-generation sequencing and cloud computing, into an optimized workflow version that uses a specific identification algorithm on the designated cloud platform. We use 182 samples (92 for COVID-19 and 90 for swine flu) with short-read sequencing data from two open-access datasets to represent each pandemic and evaluate our workflow performance based on an index specifically created for SARS-CoV-2 or H1N1. Results show that our workflow could differentiate cases between the two pandemics with a higher accuracy depending on the index used, especially when the index that exclusively represented each dataset was used. Our workflow substantially outperforms the original complete identification workflow available on the same platform in terms of time and cost by preserving essential tools internally. Our workflow can serve as a powerful tool for the robust identification of cases and, thus, aid in controlling the current and future pandemics.https://www.mdpi.com/2079-7737/10/10/1023next-generation sequencingcloud computingcloud workflowpandemicsCOVID-19SARS-CoV-2
spellingShingle Hendrick Gao-Min Lim
Shih-Hsin Hsiao
Yuan-Chii Gladys Lee
Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during Pandemics
Biology
next-generation sequencing
cloud computing
cloud workflow
pandemics
COVID-19
SARS-CoV-2
title Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during Pandemics
title_full Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during Pandemics
title_fullStr Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during Pandemics
title_full_unstemmed Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during Pandemics
title_short Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during Pandemics
title_sort orchestrating an optimized next generation sequencing based cloud workflow for robust viral identification during pandemics
topic next-generation sequencing
cloud computing
cloud workflow
pandemics
COVID-19
SARS-CoV-2
url https://www.mdpi.com/2079-7737/10/10/1023
work_keys_str_mv AT hendrickgaominlim orchestratinganoptimizednextgenerationsequencingbasedcloudworkflowforrobustviralidentificationduringpandemics
AT shihhsinhsiao orchestratinganoptimizednextgenerationsequencingbasedcloudworkflowforrobustviralidentificationduringpandemics
AT yuanchiigladyslee orchestratinganoptimizednextgenerationsequencingbasedcloudworkflowforrobustviralidentificationduringpandemics