Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during Pandemics
Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has recently become a novel pandemic event following the swine flu that occurred in 2009, which was caused by the influenza A virus (H1N1 subtype). The accurate identification of the huge num...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-10-01
|
Series: | Biology |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-7737/10/10/1023 |
_version_ | 1797515262645239808 |
---|---|
author | Hendrick Gao-Min Lim Shih-Hsin Hsiao Yuan-Chii Gladys Lee |
author_facet | Hendrick Gao-Min Lim Shih-Hsin Hsiao Yuan-Chii Gladys Lee |
author_sort | Hendrick Gao-Min Lim |
collection | DOAJ |
description | Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has recently become a novel pandemic event following the swine flu that occurred in 2009, which was caused by the influenza A virus (H1N1 subtype). The accurate identification of the huge number of samples during a pandemic still remains a challenge. In this study, we integrate two technologies, next-generation sequencing and cloud computing, into an optimized workflow version that uses a specific identification algorithm on the designated cloud platform. We use 182 samples (92 for COVID-19 and 90 for swine flu) with short-read sequencing data from two open-access datasets to represent each pandemic and evaluate our workflow performance based on an index specifically created for SARS-CoV-2 or H1N1. Results show that our workflow could differentiate cases between the two pandemics with a higher accuracy depending on the index used, especially when the index that exclusively represented each dataset was used. Our workflow substantially outperforms the original complete identification workflow available on the same platform in terms of time and cost by preserving essential tools internally. Our workflow can serve as a powerful tool for the robust identification of cases and, thus, aid in controlling the current and future pandemics. |
first_indexed | 2024-03-10T06:43:01Z |
format | Article |
id | doaj.art-05f9128145db48f4b530e1d8c08cf3ec |
institution | Directory Open Access Journal |
issn | 2079-7737 |
language | English |
last_indexed | 2024-03-10T06:43:01Z |
publishDate | 2021-10-01 |
publisher | MDPI AG |
record_format | Article |
series | Biology |
spelling | doaj.art-05f9128145db48f4b530e1d8c08cf3ec2023-11-22T17:28:35ZengMDPI AGBiology2079-77372021-10-011010102310.3390/biology10101023Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during PandemicsHendrick Gao-Min Lim0Shih-Hsin Hsiao1Yuan-Chii Gladys Lee2Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 11031, TaiwanDivision of Pulmonary Medicine, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei 11031, TaiwanGraduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei 11031, TaiwanCoronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has recently become a novel pandemic event following the swine flu that occurred in 2009, which was caused by the influenza A virus (H1N1 subtype). The accurate identification of the huge number of samples during a pandemic still remains a challenge. In this study, we integrate two technologies, next-generation sequencing and cloud computing, into an optimized workflow version that uses a specific identification algorithm on the designated cloud platform. We use 182 samples (92 for COVID-19 and 90 for swine flu) with short-read sequencing data from two open-access datasets to represent each pandemic and evaluate our workflow performance based on an index specifically created for SARS-CoV-2 or H1N1. Results show that our workflow could differentiate cases between the two pandemics with a higher accuracy depending on the index used, especially when the index that exclusively represented each dataset was used. Our workflow substantially outperforms the original complete identification workflow available on the same platform in terms of time and cost by preserving essential tools internally. Our workflow can serve as a powerful tool for the robust identification of cases and, thus, aid in controlling the current and future pandemics.https://www.mdpi.com/2079-7737/10/10/1023next-generation sequencingcloud computingcloud workflowpandemicsCOVID-19SARS-CoV-2 |
spellingShingle | Hendrick Gao-Min Lim Shih-Hsin Hsiao Yuan-Chii Gladys Lee Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during Pandemics Biology next-generation sequencing cloud computing cloud workflow pandemics COVID-19 SARS-CoV-2 |
title | Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during Pandemics |
title_full | Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during Pandemics |
title_fullStr | Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during Pandemics |
title_full_unstemmed | Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during Pandemics |
title_short | Orchestrating an Optimized Next-Generation Sequencing-Based Cloud Workflow for Robust Viral Identification during Pandemics |
title_sort | orchestrating an optimized next generation sequencing based cloud workflow for robust viral identification during pandemics |
topic | next-generation sequencing cloud computing cloud workflow pandemics COVID-19 SARS-CoV-2 |
url | https://www.mdpi.com/2079-7737/10/10/1023 |
work_keys_str_mv | AT hendrickgaominlim orchestratinganoptimizednextgenerationsequencingbasedcloudworkflowforrobustviralidentificationduringpandemics AT shihhsinhsiao orchestratinganoptimizednextgenerationsequencingbasedcloudworkflowforrobustviralidentificationduringpandemics AT yuanchiigladyslee orchestratinganoptimizednextgenerationsequencingbasedcloudworkflowforrobustviralidentificationduringpandemics |