A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains
Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is a newly emerging virus well known as the major cause of the worldwide pandemic due to Coronavirus Disease 2019 (COVID-19). Major breakthroughs in the Next Generation Sequencing (NGS) field were elucidated following the first release of...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI
2022
|
Subjects: | |
Online Access: | https://repository.ugm.ac.id/278593/1/Afiahayati_MA.pdf |
_version_ | 1826050285256048640 |
---|---|
author | AFIAHAYATI, AFIAHAYATI Bernard, Stefanus Gunadi, Gunadi Wibawa, Hendra Hakim, Mohamad Saifudin Marcellus, Marcellus Parikesit, Arli Aditya Dewa, Chandra Kusuma Sakakibara, Yasubumi |
author_facet | AFIAHAYATI, AFIAHAYATI Bernard, Stefanus Gunadi, Gunadi Wibawa, Hendra Hakim, Mohamad Saifudin Marcellus, Marcellus Parikesit, Arli Aditya Dewa, Chandra Kusuma Sakakibara, Yasubumi |
author_sort | AFIAHAYATI, AFIAHAYATI |
collection | UGM |
description | Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is a newly emerging virus well known as the major cause of the worldwide pandemic due to Coronavirus Disease 2019
(COVID-19). Major breakthroughs in the Next Generation Sequencing (NGS) field were elucidated following the first release of a full-length SARS-CoV-2 genome on the 10 January 2020, with the hope of turning the table against the worsening pandemic situation. Previous studies in respiratory virus characterization require mapping of raw sequences to the human genome in the downstream
bioinformatics pipeline as part of metagenomic principles. Illumina, as the major player in the NGS arena, took action by releasing guidelines for improved enrichment kits called the Respiratory Virus Oligo Panel (RVOP) based on a hybridization capture method capable of capturing targeted
respiratory viruses, including SARS-CoV-2; therefore, allowing a direct map of raw sequences data to SARS-CoV-2 genome in downstream bioinformatics pipeline. Consequently, two bioinformatics pipelines emerged with no previous studies benchmarking the pipelines. This study focuses on
gaining insight and understanding of target enrichment workflow by Illumina through the utilization of different bioinformatics pipelines named as ‘Fast Pipeline’ and ‘Normal Pipeline’ to SARS-CoV-2 strains isolated from Yogyakarta and Central Java, Indonesia. Overall, both pipelines work well in the characterization of SARS-CoV-2 samples, including in the identification of major studied nucleotide substitutions and amino acid mutations. A higher number of reads mapped to the SARS-CoV-2 genome in Fast Pipeline and merely were discovered as a contributing factor in a higher number of coverage depth and identified variations (SNPs, insertion, and deletion). Fast Pipeline ultimately works well in a situation where time is a critical factor. On the other hand, Normal Pipeline would
require a longer time as it mapped reads to the human genome. Certain limitations were identified in terms of pipeline algorithm, whereas it is highly recommended in future studies to design a pipeline in an integrated framework, for instance, by using NextFlow, a workflow framework to combine all scripts into one fully integrated pipeline. |
first_indexed | 2024-03-14T00:01:28Z |
format | Article |
id | oai:generic.eprints.org:278593 |
institution | Universiti Gadjah Mada |
language | English |
last_indexed | 2024-03-14T00:01:28Z |
publishDate | 2022 |
publisher | MDPI |
record_format | dspace |
spelling | oai:generic.eprints.org:2785932023-11-02T02:19:36Z https://repository.ugm.ac.id/278593/ A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains AFIAHAYATI, AFIAHAYATI Bernard, Stefanus Gunadi, Gunadi Wibawa, Hendra Hakim, Mohamad Saifudin Marcellus, Marcellus Parikesit, Arli Aditya Dewa, Chandra Kusuma Sakakibara, Yasubumi Computational Logic and Formal Languages Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is a newly emerging virus well known as the major cause of the worldwide pandemic due to Coronavirus Disease 2019 (COVID-19). Major breakthroughs in the Next Generation Sequencing (NGS) field were elucidated following the first release of a full-length SARS-CoV-2 genome on the 10 January 2020, with the hope of turning the table against the worsening pandemic situation. Previous studies in respiratory virus characterization require mapping of raw sequences to the human genome in the downstream bioinformatics pipeline as part of metagenomic principles. Illumina, as the major player in the NGS arena, took action by releasing guidelines for improved enrichment kits called the Respiratory Virus Oligo Panel (RVOP) based on a hybridization capture method capable of capturing targeted respiratory viruses, including SARS-CoV-2; therefore, allowing a direct map of raw sequences data to SARS-CoV-2 genome in downstream bioinformatics pipeline. Consequently, two bioinformatics pipelines emerged with no previous studies benchmarking the pipelines. This study focuses on gaining insight and understanding of target enrichment workflow by Illumina through the utilization of different bioinformatics pipelines named as ‘Fast Pipeline’ and ‘Normal Pipeline’ to SARS-CoV-2 strains isolated from Yogyakarta and Central Java, Indonesia. Overall, both pipelines work well in the characterization of SARS-CoV-2 samples, including in the identification of major studied nucleotide substitutions and amino acid mutations. A higher number of reads mapped to the SARS-CoV-2 genome in Fast Pipeline and merely were discovered as a contributing factor in a higher number of coverage depth and identified variations (SNPs, insertion, and deletion). Fast Pipeline ultimately works well in a situation where time is a critical factor. On the other hand, Normal Pipeline would require a longer time as it mapped reads to the human genome. Certain limitations were identified in terms of pipeline algorithm, whereas it is highly recommended in future studies to design a pipeline in an integrated framework, for instance, by using NextFlow, a workflow framework to combine all scripts into one fully integrated pipeline. MDPI 2022-07-26 Article PeerReviewed application/pdf en https://repository.ugm.ac.id/278593/1/Afiahayati_MA.pdf AFIAHAYATI, AFIAHAYATI and Bernard, Stefanus and Gunadi, Gunadi and Wibawa, Hendra and Hakim, Mohamad Saifudin and Marcellus, Marcellus and Parikesit, Arli Aditya and Dewa, Chandra Kusuma and Sakakibara, Yasubumi (2022) A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains. Genes, 13 (1330). pp. 1-24. ISSN 2073-4425 https://www.mdpi.com/journal/genes https://doi.org/10.3390/genes13081330 |
spellingShingle | Computational Logic and Formal Languages AFIAHAYATI, AFIAHAYATI Bernard, Stefanus Gunadi, Gunadi Wibawa, Hendra Hakim, Mohamad Saifudin Marcellus, Marcellus Parikesit, Arli Aditya Dewa, Chandra Kusuma Sakakibara, Yasubumi A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains |
title | A Comparison of Bioinformatics Pipelines for Enrichment
Illumina Next Generation Sequencing Systems in Detecting
SARS-CoV-2 Virus Strains |
title_full | A Comparison of Bioinformatics Pipelines for Enrichment
Illumina Next Generation Sequencing Systems in Detecting
SARS-CoV-2 Virus Strains |
title_fullStr | A Comparison of Bioinformatics Pipelines for Enrichment
Illumina Next Generation Sequencing Systems in Detecting
SARS-CoV-2 Virus Strains |
title_full_unstemmed | A Comparison of Bioinformatics Pipelines for Enrichment
Illumina Next Generation Sequencing Systems in Detecting
SARS-CoV-2 Virus Strains |
title_short | A Comparison of Bioinformatics Pipelines for Enrichment
Illumina Next Generation Sequencing Systems in Detecting
SARS-CoV-2 Virus Strains |
title_sort | comparison of bioinformatics pipelines for enrichment illumina next generation sequencing systems in detecting sars cov 2 virus strains |
topic | Computational Logic and Formal Languages |
url | https://repository.ugm.ac.id/278593/1/Afiahayati_MA.pdf |
work_keys_str_mv | AT afiahayatiafiahayati acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT bernardstefanus acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT gunadigunadi acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT wibawahendra acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT hakimmohamadsaifudin acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT marcellusmarcellus acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT parikesitarliaditya acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT dewachandrakusuma acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT sakakibarayasubumi acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT afiahayatiafiahayati comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT bernardstefanus comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT gunadigunadi comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT wibawahendra comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT hakimmohamadsaifudin comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT marcellusmarcellus comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT parikesitarliaditya comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT dewachandrakusuma comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains AT sakakibarayasubumi comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains |