A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is a newly emerging virus well known as the major cause of the worldwide pandemic due to Coronavirus Disease 2019 (COVID-19). Major breakthroughs in the Next Generation Sequencing (NGS) field were elucidated following the first release of...

Full description

Bibliographic Details
Main Authors: AFIAHAYATI, AFIAHAYATI, Bernard, Stefanus, Gunadi, Gunadi, Wibawa, Hendra, Hakim, Mohamad Saifudin, Marcellus, Marcellus, Parikesit, Arli Aditya, Dewa, Chandra Kusuma, Sakakibara, Yasubumi
Format: Article
Language:English
Published: MDPI 2022
Subjects:
Online Access:https://repository.ugm.ac.id/278593/1/Afiahayati_MA.pdf
_version_ 1826050285256048640
author AFIAHAYATI, AFIAHAYATI
Bernard, Stefanus
Gunadi, Gunadi
Wibawa, Hendra
Hakim, Mohamad Saifudin
Marcellus, Marcellus
Parikesit, Arli Aditya
Dewa, Chandra Kusuma
Sakakibara, Yasubumi
author_facet AFIAHAYATI, AFIAHAYATI
Bernard, Stefanus
Gunadi, Gunadi
Wibawa, Hendra
Hakim, Mohamad Saifudin
Marcellus, Marcellus
Parikesit, Arli Aditya
Dewa, Chandra Kusuma
Sakakibara, Yasubumi
author_sort AFIAHAYATI, AFIAHAYATI
collection UGM
description Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is a newly emerging virus well known as the major cause of the worldwide pandemic due to Coronavirus Disease 2019 (COVID-19). Major breakthroughs in the Next Generation Sequencing (NGS) field were elucidated following the first release of a full-length SARS-CoV-2 genome on the 10 January 2020, with the hope of turning the table against the worsening pandemic situation. Previous studies in respiratory virus characterization require mapping of raw sequences to the human genome in the downstream bioinformatics pipeline as part of metagenomic principles. Illumina, as the major player in the NGS arena, took action by releasing guidelines for improved enrichment kits called the Respiratory Virus Oligo Panel (RVOP) based on a hybridization capture method capable of capturing targeted respiratory viruses, including SARS-CoV-2; therefore, allowing a direct map of raw sequences data to SARS-CoV-2 genome in downstream bioinformatics pipeline. Consequently, two bioinformatics pipelines emerged with no previous studies benchmarking the pipelines. This study focuses on gaining insight and understanding of target enrichment workflow by Illumina through the utilization of different bioinformatics pipelines named as ‘Fast Pipeline’ and ‘Normal Pipeline’ to SARS-CoV-2 strains isolated from Yogyakarta and Central Java, Indonesia. Overall, both pipelines work well in the characterization of SARS-CoV-2 samples, including in the identification of major studied nucleotide substitutions and amino acid mutations. A higher number of reads mapped to the SARS-CoV-2 genome in Fast Pipeline and merely were discovered as a contributing factor in a higher number of coverage depth and identified variations (SNPs, insertion, and deletion). Fast Pipeline ultimately works well in a situation where time is a critical factor. On the other hand, Normal Pipeline would require a longer time as it mapped reads to the human genome. Certain limitations were identified in terms of pipeline algorithm, whereas it is highly recommended in future studies to design a pipeline in an integrated framework, for instance, by using NextFlow, a workflow framework to combine all scripts into one fully integrated pipeline.
first_indexed 2024-03-14T00:01:28Z
format Article
id oai:generic.eprints.org:278593
institution Universiti Gadjah Mada
language English
last_indexed 2024-03-14T00:01:28Z
publishDate 2022
publisher MDPI
record_format dspace
spelling oai:generic.eprints.org:2785932023-11-02T02:19:36Z https://repository.ugm.ac.id/278593/ A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains AFIAHAYATI, AFIAHAYATI Bernard, Stefanus Gunadi, Gunadi Wibawa, Hendra Hakim, Mohamad Saifudin Marcellus, Marcellus Parikesit, Arli Aditya Dewa, Chandra Kusuma Sakakibara, Yasubumi Computational Logic and Formal Languages Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) is a newly emerging virus well known as the major cause of the worldwide pandemic due to Coronavirus Disease 2019 (COVID-19). Major breakthroughs in the Next Generation Sequencing (NGS) field were elucidated following the first release of a full-length SARS-CoV-2 genome on the 10 January 2020, with the hope of turning the table against the worsening pandemic situation. Previous studies in respiratory virus characterization require mapping of raw sequences to the human genome in the downstream bioinformatics pipeline as part of metagenomic principles. Illumina, as the major player in the NGS arena, took action by releasing guidelines for improved enrichment kits called the Respiratory Virus Oligo Panel (RVOP) based on a hybridization capture method capable of capturing targeted respiratory viruses, including SARS-CoV-2; therefore, allowing a direct map of raw sequences data to SARS-CoV-2 genome in downstream bioinformatics pipeline. Consequently, two bioinformatics pipelines emerged with no previous studies benchmarking the pipelines. This study focuses on gaining insight and understanding of target enrichment workflow by Illumina through the utilization of different bioinformatics pipelines named as ‘Fast Pipeline’ and ‘Normal Pipeline’ to SARS-CoV-2 strains isolated from Yogyakarta and Central Java, Indonesia. Overall, both pipelines work well in the characterization of SARS-CoV-2 samples, including in the identification of major studied nucleotide substitutions and amino acid mutations. A higher number of reads mapped to the SARS-CoV-2 genome in Fast Pipeline and merely were discovered as a contributing factor in a higher number of coverage depth and identified variations (SNPs, insertion, and deletion). Fast Pipeline ultimately works well in a situation where time is a critical factor. On the other hand, Normal Pipeline would require a longer time as it mapped reads to the human genome. Certain limitations were identified in terms of pipeline algorithm, whereas it is highly recommended in future studies to design a pipeline in an integrated framework, for instance, by using NextFlow, a workflow framework to combine all scripts into one fully integrated pipeline. MDPI 2022-07-26 Article PeerReviewed application/pdf en https://repository.ugm.ac.id/278593/1/Afiahayati_MA.pdf AFIAHAYATI, AFIAHAYATI and Bernard, Stefanus and Gunadi, Gunadi and Wibawa, Hendra and Hakim, Mohamad Saifudin and Marcellus, Marcellus and Parikesit, Arli Aditya and Dewa, Chandra Kusuma and Sakakibara, Yasubumi (2022) A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains. Genes, 13 (1330). pp. 1-24. ISSN 2073-4425 https://www.mdpi.com/journal/genes https://doi.org/10.3390/genes13081330
spellingShingle Computational Logic and Formal Languages
AFIAHAYATI, AFIAHAYATI
Bernard, Stefanus
Gunadi, Gunadi
Wibawa, Hendra
Hakim, Mohamad Saifudin
Marcellus, Marcellus
Parikesit, Arli Aditya
Dewa, Chandra Kusuma
Sakakibara, Yasubumi
A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains
title A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains
title_full A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains
title_fullStr A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains
title_full_unstemmed A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains
title_short A Comparison of Bioinformatics Pipelines for Enrichment Illumina Next Generation Sequencing Systems in Detecting SARS-CoV-2 Virus Strains
title_sort comparison of bioinformatics pipelines for enrichment illumina next generation sequencing systems in detecting sars cov 2 virus strains
topic Computational Logic and Formal Languages
url https://repository.ugm.ac.id/278593/1/Afiahayati_MA.pdf
work_keys_str_mv AT afiahayatiafiahayati acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT bernardstefanus acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT gunadigunadi acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT wibawahendra acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT hakimmohamadsaifudin acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT marcellusmarcellus acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT parikesitarliaditya acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT dewachandrakusuma acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT sakakibarayasubumi acomparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT afiahayatiafiahayati comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT bernardstefanus comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT gunadigunadi comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT wibawahendra comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT hakimmohamadsaifudin comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT marcellusmarcellus comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT parikesitarliaditya comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT dewachandrakusuma comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains
AT sakakibarayasubumi comparisonofbioinformaticspipelinesforenrichmentilluminanextgenerationsequencingsystemsindetectingsarscov2virusstrains