Nanopore sequencing data analysis using Microsoft Azure cloud computing service

Genetic information provides insights into the exome, genome, epigenetics and structural organisation of the organism. Given the enormous amount of genetic information, scientists are able to perform mammoth tasks to improve the standard of health care such as determining genetic influences on outco...

Full description

Bibliographic Details
Main Authors: Linh Truong, Felipe Ayora, Lloyd D’Orsogna, Patricia Martinez, Dianne De Santis
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2022-01-01
Series:PLoS ONE
Online Access:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9718390/?tool=EBI
_version_ 1811178976427638784
author Linh Truong
Felipe Ayora
Lloyd D’Orsogna
Patricia Martinez
Dianne De Santis
author_facet Linh Truong
Felipe Ayora
Lloyd D’Orsogna
Patricia Martinez
Dianne De Santis
author_sort Linh Truong
collection DOAJ
description Genetic information provides insights into the exome, genome, epigenetics and structural organisation of the organism. Given the enormous amount of genetic information, scientists are able to perform mammoth tasks to improve the standard of health care such as determining genetic influences on outcome of allogeneic transplantation. Cloud based computing has increasingly become a key choice for many scientists, engineers and institutions as it offers on-demand network access and users can conveniently rent rather than buy all required computing resources. With the positive advancements of cloud computing and nanopore sequencing data output, we were motivated to develop an automated and scalable analysis pipeline utilizing cloud infrastructure in Microsoft Azure to accelerate HLA genotyping service and improve the efficiency of the workflow at lower cost. In this study, we describe (i) the selection process for suitable virtual machine sizes for computing resources to balance between the best performance versus cost effectiveness; (ii) the building of Docker containers to include all tools in the cloud computational environment; (iii) the comparison of HLA genotype concordance between the in-house manual method and the automated cloud-based pipeline to assess data accuracy. In conclusion, the Microsoft Azure cloud based data analysis pipeline was shown to meet all the key imperatives for performance, cost, usability, simplicity and accuracy. Importantly, the pipeline allows for the on-going maintenance and testing of version changes before implementation. This pipeline is suitable for the data analysis from MinION sequencing platform and could be adopted for other data analysis application processes.
first_indexed 2024-04-11T06:28:06Z
format Article
id doaj.art-31185b59520f403cadf709c4ec2834a8
institution Directory Open Access Journal
issn 1932-6203
language English
last_indexed 2024-04-11T06:28:06Z
publishDate 2022-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj.art-31185b59520f403cadf709c4ec2834a82022-12-22T04:40:16ZengPublic Library of Science (PLoS)PLoS ONE1932-62032022-01-011712Nanopore sequencing data analysis using Microsoft Azure cloud computing serviceLinh TruongFelipe AyoraLloyd D’OrsognaPatricia MartinezDianne De SantisGenetic information provides insights into the exome, genome, epigenetics and structural organisation of the organism. Given the enormous amount of genetic information, scientists are able to perform mammoth tasks to improve the standard of health care such as determining genetic influences on outcome of allogeneic transplantation. Cloud based computing has increasingly become a key choice for many scientists, engineers and institutions as it offers on-demand network access and users can conveniently rent rather than buy all required computing resources. With the positive advancements of cloud computing and nanopore sequencing data output, we were motivated to develop an automated and scalable analysis pipeline utilizing cloud infrastructure in Microsoft Azure to accelerate HLA genotyping service and improve the efficiency of the workflow at lower cost. In this study, we describe (i) the selection process for suitable virtual machine sizes for computing resources to balance between the best performance versus cost effectiveness; (ii) the building of Docker containers to include all tools in the cloud computational environment; (iii) the comparison of HLA genotype concordance between the in-house manual method and the automated cloud-based pipeline to assess data accuracy. In conclusion, the Microsoft Azure cloud based data analysis pipeline was shown to meet all the key imperatives for performance, cost, usability, simplicity and accuracy. Importantly, the pipeline allows for the on-going maintenance and testing of version changes before implementation. This pipeline is suitable for the data analysis from MinION sequencing platform and could be adopted for other data analysis application processes.https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9718390/?tool=EBI
spellingShingle Linh Truong
Felipe Ayora
Lloyd D’Orsogna
Patricia Martinez
Dianne De Santis
Nanopore sequencing data analysis using Microsoft Azure cloud computing service
PLoS ONE
title Nanopore sequencing data analysis using Microsoft Azure cloud computing service
title_full Nanopore sequencing data analysis using Microsoft Azure cloud computing service
title_fullStr Nanopore sequencing data analysis using Microsoft Azure cloud computing service
title_full_unstemmed Nanopore sequencing data analysis using Microsoft Azure cloud computing service
title_short Nanopore sequencing data analysis using Microsoft Azure cloud computing service
title_sort nanopore sequencing data analysis using microsoft azure cloud computing service
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9718390/?tool=EBI
work_keys_str_mv AT linhtruong nanoporesequencingdataanalysisusingmicrosoftazurecloudcomputingservice
AT felipeayora nanoporesequencingdataanalysisusingmicrosoftazurecloudcomputingservice
AT lloyddorsogna nanoporesequencingdataanalysisusingmicrosoftazurecloudcomputingservice
AT patriciamartinez nanoporesequencingdataanalysisusingmicrosoftazurecloudcomputingservice
AT diannedesantis nanoporesequencingdataanalysisusingmicrosoftazurecloudcomputingservice