Automated quality control for a molecular surveillance system

Abstract Background Molecular surveillance and outbreak investigation are important for elimination of hepatitis C virus (HCV) infection in the United States. A web-based system, Global Hepatitis Outbreak and Surveillance Technology (GHOST), has been developed using Illumina MiSeq-based amplicon seq...

Full description

Bibliographic Details
Main Authors: Seth Sims, Atkinson G. Longmire, David S. Campo, Sumathi Ramachandran, Magdalena Medrzycki, Lilia Ganova-Raeva, Yulin Lin, Amanda Sue, Hong Thai, Alexander Zelikovsky, Yury Khudyakov
Format: Article
Language:English
Published: BMC 2018-10-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2329-5
_version_ 1818941087698386944
author Seth Sims
Atkinson G. Longmire
David S. Campo
Sumathi Ramachandran
Magdalena Medrzycki
Lilia Ganova-Raeva
Yulin Lin
Amanda Sue
Hong Thai
Alexander Zelikovsky
Yury Khudyakov
author_facet Seth Sims
Atkinson G. Longmire
David S. Campo
Sumathi Ramachandran
Magdalena Medrzycki
Lilia Ganova-Raeva
Yulin Lin
Amanda Sue
Hong Thai
Alexander Zelikovsky
Yury Khudyakov
author_sort Seth Sims
collection DOAJ
description Abstract Background Molecular surveillance and outbreak investigation are important for elimination of hepatitis C virus (HCV) infection in the United States. A web-based system, Global Hepatitis Outbreak and Surveillance Technology (GHOST), has been developed using Illumina MiSeq-based amplicon sequence data derived from the HCV E1/E2-junction genomic region to enable public health institutions to conduct cost-effective and accurate molecular surveillance, outbreak detection and strain characterization. However, as there are many factors that could impact input data quality to which the GHOST system is not completely immune, accuracy of epidemiological inferences generated by GHOST may be affected. Here, we analyze the data submitted to the GHOST system during its pilot phase to assess the nature of the data and to identify common quality concerns that can be detected and corrected automatically. Results The GHOST quality control filters were individually examined, and quality failure rates were measured for all samples, including negative controls. New filters were developed and introduced to detect primer dimers, loss of specimen-specific product, or short products. The genotyping tool was adjusted to improve the accuracy of subtype calls. The identification of “chordless” cycles in a transmission network from data generated with known laboratory-based quality concerns allowed for further improvement of transmission detection by GHOST in surveillance settings. Parameters derived to detect actionable common quality control anomalies were incorporated into the automatic quality control module that rejects data depending on the magnitude of a quality problem, and warns and guides users in performing correctional actions. The guiding responses generated by the system are tailored to the GHOST laboratory protocol. Conclusions Several new quality control problems were identified in MiSeq data submitted to GHOST and used to improve protection of the system from erroneous data and users from erroneous inferences. The GHOST system was upgraded to include identification of causes of erroneous data and recommendation of corrective actions to laboratory users.
first_indexed 2024-12-20T06:49:58Z
format Article
id doaj.art-0123f253243749639ba01dfc5fb62ee5
institution Directory Open Access Journal
issn 1471-2105
language English
last_indexed 2024-12-20T06:49:58Z
publishDate 2018-10-01
publisher BMC
record_format Article
series BMC Bioinformatics
spelling doaj.art-0123f253243749639ba01dfc5fb62ee52022-12-21T19:49:33ZengBMCBMC Bioinformatics1471-21052018-10-0119S1111510.1186/s12859-018-2329-5Automated quality control for a molecular surveillance systemSeth Sims0Atkinson G. Longmire1David S. Campo2Sumathi Ramachandran3Magdalena Medrzycki4Lilia Ganova-Raeva5Yulin Lin6Amanda Sue7Hong Thai8Alexander Zelikovsky9Yury Khudyakov10Division of Viral Hepatitis, Centers for Disease Control and PreventionDivision of Viral Hepatitis, Centers for Disease Control and PreventionDivision of Viral Hepatitis, Centers for Disease Control and PreventionDivision of Viral Hepatitis, Centers for Disease Control and PreventionDivision of Viral Hepatitis, Centers for Disease Control and PreventionDivision of Viral Hepatitis, Centers for Disease Control and PreventionDivision of Viral Hepatitis, Centers for Disease Control and PreventionDivision of Viral Hepatitis, Centers for Disease Control and PreventionDivision of Viral Hepatitis, Centers for Disease Control and PreventionDepartment of Computer Science, Georgia State UniversityDivision of Viral Hepatitis, Centers for Disease Control and PreventionAbstract Background Molecular surveillance and outbreak investigation are important for elimination of hepatitis C virus (HCV) infection in the United States. A web-based system, Global Hepatitis Outbreak and Surveillance Technology (GHOST), has been developed using Illumina MiSeq-based amplicon sequence data derived from the HCV E1/E2-junction genomic region to enable public health institutions to conduct cost-effective and accurate molecular surveillance, outbreak detection and strain characterization. However, as there are many factors that could impact input data quality to which the GHOST system is not completely immune, accuracy of epidemiological inferences generated by GHOST may be affected. Here, we analyze the data submitted to the GHOST system during its pilot phase to assess the nature of the data and to identify common quality concerns that can be detected and corrected automatically. Results The GHOST quality control filters were individually examined, and quality failure rates were measured for all samples, including negative controls. New filters were developed and introduced to detect primer dimers, loss of specimen-specific product, or short products. The genotyping tool was adjusted to improve the accuracy of subtype calls. The identification of “chordless” cycles in a transmission network from data generated with known laboratory-based quality concerns allowed for further improvement of transmission detection by GHOST in surveillance settings. Parameters derived to detect actionable common quality control anomalies were incorporated into the automatic quality control module that rejects data depending on the magnitude of a quality problem, and warns and guides users in performing correctional actions. The guiding responses generated by the system are tailored to the GHOST laboratory protocol. Conclusions Several new quality control problems were identified in MiSeq data submitted to GHOST and used to improve protection of the system from erroneous data and users from erroneous inferences. The GHOST system was upgraded to include identification of causes of erroneous data and recommendation of corrective actions to laboratory users.http://link.springer.com/article/10.1186/s12859-018-2329-5HVR1HCVTransmissionOutbreak detectionMolecular surveillanceQuality control
spellingShingle Seth Sims
Atkinson G. Longmire
David S. Campo
Sumathi Ramachandran
Magdalena Medrzycki
Lilia Ganova-Raeva
Yulin Lin
Amanda Sue
Hong Thai
Alexander Zelikovsky
Yury Khudyakov
Automated quality control for a molecular surveillance system
BMC Bioinformatics
HVR1
HCV
Transmission
Outbreak detection
Molecular surveillance
Quality control
title Automated quality control for a molecular surveillance system
title_full Automated quality control for a molecular surveillance system
title_fullStr Automated quality control for a molecular surveillance system
title_full_unstemmed Automated quality control for a molecular surveillance system
title_short Automated quality control for a molecular surveillance system
title_sort automated quality control for a molecular surveillance system
topic HVR1
HCV
Transmission
Outbreak detection
Molecular surveillance
Quality control
url http://link.springer.com/article/10.1186/s12859-018-2329-5
work_keys_str_mv AT sethsims automatedqualitycontrolforamolecularsurveillancesystem
AT atkinsonglongmire automatedqualitycontrolforamolecularsurveillancesystem
AT davidscampo automatedqualitycontrolforamolecularsurveillancesystem
AT sumathiramachandran automatedqualitycontrolforamolecularsurveillancesystem
AT magdalenamedrzycki automatedqualitycontrolforamolecularsurveillancesystem
AT liliaganovaraeva automatedqualitycontrolforamolecularsurveillancesystem
AT yulinlin automatedqualitycontrolforamolecularsurveillancesystem
AT amandasue automatedqualitycontrolforamolecularsurveillancesystem
AT hongthai automatedqualitycontrolforamolecularsurveillancesystem
AT alexanderzelikovsky automatedqualitycontrolforamolecularsurveillancesystem
AT yurykhudyakov automatedqualitycontrolforamolecularsurveillancesystem