Identification of highly variable sequence fragments in unmapped reads for rapid bacterial genotyping

Abstract Background Bacterial genotyping is a crucial process in outbreak investigation and epidemiological studies. Several typing methods such as pulsed-field gel electrophoresis, multilocus sequence typing (MLST) and whole genome sequencing are currently used in routine clinical practice. However...

Full description

Bibliographic Details
Main Authors: Marketa Nykrynova, Vojtech Barton, Matej Bezdicek, Martina Lengerova, Helena Skutkova
Format: Article
Language:English
Published: BMC 2022-12-01
Series:BMC Genomics
Subjects:
Online Access:https://doi.org/10.1186/s12864-022-08550-4
_version_ 1797973681284055040
author Marketa Nykrynova
Vojtech Barton
Matej Bezdicek
Martina Lengerova
Helena Skutkova
author_facet Marketa Nykrynova
Vojtech Barton
Matej Bezdicek
Martina Lengerova
Helena Skutkova
author_sort Marketa Nykrynova
collection DOAJ
description Abstract Background Bacterial genotyping is a crucial process in outbreak investigation and epidemiological studies. Several typing methods such as pulsed-field gel electrophoresis, multilocus sequence typing (MLST) and whole genome sequencing are currently used in routine clinical practice. However, these methods are costly, time-consuming and have high computational demands. An alternative to these methods is mini-MLST, a quick, cost-effective and robust method based on high-resolution melting analysis. Nevertheless, no standardized approach to identify markers suitable for mini-MLST exists. Here, we present a pipeline for variable fragment detection in unmapped reads based on a modified hybrid assembly approach using data from one sequencing platform. Results In routine assembly against the reference sequence, high variable reads are not aligned and remain unmapped. If de novo assembly of them is performed, variable genomic regions can be located in created scaffolds. Based on the variability rates calculation, it is possible to find a highly variable region with the same discriminatory power as seven housekeeping gene fragments used in MLST. In the work presented here, we show the capability of identifying one variable fragment in de novo assembled scaffolds of 21 Escherichia coli genomes and three variable regions in scaffolds of 31 Klebsiella pneumoniae genomes. For each identified fragment, the melting temperatures are calculated based on the nearest neighbor method to verify the mini-MLST’s discriminatory power. Conclusions A pipeline for a modified hybrid assembly approach consisting of reference-based mapping and de novo assembly of unmapped reads is presented. This approach can be employed for the identification of highly variable genomic fragments in unmapped reads. The identified variable regions can then be used in efficient laboratory methods for bacterial typing such as mini-MLST with high discriminatory power, fully replacing expensive methods such as MLST. The results can and will be delivered in a shorter time, which allows immediate and fast infection monitoring in clinical practice.
first_indexed 2024-04-11T04:08:11Z
format Article
id doaj.art-1730819455ee4403b41293743281dad9
institution Directory Open Access Journal
issn 1471-2164
language English
last_indexed 2024-04-11T04:08:11Z
publishDate 2022-12-01
publisher BMC
record_format Article
series BMC Genomics
spelling doaj.art-1730819455ee4403b41293743281dad92023-01-01T12:13:48ZengBMCBMC Genomics1471-21642022-12-0123S311210.1186/s12864-022-08550-4Identification of highly variable sequence fragments in unmapped reads for rapid bacterial genotypingMarketa Nykrynova0Vojtech Barton1Matej Bezdicek2Martina Lengerova3Helena Skutkova4Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of TechnologyDepartment of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of TechnologyDepartment of Internal Medicine, Hematology and Oncology, University Hospital BrnoDepartment of Internal Medicine, Hematology and Oncology, University Hospital BrnoDepartment of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of TechnologyAbstract Background Bacterial genotyping is a crucial process in outbreak investigation and epidemiological studies. Several typing methods such as pulsed-field gel electrophoresis, multilocus sequence typing (MLST) and whole genome sequencing are currently used in routine clinical practice. However, these methods are costly, time-consuming and have high computational demands. An alternative to these methods is mini-MLST, a quick, cost-effective and robust method based on high-resolution melting analysis. Nevertheless, no standardized approach to identify markers suitable for mini-MLST exists. Here, we present a pipeline for variable fragment detection in unmapped reads based on a modified hybrid assembly approach using data from one sequencing platform. Results In routine assembly against the reference sequence, high variable reads are not aligned and remain unmapped. If de novo assembly of them is performed, variable genomic regions can be located in created scaffolds. Based on the variability rates calculation, it is possible to find a highly variable region with the same discriminatory power as seven housekeeping gene fragments used in MLST. In the work presented here, we show the capability of identifying one variable fragment in de novo assembled scaffolds of 21 Escherichia coli genomes and three variable regions in scaffolds of 31 Klebsiella pneumoniae genomes. For each identified fragment, the melting temperatures are calculated based on the nearest neighbor method to verify the mini-MLST’s discriminatory power. Conclusions A pipeline for a modified hybrid assembly approach consisting of reference-based mapping and de novo assembly of unmapped reads is presented. This approach can be employed for the identification of highly variable genomic fragments in unmapped reads. The identified variable regions can then be used in efficient laboratory methods for bacterial typing such as mini-MLST with high discriminatory power, fully replacing expensive methods such as MLST. The results can and will be delivered in a shorter time, which allows immediate and fast infection monitoring in clinical practice.https://doi.org/10.1186/s12864-022-08550-4Bacterial genotypingGenome assemblyUnmapped readsDe novo assemblyMultilocus sequence typingMini-MLST
spellingShingle Marketa Nykrynova
Vojtech Barton
Matej Bezdicek
Martina Lengerova
Helena Skutkova
Identification of highly variable sequence fragments in unmapped reads for rapid bacterial genotyping
BMC Genomics
Bacterial genotyping
Genome assembly
Unmapped reads
De novo assembly
Multilocus sequence typing
Mini-MLST
title Identification of highly variable sequence fragments in unmapped reads for rapid bacterial genotyping
title_full Identification of highly variable sequence fragments in unmapped reads for rapid bacterial genotyping
title_fullStr Identification of highly variable sequence fragments in unmapped reads for rapid bacterial genotyping
title_full_unstemmed Identification of highly variable sequence fragments in unmapped reads for rapid bacterial genotyping
title_short Identification of highly variable sequence fragments in unmapped reads for rapid bacterial genotyping
title_sort identification of highly variable sequence fragments in unmapped reads for rapid bacterial genotyping
topic Bacterial genotyping
Genome assembly
Unmapped reads
De novo assembly
Multilocus sequence typing
Mini-MLST
url https://doi.org/10.1186/s12864-022-08550-4
work_keys_str_mv AT marketanykrynova identificationofhighlyvariablesequencefragmentsinunmappedreadsforrapidbacterialgenotyping
AT vojtechbarton identificationofhighlyvariablesequencefragmentsinunmappedreadsforrapidbacterialgenotyping
AT matejbezdicek identificationofhighlyvariablesequencefragmentsinunmappedreadsforrapidbacterialgenotyping
AT martinalengerova identificationofhighlyvariablesequencefragmentsinunmappedreadsforrapidbacterialgenotyping
AT helenaskutkova identificationofhighlyvariablesequencefragmentsinunmappedreadsforrapidbacterialgenotyping