The K-mer antibiotic resistance gene variant analyzer (KARGVA)
Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. C...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2023-03-01
|
Series: | Frontiers in Microbiology |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fmicb.2023.1060891/full |
_version_ | 1811159002093977600 |
---|---|
author | Simone Marini Simone Marini Christina Boucher Noelle Noyes Mattia Prosperi |
author_facet | Simone Marini Simone Marini Christina Boucher Noelle Noyes Mattia Prosperi |
author_sort | Simone Marini |
collection | DOAJ |
description | Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. Curated ARG databases exist together with assorted ARG classification approaches (e.g., database alignment, machine learning). Besides ARGs that naturally occur in bacterial strains or are acquired through mobile elements, there are chromosomal genes that can render a bacterium resistant to antibiotics through point mutations, i.e., ARG variants (ARGVs). While ARG repositories also collect ARGVs, there are only a few tools that are able to identify ARGVs from metagenomics and high throughput sequencing data, with a number of limitations (e.g., pre-assembly, a posteriori verification of mutations, or specification of species). In this work we present the k-mer, i.e., strings of fixed length k, ARGV analyzer – KARGVA – an open-source, multi-platform tool that provides: (i) an ad hoc, large ARGV database derived from multiple sources; (ii) input capability for various types of high-throughput sequencing data; (iii) a three-way, hash-based, k-mer search setup to process data efficiently, linking k-mers to ARGVs, k-mers to point mutations, and ARGVs to k-mers, respectively; (iv) a statistical filter on sequence classification to reduce type I and II errors. On semi-synthetic data, KARGVA provides very high accuracy even in presence of high sequencing errors or mutations (99.2 and 86.6% accuracy within 1 and 5% base change rates, respectively), and genome rearrangements (98.2% accuracy), with robust performance on ad hoc false positive sets. On data from the worldwide MetaSUB consortium, comprising 3,700+ metagenomics experiments, KARGVA identifies more ARGVs than Resistance Gene Identifier (4.8x) and PointFinder (6.8x), yet all predictions are below the expected false positive estimates. The prevalence of ARGVs is correlated to ARGs but ecological characteristics do not explain well ARGV variance. KARGVA is publicly available at https://github.com/DataIntellSystLab/KARGVA under MIT license. |
first_indexed | 2024-04-10T05:33:47Z |
format | Article |
id | doaj.art-b8d723e56d6d4f91a59ba0acf7f7e4d9 |
institution | Directory Open Access Journal |
issn | 1664-302X |
language | English |
last_indexed | 2024-04-10T05:33:47Z |
publishDate | 2023-03-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Microbiology |
spelling | doaj.art-b8d723e56d6d4f91a59ba0acf7f7e4d92023-03-07T05:09:18ZengFrontiers Media S.A.Frontiers in Microbiology1664-302X2023-03-011410.3389/fmicb.2023.10608911060891The K-mer antibiotic resistance gene variant analyzer (KARGVA)Simone Marini0Simone Marini1Christina Boucher2Noelle Noyes3Mattia Prosperi4Department of Epidemiology, University of Florida, Gainesville, FL, United StatesDepartment of Pathology, University of Florida, Gainesville, FL, United StatesDepartment of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, United StatesDepartment of Veterinary Population Medicine, University of Minnesota, St. Paul, MN, United StatesDepartment of Epidemiology, University of Florida, Gainesville, FL, United StatesCharacterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. Curated ARG databases exist together with assorted ARG classification approaches (e.g., database alignment, machine learning). Besides ARGs that naturally occur in bacterial strains or are acquired through mobile elements, there are chromosomal genes that can render a bacterium resistant to antibiotics through point mutations, i.e., ARG variants (ARGVs). While ARG repositories also collect ARGVs, there are only a few tools that are able to identify ARGVs from metagenomics and high throughput sequencing data, with a number of limitations (e.g., pre-assembly, a posteriori verification of mutations, or specification of species). In this work we present the k-mer, i.e., strings of fixed length k, ARGV analyzer – KARGVA – an open-source, multi-platform tool that provides: (i) an ad hoc, large ARGV database derived from multiple sources; (ii) input capability for various types of high-throughput sequencing data; (iii) a three-way, hash-based, k-mer search setup to process data efficiently, linking k-mers to ARGVs, k-mers to point mutations, and ARGVs to k-mers, respectively; (iv) a statistical filter on sequence classification to reduce type I and II errors. On semi-synthetic data, KARGVA provides very high accuracy even in presence of high sequencing errors or mutations (99.2 and 86.6% accuracy within 1 and 5% base change rates, respectively), and genome rearrangements (98.2% accuracy), with robust performance on ad hoc false positive sets. On data from the worldwide MetaSUB consortium, comprising 3,700+ metagenomics experiments, KARGVA identifies more ARGVs than Resistance Gene Identifier (4.8x) and PointFinder (6.8x), yet all predictions are below the expected false positive estimates. The prevalence of ARGVs is correlated to ARGs but ecological characteristics do not explain well ARGV variance. KARGVA is publicly available at https://github.com/DataIntellSystLab/KARGVA under MIT license.https://www.frontiersin.org/articles/10.3389/fmicb.2023.1060891/fullantibiotic resistancegene variantsmetagenomicshigh-throughput sequencingbioinformaticsstatistical learning |
spellingShingle | Simone Marini Simone Marini Christina Boucher Noelle Noyes Mattia Prosperi The K-mer antibiotic resistance gene variant analyzer (KARGVA) Frontiers in Microbiology antibiotic resistance gene variants metagenomics high-throughput sequencing bioinformatics statistical learning |
title | The K-mer antibiotic resistance gene variant analyzer (KARGVA) |
title_full | The K-mer antibiotic resistance gene variant analyzer (KARGVA) |
title_fullStr | The K-mer antibiotic resistance gene variant analyzer (KARGVA) |
title_full_unstemmed | The K-mer antibiotic resistance gene variant analyzer (KARGVA) |
title_short | The K-mer antibiotic resistance gene variant analyzer (KARGVA) |
title_sort | k mer antibiotic resistance gene variant analyzer kargva |
topic | antibiotic resistance gene variants metagenomics high-throughput sequencing bioinformatics statistical learning |
url | https://www.frontiersin.org/articles/10.3389/fmicb.2023.1060891/full |
work_keys_str_mv | AT simonemarini thekmerantibioticresistancegenevariantanalyzerkargva AT simonemarini thekmerantibioticresistancegenevariantanalyzerkargva AT christinaboucher thekmerantibioticresistancegenevariantanalyzerkargva AT noellenoyes thekmerantibioticresistancegenevariantanalyzerkargva AT mattiaprosperi thekmerantibioticresistancegenevariantanalyzerkargva AT simonemarini kmerantibioticresistancegenevariantanalyzerkargva AT simonemarini kmerantibioticresistancegenevariantanalyzerkargva AT christinaboucher kmerantibioticresistancegenevariantanalyzerkargva AT noellenoyes kmerantibioticresistancegenevariantanalyzerkargva AT mattiaprosperi kmerantibioticresistancegenevariantanalyzerkargva |