The K-mer antibiotic resistance gene variant analyzer (KARGVA)

Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. C...

Full description

Bibliographic Details
Main Authors: Simone Marini, Christina Boucher, Noelle Noyes, Mattia Prosperi
Format: Article
Language:English
Published: Frontiers Media S.A. 2023-03-01
Series:Frontiers in Microbiology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmicb.2023.1060891/full
_version_ 1811159002093977600
author Simone Marini
Simone Marini
Christina Boucher
Noelle Noyes
Mattia Prosperi
author_facet Simone Marini
Simone Marini
Christina Boucher
Noelle Noyes
Mattia Prosperi
author_sort Simone Marini
collection DOAJ
description Characterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. Curated ARG databases exist together with assorted ARG classification approaches (e.g., database alignment, machine learning). Besides ARGs that naturally occur in bacterial strains or are acquired through mobile elements, there are chromosomal genes that can render a bacterium resistant to antibiotics through point mutations, i.e., ARG variants (ARGVs). While ARG repositories also collect ARGVs, there are only a few tools that are able to identify ARGVs from metagenomics and high throughput sequencing data, with a number of limitations (e.g., pre-assembly, a posteriori verification of mutations, or specification of species). In this work we present the k-mer, i.e., strings of fixed length k, ARGV analyzer – KARGVA – an open-source, multi-platform tool that provides: (i) an ad hoc, large ARGV database derived from multiple sources; (ii) input capability for various types of high-throughput sequencing data; (iii) a three-way, hash-based, k-mer search setup to process data efficiently, linking k-mers to ARGVs, k-mers to point mutations, and ARGVs to k-mers, respectively; (iv) a statistical filter on sequence classification to reduce type I and II errors. On semi-synthetic data, KARGVA provides very high accuracy even in presence of high sequencing errors or mutations (99.2 and 86.6% accuracy within 1 and 5% base change rates, respectively), and genome rearrangements (98.2% accuracy), with robust performance on ad hoc false positive sets. On data from the worldwide MetaSUB consortium, comprising 3,700+ metagenomics experiments, KARGVA identifies more ARGVs than Resistance Gene Identifier (4.8x) and PointFinder (6.8x), yet all predictions are below the expected false positive estimates. The prevalence of ARGVs is correlated to ARGs but ecological characteristics do not explain well ARGV variance. KARGVA is publicly available at https://github.com/DataIntellSystLab/KARGVA under MIT license.
first_indexed 2024-04-10T05:33:47Z
format Article
id doaj.art-b8d723e56d6d4f91a59ba0acf7f7e4d9
institution Directory Open Access Journal
issn 1664-302X
language English
last_indexed 2024-04-10T05:33:47Z
publishDate 2023-03-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Microbiology
spelling doaj.art-b8d723e56d6d4f91a59ba0acf7f7e4d92023-03-07T05:09:18ZengFrontiers Media S.A.Frontiers in Microbiology1664-302X2023-03-011410.3389/fmicb.2023.10608911060891The K-mer antibiotic resistance gene variant analyzer (KARGVA)Simone Marini0Simone Marini1Christina Boucher2Noelle Noyes3Mattia Prosperi4Department of Epidemiology, University of Florida, Gainesville, FL, United StatesDepartment of Pathology, University of Florida, Gainesville, FL, United StatesDepartment of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, United StatesDepartment of Veterinary Population Medicine, University of Minnesota, St. Paul, MN, United StatesDepartment of Epidemiology, University of Florida, Gainesville, FL, United StatesCharacterization of antibiotic resistance genes (ARGs) from high-throughput sequencing data of metagenomics and cultured bacterial samples is a challenging task, with the need to account for both computational (e.g., string algorithms) and biological (e.g., gene transfers, rearrangements) aspects. Curated ARG databases exist together with assorted ARG classification approaches (e.g., database alignment, machine learning). Besides ARGs that naturally occur in bacterial strains or are acquired through mobile elements, there are chromosomal genes that can render a bacterium resistant to antibiotics through point mutations, i.e., ARG variants (ARGVs). While ARG repositories also collect ARGVs, there are only a few tools that are able to identify ARGVs from metagenomics and high throughput sequencing data, with a number of limitations (e.g., pre-assembly, a posteriori verification of mutations, or specification of species). In this work we present the k-mer, i.e., strings of fixed length k, ARGV analyzer – KARGVA – an open-source, multi-platform tool that provides: (i) an ad hoc, large ARGV database derived from multiple sources; (ii) input capability for various types of high-throughput sequencing data; (iii) a three-way, hash-based, k-mer search setup to process data efficiently, linking k-mers to ARGVs, k-mers to point mutations, and ARGVs to k-mers, respectively; (iv) a statistical filter on sequence classification to reduce type I and II errors. On semi-synthetic data, KARGVA provides very high accuracy even in presence of high sequencing errors or mutations (99.2 and 86.6% accuracy within 1 and 5% base change rates, respectively), and genome rearrangements (98.2% accuracy), with robust performance on ad hoc false positive sets. On data from the worldwide MetaSUB consortium, comprising 3,700+ metagenomics experiments, KARGVA identifies more ARGVs than Resistance Gene Identifier (4.8x) and PointFinder (6.8x), yet all predictions are below the expected false positive estimates. The prevalence of ARGVs is correlated to ARGs but ecological characteristics do not explain well ARGV variance. KARGVA is publicly available at https://github.com/DataIntellSystLab/KARGVA under MIT license.https://www.frontiersin.org/articles/10.3389/fmicb.2023.1060891/fullantibiotic resistancegene variantsmetagenomicshigh-throughput sequencingbioinformaticsstatistical learning
spellingShingle Simone Marini
Simone Marini
Christina Boucher
Noelle Noyes
Mattia Prosperi
The K-mer antibiotic resistance gene variant analyzer (KARGVA)
Frontiers in Microbiology
antibiotic resistance
gene variants
metagenomics
high-throughput sequencing
bioinformatics
statistical learning
title The K-mer antibiotic resistance gene variant analyzer (KARGVA)
title_full The K-mer antibiotic resistance gene variant analyzer (KARGVA)
title_fullStr The K-mer antibiotic resistance gene variant analyzer (KARGVA)
title_full_unstemmed The K-mer antibiotic resistance gene variant analyzer (KARGVA)
title_short The K-mer antibiotic resistance gene variant analyzer (KARGVA)
title_sort k mer antibiotic resistance gene variant analyzer kargva
topic antibiotic resistance
gene variants
metagenomics
high-throughput sequencing
bioinformatics
statistical learning
url https://www.frontiersin.org/articles/10.3389/fmicb.2023.1060891/full
work_keys_str_mv AT simonemarini thekmerantibioticresistancegenevariantanalyzerkargva
AT simonemarini thekmerantibioticresistancegenevariantanalyzerkargva
AT christinaboucher thekmerantibioticresistancegenevariantanalyzerkargva
AT noellenoyes thekmerantibioticresistancegenevariantanalyzerkargva
AT mattiaprosperi thekmerantibioticresistancegenevariantanalyzerkargva
AT simonemarini kmerantibioticresistancegenevariantanalyzerkargva
AT simonemarini kmerantibioticresistancegenevariantanalyzerkargva
AT christinaboucher kmerantibioticresistancegenevariantanalyzerkargva
AT noellenoyes kmerantibioticresistancegenevariantanalyzerkargva
AT mattiaprosperi kmerantibioticresistancegenevariantanalyzerkargva