Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction

The prediction of antimicrobial resistance (AMR) based on genomic information can improve patient outcomes. Genetic mechanisms have been shown to explain AMR with accuracies in line with standard microbiology laboratory testing. To translate genetic mechanisms into phenotypic AMR, machine learning h...

Full description

Bibliographic Details
Main Authors: Peter Májek, Lukas Lüftinger, Stephan Beisken, Thomas Rattei, Arne Materna
Format: Article
Language:English
Published: MDPI AG 2021-12-01
Series:International Journal of Molecular Sciences
Subjects:
Online Access:https://www.mdpi.com/1422-0067/22/23/13049
_version_ 1797507714175205376
author Peter Májek
Lukas Lüftinger
Stephan Beisken
Thomas Rattei
Arne Materna
author_facet Peter Májek
Lukas Lüftinger
Stephan Beisken
Thomas Rattei
Arne Materna
author_sort Peter Májek
collection DOAJ
description The prediction of antimicrobial resistance (AMR) based on genomic information can improve patient outcomes. Genetic mechanisms have been shown to explain AMR with accuracies in line with standard microbiology laboratory testing. To translate genetic mechanisms into phenotypic AMR, machine learning has been successfully applied. AMR machine learning models typically use nucleotide k-mer counts to represent genomic sequences. While k-mer representation efficiently captures sequence variation, it also results in high-dimensional and sparse data. With limited training data available, achieving acceptable model performance or model interpretability is challenging. In this study, we explore the utility of feature engineering with several biologically relevant signals. We propose to predict the functional impact of observed mutations with PROVEAN to use the predicted impact as a new feature for each protein in an organism’s proteome. The addition of the new features was tested on a total of 19,521 isolates across nine clinically relevant pathogens and 30 different antibiotics. The new features significantly improved the predictive performance of trained AMR models for <i>Pseudomonas aeruginosa</i>, <i>Citrobacter freundii</i>, and <i>Escherichia coli</i>. The balanced accuracy of the respective models of those three pathogens improved by 6.0% on average.
first_indexed 2024-03-10T04:52:22Z
format Article
id doaj.art-80457dfda7e2469eb06d637b50e5a16b
institution Directory Open Access Journal
issn 1661-6596
1422-0067
language English
last_indexed 2024-03-10T04:52:22Z
publishDate 2021-12-01
publisher MDPI AG
record_format Article
series International Journal of Molecular Sciences
spelling doaj.art-80457dfda7e2469eb06d637b50e5a16b2023-11-23T02:32:20ZengMDPI AGInternational Journal of Molecular Sciences1661-65961422-00672021-12-0122231304910.3390/ijms222313049Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance PredictionPeter Májek0Lukas Lüftinger1Stephan Beisken2Thomas Rattei3Arne Materna4Ares Genetics GmbH, Vienna 1030, AustriaAres Genetics GmbH, Vienna 1030, AustriaAres Genetics GmbH, Vienna 1030, AustriaCentre for Microbiology and Environmental Systems Science, Division of Computational Systems Biology, University of Vienna, Vienna 1030, AustriaAres Genetics GmbH, Vienna 1030, AustriaThe prediction of antimicrobial resistance (AMR) based on genomic information can improve patient outcomes. Genetic mechanisms have been shown to explain AMR with accuracies in line with standard microbiology laboratory testing. To translate genetic mechanisms into phenotypic AMR, machine learning has been successfully applied. AMR machine learning models typically use nucleotide k-mer counts to represent genomic sequences. While k-mer representation efficiently captures sequence variation, it also results in high-dimensional and sparse data. With limited training data available, achieving acceptable model performance or model interpretability is challenging. In this study, we explore the utility of feature engineering with several biologically relevant signals. We propose to predict the functional impact of observed mutations with PROVEAN to use the predicted impact as a new feature for each protein in an organism’s proteome. The addition of the new features was tested on a total of 19,521 isolates across nine clinically relevant pathogens and 30 different antibiotics. The new features significantly improved the predictive performance of trained AMR models for <i>Pseudomonas aeruginosa</i>, <i>Citrobacter freundii</i>, and <i>Escherichia coli</i>. The balanced accuracy of the respective models of those three pathogens improved by 6.0% on average.https://www.mdpi.com/1422-0067/22/23/13049machine learninggenomicsantimicrobial resistanceantibioticsWGSgenome-wide mutation scoring
spellingShingle Peter Májek
Lukas Lüftinger
Stephan Beisken
Thomas Rattei
Arne Materna
Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction
International Journal of Molecular Sciences
machine learning
genomics
antimicrobial resistance
antibiotics
WGS
genome-wide mutation scoring
title Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction
title_full Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction
title_fullStr Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction
title_full_unstemmed Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction
title_short Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction
title_sort genome wide mutation scoring for machine learning based antimicrobial resistance prediction
topic machine learning
genomics
antimicrobial resistance
antibiotics
WGS
genome-wide mutation scoring
url https://www.mdpi.com/1422-0067/22/23/13049
work_keys_str_mv AT petermajek genomewidemutationscoringformachinelearningbasedantimicrobialresistanceprediction
AT lukasluftinger genomewidemutationscoringformachinelearningbasedantimicrobialresistanceprediction
AT stephanbeisken genomewidemutationscoringformachinelearningbasedantimicrobialresistanceprediction
AT thomasrattei genomewidemutationscoringformachinelearningbasedantimicrobialresistanceprediction
AT arnematerna genomewidemutationscoringformachinelearningbasedantimicrobialresistanceprediction