Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction
The prediction of antimicrobial resistance (AMR) based on genomic information can improve patient outcomes. Genetic mechanisms have been shown to explain AMR with accuracies in line with standard microbiology laboratory testing. To translate genetic mechanisms into phenotypic AMR, machine learning h...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-12-01
|
Series: | International Journal of Molecular Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/1422-0067/22/23/13049 |
_version_ | 1797507714175205376 |
---|---|
author | Peter Májek Lukas Lüftinger Stephan Beisken Thomas Rattei Arne Materna |
author_facet | Peter Májek Lukas Lüftinger Stephan Beisken Thomas Rattei Arne Materna |
author_sort | Peter Májek |
collection | DOAJ |
description | The prediction of antimicrobial resistance (AMR) based on genomic information can improve patient outcomes. Genetic mechanisms have been shown to explain AMR with accuracies in line with standard microbiology laboratory testing. To translate genetic mechanisms into phenotypic AMR, machine learning has been successfully applied. AMR machine learning models typically use nucleotide k-mer counts to represent genomic sequences. While k-mer representation efficiently captures sequence variation, it also results in high-dimensional and sparse data. With limited training data available, achieving acceptable model performance or model interpretability is challenging. In this study, we explore the utility of feature engineering with several biologically relevant signals. We propose to predict the functional impact of observed mutations with PROVEAN to use the predicted impact as a new feature for each protein in an organism’s proteome. The addition of the new features was tested on a total of 19,521 isolates across nine clinically relevant pathogens and 30 different antibiotics. The new features significantly improved the predictive performance of trained AMR models for <i>Pseudomonas aeruginosa</i>, <i>Citrobacter freundii</i>, and <i>Escherichia coli</i>. The balanced accuracy of the respective models of those three pathogens improved by 6.0% on average. |
first_indexed | 2024-03-10T04:52:22Z |
format | Article |
id | doaj.art-80457dfda7e2469eb06d637b50e5a16b |
institution | Directory Open Access Journal |
issn | 1661-6596 1422-0067 |
language | English |
last_indexed | 2024-03-10T04:52:22Z |
publishDate | 2021-12-01 |
publisher | MDPI AG |
record_format | Article |
series | International Journal of Molecular Sciences |
spelling | doaj.art-80457dfda7e2469eb06d637b50e5a16b2023-11-23T02:32:20ZengMDPI AGInternational Journal of Molecular Sciences1661-65961422-00672021-12-0122231304910.3390/ijms222313049Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance PredictionPeter Májek0Lukas Lüftinger1Stephan Beisken2Thomas Rattei3Arne Materna4Ares Genetics GmbH, Vienna 1030, AustriaAres Genetics GmbH, Vienna 1030, AustriaAres Genetics GmbH, Vienna 1030, AustriaCentre for Microbiology and Environmental Systems Science, Division of Computational Systems Biology, University of Vienna, Vienna 1030, AustriaAres Genetics GmbH, Vienna 1030, AustriaThe prediction of antimicrobial resistance (AMR) based on genomic information can improve patient outcomes. Genetic mechanisms have been shown to explain AMR with accuracies in line with standard microbiology laboratory testing. To translate genetic mechanisms into phenotypic AMR, machine learning has been successfully applied. AMR machine learning models typically use nucleotide k-mer counts to represent genomic sequences. While k-mer representation efficiently captures sequence variation, it also results in high-dimensional and sparse data. With limited training data available, achieving acceptable model performance or model interpretability is challenging. In this study, we explore the utility of feature engineering with several biologically relevant signals. We propose to predict the functional impact of observed mutations with PROVEAN to use the predicted impact as a new feature for each protein in an organism’s proteome. The addition of the new features was tested on a total of 19,521 isolates across nine clinically relevant pathogens and 30 different antibiotics. The new features significantly improved the predictive performance of trained AMR models for <i>Pseudomonas aeruginosa</i>, <i>Citrobacter freundii</i>, and <i>Escherichia coli</i>. The balanced accuracy of the respective models of those three pathogens improved by 6.0% on average.https://www.mdpi.com/1422-0067/22/23/13049machine learninggenomicsantimicrobial resistanceantibioticsWGSgenome-wide mutation scoring |
spellingShingle | Peter Májek Lukas Lüftinger Stephan Beisken Thomas Rattei Arne Materna Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction International Journal of Molecular Sciences machine learning genomics antimicrobial resistance antibiotics WGS genome-wide mutation scoring |
title | Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction |
title_full | Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction |
title_fullStr | Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction |
title_full_unstemmed | Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction |
title_short | Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction |
title_sort | genome wide mutation scoring for machine learning based antimicrobial resistance prediction |
topic | machine learning genomics antimicrobial resistance antibiotics WGS genome-wide mutation scoring |
url | https://www.mdpi.com/1422-0067/22/23/13049 |
work_keys_str_mv | AT petermajek genomewidemutationscoringformachinelearningbasedantimicrobialresistanceprediction AT lukasluftinger genomewidemutationscoringformachinelearningbasedantimicrobialresistanceprediction AT stephanbeisken genomewidemutationscoringformachinelearningbasedantimicrobialresistanceprediction AT thomasrattei genomewidemutationscoringformachinelearningbasedantimicrobialresistanceprediction AT arnematerna genomewidemutationscoringformachinelearningbasedantimicrobialresistanceprediction |