Large-scale genomic analyses with machine learning uncover predictive patterns associated with fungal phytopathogenic lifestyles and traits

Abstract Invasive plant pathogenic fungi have a global impact, with devastating economic and environmental effects on crops and forests. Biosurveillance, a critical component of threat mitigation, requires risk prediction based on fungal lifestyles and traits. Recent studies have revealed distinct g...

Full description

Bibliographic Details
Main Authors: E. N. Dort, E. Layne, N. Feau, A. Butyaev, B. Henrissat, F. M. Martin, S. Haridas, A. Salamov, I. V. Grigoriev, M. Blanchette, R. C. Hamelin
Format: Article
Language:English
Published: Nature Portfolio 2023-10-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-44005-w
_version_ 1797452735518343168
author E. N. Dort
E. Layne
N. Feau
A. Butyaev
B. Henrissat
F. M. Martin
S. Haridas
A. Salamov
I. V. Grigoriev
M. Blanchette
R. C. Hamelin
author_facet E. N. Dort
E. Layne
N. Feau
A. Butyaev
B. Henrissat
F. M. Martin
S. Haridas
A. Salamov
I. V. Grigoriev
M. Blanchette
R. C. Hamelin
author_sort E. N. Dort
collection DOAJ
description Abstract Invasive plant pathogenic fungi have a global impact, with devastating economic and environmental effects on crops and forests. Biosurveillance, a critical component of threat mitigation, requires risk prediction based on fungal lifestyles and traits. Recent studies have revealed distinct genomic patterns associated with specific groups of plant pathogenic fungi. We sought to establish whether these phytopathogenic genomic patterns hold across diverse taxonomic and ecological groups from the Ascomycota and Basidiomycota, and furthermore, if those patterns can be used in a predictive capacity for biosurveillance. Using a supervised machine learning approach that integrates phylogenetic and genomic data, we analyzed 387 fungal genomes to test a proof-of-concept for the use of genomic signatures in predicting fungal phytopathogenic lifestyles and traits during biosurveillance activities. Our machine learning feature sets were derived from genome annotation data of carbohydrate-active enzymes (CAZymes), peptidases, secondary metabolite clusters (SMCs), transporters, and transcription factors. We found that machine learning could successfully predict fungal lifestyles and traits across taxonomic groups, with the best predictive performance coming from feature sets comprising CAZyme, peptidase, and SMC data. While phylogeny was an important component in most predictions, the inclusion of genomic data improved prediction performance for every lifestyle and trait tested. Plant pathogenicity was one of the best-predicted traits, showing the promise of predictive genomics for biosurveillance applications. Furthermore, our machine learning approach revealed expansions in the number of genes from specific CAZyme and peptidase families in the genomes of plant pathogens compared to non-phytopathogenic genomes (saprotrophs, endo- and ectomycorrhizal fungi). Such genomic feature profiles give insight into the evolution of fungal phytopathogenicity and could be useful to predict the risks of unknown fungi in future biosurveillance activities.
first_indexed 2024-03-09T15:12:57Z
format Article
id doaj.art-9fbf839202454de4826c70ec8aa714b0
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-09T15:12:57Z
publishDate 2023-10-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-9fbf839202454de4826c70ec8aa714b02023-11-26T13:17:41ZengNature PortfolioScientific Reports2045-23222023-10-0113111510.1038/s41598-023-44005-wLarge-scale genomic analyses with machine learning uncover predictive patterns associated with fungal phytopathogenic lifestyles and traitsE. N. Dort0E. Layne1N. Feau2A. Butyaev3B. Henrissat4F. M. Martin5S. Haridas6A. Salamov7I. V. Grigoriev8M. Blanchette9R. C. Hamelin10Department of Forest and Conservation Sciences, Faculty of Forestry, University of British ColumbiaSchool of Computer Science, McGill UniversityPacific Forestry Centre, Canadian Forest Service, Natural Resources CanadaSchool of Computer Science, McGill UniversityDepartment of Biotechnology and Biomedicine (DTU Bioengineering), Technical University of DenmarkInstitut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement, Unité Mixte de Recherche Interactions Arbres/Microorganismes, Centre INRAE, Grand Est-Nancy, Université de LorraineLawrence Berkeley National Laboratory, U.S. Department of Energy Joint Genome InstituteLawrence Berkeley National Laboratory, U.S. Department of Energy Joint Genome InstituteLawrence Berkeley National Laboratory, U.S. Department of Energy Joint Genome InstituteSchool of Computer Science, McGill UniversityDepartment of Forest and Conservation Sciences, Faculty of Forestry, University of British ColumbiaAbstract Invasive plant pathogenic fungi have a global impact, with devastating economic and environmental effects on crops and forests. Biosurveillance, a critical component of threat mitigation, requires risk prediction based on fungal lifestyles and traits. Recent studies have revealed distinct genomic patterns associated with specific groups of plant pathogenic fungi. We sought to establish whether these phytopathogenic genomic patterns hold across diverse taxonomic and ecological groups from the Ascomycota and Basidiomycota, and furthermore, if those patterns can be used in a predictive capacity for biosurveillance. Using a supervised machine learning approach that integrates phylogenetic and genomic data, we analyzed 387 fungal genomes to test a proof-of-concept for the use of genomic signatures in predicting fungal phytopathogenic lifestyles and traits during biosurveillance activities. Our machine learning feature sets were derived from genome annotation data of carbohydrate-active enzymes (CAZymes), peptidases, secondary metabolite clusters (SMCs), transporters, and transcription factors. We found that machine learning could successfully predict fungal lifestyles and traits across taxonomic groups, with the best predictive performance coming from feature sets comprising CAZyme, peptidase, and SMC data. While phylogeny was an important component in most predictions, the inclusion of genomic data improved prediction performance for every lifestyle and trait tested. Plant pathogenicity was one of the best-predicted traits, showing the promise of predictive genomics for biosurveillance applications. Furthermore, our machine learning approach revealed expansions in the number of genes from specific CAZyme and peptidase families in the genomes of plant pathogens compared to non-phytopathogenic genomes (saprotrophs, endo- and ectomycorrhizal fungi). Such genomic feature profiles give insight into the evolution of fungal phytopathogenicity and could be useful to predict the risks of unknown fungi in future biosurveillance activities.https://doi.org/10.1038/s41598-023-44005-w
spellingShingle E. N. Dort
E. Layne
N. Feau
A. Butyaev
B. Henrissat
F. M. Martin
S. Haridas
A. Salamov
I. V. Grigoriev
M. Blanchette
R. C. Hamelin
Large-scale genomic analyses with machine learning uncover predictive patterns associated with fungal phytopathogenic lifestyles and traits
Scientific Reports
title Large-scale genomic analyses with machine learning uncover predictive patterns associated with fungal phytopathogenic lifestyles and traits
title_full Large-scale genomic analyses with machine learning uncover predictive patterns associated with fungal phytopathogenic lifestyles and traits
title_fullStr Large-scale genomic analyses with machine learning uncover predictive patterns associated with fungal phytopathogenic lifestyles and traits
title_full_unstemmed Large-scale genomic analyses with machine learning uncover predictive patterns associated with fungal phytopathogenic lifestyles and traits
title_short Large-scale genomic analyses with machine learning uncover predictive patterns associated with fungal phytopathogenic lifestyles and traits
title_sort large scale genomic analyses with machine learning uncover predictive patterns associated with fungal phytopathogenic lifestyles and traits
url https://doi.org/10.1038/s41598-023-44005-w
work_keys_str_mv AT endort largescalegenomicanalyseswithmachinelearninguncoverpredictivepatternsassociatedwithfungalphytopathogeniclifestylesandtraits
AT elayne largescalegenomicanalyseswithmachinelearninguncoverpredictivepatternsassociatedwithfungalphytopathogeniclifestylesandtraits
AT nfeau largescalegenomicanalyseswithmachinelearninguncoverpredictivepatternsassociatedwithfungalphytopathogeniclifestylesandtraits
AT abutyaev largescalegenomicanalyseswithmachinelearninguncoverpredictivepatternsassociatedwithfungalphytopathogeniclifestylesandtraits
AT bhenrissat largescalegenomicanalyseswithmachinelearninguncoverpredictivepatternsassociatedwithfungalphytopathogeniclifestylesandtraits
AT fmmartin largescalegenomicanalyseswithmachinelearninguncoverpredictivepatternsassociatedwithfungalphytopathogeniclifestylesandtraits
AT sharidas largescalegenomicanalyseswithmachinelearninguncoverpredictivepatternsassociatedwithfungalphytopathogeniclifestylesandtraits
AT asalamov largescalegenomicanalyseswithmachinelearninguncoverpredictivepatternsassociatedwithfungalphytopathogeniclifestylesandtraits
AT ivgrigoriev largescalegenomicanalyseswithmachinelearninguncoverpredictivepatternsassociatedwithfungalphytopathogeniclifestylesandtraits
AT mblanchette largescalegenomicanalyseswithmachinelearninguncoverpredictivepatternsassociatedwithfungalphytopathogeniclifestylesandtraits
AT rchamelin largescalegenomicanalyseswithmachinelearninguncoverpredictivepatternsassociatedwithfungalphytopathogeniclifestylesandtraits