Enhanced correlation-based linking of biosynthetic gene clusters to their metabolic products through chemical class matching

Abstract Background It is well-known that the microbiome produces a myriad of specialised metabolites with diverse functions. To better characterise their structures and identify their producers in complex samples, integrative genome and metabolome mining is becoming increasingly popular. Metabologe...

Full description

Bibliographic Details
Main Authors: Joris J. R. Louwen, Marnix H. Medema, Justin J. J. van der Hooft
Format: Article
Language:English
Published: BMC 2023-01-01
Series:Microbiome
Subjects:
Online Access:https://doi.org/10.1186/s40168-022-01444-3
_version_ 1811175749548244992
author Joris J. R. Louwen
Marnix H. Medema
Justin J. J. van der Hooft
author_facet Joris J. R. Louwen
Marnix H. Medema
Justin J. J. van der Hooft
author_sort Joris J. R. Louwen
collection DOAJ
description Abstract Background It is well-known that the microbiome produces a myriad of specialised metabolites with diverse functions. To better characterise their structures and identify their producers in complex samples, integrative genome and metabolome mining is becoming increasingly popular. Metabologenomic co-occurrence-based correlation scoring methods facilitate the linking of metabolite mass fragmentation spectra (MS/MS) to their cognate biosynthetic gene clusters (BGCs) based on shared absence/presence patterns of metabolites and BGCs in paired omics datasets of multiple strains. Recently, these methods have been made more readily accessible through the NPLinker platform. However, co-occurrence-based approaches usually result in too many candidate links to manually validate. To address this issue, we introduce a generic feature-based correlation method that matches chemical compound classes between BGCs and MS/MS spectra. Results To automatically reduce the long lists of potential BGC-MS/MS spectrum links, we match natural product (NP) ontologies previously independently developed for genomics and metabolomics and developed NPClassScore: an empirical class matching score that we also implemented in the NPLinker platform. By applying NPClassScore on three paired omics datasets totalling 189 bacterial strains, we show that the number of links is reduced by on average 63% as compared to using a co-occurrence-based strategy alone. We further demonstrate that 96% of experimentally validated links in these datasets are retained and prioritised when using NPClassScore. Conclusion The matching genome-metabolome class ontologies provide a starting point for selecting plausible candidates for BGCs and MS/MS spectra based on matching chemical compound class ontologies. NPClassScore expedites genome/metabolome data integration, as relevant BGC-metabolite links are prioritised, and researchers are faced with substantially fewer proposed BGC-MS/MS links to manually inspect. We anticipate that our addition to the NPLinker platform will aid integrative omics mining workflows in discovering novel NPs and understanding complex metabolic interactions in the microbiome. Video Abstract
first_indexed 2024-04-10T19:41:59Z
format Article
id doaj.art-64adb32a491840738686867bbd496ca0
institution Directory Open Access Journal
issn 2049-2618
language English
last_indexed 2024-04-10T19:41:59Z
publishDate 2023-01-01
publisher BMC
record_format Article
series Microbiome
spelling doaj.art-64adb32a491840738686867bbd496ca02023-01-29T12:17:31ZengBMCMicrobiome2049-26182023-01-0111111210.1186/s40168-022-01444-3Enhanced correlation-based linking of biosynthetic gene clusters to their metabolic products through chemical class matchingJoris J. R. Louwen0Marnix H. Medema1Justin J. J. van der Hooft2Bioinformatics Group, Wageningen University & ResearchBioinformatics Group, Wageningen University & ResearchBioinformatics Group, Wageningen University & ResearchAbstract Background It is well-known that the microbiome produces a myriad of specialised metabolites with diverse functions. To better characterise their structures and identify their producers in complex samples, integrative genome and metabolome mining is becoming increasingly popular. Metabologenomic co-occurrence-based correlation scoring methods facilitate the linking of metabolite mass fragmentation spectra (MS/MS) to their cognate biosynthetic gene clusters (BGCs) based on shared absence/presence patterns of metabolites and BGCs in paired omics datasets of multiple strains. Recently, these methods have been made more readily accessible through the NPLinker platform. However, co-occurrence-based approaches usually result in too many candidate links to manually validate. To address this issue, we introduce a generic feature-based correlation method that matches chemical compound classes between BGCs and MS/MS spectra. Results To automatically reduce the long lists of potential BGC-MS/MS spectrum links, we match natural product (NP) ontologies previously independently developed for genomics and metabolomics and developed NPClassScore: an empirical class matching score that we also implemented in the NPLinker platform. By applying NPClassScore on three paired omics datasets totalling 189 bacterial strains, we show that the number of links is reduced by on average 63% as compared to using a co-occurrence-based strategy alone. We further demonstrate that 96% of experimentally validated links in these datasets are retained and prioritised when using NPClassScore. Conclusion The matching genome-metabolome class ontologies provide a starting point for selecting plausible candidates for BGCs and MS/MS spectra based on matching chemical compound class ontologies. NPClassScore expedites genome/metabolome data integration, as relevant BGC-metabolite links are prioritised, and researchers are faced with substantially fewer proposed BGC-MS/MS links to manually inspect. We anticipate that our addition to the NPLinker platform will aid integrative omics mining workflows in discovering novel NPs and understanding complex metabolic interactions in the microbiome. Video Abstracthttps://doi.org/10.1186/s40168-022-01444-3Multi-omicsGenome miningGenomicsMetabolome miningMetabolomicsChemical compound classification
spellingShingle Joris J. R. Louwen
Marnix H. Medema
Justin J. J. van der Hooft
Enhanced correlation-based linking of biosynthetic gene clusters to their metabolic products through chemical class matching
Microbiome
Multi-omics
Genome mining
Genomics
Metabolome mining
Metabolomics
Chemical compound classification
title Enhanced correlation-based linking of biosynthetic gene clusters to their metabolic products through chemical class matching
title_full Enhanced correlation-based linking of biosynthetic gene clusters to their metabolic products through chemical class matching
title_fullStr Enhanced correlation-based linking of biosynthetic gene clusters to their metabolic products through chemical class matching
title_full_unstemmed Enhanced correlation-based linking of biosynthetic gene clusters to their metabolic products through chemical class matching
title_short Enhanced correlation-based linking of biosynthetic gene clusters to their metabolic products through chemical class matching
title_sort enhanced correlation based linking of biosynthetic gene clusters to their metabolic products through chemical class matching
topic Multi-omics
Genome mining
Genomics
Metabolome mining
Metabolomics
Chemical compound classification
url https://doi.org/10.1186/s40168-022-01444-3
work_keys_str_mv AT jorisjrlouwen enhancedcorrelationbasedlinkingofbiosyntheticgeneclusterstotheirmetabolicproductsthroughchemicalclassmatching
AT marnixhmedema enhancedcorrelationbasedlinkingofbiosyntheticgeneclusterstotheirmetabolicproductsthroughchemicalclassmatching
AT justinjjvanderhooft enhancedcorrelationbasedlinkingofbiosyntheticgeneclusterstotheirmetabolicproductsthroughchemicalclassmatching