Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors

In recent years, a number of machine learning models for the prediction of the skin sensitization potential of small organic molecules have been reported and become available. These models generally perform well within their applicability domains but, as a result of the use of molecular fingerprints...

Full description

Bibliographic Details
Main Authors: Anke Wilm, Marina Garcia de Lomana, Conrad Stork, Neann Mathai, Steffen Hirte, Ulf Norinder, Jochen Kühnl, Johannes Kirchmair
Format: Article
Language:English
Published: MDPI AG 2021-08-01
Series:Pharmaceuticals
Subjects:
Online Access:https://www.mdpi.com/1424-8247/14/8/790
_version_ 1797522466383331328
author Anke Wilm
Marina Garcia de Lomana
Conrad Stork
Neann Mathai
Steffen Hirte
Ulf Norinder
Jochen Kühnl
Johannes Kirchmair
author_facet Anke Wilm
Marina Garcia de Lomana
Conrad Stork
Neann Mathai
Steffen Hirte
Ulf Norinder
Jochen Kühnl
Johannes Kirchmair
author_sort Anke Wilm
collection DOAJ
description In recent years, a number of machine learning models for the prediction of the skin sensitization potential of small organic molecules have been reported and become available. These models generally perform well within their applicability domains but, as a result of the use of molecular fingerprints and other non-intuitive descriptors, the interpretability of the existing models is limited. The aim of this work is to develop a strategy to replace the non-intuitive features by predicted outcomes of bioassays. We show that such replacement is indeed possible and that as few as ten interpretable, predicted bioactivities are sufficient to reach competitive performance. On a holdout data set of 257 compounds, the best model (“Skin Doctor CP:Bio”) obtained an efficiency of 0.82 and an MCC of 0.52 (at the significance level of 0.20). Skin Doctor CP:Bio is available free of charge for academic research. The modeling strategies explored in this work are easily transferable and could be adopted for the development of more interpretable machine learning models for the prediction of the bioactivity and toxicity of small organic compounds.
first_indexed 2024-03-10T08:29:47Z
format Article
id doaj.art-f44cc0a25c6a4a5abfab2fabee939e30
institution Directory Open Access Journal
issn 1424-8247
language English
last_indexed 2024-03-10T08:29:47Z
publishDate 2021-08-01
publisher MDPI AG
record_format Article
series Pharmaceuticals
spelling doaj.art-f44cc0a25c6a4a5abfab2fabee939e302023-11-22T09:11:53ZengMDPI AGPharmaceuticals1424-82472021-08-0114879010.3390/ph14080790Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful DescriptorsAnke Wilm0Marina Garcia de Lomana1Conrad Stork2Neann Mathai3Steffen Hirte4Ulf Norinder5Jochen Kühnl6Johannes Kirchmair7Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, GermanyDepartment of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090 Vienna, AustriaCenter for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, GermanyComputational Biology Unit (CBU), Department of Chemistry, University of Bergen, N-5020 Bergen, NorwayDepartment of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090 Vienna, AustriaMTM Research Centre, School of Science and Technology, Örebro University, SE-70182 Örebro, SwedenFront End Innovation, Beiersdorf AG, 22529 Hamburg, GermanyCenter for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, GermanyIn recent years, a number of machine learning models for the prediction of the skin sensitization potential of small organic molecules have been reported and become available. These models generally perform well within their applicability domains but, as a result of the use of molecular fingerprints and other non-intuitive descriptors, the interpretability of the existing models is limited. The aim of this work is to develop a strategy to replace the non-intuitive features by predicted outcomes of bioassays. We show that such replacement is indeed possible and that as few as ten interpretable, predicted bioactivities are sufficient to reach competitive performance. On a holdout data set of 257 compounds, the best model (“Skin Doctor CP:Bio”) obtained an efficiency of 0.82 and an MCC of 0.52 (at the significance level of 0.20). Skin Doctor CP:Bio is available free of charge for academic research. The modeling strategies explored in this work are easily transferable and could be adopted for the development of more interpretable machine learning models for the prediction of the bioactivity and toxicity of small organic compounds.https://www.mdpi.com/1424-8247/14/8/790skin sensitizationtoxicity predictionin silico predictionmachine learningrandom forestconformal prediction
spellingShingle Anke Wilm
Marina Garcia de Lomana
Conrad Stork
Neann Mathai
Steffen Hirte
Ulf Norinder
Jochen Kühnl
Johannes Kirchmair
Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors
Pharmaceuticals
skin sensitization
toxicity prediction
in silico prediction
machine learning
random forest
conformal prediction
title Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors
title_full Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors
title_fullStr Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors
title_full_unstemmed Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors
title_short Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors
title_sort predicting the skin sensitization potential of small molecules with machine learning models trained on biologically meaningful descriptors
topic skin sensitization
toxicity prediction
in silico prediction
machine learning
random forest
conformal prediction
url https://www.mdpi.com/1424-8247/14/8/790
work_keys_str_mv AT ankewilm predictingtheskinsensitizationpotentialofsmallmoleculeswithmachinelearningmodelstrainedonbiologicallymeaningfuldescriptors
AT marinagarciadelomana predictingtheskinsensitizationpotentialofsmallmoleculeswithmachinelearningmodelstrainedonbiologicallymeaningfuldescriptors
AT conradstork predictingtheskinsensitizationpotentialofsmallmoleculeswithmachinelearningmodelstrainedonbiologicallymeaningfuldescriptors
AT neannmathai predictingtheskinsensitizationpotentialofsmallmoleculeswithmachinelearningmodelstrainedonbiologicallymeaningfuldescriptors
AT steffenhirte predictingtheskinsensitizationpotentialofsmallmoleculeswithmachinelearningmodelstrainedonbiologicallymeaningfuldescriptors
AT ulfnorinder predictingtheskinsensitizationpotentialofsmallmoleculeswithmachinelearningmodelstrainedonbiologicallymeaningfuldescriptors
AT jochenkuhnl predictingtheskinsensitizationpotentialofsmallmoleculeswithmachinelearningmodelstrainedonbiologicallymeaningfuldescriptors
AT johanneskirchmair predictingtheskinsensitizationpotentialofsmallmoleculeswithmachinelearningmodelstrainedonbiologicallymeaningfuldescriptors