Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors
In recent years, a number of machine learning models for the prediction of the skin sensitization potential of small organic molecules have been reported and become available. These models generally perform well within their applicability domains but, as a result of the use of molecular fingerprints...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-08-01
|
Series: | Pharmaceuticals |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8247/14/8/790 |
_version_ | 1797522466383331328 |
---|---|
author | Anke Wilm Marina Garcia de Lomana Conrad Stork Neann Mathai Steffen Hirte Ulf Norinder Jochen Kühnl Johannes Kirchmair |
author_facet | Anke Wilm Marina Garcia de Lomana Conrad Stork Neann Mathai Steffen Hirte Ulf Norinder Jochen Kühnl Johannes Kirchmair |
author_sort | Anke Wilm |
collection | DOAJ |
description | In recent years, a number of machine learning models for the prediction of the skin sensitization potential of small organic molecules have been reported and become available. These models generally perform well within their applicability domains but, as a result of the use of molecular fingerprints and other non-intuitive descriptors, the interpretability of the existing models is limited. The aim of this work is to develop a strategy to replace the non-intuitive features by predicted outcomes of bioassays. We show that such replacement is indeed possible and that as few as ten interpretable, predicted bioactivities are sufficient to reach competitive performance. On a holdout data set of 257 compounds, the best model (“Skin Doctor CP:Bio”) obtained an efficiency of 0.82 and an MCC of 0.52 (at the significance level of 0.20). Skin Doctor CP:Bio is available free of charge for academic research. The modeling strategies explored in this work are easily transferable and could be adopted for the development of more interpretable machine learning models for the prediction of the bioactivity and toxicity of small organic compounds. |
first_indexed | 2024-03-10T08:29:47Z |
format | Article |
id | doaj.art-f44cc0a25c6a4a5abfab2fabee939e30 |
institution | Directory Open Access Journal |
issn | 1424-8247 |
language | English |
last_indexed | 2024-03-10T08:29:47Z |
publishDate | 2021-08-01 |
publisher | MDPI AG |
record_format | Article |
series | Pharmaceuticals |
spelling | doaj.art-f44cc0a25c6a4a5abfab2fabee939e302023-11-22T09:11:53ZengMDPI AGPharmaceuticals1424-82472021-08-0114879010.3390/ph14080790Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful DescriptorsAnke Wilm0Marina Garcia de Lomana1Conrad Stork2Neann Mathai3Steffen Hirte4Ulf Norinder5Jochen Kühnl6Johannes Kirchmair7Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, GermanyDepartment of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090 Vienna, AustriaCenter for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, GermanyComputational Biology Unit (CBU), Department of Chemistry, University of Bergen, N-5020 Bergen, NorwayDepartment of Pharmaceutical Sciences, Faculty of Life Sciences, University of Vienna, 1090 Vienna, AustriaMTM Research Centre, School of Science and Technology, Örebro University, SE-70182 Örebro, SwedenFront End Innovation, Beiersdorf AG, 22529 Hamburg, GermanyCenter for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, GermanyIn recent years, a number of machine learning models for the prediction of the skin sensitization potential of small organic molecules have been reported and become available. These models generally perform well within their applicability domains but, as a result of the use of molecular fingerprints and other non-intuitive descriptors, the interpretability of the existing models is limited. The aim of this work is to develop a strategy to replace the non-intuitive features by predicted outcomes of bioassays. We show that such replacement is indeed possible and that as few as ten interpretable, predicted bioactivities are sufficient to reach competitive performance. On a holdout data set of 257 compounds, the best model (“Skin Doctor CP:Bio”) obtained an efficiency of 0.82 and an MCC of 0.52 (at the significance level of 0.20). Skin Doctor CP:Bio is available free of charge for academic research. The modeling strategies explored in this work are easily transferable and could be adopted for the development of more interpretable machine learning models for the prediction of the bioactivity and toxicity of small organic compounds.https://www.mdpi.com/1424-8247/14/8/790skin sensitizationtoxicity predictionin silico predictionmachine learningrandom forestconformal prediction |
spellingShingle | Anke Wilm Marina Garcia de Lomana Conrad Stork Neann Mathai Steffen Hirte Ulf Norinder Jochen Kühnl Johannes Kirchmair Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors Pharmaceuticals skin sensitization toxicity prediction in silico prediction machine learning random forest conformal prediction |
title | Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors |
title_full | Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors |
title_fullStr | Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors |
title_full_unstemmed | Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors |
title_short | Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors |
title_sort | predicting the skin sensitization potential of small molecules with machine learning models trained on biologically meaningful descriptors |
topic | skin sensitization toxicity prediction in silico prediction machine learning random forest conformal prediction |
url | https://www.mdpi.com/1424-8247/14/8/790 |
work_keys_str_mv | AT ankewilm predictingtheskinsensitizationpotentialofsmallmoleculeswithmachinelearningmodelstrainedonbiologicallymeaningfuldescriptors AT marinagarciadelomana predictingtheskinsensitizationpotentialofsmallmoleculeswithmachinelearningmodelstrainedonbiologicallymeaningfuldescriptors AT conradstork predictingtheskinsensitizationpotentialofsmallmoleculeswithmachinelearningmodelstrainedonbiologicallymeaningfuldescriptors AT neannmathai predictingtheskinsensitizationpotentialofsmallmoleculeswithmachinelearningmodelstrainedonbiologicallymeaningfuldescriptors AT steffenhirte predictingtheskinsensitizationpotentialofsmallmoleculeswithmachinelearningmodelstrainedonbiologicallymeaningfuldescriptors AT ulfnorinder predictingtheskinsensitizationpotentialofsmallmoleculeswithmachinelearningmodelstrainedonbiologicallymeaningfuldescriptors AT jochenkuhnl predictingtheskinsensitizationpotentialofsmallmoleculeswithmachinelearningmodelstrainedonbiologicallymeaningfuldescriptors AT johanneskirchmair predictingtheskinsensitizationpotentialofsmallmoleculeswithmachinelearningmodelstrainedonbiologicallymeaningfuldescriptors |