Increasing the performance, trustworthiness and practical value of machine learning models: a case study predicting hydrogen bond network dimensionalities from molecular diagrams

The performance of a model is dependent on the quality and information content of the data used to build it. By applying machine learning approaches to a standard chemical dataset, we developed a 4-class classification algorithm that is able to predict the hydrogen bond network dimensionality that a...

Full description

Bibliographic Details
Main Authors: Frade, Ap, McCabe, P, Cooper, R
Format: Journal article
Language:English
Published: Royal Society of Chemistry 2020
_version_ 1797089041312645120
author Frade, Ap
McCabe, P
Cooper, R
author_facet Frade, Ap
McCabe, P
Cooper, R
author_sort Frade, Ap
collection OXFORD
description The performance of a model is dependent on the quality and information content of the data used to build it. By applying machine learning approaches to a standard chemical dataset, we developed a 4-class classification algorithm that is able to predict the hydrogen bond network dimensionality that a molecule would adopt in its crystal form with an accuracy of 59% (in comparison to a 25% random threshold), exclusively from two and lower dimensional molecular descriptors. Although better than random, the performance level achieved by the model did not meet the standards for its reliable application. The practical value of our model was improved by wrapping the model around a confidence tool that increases model robustness, quantifies prediction trust, and allows one to operate a classifier virtually up to any accuracy level. Using this tool, the performance of the model could be improved up to 73% or 89% with the compromise that only 34% and 8% of the total set of test examples could be predicted. We anticipate that the ability to adjust the performance of reliable 2D based models to the requirements of its different applications may increase their practical value, making them suitable to tasks that range from initial virtual library filtering to profile specific compound identification.
first_indexed 2024-03-07T02:58:47Z
format Journal article
id oxford-uuid:b039ab93-d236-485f-843b-e5bf57975537
institution University of Oxford
language English
last_indexed 2024-03-07T02:58:47Z
publishDate 2020
publisher Royal Society of Chemistry
record_format dspace
spelling oxford-uuid:b039ab93-d236-485f-843b-e5bf579755372022-03-27T03:54:58ZIncreasing the performance, trustworthiness and practical value of machine learning models: a case study predicting hydrogen bond network dimensionalities from molecular diagramsJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:b039ab93-d236-485f-843b-e5bf57975537EnglishSymplectic ElementsRoyal Society of Chemistry2020Frade, ApMcCabe, PCooper, RThe performance of a model is dependent on the quality and information content of the data used to build it. By applying machine learning approaches to a standard chemical dataset, we developed a 4-class classification algorithm that is able to predict the hydrogen bond network dimensionality that a molecule would adopt in its crystal form with an accuracy of 59% (in comparison to a 25% random threshold), exclusively from two and lower dimensional molecular descriptors. Although better than random, the performance level achieved by the model did not meet the standards for its reliable application. The practical value of our model was improved by wrapping the model around a confidence tool that increases model robustness, quantifies prediction trust, and allows one to operate a classifier virtually up to any accuracy level. Using this tool, the performance of the model could be improved up to 73% or 89% with the compromise that only 34% and 8% of the total set of test examples could be predicted. We anticipate that the ability to adjust the performance of reliable 2D based models to the requirements of its different applications may increase their practical value, making them suitable to tasks that range from initial virtual library filtering to profile specific compound identification.
spellingShingle Frade, Ap
McCabe, P
Cooper, R
Increasing the performance, trustworthiness and practical value of machine learning models: a case study predicting hydrogen bond network dimensionalities from molecular diagrams
title Increasing the performance, trustworthiness and practical value of machine learning models: a case study predicting hydrogen bond network dimensionalities from molecular diagrams
title_full Increasing the performance, trustworthiness and practical value of machine learning models: a case study predicting hydrogen bond network dimensionalities from molecular diagrams
title_fullStr Increasing the performance, trustworthiness and practical value of machine learning models: a case study predicting hydrogen bond network dimensionalities from molecular diagrams
title_full_unstemmed Increasing the performance, trustworthiness and practical value of machine learning models: a case study predicting hydrogen bond network dimensionalities from molecular diagrams
title_short Increasing the performance, trustworthiness and practical value of machine learning models: a case study predicting hydrogen bond network dimensionalities from molecular diagrams
title_sort increasing the performance trustworthiness and practical value of machine learning models a case study predicting hydrogen bond network dimensionalities from molecular diagrams
work_keys_str_mv AT fradeap increasingtheperformancetrustworthinessandpracticalvalueofmachinelearningmodelsacasestudypredictinghydrogenbondnetworkdimensionalitiesfrommoleculardiagrams
AT mccabep increasingtheperformancetrustworthinessandpracticalvalueofmachinelearningmodelsacasestudypredictinghydrogenbondnetworkdimensionalitiesfrommoleculardiagrams
AT cooperr increasingtheperformancetrustworthinessandpracticalvalueofmachinelearningmodelsacasestudypredictinghydrogenbondnetworkdimensionalitiesfrommoleculardiagrams