Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection

Complex mixtures containing natural products are still an interesting source of novel drug candidates. High content screening (HCS) is a popular tool to screen for such. In particular, multiplexed HCS assays promise comprehensive bioactivity profiles, but generate also high amounts of data. Yet, onl...

Full description

Bibliographic Details
Main Authors:	Erwin Kupczyk, Kenji Schorpp, Kamyar Hadian, Sean Lin, Dimitrios Tziotis, Philippe Schmitt-Kopplin, Constanze Mueller
Format:	Article
Language:	English
Published:	Elsevier 2022-01-01
Series:	Computational and Structural Biotechnology Journal
Subjects:	High-content screening Machine learning Deep learning Classifier Novelty detection Bioactives
Online Access:	http://www.sciencedirect.com/science/article/pii/S2001037022004299

_version_	1797978146426847232
author	Erwin Kupczyk Kenji Schorpp Kamyar Hadian Sean Lin Dimitrios Tziotis Philippe Schmitt-Kopplin Constanze Mueller
author_facet	Erwin Kupczyk Kenji Schorpp Kamyar Hadian Sean Lin Dimitrios Tziotis Philippe Schmitt-Kopplin Constanze Mueller
author_sort	Erwin Kupczyk
collection	DOAJ
description	Complex mixtures containing natural products are still an interesting source of novel drug candidates. High content screening (HCS) is a popular tool to screen for such. In particular, multiplexed HCS assays promise comprehensive bioactivity profiles, but generate also high amounts of data. Yet, only some machine learning (ML) applications for data analysis are available and these usually require a profound knowledge of the underlying cell biology. Unfortunately, there are no applications that simply predict if samples are biologically active or not (any kind of bioactivity). Within this work, we benchmark ML algorithms for binary classification, starting with classical ML models, which are the standard classifiers of the scikit-learn library or ensemble models of these classifiers (a total of 92 models tested). Followed by a partial least square regression (PLSR)-based classification (44 tested models in total) and simple artificial neural networks (ANNs) with dense layers (72 tested models in total). In addition, a novelty detection (ND) was examined, which is supposed to handle unknown patterns. For the final analysis the models, with and without upstream ND, were tested with two independent data sets. In our analysis, a stacking model, an ensamble model of class ML algorithms, performed best to predict new and unknown data. ND improved the predictions of the models and was useful to handle unknown patterns. Importantly, the classifier presented here can be easily rebuilt and be adapted to the data and demands of other groups. The hit detector (ND + stacking model) is universal and suitable for a broader application to support the search for new drug candidates.
first_indexed	2024-04-11T05:18:17Z
format	Article
id	doaj.art-4aa5cd7a00e84195a70b11e89e400f7e
institution	Directory Open Access Journal
issn	2001-0370
language	English
last_indexed	2024-04-11T05:18:17Z
publishDate	2022-01-01
publisher	Elsevier
record_format	Article
series	Computational and Structural Biotechnology Journal
spelling	doaj.art-4aa5cd7a00e84195a70b11e89e400f7e2022-12-24T04:54:31ZengElsevierComputational and Structural Biotechnology Journal2001-03702022-01-012054535465Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detectionErwin Kupczyk0Kenji Schorpp1Kamyar Hadian2Sean Lin3Dimitrios Tziotis4Philippe Schmitt-Kopplin5Constanze Mueller6Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany; Comprehensive Foodomics Platform, Chair of Analytical Food Chemistry, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 2, 85354 Freising, GermanyInstitute for Molecular Toxicology and Pharmacology, Cell Signaling and Chemical Biology, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, GermanyInstitute for Molecular Toxicology and Pharmacology, Cell Signaling and Chemical Biology, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, GermanyInstitute for Molecular Toxicology and Pharmacology, Cell Signaling and Chemical Biology, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, GermanyResearch Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, GermanyResearch Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany; Comprehensive Foodomics Platform, Chair of Analytical Food Chemistry, TUM School of Life Sciences, Technical University of Munich, Maximus-von-Imhof-Forum 2, 85354 Freising, Germany; Corresponding author at: Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany.Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany; Corresponding author at: Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany.Complex mixtures containing natural products are still an interesting source of novel drug candidates. High content screening (HCS) is a popular tool to screen for such. In particular, multiplexed HCS assays promise comprehensive bioactivity profiles, but generate also high amounts of data. Yet, only some machine learning (ML) applications for data analysis are available and these usually require a profound knowledge of the underlying cell biology. Unfortunately, there are no applications that simply predict if samples are biologically active or not (any kind of bioactivity). Within this work, we benchmark ML algorithms for binary classification, starting with classical ML models, which are the standard classifiers of the scikit-learn library or ensemble models of these classifiers (a total of 92 models tested). Followed by a partial least square regression (PLSR)-based classification (44 tested models in total) and simple artificial neural networks (ANNs) with dense layers (72 tested models in total). In addition, a novelty detection (ND) was examined, which is supposed to handle unknown patterns. For the final analysis the models, with and without upstream ND, were tested with two independent data sets. In our analysis, a stacking model, an ensamble model of class ML algorithms, performed best to predict new and unknown data. ND improved the predictions of the models and was useful to handle unknown patterns. Importantly, the classifier presented here can be easily rebuilt and be adapted to the data and demands of other groups. The hit detector (ND + stacking model) is universal and suitable for a broader application to support the search for new drug candidates.http://www.sciencedirect.com/science/article/pii/S2001037022004299High-content screeningMachine learningDeep learningClassifierNovelty detectionBioactives
spellingShingle	Erwin Kupczyk Kenji Schorpp Kamyar Hadian Sean Lin Dimitrios Tziotis Philippe Schmitt-Kopplin Constanze Mueller Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection Computational and Structural Biotechnology Journal High-content screening Machine learning Deep learning Classifier Novelty detection Bioactives
title	Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
title_full	Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
title_fullStr	Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
title_full_unstemmed	Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
title_short	Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
title_sort	unleashing high content screening in hit detection benchmarking ai workflows including novelty detection
topic	High-content screening Machine learning Deep learning Classifier Novelty detection Bioactives
url	http://www.sciencedirect.com/science/article/pii/S2001037022004299
work_keys_str_mv	AT erwinkupczyk unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT kenjischorpp unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT kamyarhadian unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT seanlin unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT dimitriostziotis unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT philippeschmittkopplin unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT constanzemueller unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection

Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection

Similar Items